Friday, March 30, 2012

Puzzled by Query Plan

I'm hoping somebody can explain exactly what's going on here - I can't
find it documented anywhere.

Go to the Northwind database, and run the following SQL:

create index IX_UnitPrice on [order details](unitprice)

Now, turn on SHOWPLAN (either graphical or text, it doesn't matter),
and run the following query:

select * from [order details]
where unitprice = 2

Output:

StmtText
|--Index Seek(OBJECT: ([Northwind].[dbo].[Order
Details].[IX_UnitPrice]), SEEK: ([Order
Details].[UnitPrice]=Convert([@.1])) ORDERED FORWARD)

Now, alter the SARG slightly by making it a float:

select unitprice from [order details]
where unitprice = 2.000

Output:

StmtText
|--Nested Loops(Inner Join, OUTER REFERENCES: ([Expr1003], [Expr1004],
[Expr1005]))
|--Compute Scalar(DEFINE: ([Expr1003]=Convert(Convert([@.1]))-1.00,
[Expr1004]=Convert(Convert([@.1]))+1.00, [Expr1005]=If
(Convert(Convert([@.1]))-1.00=NULL) then 0 else 6|If
(Convert(Convert([@.1]))+1.00=NULL) then 0 else 10))
| |--Constant Scan
|--Index Seek(OBJECT: ([Northwind].[dbo].[Order
Details].[IX_UnitPrice]), SEEK: ([Order Details].[UnitPrice] >
[Expr1003] AND [Order Details].[UnitPrice] < [Expr1004]), WHERE:
(Convert([Order Details].[UnitPrice])=Convert([@.1])) ORDERED FORWARD)

Right. I understand that in both cases the SARG datatype is different
from the column datatype (which is money), and that in the first
example the SARG constant gets implicitly converted from int -> money
(following the datatype hierarchy rules), and so the index can still
be used.

In the second example, the datatype hierarchy dictates that money is
lower than float, so the table column gets implicitly converted from
money -> float, which strictly speaking disallows the use of the index
on that column.

What I DON'T understand is what exactly all that gubbins about the
expressions (especially the definition of [Expr1005] is all about; how
does that statement decide whether Expr1005 is going to be NULL, 6, or
10?

I'm soon going to be giving some worked tutorials on index selection
and use of Showplan to our developers, and being a bolshi lot they're
bound to want to know exactly what all that output means. I'd rather
be able to tell them than to say I don't actually know!

How about it someone?

Thanks,

Phil"Philip Yale" wrote:

<snip
> select unitprice from [order details]
> where unitprice = 2.000
> Output:
> StmtText
> |--Nested Loops(Inner Join, OUTER REFERENCES: ([Expr1003], [Expr1004],
> [Expr1005]))
> |--Compute Scalar(DEFINE: ([Expr1003]=Convert(Convert([@.1]))-1.00,
> [Expr1004]=Convert(Convert([@.1]))+1.00, [Expr1005]=If
> (Convert(Convert([@.1]))-1.00=NULL) then 0 else 6|If
> (Convert(Convert([@.1]))+1.00=NULL) then 0 else 10))
> | |--Constant Scan
> |--Index Seek(OBJECT: ([Northwind].[dbo].[Order
> Details].[IX_UnitPrice]), SEEK: ([Order Details].[UnitPrice] >
> [Expr1003] AND [Order Details].[UnitPrice] < [Expr1004]), WHERE:
> (Convert([Order Details].[UnitPrice])=Convert([@.1])) ORDERED FORWARD)
>
> Right. I understand that in both cases the SARG datatype is different
> from the column datatype (which is money), and that in the first
> example the SARG constant gets implicitly converted from int -> money
> (following the datatype hierarchy rules), and so the index can still
> be used.
> In the second example, the datatype hierarchy dictates that money is
> lower than float, so the table column gets implicitly converted from
> money -> float, which strictly speaking disallows the use of the index
> on that column.
> What I DON'T understand is what exactly all that gubbins about the
> expressions (especially the definition of [Expr1005] is all about; how
> does that statement decide whether Expr1005 is going to be NULL, 6, or
> 10?

<snip
Phil,

It appears that SQL Server is converting your float SARG to 2 money scalars
(SARG - 1 and SARG + 1) so that it can perform an index seek with money
types and still handle loss of precision and floating point rounding. The
second part of the seek (listed as the WHERE) then converts the index values
to float: this way it doesn't have to convert the table column until after a
seek has been performed. Pretty smart if you ask me...

[Expr1005] use a bitwise or so that Expr1005 has a unique value for the
various NULL states of the other two calcuated values

6 = 0110 in binary
10 = 1010 in binary

SARG-1 SARG+1 Result
NOT NULL NOT NULL 6 | 10 = 1110
NOT NULL NULL 6 | 0 = 0110
NULL NOT NULL 0 | 10 = 1010
NULL NULL 0 | 0 = 0000

So you have a 4-bit value where you can examine bit 1 to see if you have any
non-null value, bit 2 to check SARG-1 for a non-null value, and bit 3 to
check SARG+1 for a non-null value. Bit 0 tells you nothing: I don't know
why unless it has something to do with the internal representation of null's
or float's or something else.

What I'm not clear on is exactly how this bit mask is used in the nested
loop join unless a row is rejected out of hand for a zero value for the
expression. Perhaps someone else can shed light on this...

Craig|||"Craig Kelly" <cnkelly.nospam@.nospam.net> wrote in message news:<L8ydd.14900$OD2.132@.bgtnsc05-news.ops.worldnet.att.net>...
> "Philip Yale" wrote:
> <snip>
> > select unitprice from [order details]
> > where unitprice = 2.000
> > Output:
> > StmtText
> > |--Nested Loops(Inner Join, OUTER REFERENCES: ([Expr1003], [Expr1004],
> [Expr1005]))
> > |--Compute Scalar(DEFINE: ([Expr1003]=Convert(Convert([@.1]))-1.00,
> > [Expr1004]=Convert(Convert([@.1]))+1.00, [Expr1005]=If
> > (Convert(Convert([@.1]))-1.00=NULL) then 0 else 6|If
> > (Convert(Convert([@.1]))+1.00=NULL) then 0 else 10))
> > | |--Constant Scan
> > |--Index Seek(OBJECT: ([Northwind].[dbo].[Order
> > Details].[IX_UnitPrice]), SEEK: ([Order Details].[UnitPrice] >
> > [Expr1003] AND [Order Details].[UnitPrice] < [Expr1004]), WHERE:
> > (Convert([Order Details].[UnitPrice])=Convert([@.1])) ORDERED FORWARD)
> > Right. I understand that in both cases the SARG datatype is different
> > from the column datatype (which is money), and that in the first
> > example the SARG constant gets implicitly converted from int -> money
> > (following the datatype hierarchy rules), and so the index can still
> > be used.
> > In the second example, the datatype hierarchy dictates that money is
> > lower than float, so the table column gets implicitly converted from
> > money -> float, which strictly speaking disallows the use of the index
> > on that column.
> > What I DON'T understand is what exactly all that gubbins about the
> > expressions (especially the definition of [Expr1005] is all about; how
> > does that statement decide whether Expr1005 is going to be NULL, 6, or
> > 10?
> <snip>
> Phil,
> It appears that SQL Server is converting your float SARG to 2 money scalars
> (SARG - 1 and SARG + 1) so that it can perform an index seek with money
> types and still handle loss of precision and floating point rounding. The
> second part of the seek (listed as the WHERE) then converts the index values
> to float: this way it doesn't have to convert the table column until after a
> seek has been performed. Pretty smart if you ask me...
> [Expr1005] use a bitwise or so that Expr1005 has a unique value for the
> various NULL states of the other two calcuated values
> 6 = 0110 in binary
> 10 = 1010 in binary
> SARG-1 SARG+1 Result
> NOT NULL NOT NULL 6 | 10 = 1110
> NOT NULL NULL 6 | 0 = 0110
> NULL NOT NULL 0 | 10 = 1010
> NULL NULL 0 | 0 = 0000
> So you have a 4-bit value where you can examine bit 1 to see if you have any
> non-null value, bit 2 to check SARG-1 for a non-null value, and bit 3 to
> check SARG+1 for a non-null value. Bit 0 tells you nothing: I don't know
> why unless it has something to do with the internal representation of null's
> or float's or something else.
> What I'm not clear on is exactly how this bit mask is used in the nested
> loop join unless a row is rejected out of hand for a zero value for the
> expression. Perhaps someone else can shed light on this...
> Craig

Craig,

Thanks very much for that - an extremely well-explained and detailed
answer. I'm intrigued to know how you knew all that stuff - or did you
just deduce it? (Don't want a job do you? :-) )

I must confess I never thought that the values might be bitmaps. Like
you say, it's all pretty smart. All we need to know now is just where
Expr1005 is actually used.

Phil|||"Craig Kelly" <cnkelly.nospam@.nospam.net> wrote in message news:<L8ydd.14900$OD2.132@.bgtnsc05-news.ops.worldnet.att.net>...
> "Philip Yale" wrote:
> <snip>
> > select unitprice from [order details]
> > where unitprice = 2.000
> > Output:
> > StmtText
> > |--Nested Loops(Inner Join, OUTER REFERENCES: ([Expr1003], [Expr1004],
> [Expr1005]))
> > |--Compute Scalar(DEFINE: ([Expr1003]=Convert(Convert([@.1]))-1.00,
> > [Expr1004]=Convert(Convert([@.1]))+1.00, [Expr1005]=If
> > (Convert(Convert([@.1]))-1.00=NULL) then 0 else 6|If
> > (Convert(Convert([@.1]))+1.00=NULL) then 0 else 10))
> > | |--Constant Scan
> > |--Index Seek(OBJECT: ([Northwind].[dbo].[Order
> > Details].[IX_UnitPrice]), SEEK: ([Order Details].[UnitPrice] >
> > [Expr1003] AND [Order Details].[UnitPrice] < [Expr1004]), WHERE:
> > (Convert([Order Details].[UnitPrice])=Convert([@.1])) ORDERED FORWARD)
> > Right. I understand that in both cases the SARG datatype is different
> > from the column datatype (which is money), and that in the first
> > example the SARG constant gets implicitly converted from int -> money
> > (following the datatype hierarchy rules), and so the index can still
> > be used.
> > In the second example, the datatype hierarchy dictates that money is
> > lower than float, so the table column gets implicitly converted from
> > money -> float, which strictly speaking disallows the use of the index
> > on that column.
> > What I DON'T understand is what exactly all that gubbins about the
> > expressions (especially the definition of [Expr1005] is all about; how
> > does that statement decide whether Expr1005 is going to be NULL, 6, or
> > 10?
> <snip>
> Phil,
> It appears that SQL Server is converting your float SARG to 2 money scalars
> (SARG - 1 and SARG + 1) so that it can perform an index seek with money
> types and still handle loss of precision and floating point rounding. The
> second part of the seek (listed as the WHERE) then converts the index values
> to float: this way it doesn't have to convert the table column until after a
> seek has been performed. Pretty smart if you ask me...
> [Expr1005] use a bitwise or so that Expr1005 has a unique value for the
> various NULL states of the other two calcuated values
> 6 = 0110 in binary
> 10 = 1010 in binary
> SARG-1 SARG+1 Result
> NOT NULL NOT NULL 6 | 10 = 1110
> NOT NULL NULL 6 | 0 = 0110
> NULL NOT NULL 0 | 10 = 1010
> NULL NULL 0 | 0 = 0000
> So you have a 4-bit value where you can examine bit 1 to see if you have any
> non-null value, bit 2 to check SARG-1 for a non-null value, and bit 3 to
> check SARG+1 for a non-null value. Bit 0 tells you nothing: I don't know
> why unless it has something to do with the internal representation of null's
> or float's or something else.
> What I'm not clear on is exactly how this bit mask is used in the nested
> loop join unless a row is rejected out of hand for a zero value for the
> expression. Perhaps someone else can shed light on this...
> Craig

Craig,

Thanks very much for that - an extremely well-explained and detailed
answer. I'm intrigued to know how you knew all that stuff - or did you
just deduce it? (Don't want a job do you? :-) )

I must confess I never thought that the values might be bitmaps. Like
you say, it's all pretty smart. All we need to know now is just where
Expr1005 is actually used.

Phil|||"Philip Yale" wrote:

> Craig,
> Thanks very much for that - an extremely well-explained and detailed
> answer. I'm intrigued to know how you knew all that stuff - or did you
> just deduce it? (Don't want a job do you? :-) )
> I must confess I never thought that the values might be bitmaps. Like
> you say, it's all pretty smart. All we need to know now is just where
> Expr1005 is actually used.
> Phil

Phil,

Thank you for the multiple (and very flattering) compliments!

I make it a habit to examine query plans and make sure I understand them
when unit testing stored procedures, so they aren't totally unfamiliar to
me; however, I don't consider myself a query plan expert by any means. I
was mostly able to deduce what was going on fairly quickly because that kind
of handling of floating point values is one of two fairly common idioms in
C/C++ (especially since portable floating point handling can be, ahem,
challenging given the differing implementations out there). But I'm still
very curious as to where the bitmap is used...

As far as employment is concerned, I'm fairly happy where I'm at but obscene
piles of money are always a great inducement ;)

Craig

No comments:

Post a Comment