Showing posts with label puzzled. Show all posts
Showing posts with label puzzled. Show all posts

Friday, March 30, 2012

Puzzled by Query Plan

I'm hoping somebody can explain exactly what's going on here - I can't
find it documented anywhere.

Go to the Northwind database, and run the following SQL:

create index IX_UnitPrice on [order details](unitprice)

Now, turn on SHOWPLAN (either graphical or text, it doesn't matter),
and run the following query:

select * from [order details]
where unitprice = 2

Output:

StmtText
|--Index Seek(OBJECT: ([Northwind].[dbo].[Order
Details].[IX_UnitPrice]), SEEK: ([Order
Details].[UnitPrice]=Convert([@.1])) ORDERED FORWARD)

Now, alter the SARG slightly by making it a float:

select unitprice from [order details]
where unitprice = 2.000

Output:

StmtText
|--Nested Loops(Inner Join, OUTER REFERENCES: ([Expr1003], [Expr1004],
[Expr1005]))
|--Compute Scalar(DEFINE: ([Expr1003]=Convert(Convert([@.1]))-1.00,
[Expr1004]=Convert(Convert([@.1]))+1.00, [Expr1005]=If
(Convert(Convert([@.1]))-1.00=NULL) then 0 else 6|If
(Convert(Convert([@.1]))+1.00=NULL) then 0 else 10))
| |--Constant Scan
|--Index Seek(OBJECT: ([Northwind].[dbo].[Order
Details].[IX_UnitPrice]), SEEK: ([Order Details].[UnitPrice] >
[Expr1003] AND [Order Details].[UnitPrice] < [Expr1004]), WHERE:
(Convert([Order Details].[UnitPrice])=Convert([@.1])) ORDERED FORWARD)

Right. I understand that in both cases the SARG datatype is different
from the column datatype (which is money), and that in the first
example the SARG constant gets implicitly converted from int -> money
(following the datatype hierarchy rules), and so the index can still
be used.

In the second example, the datatype hierarchy dictates that money is
lower than float, so the table column gets implicitly converted from
money -> float, which strictly speaking disallows the use of the index
on that column.

What I DON'T understand is what exactly all that gubbins about the
expressions (especially the definition of [Expr1005] is all about; how
does that statement decide whether Expr1005 is going to be NULL, 6, or
10?

I'm soon going to be giving some worked tutorials on index selection
and use of Showplan to our developers, and being a bolshi lot they're
bound to want to know exactly what all that output means. I'd rather
be able to tell them than to say I don't actually know!

How about it someone?

Thanks,

Phil"Philip Yale" wrote:

<snip
> select unitprice from [order details]
> where unitprice = 2.000
> Output:
> StmtText
> |--Nested Loops(Inner Join, OUTER REFERENCES: ([Expr1003], [Expr1004],
> [Expr1005]))
> |--Compute Scalar(DEFINE: ([Expr1003]=Convert(Convert([@.1]))-1.00,
> [Expr1004]=Convert(Convert([@.1]))+1.00, [Expr1005]=If
> (Convert(Convert([@.1]))-1.00=NULL) then 0 else 6|If
> (Convert(Convert([@.1]))+1.00=NULL) then 0 else 10))
> | |--Constant Scan
> |--Index Seek(OBJECT: ([Northwind].[dbo].[Order
> Details].[IX_UnitPrice]), SEEK: ([Order Details].[UnitPrice] >
> [Expr1003] AND [Order Details].[UnitPrice] < [Expr1004]), WHERE:
> (Convert([Order Details].[UnitPrice])=Convert([@.1])) ORDERED FORWARD)
>
> Right. I understand that in both cases the SARG datatype is different
> from the column datatype (which is money), and that in the first
> example the SARG constant gets implicitly converted from int -> money
> (following the datatype hierarchy rules), and so the index can still
> be used.
> In the second example, the datatype hierarchy dictates that money is
> lower than float, so the table column gets implicitly converted from
> money -> float, which strictly speaking disallows the use of the index
> on that column.
> What I DON'T understand is what exactly all that gubbins about the
> expressions (especially the definition of [Expr1005] is all about; how
> does that statement decide whether Expr1005 is going to be NULL, 6, or
> 10?

<snip
Phil,

It appears that SQL Server is converting your float SARG to 2 money scalars
(SARG - 1 and SARG + 1) so that it can perform an index seek with money
types and still handle loss of precision and floating point rounding. The
second part of the seek (listed as the WHERE) then converts the index values
to float: this way it doesn't have to convert the table column until after a
seek has been performed. Pretty smart if you ask me...

[Expr1005] use a bitwise or so that Expr1005 has a unique value for the
various NULL states of the other two calcuated values

6 = 0110 in binary
10 = 1010 in binary

SARG-1 SARG+1 Result
NOT NULL NOT NULL 6 | 10 = 1110
NOT NULL NULL 6 | 0 = 0110
NULL NOT NULL 0 | 10 = 1010
NULL NULL 0 | 0 = 0000

So you have a 4-bit value where you can examine bit 1 to see if you have any
non-null value, bit 2 to check SARG-1 for a non-null value, and bit 3 to
check SARG+1 for a non-null value. Bit 0 tells you nothing: I don't know
why unless it has something to do with the internal representation of null's
or float's or something else.

What I'm not clear on is exactly how this bit mask is used in the nested
loop join unless a row is rejected out of hand for a zero value for the
expression. Perhaps someone else can shed light on this...

Craig|||"Craig Kelly" <cnkelly.nospam@.nospam.net> wrote in message news:<L8ydd.14900$OD2.132@.bgtnsc05-news.ops.worldnet.att.net>...
> "Philip Yale" wrote:
> <snip>
> > select unitprice from [order details]
> > where unitprice = 2.000
> > Output:
> > StmtText
> > |--Nested Loops(Inner Join, OUTER REFERENCES: ([Expr1003], [Expr1004],
> [Expr1005]))
> > |--Compute Scalar(DEFINE: ([Expr1003]=Convert(Convert([@.1]))-1.00,
> > [Expr1004]=Convert(Convert([@.1]))+1.00, [Expr1005]=If
> > (Convert(Convert([@.1]))-1.00=NULL) then 0 else 6|If
> > (Convert(Convert([@.1]))+1.00=NULL) then 0 else 10))
> > | |--Constant Scan
> > |--Index Seek(OBJECT: ([Northwind].[dbo].[Order
> > Details].[IX_UnitPrice]), SEEK: ([Order Details].[UnitPrice] >
> > [Expr1003] AND [Order Details].[UnitPrice] < [Expr1004]), WHERE:
> > (Convert([Order Details].[UnitPrice])=Convert([@.1])) ORDERED FORWARD)
> > Right. I understand that in both cases the SARG datatype is different
> > from the column datatype (which is money), and that in the first
> > example the SARG constant gets implicitly converted from int -> money
> > (following the datatype hierarchy rules), and so the index can still
> > be used.
> > In the second example, the datatype hierarchy dictates that money is
> > lower than float, so the table column gets implicitly converted from
> > money -> float, which strictly speaking disallows the use of the index
> > on that column.
> > What I DON'T understand is what exactly all that gubbins about the
> > expressions (especially the definition of [Expr1005] is all about; how
> > does that statement decide whether Expr1005 is going to be NULL, 6, or
> > 10?
> <snip>
> Phil,
> It appears that SQL Server is converting your float SARG to 2 money scalars
> (SARG - 1 and SARG + 1) so that it can perform an index seek with money
> types and still handle loss of precision and floating point rounding. The
> second part of the seek (listed as the WHERE) then converts the index values
> to float: this way it doesn't have to convert the table column until after a
> seek has been performed. Pretty smart if you ask me...
> [Expr1005] use a bitwise or so that Expr1005 has a unique value for the
> various NULL states of the other two calcuated values
> 6 = 0110 in binary
> 10 = 1010 in binary
> SARG-1 SARG+1 Result
> NOT NULL NOT NULL 6 | 10 = 1110
> NOT NULL NULL 6 | 0 = 0110
> NULL NOT NULL 0 | 10 = 1010
> NULL NULL 0 | 0 = 0000
> So you have a 4-bit value where you can examine bit 1 to see if you have any
> non-null value, bit 2 to check SARG-1 for a non-null value, and bit 3 to
> check SARG+1 for a non-null value. Bit 0 tells you nothing: I don't know
> why unless it has something to do with the internal representation of null's
> or float's or something else.
> What I'm not clear on is exactly how this bit mask is used in the nested
> loop join unless a row is rejected out of hand for a zero value for the
> expression. Perhaps someone else can shed light on this...
> Craig

Craig,

Thanks very much for that - an extremely well-explained and detailed
answer. I'm intrigued to know how you knew all that stuff - or did you
just deduce it? (Don't want a job do you? :-) )

I must confess I never thought that the values might be bitmaps. Like
you say, it's all pretty smart. All we need to know now is just where
Expr1005 is actually used.

Phil|||"Craig Kelly" <cnkelly.nospam@.nospam.net> wrote in message news:<L8ydd.14900$OD2.132@.bgtnsc05-news.ops.worldnet.att.net>...
> "Philip Yale" wrote:
> <snip>
> > select unitprice from [order details]
> > where unitprice = 2.000
> > Output:
> > StmtText
> > |--Nested Loops(Inner Join, OUTER REFERENCES: ([Expr1003], [Expr1004],
> [Expr1005]))
> > |--Compute Scalar(DEFINE: ([Expr1003]=Convert(Convert([@.1]))-1.00,
> > [Expr1004]=Convert(Convert([@.1]))+1.00, [Expr1005]=If
> > (Convert(Convert([@.1]))-1.00=NULL) then 0 else 6|If
> > (Convert(Convert([@.1]))+1.00=NULL) then 0 else 10))
> > | |--Constant Scan
> > |--Index Seek(OBJECT: ([Northwind].[dbo].[Order
> > Details].[IX_UnitPrice]), SEEK: ([Order Details].[UnitPrice] >
> > [Expr1003] AND [Order Details].[UnitPrice] < [Expr1004]), WHERE:
> > (Convert([Order Details].[UnitPrice])=Convert([@.1])) ORDERED FORWARD)
> > Right. I understand that in both cases the SARG datatype is different
> > from the column datatype (which is money), and that in the first
> > example the SARG constant gets implicitly converted from int -> money
> > (following the datatype hierarchy rules), and so the index can still
> > be used.
> > In the second example, the datatype hierarchy dictates that money is
> > lower than float, so the table column gets implicitly converted from
> > money -> float, which strictly speaking disallows the use of the index
> > on that column.
> > What I DON'T understand is what exactly all that gubbins about the
> > expressions (especially the definition of [Expr1005] is all about; how
> > does that statement decide whether Expr1005 is going to be NULL, 6, or
> > 10?
> <snip>
> Phil,
> It appears that SQL Server is converting your float SARG to 2 money scalars
> (SARG - 1 and SARG + 1) so that it can perform an index seek with money
> types and still handle loss of precision and floating point rounding. The
> second part of the seek (listed as the WHERE) then converts the index values
> to float: this way it doesn't have to convert the table column until after a
> seek has been performed. Pretty smart if you ask me...
> [Expr1005] use a bitwise or so that Expr1005 has a unique value for the
> various NULL states of the other two calcuated values
> 6 = 0110 in binary
> 10 = 1010 in binary
> SARG-1 SARG+1 Result
> NOT NULL NOT NULL 6 | 10 = 1110
> NOT NULL NULL 6 | 0 = 0110
> NULL NOT NULL 0 | 10 = 1010
> NULL NULL 0 | 0 = 0000
> So you have a 4-bit value where you can examine bit 1 to see if you have any
> non-null value, bit 2 to check SARG-1 for a non-null value, and bit 3 to
> check SARG+1 for a non-null value. Bit 0 tells you nothing: I don't know
> why unless it has something to do with the internal representation of null's
> or float's or something else.
> What I'm not clear on is exactly how this bit mask is used in the nested
> loop join unless a row is rejected out of hand for a zero value for the
> expression. Perhaps someone else can shed light on this...
> Craig

Craig,

Thanks very much for that - an extremely well-explained and detailed
answer. I'm intrigued to know how you knew all that stuff - or did you
just deduce it? (Don't want a job do you? :-) )

I must confess I never thought that the values might be bitmaps. Like
you say, it's all pretty smart. All we need to know now is just where
Expr1005 is actually used.

Phil|||"Philip Yale" wrote:

> Craig,
> Thanks very much for that - an extremely well-explained and detailed
> answer. I'm intrigued to know how you knew all that stuff - or did you
> just deduce it? (Don't want a job do you? :-) )
> I must confess I never thought that the values might be bitmaps. Like
> you say, it's all pretty smart. All we need to know now is just where
> Expr1005 is actually used.
> Phil

Phil,

Thank you for the multiple (and very flattering) compliments!

I make it a habit to examine query plans and make sure I understand them
when unit testing stored procedures, so they aren't totally unfamiliar to
me; however, I don't consider myself a query plan expert by any means. I
was mostly able to deduce what was going on fairly quickly because that kind
of handling of floating point values is one of two fairly common idioms in
C/C++ (especially since portable floating point handling can be, ahem,
challenging given the differing implementations out there). But I'm still
very curious as to where the bitmap is used...

As far as employment is concerned, I'm fairly happy where I'm at but obscene
piles of money are always a great inducement ;)

Craig

Puzzled by installation error.

I have tried installed several times the SQL Server 2005 Express (Standard or Advanced Services) on my XP SP2 system and all failed.

The error messages are all about the Secure Socket Layer certificate and provider, either cannot find, create valid SSL certificate or the authentication provider is not trust (actually I don't know any of this and where to find them). I am learning at home the SQL Server and not going to do a website. Why I need a SSL? Or if it is really necessary, how do I obtained one?

I installed IIS (supposedly 5.0 since I have XP SP2), is this related? How to install without IIS?

Or do I need a Server OS?

Turn of the SSL on the default website (where Reporting Service will be installed) and you should be able to install the whole thing easily. Afterwards you can turn on SSL again on the default website.

HTH, Jens K. Suessmeyer.


http://www.sqlserver2005.de

|||

How to turn off SSL? In SQL Express setup or IIS setting in MMC?

My IIS setting has a default SSL port 440 but I cannot change it.

Puzzled by installation error.

I have tried installed several times the SQL Server 2005 Express (Standard or Advanced Services) on my XP SP2 system and all failed.

The error messages are all about the Secure Socket Layer certificate and provider, either cannot find, create valid SSL certificate or the authentication provider is not trust (actually I don't know any of this and where to find them). I am learning at home the SQL Server and not going to do a website. Why I need a SSL? Or if it is really necessary, how do I obtained one?

I installed IIS (supposedly 5.0 since I have XP SP2), is this related? How to install without IIS?

Or do I need a Server OS?

Turn of the SSL on the default website (where Reporting Service will be installed) and you should be able to install the whole thing easily. Afterwards you can turn on SSL again on the default website.

HTH, Jens K. Suessmeyer.


http://www.sqlserver2005.de

|||

How to turn off SSL? In SQL Express setup or IIS setting in MMC?

My IIS setting has a default SSL port 440 but I cannot change it.

Puzzled by duplicates

Hello all!

Have a question:

Is there way in SQL to determine duplicate rows without using count(), aggregate functions, group by or select distinct?

I only have regular select, join and delete features.

Basically, what are all the possible ways to determine duplicates in data like this?

col1|col2
----
s1--j1
s1--j4
s1--j1
s1--j3
s1--j2

I greatly appreciate your response!
Thanks!You need to add a third column, a unique ID column. Then you can do this

select t1.* from t1
join t2 on t1.col1=t2.col2 and t1.col2=t2.col2 and t1.id != t2.id

Check out my brand new SQL tutorial at http://www.bitesizeinc.net/index.php/sql.html

-Chris
http://www.bitesizeinc.net/|||Is there way to do it without a unique ID? Not to be persistent, but my original goal was to do it only with those simple SQL statements.

I just wanted to make sure that I tried every way possible. Maybe there is a way to use cartesian product or whichever, but it's gotta be based on these simple statements.

If it can only be done using unique ID or aggregate functions, it's a good answer as well.

Let me know if I sound confusing. Great site btw, very original!.

Thanks for your devotion.|||Thank you so much for visitting my site...

I think that you're out of options here.

The best way is using GROUP BY.

Otherwise, you could select DISTINCT CONCAT(field1,' ',field2)

Or use the unique ID

I can't think of anyway else...

-Chrissql

Puzzled by Config files

I created a package, created an XML Config file, created a deployment utility and then deployed to server. I scheduled it with agent and I get this error (changed the package name in the error to IMPORT to make it more readable):

Description: The configuration file name "C:\Documents and Settings\chris\My Documents\Visual Studio 2005\Projects\IMPORT\IMPORT\IMPORT.dtsConfig" is not valid. Check the configuration file name. End Warning Warning: 2006-07-12 13:11:39.96 Code: 0x80012059 Source: IMPORT Description: Failed to load at least one of the configuration entries for the package. Check configurations entries and previous warnings to see descriptions of which configuration failed. End Warning Progress: 2006-07-12 13:11:40.56 Source: Import Data Validating: 0% complete End Progress Progress: 2006-07-12 13:11:40.56 Source:... Process Exit Code 1. The step failed.

So basically its trying to point to a config file on my dev workstation. When I created the deploy utility it created a config file in the Deployment directory with the Manifest and the package. I installed from the deployment directory. When installing it asked where I wanted to install and I left it the default of "C:\Program Files\Microsoft SQL Server\90\DTS\Packages\IMPORT" and if I look in that folder the IMPORT.dtsconfig is there.

Ideas of where I went wrong? I want to make sure I understand the whole configuration file notion correctly as I am starting to roll a bunch of packages with config files out to production. I've done the tutorial and thought it was straightforward and made sense. I can fix it by editing the xml of the package and removing the line pointing to my dev box - just want to know what I am doing wrong :)

Nevermind... Found lots of other people with similar issues in diff threads on this forum. This whole config file thing is pretty confusing. When you do the install from the deployment folder it will show all the values from the config file in the deployment directory - but it doesn't use any of them when you actually run the package.

I am really not sure why they put the dtsconfig file in the deployment folder at all - all that does is confuse things. I had been placing the production config file over the one in the deployment folder and assumed it would use it... not so.

Puzzled by concurrent update

I have a puzzle in my mind here. I will thank anyone who can solve my puzzle.

I am not familiar with SQL and its theories behind, so please bear with me if I am asking a stupid newbie question.

My puzzle is generally a problem of generating sequence numbers. The following SQL is only a stripped down version - it fetches the max number, add 1 to it and updates the table with the new number.

DECLARE @.max int

SELECT @.max = MAX(next_number) + 1 from sequence_numbers

UPDATE sequence_numbers SET next_number = @.max WHERE next_number = @.max

Now if user1 gets 100 and user2 also gets 100 and they both try to update the table, what would happen? I fear that the result would be 101.

One of my coworker thinks that adding 'WHERE next_number = @.max' can solve the conflict - user2 will fail. His reasoning is like this:

After user1 updates the table, next_number would be 101 and user2's update will fail because his WHERE criteria is still 100.

But I think user2 still sees the old data (100) and still succeeds and thus both users update the table with number 101.

Thanks.

If you want to use a sequence table, you have to lock this table during the update. I wouldn′t use separate statement though the thing that you described first will happen.

You should either do that within a transaction while locking the table or do it within one statement using locking mechanism and queying this with your update statement:

UPDATE sequence_numbers
SET next_number =
( SELECT MAX(next_number) + 1 from sequence_numbers (TABLOCK) )

HTH, Jens Suessmeyer.

http://www.sqlserver2005.de

|||

Hi Neo,

One of the cruical components of DBMS(Database Management System) is the concurrency control (CC) subsytem. It ensures that concurrent execution of transactions against DB doesn't violate the consistency of the database. While the transactions execute concurrently in real, the CC component ensures that the database state is AS IF THEY EXECUTED SEQUENTIALLY.
In two words, the server "locks" rows that some transaction modifies in order to prevent others from reading/writing those rows, until the first transaction commits.
If you need serious understanding about how all this works, open books online, and look for concurrency control. There's a lot of material there, from concurrency control basics up to some advanced issues.
About your co-workers explanation: Imagine that the updates are done not in form "where next_number=@.max" but "where row_id=something". By the model proposed by your friend, this update will succeed, because the first update hasn't modified the row_id. In fact it doesn't-the explanation is wrong.

To Jens: I guess that under default serializability level, that is, read commited, second update just cannot read the modified rows, because they will be locked with EXCLUSIVE lock, which blocks any reading transactions. If the first transaction reads data(read lock), the second one reads(read lock), then both will be denied an exclusive access to the rows needed to update(this is a deadlock).

In either case the execution will be serializable, and there is no need for a TABLOCK, there is no need for explicit locking. Please correct me if I am wrong.

|||

I looked thru SQL BOL on Concurrency Control. But reading those technical explanation does not lead me to solve my puzzle. Can anyone explain more to me or direct to some 'newbie' level articles on CC (if such articles exist)? Thanks.

|||

You need to use a locking hint to single thread access to the table. There is a good article here: http://blogs.msdn.com/sqlcat/archive/2006/04/10/572848.aspx from the Microsoft SQL Server Development Customer Advisory Team.

You have the basics, and the rest is covered in that article.

|||

I read the blog, but it does not answer my problem.

Option 1 is only for 'low volume' - how low? Does 'low 'mean that if it is used in higher volum there will be concurrent issues, such as two users get duplicate seq number?

Option 2 is not very pratical.

|||

The tablock was useless, you are right, I was thinking one step further, updating the sequence table (if that would be the requirement) to ensure that nobody else will query meanwhile the sequence table. I saw that this was not a requirement, changed the query and forgot about the tablock.

To the original poster: if you don′t want to reset (pushing the sequence number one step further) you can use the update with the inline select, that should do the trick.

No deadlock will occur if two are using the same update command, because the only locked table will be the updated one. After this is not released the Select statement won′t take place.


HTH, Jens Suessmeyer.

http://www.sqlserver2005.de

|||

The problem with the first action is that you have to single thread access to the table of sequence numbers in a manner that will be slower than using an identity column. You will not get duplicates, but every user will have to wait.

The biggest problem comes in that when you lock the row to get the max, it has to stay locked until you do the update. But, what if this transaction is part of a greater transaction. Then all processes have to wait until the whole transaction completes because the UPDATE operation causes an exclusive lock that keeps the next user from asking for the max (to avoid dups).

Option 2 beats that by using INSERT operations with identity values that do not block one another and do not require an EXCLUSIVE lock that will block other rows from inserting with the next new value. You can get gaps if transactions are rolled back, but it will work nicely.

The second option can be done with very little work just like the first, though it will look messier.

|||

Thanks for all your replies. They are useful to me.

But how about my original question. Will the two users both succeed in updating (note tha next_number = @.max criteria)?

|||

You can use the TSQL update extension like:

UPDATE sequence_numbers

SET @.max = next_number = next_number + 1

WHERE name = @.blah

The above increments next_number by one and assigns the final result to max in one statement. There is no concurrency issue and the statement itself is serialized due to the exclusive lock on the table or row.

You code doesn't serialize access to getting current maximum number so it will not work.