Friday, March 30, 2012

Puzzled by concurrent update

I have a puzzle in my mind here. I will thank anyone who can solve my puzzle.

I am not familiar with SQL and its theories behind, so please bear with me if I am asking a stupid newbie question.

My puzzle is generally a problem of generating sequence numbers. The following SQL is only a stripped down version - it fetches the max number, add 1 to it and updates the table with the new number.

DECLARE @.max int

SELECT @.max = MAX(next_number) + 1 from sequence_numbers

UPDATE sequence_numbers SET next_number = @.max WHERE next_number = @.max

Now if user1 gets 100 and user2 also gets 100 and they both try to update the table, what would happen? I fear that the result would be 101.

One of my coworker thinks that adding 'WHERE next_number = @.max' can solve the conflict - user2 will fail. His reasoning is like this:

After user1 updates the table, next_number would be 101 and user2's update will fail because his WHERE criteria is still 100.

But I think user2 still sees the old data (100) and still succeeds and thus both users update the table with number 101.

Thanks.

If you want to use a sequence table, you have to lock this table during the update. I wouldn′t use separate statement though the thing that you described first will happen.

You should either do that within a transaction while locking the table or do it within one statement using locking mechanism and queying this with your update statement:

UPDATE sequence_numbers
SET next_number =
( SELECT MAX(next_number) + 1 from sequence_numbers (TABLOCK) )

HTH, Jens Suessmeyer.

http://www.sqlserver2005.de

|||

Hi Neo,

One of the cruical components of DBMS(Database Management System) is the concurrency control (CC) subsytem. It ensures that concurrent execution of transactions against DB doesn't violate the consistency of the database. While the transactions execute concurrently in real, the CC component ensures that the database state is AS IF THEY EXECUTED SEQUENTIALLY.
In two words, the server "locks" rows that some transaction modifies in order to prevent others from reading/writing those rows, until the first transaction commits.
If you need serious understanding about how all this works, open books online, and look for concurrency control. There's a lot of material there, from concurrency control basics up to some advanced issues.
About your co-workers explanation: Imagine that the updates are done not in form "where next_number=@.max" but "where row_id=something". By the model proposed by your friend, this update will succeed, because the first update hasn't modified the row_id. In fact it doesn't-the explanation is wrong.

To Jens: I guess that under default serializability level, that is, read commited, second update just cannot read the modified rows, because they will be locked with EXCLUSIVE lock, which blocks any reading transactions. If the first transaction reads data(read lock), the second one reads(read lock), then both will be denied an exclusive access to the rows needed to update(this is a deadlock).

In either case the execution will be serializable, and there is no need for a TABLOCK, there is no need for explicit locking. Please correct me if I am wrong.

|||

I looked thru SQL BOL on Concurrency Control. But reading those technical explanation does not lead me to solve my puzzle. Can anyone explain more to me or direct to some 'newbie' level articles on CC (if such articles exist)? Thanks.

|||

You need to use a locking hint to single thread access to the table. There is a good article here: http://blogs.msdn.com/sqlcat/archive/2006/04/10/572848.aspx from the Microsoft SQL Server Development Customer Advisory Team.

You have the basics, and the rest is covered in that article.

|||

I read the blog, but it does not answer my problem.

Option 1 is only for 'low volume' - how low? Does 'low 'mean that if it is used in higher volum there will be concurrent issues, such as two users get duplicate seq number?

Option 2 is not very pratical.

|||

The tablock was useless, you are right, I was thinking one step further, updating the sequence table (if that would be the requirement) to ensure that nobody else will query meanwhile the sequence table. I saw that this was not a requirement, changed the query and forgot about the tablock.

To the original poster: if you don′t want to reset (pushing the sequence number one step further) you can use the update with the inline select, that should do the trick.

No deadlock will occur if two are using the same update command, because the only locked table will be the updated one. After this is not released the Select statement won′t take place.


HTH, Jens Suessmeyer.

http://www.sqlserver2005.de

|||

The problem with the first action is that you have to single thread access to the table of sequence numbers in a manner that will be slower than using an identity column. You will not get duplicates, but every user will have to wait.

The biggest problem comes in that when you lock the row to get the max, it has to stay locked until you do the update. But, what if this transaction is part of a greater transaction. Then all processes have to wait until the whole transaction completes because the UPDATE operation causes an exclusive lock that keeps the next user from asking for the max (to avoid dups).

Option 2 beats that by using INSERT operations with identity values that do not block one another and do not require an EXCLUSIVE lock that will block other rows from inserting with the next new value. You can get gaps if transactions are rolled back, but it will work nicely.

The second option can be done with very little work just like the first, though it will look messier.

|||

Thanks for all your replies. They are useful to me.

But how about my original question. Will the two users both succeed in updating (note tha next_number = @.max criteria)?

|||

You can use the TSQL update extension like:

UPDATE sequence_numbers

SET @.max = next_number = next_number + 1

WHERE name = @.blah

The above increments next_number by one and assigns the final result to max in one statement. There is no concurrency issue and the statement itself is serialized due to the exclusive lock on the table or row.

You code doesn't serialize access to getting current maximum number so it will not work.

No comments:

Post a Comment