Wednesday, March 21, 2012

purge process from large table

Hi,
We have specific request here: there is a large table, around 5,000,000 rows
(3GB). It has unique clustered index, created on 6 of it's 10 columns. We
have to delete around 15% of rows every day, but in a way that table keeps
being available all the time. Deletion criteria is date (where
column_date<=getdate()). There is nonclustered index created on column_date
table. Since there is availability criteria, simple: delete <table name>
where column_date<=getdate() is out of the question because of exclusive
table lock on this table.
Any ideas?
Thanks,
PedjaBy "available" do you mean readable via a SELECT?
If you are using SQL 2005, you can take advantage of the new Read Committed
Snapshot Isolation level, which allows a SELECT to read the most
recently-committed version of a set of data, even while that data is being
modified.
If you are using SQL 2000, your readers can use WITH (NOLOCK) as an option
to the SELECT statements, putting the transaction into read uncommited. Your
readers won't have to wait for writers, but you may get inaccurate data.
Is it possible that you can split this delete operation so that it occurs
several times a day? That way it can delete fewer rows, resulting in less
blocking time.
"Pedja" wrote:

> Hi,
> We have specific request here: there is a large table, around 5,000,000 ro
ws
> (3GB). It has unique clustered index, created on 6 of it's 10 columns. We
> have to delete around 15% of rows every day, but in a way that table keeps
> being available all the time. Deletion criteria is date (where
> column_date<=getdate()). There is nonclustered index created on column_dat
e
> table. Since there is availability criteria, simple: delete <table name>
> where column_date<=getdate() is out of the question because of exclusive
> table lock on this table.
> Any ideas?
> Thanks,
> Pedja|||Mark,
We use sql server 2000. By available, I mean both, read/write operations.
NOLOCK hint won't help, because once it is grabbed by purge process, table i
s
being locked until it is completed (that is why I posted this question
initially), so neither reads nor writes are allowed during this time. My ide
a
was to split deletion process to batches of 1000 rows (set rowcount 1000),
but I wanted to hear some other ideas too.
Thanks
"Mark Williams" wrote:
> By "available" do you mean readable via a SELECT?
> If you are using SQL 2005, you can take advantage of the new Read Committe
d
> Snapshot Isolation level, which allows a SELECT to read the most
> recently-committed version of a set of data, even while that data is being
> modified.
> If you are using SQL 2000, your readers can use WITH (NOLOCK) as an option
> to the SELECT statements, putting the transaction into read uncommited. Yo
ur
> readers won't have to wait for writers, but you may get inaccurate data.
> Is it possible that you can split this delete operation so that it occurs
> several times a day? That way it can delete fewer rows, resulting in less
> blocking time.
> --
> "Pedja" wrote:
>|||Pedja,
you could split up your data into several tables, one table per day.
You can access them via a UNION ALL view. Then the purge is very fast,
you just re-create the view, which is a snap, and drop the oldest
table. There are some divantages: some of queries against the view
will work slower, and you will not be able to enforse unique (and
sometimes other) constraints just as easily|||Hi
Divide your deletion into small batches
SET ROWCOUNT 1000
WHILE 1 = 1
BEGIN
--Here your DML Statement
IF @.@.ROWCOUNT = 0
BEGIN
BREAK
END
END
SET ROWCOUNT 0
"Pedja" <Pedja@.discussions.microsoft.com> wrote in message
news:38445382-60B6-4327-A3FE-C5AD8BF88E41@.microsoft.com...
> Hi,
> We have specific request here: there is a large table, around 5,000,000
> rows
> (3GB). It has unique clustered index, created on 6 of it's 10 columns. We
> have to delete around 15% of rows every day, but in a way that table keeps
> being available all the time. Deletion criteria is date (where
> column_date<=getdate()). There is nonclustered index created on
> column_date
> table. Since there is availability criteria, simple: delete <table name>
> where column_date<=getdate() is out of the question because of exclusive
> table lock on this table.
> Any ideas?
> Thanks,
> Pedjasql

No comments:

Post a Comment