To Check or Not to Check (Uniqueness) noreply@blogger.com (Kent Milligan – IBM Technology Expert Labs)

[[{“value”:”In Hamlet’s famous ‘To Be
or Not To Be’ speech, he wrestles with thoughts about life
and death. Obviously, slow application performance is not a life-or-death
issue, but it can sometimes feel that way if you’re the developer whose program
is slowing down a critical business process.
Clients
frequently engage our IBM Expert Labs team to analyze applications to identify performance
bottlenecks and to identify solutions to improve performance. A couple months
ago, I was reviewing the SQL statements embedded within a client’s RPG program.
The analysis of the SQL identified a coding pattern where the program would run
a SELECT statement against a table and almost immediately turn around and run
an INSERT against the same table. The code looked something like the following:
SELECT 1 INTO :outhv FROM sometab
WHERE col1=:hv1 AND col2=:hv2 AND col3=:hv3 AND col4=:hv4;

If sqlCode = 100;
INSERT INTO sometab
VALUES(:hv1, :hv2, :hv3, :hv4, …)
Endif;

When I
asked the customer about the purpose of this coding pattern, they shared that
the columns referenced on the Where clause defined a unique key for the table.
Thus, the SELECT statement was being run to verify if the specified key value
already exists in the table. If the SELECT statement didn’t return a row, the
program would know that there was no possibility of a duplicate key error –
meaning that the INSERT statement would run successfully.

This
explanation led the developer to ask if it was more efficient to have Db2 just
check for the duplicate key value on the INSERT statement. With this approach,
the program would be running a single SQL statement opposed to the coding
pattern above that would result in two SQL statements being executed in the
cases where the new values were unique. In general, the fewer calls that you
make to Db2 for i (or any database), the faster that your application will
run.

I put
together a small performance test to verify if less is
more
when it comes to inserting new rows that may result in a duplicate key error.
Essentially, is it faster to check first to avoid the duplicate key exception
or not to check by running the Insert statement and relying on Db2 to detect
duplicate keys?

My
performance test used an SQL stored procedure with a loop to insert 1,000 rows
– the loop had logic that would cause every other INSERT statement to fail with
a duplicate key error. For the “not to check” version of my test, the following
condition handler was used to trap the duplicate key error and then allow the
stored procedure to continue to run its loop inserting rows. Even with the
condition handler in place, Db2 still writes a duplicate key error message into
the job log.
DECLARE CONTINUE HANDLER FOR SQLSTATE ‘23505’
SET errFlag = ‘Y’ ;

I ran
each version of the stored procedure multiple times to get a consistent timing.
In the end, the “not to check” version of the tests consistently ran 5-6%
faster than the “check” version of the procedure which avoided the duplicate
key error by first running a SELECT statement. The performance tests
essentially showed that the overhead of running the second SQL statement was
greater than the overhead of Db2 signaling an error back to the application.

These
test results reinforce the earlier assertion that performance is usually best
when your application program runs the fewest number of SQL statements
possible. With this coding pattern related to the insertion of unique keys, the
answer to the question posed at the start of this entry is: Not To Check!”}]] Read More