[[{“value”:” This entry’s title is derived from the Catch Me If You Can movie – a
good one if you haven’t seen it yet. This movie is based on
the true story of a con man who takes on the identity of a pilot, doctor and
lawyer as he steals millions of dollars by cashing fraudulent checks. You may
be wondering how a story like this relates to Db2 for i since none of the con’s
fake identities were IT professions like programmer or database engineer. However, Db2 for i database tables can take on an identity in the
form of an identity column.
An
identity column allows you to have Db2 automatically generate sequential
numeric values for the column as new rows are inserted into the table. Thus,
identity columns are ideal to use as a surrogate primary key or natural key columns like order number. Here’s a simple
example an employee table using an identity column to generate the employee
number as new employees are inserted into the table.
CREATE TABLE employee (
empno INTEGER GENERATED ALWAYS AS IDENTITY
(START WITH 10 INCREMENT BY 10),
name CHAR(30), deptno CHAR(4))
The
prior example didn’t include the CACHE setting for the identity column which
defaults to 20 when the CACHE option is not specified. Here’s the updated
version of our simple example to include the default setting for the CACHE
option.
CREATE TABLE employee (
empno INTEGER GENERATED ALWAYS AS IDENTITY
(START WITH 10 INCREMENT BY 10 CACHE 20),
name CHAR(30), deptno CHAR(4))
The
default cache setting of 20 results in Db2 generating the next 20 values for
that column and storing those generated values in memory. As new rows are
written to this employee table, the generated values for the identity column
are taken from the cached values in memory. This means that after 20 rows are
inserted into the table, Db2 will have to generate and cache another 20 values
in memory.
If the
table with the identity column has a low velocity rate of inserts, then the
default cache of 20 generated values will perform pretty well. However, what if the table has a high insert
velocity? That means that in the case
where the table is the target of a high rate of inserts, Db2 will have to pause
after 20 rows are inserted to generate values for the next 20 rows. If you have
a batch process that inserting hundreds of thousands of rows like one of our
recent IBM Technology Expert Labs clients, pausing after every 20 rows is not
very efficient. The client that our IBM
Technology Expert Labs team assisted was able to speed up the writes of all
those rows to the table by increasing the identity column cache size to 1024.
A larger
cache size can be specified using the CACHE clause when a table is initially
being created or after the fact with an ALTER TABLE statement like the
following:
ALTER TABLE employee
ALTER COLUMN empno
SET GENERATED ALWAYS AS IDENTITY
(START WITH 10 INCREMENT BY 10 CACHE 100)
The
obvious question to ask at this point is what’s the downside of using a larger
cache size. One tradeoff is memory. However, most IBM i servers have plenty of
memory to use these days … so you shouldn’t worry too much about the memory
needed to store 20 generated numeric values versus 100 numeric values. Another
consideration is the potential for cached generated values being lost with a
system crash. System failures are a
pretty rare event in the IBM i world, but if a crash did occur all the cached
generated values for an identity column will be lost.
Let’s
say that this simple employee table was created with an identity column cache
size of 100 and that a system failure occurs after first 50 employees have been
added to the table. After the system
IPL, the next employee number value will be 1010 and not 510. This gap in occurs because the unused
generated values (i.e., 510-1000) cached in memory were lost during the system
crash.
This gap
potential may seem concerning, but don’t forget your identity columns will
often have gaps in their values without a system crash. For example, if a row
is deleted, then that generated identity value is not reused. If an Insert
statement running under commitment control gets rolled back, the generated
identity value for the rolled back Insert is also not reused.
Caching
in the IT world can often provide benefits, so I hope that you’ve learned how caching can be beneficial with Db2 for i identity columns. If you’re interested in learning about other
SQL performance tuning techniques like this one, IBM is hosting a Db2 for i SQL
Performance enablement workshop December 10-12 at its facility in Montpellier,
France that I will be helping lead. Contact me if you’re interested in additional details.”}]] Read More