Wednesday, October 24, 2012

Using Autoincrementing Columns from a PowerBuilder DataWindow

Note:  This blog entry is basically the written version of a video I did for the SAP Database and Technology Academy.  You can either watch the video or read the blog entry, or both if you're so inclined.






Why use Autoincrementing Columns?

Before we get into how, first we need to look at why we want to use autoincrementing columns.  So let’s step back and consider how we want to create primary keys for the entities in our databases.  There are basically two approaches, using “natural” (intelligent) keys or “surrogate” (non-intelligent) key.  Natural (or intelligent) keys mean we pick some attribute of the entity itself that we believe provides a unique value we can use to reference that entity. We then use that key in foreign key references into other tables.

For example, let’s say we have a simple training database where we want to track employees, the classes offered, and if employees have taken the classes.  The entity-relationship diagram is shown below (see Figure 1).  If we’re using natural keys, we might use the employee’s
image001.png
social security number to identify the employee and the course number for the course to identify a course.  The courses an employee has taken are handled by the many-to-many associative table “training records” where the keys from the two entities (SSN from Employee and Course Number from Courses) are copied into the association records.  The Courses entity also has a “taught by” attribute that in addition uses the employee SSN to reference back to the Employee table to indicate who is teaching the course.

So far so good.  But then we run into some problems:

  1. Data entry error:  A year ago somebody entered the SSN for one of the employee incorrectly, and employee that not only takes courses but teaches some.  We just noticed the problem.  We need to change the SSN for the employee in the Employee table, but now there a numerous records in both the Courses and Training Records tables that have to be updated as well, and the foreign key constraints are making the update difficult.
  2. Changing requirements: The company decides that contractors need to take training classes as well, but they identify contractors through a contractor code rather than a SSN.  Suddenly the SSN field is no longer actually the SSN.
  3. Regulatory mandates:  The company determines that in order to meet privacy regulations all personally identifiable information (PII) must be removed from all but a handful of highly controlled HR systems.  The training database didn't make the list, and you now have to remove all records of SSNs from the system.

Sound far-fetched? It’s actually loosely based on a system at one of the companies I was at in my contracting days.

The way to prevent these problems before they start is to use surrogate key, arbitrary but unique values ( e.g., a sequential number, random numbers guaranteed to be unique, a GUID ).  Those are added to the entities as the primary key and propagated to other tables to represent the relationships (see Figure 2).
image003.png
You don’t show the values to the end users and apply no significance to them other than identifying the entities and the relationships between them.  (The minute you assign some other significance to them, you've turned them back into natural keys).  You can apply alternate keys or unique indexes on the natural keys that you still want to ensure are unique within an entity (e.g., SSN for the employee records). But you’ll find that updates to those data values, changes to what they signify or their removal are now much easier.

Different database vendors have different means of generating values that we can use for surrogate keys.  Oracle has objects called sequences that are separate from tables and generate sequential values.  What SAP Sybase Adaptive Server Enterprise (ASE), SAP Sybase SQL Anywhere (SA) and Microsoft SQL Server (MSS) have are autoincrementing columns.  (Microsoft SQL Server also provides functions to generate GUIDs as unique identifiers, but that’s beyond the scope of this article).

Creating them in the database

Creating an autoincrementing column in the database is fairly straightforward.  For ASE and SA, you simply define the column as a numeric data type (e.g., integer) and then for the default value specify ‘autoincrement’ (see Figure 3).

image005.png

For MSS, you handle it all through the data type of the column.  Each of the numeric data types had a subtype with a ‘identity’ suffix. So you could define a column as ‘numeric(5) identity’ (see Figure 4).

image007.pngAt this point, the database will start assigning incrementing numeric values to those columns each time a row in inserted into the table.  You don’t assign a value when you do the insert, you determine what value was assigned afterward and update your client with that.

Referencing them from the DataWindow


Once we have the column in the table it’s fairly easy to reference it from the DataWindow as well. The main thing we need to do
image009.png
is look at the Update Specifications for the DataWindow and ensure that the autoincrementing key is indicated as the “Identity Column” attribute in the update specifications (see Figure 5).  That tells PowerBuilder that it needs to get the value of that column for inserted rows and update the DataWindow buffer with the values.

How the DataWindow gets the value

OK, this is where things get a bit trickier.  So far it looks like magic.  You create a column in the database that autoincrements, you tell the DataWindow that such a column exists, and you’re done, right?
Not quite.  It’s important to know how PowerBuilder got that value; because it impacts the way you configure and deploy your application.  The secret behind this is a file in the PowerBuilder/Shared directory called PBODB[XXX].INI, where [XXX] is the major version number of PowerBuilder you are working with.  For example, if you are working with PowerBuilder 12.5, then the file is called PBODB125.INI.  Note that this file contains configuration information used to determine the value that was assigned to the new row even if you are not using the ODBC driver (e.g., you are using the OLE DB or SNC driver talk to MSS).  That bring us to the first important note:
If you are referencing autoincrement columns from DataWindows, you must deploy the PBODB[XXX].INI file with your application regardless of which driver you are using to talk to the database.
There is a section at the end of the file starting with the title “DBMS Driver / DBMS Settings” that explains what all the various entries in the file are and how they are used. We’ll walk through how it works for the autoincrementing columns in particular.
The first thing PowerBuilder does when it sees that you've reference an autoincrementing column that it needs to get the value for is looks to see which database you’re working with and then looks up the initial section of the file that references that database.  It also looks for the PBSyntax entry within that section.  For example, for SA, the entries look like this:

[Sybase SQL Anywhere]
PBSyntax='WATCOM50_SYNTAX'

For MSS:

[Microsoft SQL Server]
PBSyntax='MS_SQLSERVER_SYNTAX'

And for ASE I believe it uses this one (quite some time ago ASE was called SQL Server, but was renamed to distinguish it from its Microsoft cousin):

[Sybase SQL Server]
PBSyntax='SYBASE_SYNTAX'

What it then does it looks in the corresponding syntax section for an entry called GetIdentity. Once again, for SA you’ll find the following (there are a lot of other entries there, I've left them out for clarity):

[WATCOM50_SYNTAX]
GetIdentity='Select @@identity'

For MSS it has this. Note the alternate syntax and the comment explaining it.  We’ll get into that in a moment:

[MS_SQLSERVER_SYNTAX]
GetIdentity='Select @@identity'
; GetIdentity='Select max(IDENTCOL) from &TableName'
; Alternative technique if table has insert trigger associated with it.

ASE has much the same:

[SYBASE_SYNTAX]
GetIdentity='Select @@identity'
; GetIdentity='Select max(IDENTCOL) from &TableName'
; Alternative technique if table has insert trigger associated with it.

At this point we know the actual SQL statement that PowerBuilder is going to use to get the value of the identity column.  It queries for the value of a database session global variable called @@identity that holds the last value of an identity column assigned to any table for the current session.

And that’s where the rub comes in.  The main issue (as the comment indicates) is that if you have a trigger on the table, and the trigger inserts into another table that also has an autoincrementing column in it (perhaps a table that audits changes to the main table), the value PowerBuilder get’s back when it queries for @@identity may not be the value you want.

For SA this actually isn’t an issue.  SA treats @@identity as a stack.  So if you have a trigger that contains an autoincrementing column in it, SA will push the original value of @@identity down and put the value resulting from the trigger on the variable.  It will then have that value throughout the life of the trigger.  When the trigger ends, SA pops the new value off and restores the original value to the variable, the one that your insert actually created.

For MSS and ASE, it’s not quite such a happy situation.  Those databases will overwrite the value of @@identity that you generated with the value generated by the trigger.  That’s why there is the alternate method listed in the file.

GetIdentity='Select max(IDENTCOL) from &TableName'

It isn’t an ideal solution though.  Depending on how you have the isolation level set for your connection, doing a select for the max value of a column on a table might result in your blocking other sessions from updating the table or conversely may allow other sessions to update it resulting in your session still getting an incorrect value in response.

MSS has some other options that you might consider as well.  For example, this syntax should return you the last inserted value of the autoincrementing column without necessarily blocking access to the table.

GetIdentity='Select IDENT_CURRENT ( &TableName )’

You only really need to worry about this if you have a table you’re inserting records into with an autoincrementing column that has a trigger that does inserts into another table with an autoincrementing column in it.
Once you have even one table with an autoincrementing column that has a trigger that inserts into another table with an autoincrementing column way, you’ll need to change the PBODB[XXX].INI file to change the method used to retrieve the value for all tables with autoincrementing columns.
Otherwise though, just deploy the PBODB[XXX].INI file unaltered with your application and you’ll be fine.

Other things to watch out for


There are a couple of other things worth mentioning that you need to be aware of when using autoincrementing columns.  Both have to do with the way that the values are generated by the database.  When any session makes an initial insert into a table with an autoincrementing column in it, the database will not only generate the first value, but also a cache of a number of other values to be used for subsequent inserts.  The primary reason is to ensure that the generation of autoincrementing columns doesn't become a bottleneck when batch inserts are being done.  The number of values that get cached is something that can be configured for the database.

What happens though when the database is shutdown, either intentionally (for example to do full backups of the database files) or unintentionally (a crash of the database for some reason).  When that happens all those cached values are lost, and the next insert into that table will generate the next value after the last cached value, not the last used value, as well as a new set of cached values.  This is referred to as a ‘burn factor’ for the autoincrementing column, because of the values that are being burned (i.e., not used) for the column.
What this means is that, depending on how often the system is restarted, you might find huge gaps in the values being used.
This really shouldn't be an issue, because you shouldn't be using an autoincrementing column to generate something intelligent like a value that you need to have sequential without gaps (e.g., a PO number).  You should be aware of this though, and not surprised when you see it happen.
It also means that when you initially size the column that you will be using to store the values you will need to account for the burn factor.  Let’s say that you estimate that you estimate that your employee table will never, ever, have to contain more than 9,999 entries, because you’re a company with roughly 5,000 employees and you have low turnover.  If you size the autoincrementing column as a number(4), you may find out a few years down the line that you’ve run out of values that will fit in the column.
If you have a very high burn factor (because the database is caching a large number of values and you’re restarting the database on a rather frequent basis) you could end up taking more than 10,000 values to generate surrogate keys for a lot less than 5,000 records.
Just make sure that you allow a bit extra space in the data type for the autoincrementing column.

Summary


What you should realize now is how valuable autoincrementing columns are and how easy it is to configure the PowerBuilder DataWindow to take advantage of them.  There are a few things to watch out for as I’ve pointed out, but otherwise it’s a great feature that you should consider using.

No comments: