4.6.1.4 Duplicate Key Handling

More than one record with the same key value may be stored. Therefore, there must be a way to search the file to find the desired record. This is called Duplicate Key Handling. The choices for the various methods of accessing files are:

File Access Method Duplicate Key Handling

indexed none

hashed linear | quadratic

sequential linear

Only hashed files allow a choice in selecting Duplicate Key Handling. The following paragraphs outline the difference between the two.

In hashed RMSfiles, the hash function calculates a record number based on the primary key value and the maximum number of records in the RMSfile. The hash function transforms the (potentially very large) range of primary key values into a number between 1 and the maximum number of records in the RMSfile. Since the calculation depends on the maximum number of records in the RMSfile, you must specify the maximum number of records that the RMSfile can contain when you create it.

When a record is added to a hashed file, RMS attempts to place the record at the record number calculated by the hash function. If a record already exists at that location, then a collision has occurred and the record must be placed elsewhere.

RMS provides two methods for handling collisions. In the first method, linear, RMS inspects the next adjacent record slot to see if a record is stored there. If there is no record there, the new record is placed in that location. If the adjacent slot also contains a record, the slot immediately after that is inspected. This process continues until either an empty slot is found, or half the RMSfile has been searched. In the latter case, a file full error message is returned. If the end of the RMSfile is reached during the search, the search is started again from the beginning of the file.

As can be seen, it is possible to create extremely long sets of adjacent records with the linear method of Duplicate Key Handling. This can slow record access down significantly, since it might be necessary to read each record in the chain to find the desired one. So an alternate method of Duplicate Key Handling is provided, the quadratic method.

The quadratic duplicate key handling changes the method for finding the next record slot. As a point of reference let us call the slot where the record should be placed the home slot. If the home slot is occupied, then the first record slot past the home slot is checked, then the fourth past the home slot, then the ninth past the home slot, and so on. The progression goes:

1, 4, 9, 16, 25, 36, 49, ...

This progression takes the numbers:

1, 2, 3, 4, 5, 6, 7, ...

and squares the number to find the next slot to inspect. This process also halts with the file full error message once half of the records in the file have been inspected. This second method tends to spread the collisions in a hashed file throughout the file, rather than clustering them in one place.

The method of duplicate key handling to choose depends on whether a file will have many duplicate keys or not. If the number of duplicate keys will be few, then the linear method is probably sufficient. If, however, many records may have duplicate key values, then the quadratic method performs better. Once again note that the performance of hashed files is very dependent on the distribution of the primary keys and the size of the file.

The default value for the Duplicate Key Handling field for both hashed and sequential files is linear, whereas the default value for indexed files is none since no Duplicate Key Handling is necessary. This is because each record's location is in an index. If the field is blank when the RETURN key is pressed, the default value is entered. If the field is non-blank when the field is exited, the field value must be an abbreviation of a valid Duplicate Key Handling value. The full Duplicate Key Handling value is displayed after you exit the field.