SHASH

NAME

shash - compute statistics about a hashed RMSfile

SYNOPSIS

shash lfile

DESCRIPTION

Shash displays on standard output statistics about the named RMSfile lfile. Lfile must be either the logical name or the pathname of a hashed RMSfile.

Shash reads each record in the RMSfile by key value, and counts the number of record accesses to read each record. The number of accesses includes one read to read the actual record, and one read for each collision. The following is sample output from shash:

211 records in file

49 records in use

23.2227% loading factor

1 minimum accesses per record

2 maximum accesses per record

1 median accesses per record

1.06122 average accesses per record

0 minimum access time

0 maximum access time

0 median access time

0 average access time

acc #nrec %rec # >= % >=

1 ( 46, 93%, 49, 100% ) *********************************

2 ( 3, 6%, 3, 6% ) **

Records in file is the maximum number of records that can be in the RMSfile. It is the number of records specified when the RMSfile was created or expanded. This number should normally be a prime number.

Records in use is the number of active records in the RMSfile. Deleted and never used records are not counted.

Loading factor is the ratio of records in use to records in file expressed as a percentage. The performance of a hashed file deteriorates as the loading factor rises. You can change the loading factor by expanding the maximum number of records in the RMSfile.

Minimum accesses per record is the smallest number of accesses to find a record. This number is normally one.

Maximum accesses per record is the largest number of accesses to find a record.

Median accesses per record is the 'midway' point between the minimum number of accesses and the maximum number of accesses.

Average accesses per record is the average number of accesses to find a record.

Minimum access time is the shortest access time to find a record. This has a resolution of 1 second.

Maximum access time is the longest access time to find a record. This has a resolution of 1 second.

Median access time is the 'midway' point between the shortest and the longest access time to find a record. This has a resolution of 1 second.

Average access time is the average access time to find a record. This has a resolution of 1 second.

The remaining output from shash is a histogram showing the number of records that have a given number of accesses to find them. The acc column shows the number of record accesses, the #nrec column shows the number of records that have that number of accesses, the %rec column shows the percentage of records to the number of active records, the # >= column shows the number of records that have this or a greater number of accesses, and the % >= column shows the number of records that have this or a greater number of accesses as a percentage of the number of active records.

NOTES

The access timing information is very approximate; the resolution is one second, and it does not take system load into account.

This program is available only with the C/Base Utilities software package.