NPULL

NAME

npull - a fast data record extraction utility. (uses indexes)

SYNOPSIS

npull [-abdfnpr] [-g#] [-s|R|i indexno|I indexlist] [-tSEP] [-Bstring]

[-Estring] [-N#] datafile [printfields] [selectfields]

or npull -h [-tSEP] [-Bstring] [-Estring] datafile

or npull -c [-dfn(p[r])] [-g#] [-tSEP] [-Bstring] [-Estring] [-N#]

controlfile [procname]

or npull ?

DESCRIPTION

Npull (C-1) reads the specified datafile and produces on the standard output one text line for each selected record. Each output line is made up of fields separated by carets (^), followed by the record number in the datafile. A field in a data record is output only if it is listed as one of the printfield parameters in the command line. An output line is generated only if the data values in the record fall within the bounds specified by the selectfield parameters.

A printfield parameter specifies a field name within the datafile. A value is output onto the standard output file for each printfield listed, in the order the printfields are listed.

A selectfield parameter has one of the following forms:

"fieldname>value" greater than

"fieldname>=value" greater than or equal to

"fieldname=value" equal

"fieldname<>value" not equal to

"fieldname<=value" less than or equal to

"fieldname<value" less than

Only records that have values within the bounds specified will generate output lines. If there are several selectfields, the record is output only if it meets all the selectfield conditions. Selectfields do not cause an output field to be generated. If the field is to be listed, it must also be listed as a printfield. Each selectfield should be enclosed in quotes to prevent the shell from interpreting the > and < symbols. A condition with an '@' appended onto it causes pull to ignore the condition if there is no value present.

Whenever possible, npull (C-1) will try to use an index to read the data file, based upon the selectfields given. First it will try to use all "=" (equal) conditions on any of the keys in the file. If it finds a suitable key, it can then use a dfindk (C-3) search if it is the primary key, or a dfindm (C-3) search if it is a secondary key. If it can not find a suitable key, it then tries to use part of a key or conditions that use ">" (greater than) or ">=" (greater than or equal to). If there is a suitable key for these conditions, it will then use a dfindi (C-3) search. If there are no suitable keys after these checks, then it will do a sequential read of the file, using dfind (C-3).

OPTIONS

If the -a option is specified, then all of the fields in the data file are pulled, and any datafields on the command line are ignored.

When -B string is used, pull will append 'string' onto the beginning of each output record. With -Estring, 'string' is appended onto the end of each output record.

If the -b option is used, pull reads the file backwards starting at the end of the file.

If the -c option is used, the path of a controlfile must be specified. This controlfile allows you to pull multiple files simultaneously. This type of pull is used for grace reports in which you want to select records in one file based upon the contents of records in other files. An example would be sorting invoice detail records by the product in the detail and the date in the matching header records. The output from this type of pull could then be piped into wtr (C-1) using the -s option of wtr. The procname of the procedure you wish to use from the controlfile may optionally be specified.

If the -d option is used, DATE and TIME fields are output as decimal numbers with a value equivalent to the internal representation of the field. Normally, DATE and TIME fields are output in their standard format, i.e., dates are output as MM/DD/YY and times are output as HH:MM:SS. If the -d option is used, date fields are not output in MM/DD/YY format, but instead are output in long integer format giving the number of days from January 1, 1800. This allows programs such as csort (C-1) to sort date fields in date order. Put (C-1) converts such fields back to their proper internal representation automatically.

If the -f option is specified, then the first line output contains the list of printfield names that are being output. Each printfield name is separated from the next by a caret in the same manner the printfield values are separated.

If the -g# option is used, npull (C-1) will print debugging information to standard error. This output can be useful when npull (C-1) does not pull the records you are expecting. This option is most useful when using the -c option. There are 4 levels of debugging output. 1 gives the least amount of information, 4 the most.

If the -i# option is used, npull (C-1) will read the file using the specified index number.

If the -Iindexlist option is used, pull will try to find an index in the file which starts with the field names in indexlist. The field names should be separated by commas. (Ex. -Iorder,sequence)

If the -n option is used, npull (C-1) will print to stderr the number of records pulled.

If the -Nn option is used, npull (C-1) will output up to n records then stop.

If the -p option is used, npull (C-1) will format its output in dprint (C-1) style. That is, it will print "fieldname=fieldvalue", one per line, and print the record number on the first line.

If the -r option is used, the record numbers are stripped from the output.

If the -s option is used, npull (C-1) will always use a sequential read of the file, rather than using one of the indexes. This is useful when you are attempting to repair a damaged file and the keys are corrupted.

If the -R option is used, npull (C-1) will read standard input for a list of record numbers to use when reading the file. The record numbers must be on the end of each line and must have a delimiting caret (^) in front of it. This is the same format that npull (C-1) uses for its output.

If the -t option is used, npull (C-1) will use the string following the 't' as the field separator, rather than a caret.

If the -h option is used, npull (C-1) will only output the complete field list for the file. This will give you the same output as if you had specified the -f and -a options, but with no data being output.

The ? option provides additional information on each option.

The following is a example of npull (C-1) extracting all General Ledger Chart of Accounts that greater than or equal to "100" and are less than "200":

npull -af -i0 cbooks~glacct "glacct>=100" "glacct<200"

CONTROL FILES

Syntax for control files:

%BEGIN [procname] (required)

%FILE filename [selectfields...] (may be repeated)

%SEQUENTIAL filename (optional command)

%BACKWARDS filename (optional command)

%INDEXNO filename indexno (optional command)

%INDEXLIST filename indexlist (optional command)

%PRINT filename[.fieldname] [...] (may be repeated)

%PRINTALL filename [...] (may be repeated)

%END [comment] (required)

Lines that do not begin with one of the commands listed above are ignored. This is so a npull (C-1) control file could be placed inside a grace report file, inside a section of comments.

You may put more than one control procedure in a control file. If a procname is given as a parameter on the command line, then that procedure will be used. If no procname is given, the first procedure encountered in the file will be used. An optional comment may be placed after the %END statement. The %FILE statement declares which files are to be opened and read. Files are scanned in the order specified. Select fields are the same as those specified earlier, with the addition that fields in other files may be used in place of a value. Instead of a value, another field may be specified thusly:

~filename.fieldname

%PRINT statements declare which fields are to be sent to standard output. If the fieldname is not given, and only the filename is there, then the current record number of that file will be output. %PRINTALL statements declare files in which all fields are to be output, including the record number. This is the same as using the -a option on a normal pull.

%SEQUENTIAL forces a sequential read of the specified file. %INDEXNO forces a read of the file based on the given index number (primary key is index #0). %INDEXLIST finds an index containing at least the specified fields. Fieldnames should be separated by commas and nothing else. %BACKWARDS forces a backwards read of the specified file. %BACKWARDS may be used with %INDEXNO, %INDEXLIST, and %SEQUENTIAL. %SEQUENTIAL can not be used with %INDEXNO or %INDEXLIST. Also, %INDEXNO and %INDEXLIST can not be used together.

CONTROL FILE EXAMPLE

Suppose you wish to produce a report of all the products you sold, sorted by the invoice date, the product category, and the product number. You could use the following control file:

%BEGIN all_products_sold_by_category

%FILE invoiced

%SEQUENTIAL invoiced

%FILE product product=~invoiced.product

%INDEXNO product 0

%FILE invoicem invoice=~invoiced.invoice

%INDEXLIST invoicem invoice

%PRINT invoicem.date product.category invoiced.product invoiced

%END

This control file would do a sequential read on the invoiced file, use index number 0 (the primary key) on the product file, and find an index that started with the invoice field on the invoicem file and use it. If any of those options were not specified, pull will determine an appropriate index to use on each of the files.

Note: The invoiced is the last field on the %PRINT line so that the invoiced record numbers are placed on the end of the line for wtr to use. The shell command to use this would be (assuming that the above controlfile were placed inside of the grace program "report" as a comment, and the grace program was previously compiled into the file "report.rw"):

npull -c report | csort -0 -1 -2 | wtr -s invoiced report.rw >c:\LPT1