npull - a fast data record extraction utility. (uses indexes)


npull [-abdfnpr] [-g#] [-s|R|i indexno|I indexlist] [-tSEP] [-Bstring]
[-Estring] [-N#] datafile [printfields] [selectfields]
or npull -h [-tSEP] [-Bstring] [-Estring] datafile
or npull -c [-dfn(p[r])] [-g#] [-tSEP] [-Bstring] [-Estring] [-N#]
controlfile [procname]
or npull ?


npull(C-1) reads the specified datafile and produces on the standard
output one text line for each selected record. Each output line is made
up of fields separated by carets (^), followed by the record number in
the datafile. A field in a data record is output only if it is listed
as one of the printfield parameters in the command line. An output line
is generated only if the data values in the record fall within the bounds
specified by the selectfield parameters.

A printfield parameter specifies a field name within the datafile. A
value is output onto the standard output file for each printfield listed,
in the order the printfields are listed.

A selectfield parameter has one of the following forms:

"fieldname>value" greater than
"fieldname>=value" greater than or equal to
"fieldname=value" equal
"fieldname<>value" not equal to
"fieldname<=value" less than or equal to
"fieldname<=value" less than

Only records that have values within the bounds specified will generate
output lines. If there are several selectfields, the record is output
only if it meets all the selectfield conditions. Selectfields do not
cause an output field to be generated. If the field is to be listed,
it must also be listed as a printfield. Each selectfield should be
enclosed in quotes to prevent the shell from interpreting the > and <
symbols. A condition with an '@' appended onto it causes pull to ignore
the condition if there is no value present.

Whenever possible, npull(C-1) will try to use an index to read the
data file, based upon the selectfields given. First it will try to use
all "=" (equal) conditions on any of the keys in the file. If it finds
a suitable key, it can then use a dfindk(C-3) search if it is the primary
key, or a dfindm(C-3) search if it is a secondary key. If it can not
find a suitable key, it then tries to use part of a key or conditions
that use ">" (greater than) or ">=" (greater than or equal to). If there
is a suitable key for these conditions, it will then use a dfindi(C-3)
search. If there are no suitable keys after these checks, then it will
do a sequential read of the file, using dfind(C-3).


If the -a option is specified, then all of the fields in the data file
are pulled, and any datafields on the command line are ignored.

When -Bstring is used, pull will append 'string' onto the beginning of
each output record. With -Estring, 'string' is appended onto the end
of each output record.

If the -b option is used, pull reads the file backwards starting at the
end of the file.

If the -c option is used, the path of a controlfile must be specified.
This controlfile allows you to pull multiple files simultaneously.
This type of pull is used for grace reports in which you want to select
records in one file based upon the contents of records in other files.
An example would be sorting invoice detail records by the product in the
detail and the date in the matching header records. The output from this
type of pull could then be piped into wtr(C-1) using the -s option of
wtr. The procname of the procedure you wish to use from the controlfile
may optionally be specified.

If the -d option is used, DATE and TIME fields are output as decimal
numbers with a value equivalent to the internal representation of the
field. Normally, DATE and TIME fields are output in their standard
format, i.e., dates are output as MM/DD/YY and times are output as
HH:MM:SS. If the -d option is used, date fields are not output in
MM/DD/YY format, but instead are output in long integer format giving
the number of days from January 1, 1800. This allows programs such as
csort(C-1) to sort date fields in date order. put(C-1) converts such
fields back to their proper internal representation automatically.

If the -f option is specified, then the first line output contains the
list of printfield names that are being output. Each printfield name
is separated from the next by a caret in the same manner the printfield
values are separated.

If the -g# option is used, npull(C-1) will print debugging information
to standard error. This output can be useful when npull(C-1) does not
pull the records you are expecting. This option is most useful when
using the -c option. There are 4 levels of debugging output. 1 gives
the least amount of information, 4 the most.

If the -i# option is used, npull(C-1) will read the file using the
specified index number.

If the -Iindexlist option is used, pull will try to find an index in the
file which starts with the field names in indexlist. The field names
should be separated by commas. (Ex. -Iorder,sequence)

If the -n option is used, npull(C-1) will print to stderr the number of
records pulled.

If the -Nn option is used, npull(C-1) will output up to n records then

If the -p option is used, npull(C-1) will format its output in
dprint(C-1) style. That is, it will print "fieldname=fieldvalue", one
per line, and print the record number on the first line.

If the -r option is used, the record numbers are stripped from the output.

If the -s option is used, npull(C-1) will always use a sequential read
of the file, rather than using one of the indexes. This is useful when
you are attempting to repair a damaged file and the keys are corrupted.

If the -R option is used, npull(C-1) will read standard input for a list
of record numbers to use when reading the file. The record numbers must
be on the end of each line and must have a delimiting caret (^) in front
of it. This is the same format that npull(C-1) uses for its output.

If the -t option is used, npull(C-1) will use the string following the
't' as the field separator, rather than a caret.

If the -h option is used, npull(C-1) will only output the complete field
list for the file. This will give you the same output as if you had
specified the -f and -a options, but with no data being output.

The ? option provides additional information on each option.

The following is a example of npull(C-1) extracting all General Ledger
Chart of Accounts that greater than or equal to "100" and are less than

npull -af -i0 cbooks~glacct "glacct>=100" "glacct<200"


Syntax for control files:
%BEGIN [procname] (required)
%FILE filename [selectfields...] (may be repeated)
%SEQUENTIAL filename (optional command)
%BACKWARDS filename (optional command)
%INDEXNO filename indexno (optional command)
%INDEXLIST filename indexlist (optional command)
%PRINT filename[.fieldname] [...] (may be repeated)
%PRINTALL filename [...] (may be repeated)
%END [comment] (required)

Lines that do not begin with one of the commands listed above are ignored.
This is so a npull(C-1) control file could be placed inside a grace
report file, inside a section of comments.

You may put more than one control procedure in a control file. If a
procname is given as a parameter on the command line, then that procedure
will be used. If no procname is given, the first procedure encountered
in the file will be used. An optional comment may be placed after the
%END statement. The %FILE statement declares which files are to be
opened and read. Files are scanned in the order specified. Selectfields
are the same as those specified earlier, with the addition that fields
in other files may be used in place of a value. Instead of a value,
another field may be specified thusly:


%PRINT statements declare which fields are to be sent to standard output.
If the fieldname is not given, and only the filename is there, then the
current record number of that file will be output. %PRINTALL statements
declare files in which all fields are to be output, including the record
number. This is the same as using the -a option on a normal pull.

%SEQUENTIAL forces a sequential read of the specified file. %INDEXNO
forces a read of the file based on the given index number (primary key
is index #0). %INDEXLIST finds an index containing at least the specified
fields. Fieldnames should be separated by commas and nothing else.
%BACKWARDS forces a backwards read of the specified file. %BACKWARDS
not be used with %INDEXNO or %INDEXLIST. Also, %INDEXNO and %INDEXLIST
can not be used together.


Suppose you wish to produce a report of all the products you sold, sorted
by the invoice date, the product category, and the product number. You
could use the following control file:

%BEGIN all_products_sold_by_category
%FILE invoiced
%SEQUENTIAL invoiced
%FILE product product=~invoiced.product
%INDEXNO product 0
%FILE invoicem invoice=~invoiced.invoice
%INDEXLIST invoicem invoice
%PRINT product.category invoiced.product invoiced

This control file would do a sequential read on the invoiced file, use
index number 0 (the primary key) on the product file, and find an index
that started with the invoice field on the invoicem file and use it.
If any of those options were not specified, pull will determine an
appropriate index to use on each of the files.

Note: The invoiced is the last field on the %PRINT line so that the
invoiced record numbers are placed on the end of the line for wtr to use.
The shell command to use this would be (assuming that the above controlfile
were placed inside of the grace program "report" as a comment, and the
grace program was previously compiled into the file ""):

npull -c report | csort -0 -1 -2 | wtr -s invoiced |lp -s