Node:Constant Size, Next:, Previous:Field Separators, Up:Reading Files



Reading Fixed-Width Data

Note: This section discusses an advanced feature of gawk. If you are a novice awk user, you might want to skip it on the first reading.

gawk version 2.13 introduced a facility for dealing with fixed-width fields with no distinctive field separator. For example, data of this nature arises in the input for old Fortran programs where numbers are run together, or in the output of programs that did not anticipate the use of their output as input for other programs.

An example of the latter is a table where all the columns are lined up by the use of a variable number of spaces and empty fields are just spaces. Clearly, awk's normal field splitting based on FS does not work well in this case. Although a portable awk program can use a series of substr calls on $0 (see String Manipulation Functions), this is awkward and inefficient for a large number of fields.

The splitting of an input record into fixed-width fields is specified by assigning a string containing space-separated numbers to the built-in variable FIELDWIDTHS. Each number specifies the width of the field, including columns between fields. If you want to ignore the columns between fields, you can specify the width as a separate field that is subsequently ignored. It is a fatal error to supply a field width that is not a positive number. The following data is the output of the Unix w utility. It is useful to illustrate the use of FIELDWIDTHS:

 10:06pm  up 21 days, 14:04,  23 users
User     tty       login  idle   JCPU   PCPU  what
hzuo     ttyV0     8:58pm            9      5  vi p24.tex
hzang    ttyV3     6:37pm    50                -csh
eklye    ttyV5     9:53pm            7      1  em thes.tex
dportein ttyV6     8:17pm  1:47                -csh
gierd    ttyD3    10:00pm     1                elm
dave     ttyD4     9:47pm            4      4  w
brent    ttyp0    26Jun91  4:46  26:46   4:41  bash
dave     ttyq4    26Jun9115days     46     46  wnewmail

The following program takes the above input, converts the idle time to number of seconds, and prints out the first two fields and the calculated idle time:

Note: This program uses a number of awk features that haven't been introduced yet.

BEGIN  { FIELDWIDTHS = "9 6 10 6 7 7 35" }
NR > 2 {
    idle = $4
    sub(/^  */, "", idle)   # strip leading spaces
    if (idle == "")
        idle = 0
    if (idle ~ /:/) {
        split(idle, t, ":")
        idle = t[1] * 60 + t[2]
    }
    if (idle ~ /days/)
        idle *= 24 * 60 * 60

    print $1, $2, idle
}

Running the program on the data produces the following results:

hzuo      ttyV0  0
hzang     ttyV3  50
eklye     ttyV5  0
dportein  ttyV6  107
gierd     ttyD3  1
dave      ttyD4  0
brent     ttyp0  286
dave      ttyq4  1296000

Another (possibly more practical) example of fixed-width input data is the input from a deck of balloting cards. In some parts of the United States, voters mark their choices by punching holes in computer cards. These cards are then processed to count the votes for any particular candidate or on any particular issue. Because a voter may choose not to vote on some issue, any column on the card may be empty. An awk program for processing such data could use the FIELDWIDTHS feature to simplify reading the data. (Of course, getting gawk to run on a system with card readers is another story!)

Assigning a value to FS causes gawk to use FS for field splitting again. Use FS = FS to make this happen, without having to know the current value of FS. In order to tell which kind of field splitting is in effect, use PROCINFO["FS"] (see Built-in Variables That Convey Information). The value is "FS" if regular field splitting is being used, or it is "FIELDWIDTHS" if fixed-width field splitting is being used:

if (PROCINFO["FS"] == "FS")
    regular field splitting ...
else
    fixed-width field splitting ...

This information is useful when writing a function that needs to temporarily change FS or FIELDWIDTHS, read some records, and then restore the original settings (see Reading the User Database, for an example of such a function).