Node:Fields, Next:Nonconstant Fields, Previous:Records, Up:Reading Files
When awk
reads an input record, the record is
automatically parsed or separated by the interpreter into chunks
called fields. By default, fields are separated by whitespace,
like words in a line.
Whitespace in awk
means any string of one or more spaces,
tabs, or newlines;1 other characters, such as
formfeed, vertical tab, etc. that are
considered whitespace by other languages, are not considered
whitespace by awk
.
The purpose of fields is to make it more convenient for you to refer to
these pieces of the record. You don't have to use them--you can
operate on the whole record if you want--but fields are what make
simple awk
programs so powerful.
A dollar-sign ($
) is used
to refer to a field in an awk
program,
followed by the number of the field you want. Thus, $1
refers to the first field, $2
to the second, and so on.
(Unlike the Unix shells, the field numbers are not limited to single digits.
$127
is the one hundred twenty-seventh field in the record.)
For example, suppose the following is a line of input:
This seems like a pretty nice example.
Here the first field, or $1
, is This
, the second field, or
$2
, is seems
, and so on. Note that the last field,
$7
, is example.
. Because there is no space between the
e
and the .
, the period is considered part of the seventh
field.
NF
is a built-in variable whose value is the number of fields
in the current record. awk
automatically updates the value
of NF
each time it reads a record. No matter how many fields
there are, the last field in a record can be represented by $NF
.
So, $NF
is the same as $7
, which is example.
.
If you try to reference a field beyond the last
one (such as $8
when the record has only seven fields), you get
the empty string. (If used in a numeric operation, you get zero.)
The use of $0
, which looks like a reference to the "zero-th" field, is
a special case: it represents the whole input record
when you are not interested in specific fields.
Here are some more examples:
$ awk '$1 ~ /foo/ { print $0 }' BBS-list -| fooey 555-1234 2400/1200/300 B -| foot 555-6699 1200/300 B -| macfoo 555-6480 1200/300 A -| sabafoo 555-2127 1200/300 C
This example prints each record in the file BBS-list
whose first
field contains the string foo
. The operator ~
is called a
matching operator
(see How to Use Regular Expressions);
it tests whether a string (here, the field $1
) matches a given regular
expression.
By contrast, the following example
looks for foo
in the entire record and prints the first
field and the last field for each matching input record:
$ awk '/foo/ { print $1, $NF }' BBS-list -| fooey B -| foot B -| macfoo A -| sabafoo C