Node:Floating Point Issues, Previous:Basic Data Typing, Up:Basic Concepts
As mentioned earlier, floating-point numbers represent what are called
"real" numbers, i.e., those that have a fractional part. awk
uses double-precision floating-point numbers to represent all
numeric values. This section describes some of the issues
involved in using floating-point numbers.
There is a very nice paper on floating-point arithmetic by David Goldberg, "What Every Computer Scientist Should Know About Floating-point Arithmetic," ACM Computing Surveys 23, 1 (1991-03), 5-48.1 This is worth reading if you are interested in the details, but it does require a background in computer science.
Internally, awk
keeps both the numeric value
(double-precision floating-point) and the string value for a variable.
Separately, awk
keeps
track of what type the variable has
(see Variable Typing and Comparison Expressions),
which plays a role in how variables are used in comparisons.
It is important to note that the string value for a number may not
reflect the full value (all the digits) that the numeric value
actually contains.
The following program (values.awk
) illustrates this:
{ $1 = $2 + $3 # see it for what it is printf("$1 = %.12g\n", $1) # use CONVFMT a = "<" $1 ">" print "a =", a # use OFMT print "$1 =", $1 }
This program shows the full value of the sum of $2
and $3
using printf
, and then prints the string values obtained
from both automatic conversion (via CONVFMT
) and
from printing (via OFMT
).
Here is what happens when the program is run:
$ echo 2 3.654321 1.2345678 | awk -f values.awk -| $1 = 4.8888888 -| a = <4.88889> -| $1 = 4.88889
This makes it clear that the full numeric value is different from what the default string representations show.
CONVFMT
's default value is "%.6g"
, which yields a value with
at least six significant digits. For some applications, you might want to
change it to specify more precision.
On most modern machines, most of the time,
17 digits is enough to capture a floating-point number's
value exactly.2
Unlike numbers in the abstract sense (such as what you studied in high school
or college math), numbers stored in computers are limited in certain ways.
They cannot represent an infinite number of digits, nor can they always
represent things exactly.
In particular,
floating-point numbers cannot
always represent values exactly. Here is an example:
$ awk '{ printf("%010d\n", $1 * 100) }' 515.79 -| 0000051579 515.80 -| 0000051579 515.81 -| 0000051580 515.82 -| 0000051582 Ctrl-d
This shows that some values can be represented exactly,
whereas others are only approximated. This is not a "bug"
in awk
, but simply an artifact of how computers
represent numbers.
Another peculiarity of floating-point numbers on modern systems is that they often have more than one representation for the number zero! In particular, it is possible to represent "minus zero" as well as regular, or "positive" zero.
This example shows that negative and positive zero are distinct values
when stored internally, but that they are in fact equal to each other,
as well as to "regular" zero:
$ gawk 'BEGIN { mz = -0 ; pz = 0 > printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz > printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0 > }' -| -0 = -0, +0 = 0, (-0 == +0) -> 1 -| mz == 0 -> 1, pz == 0 -> 1
It helps to keep this in mind should you process numeric data that contains negative zero values; the fact that the zero is negative is noted and can affect comparisons.
Pathological cases can require up to 752 digits (!), but we doubt that you need to worry about this.