Node:Floating Point Issues, Previous:Basic Data Typing, Up:Basic Concepts



Floating-Point Number Caveats

As mentioned earlier, floating-point numbers represent what are called "real" numbers, i.e., those that have a fractional part. awk uses double-precision floating-point numbers to represent all numeric values. This section describes some of the issues involved in using floating-point numbers.

There is a very nice paper on floating-point arithmetic by David Goldberg, "What Every Computer Scientist Should Know About Floating-point Arithmetic," ACM Computing Surveys 23, 1 (1991-03), 5-48.1 This is worth reading if you are interested in the details, but it does require a background in computer science.

Internally, awk keeps both the numeric value (double-precision floating-point) and the string value for a variable. Separately, awk keeps track of what type the variable has (see Variable Typing and Comparison Expressions), which plays a role in how variables are used in comparisons.

It is important to note that the string value for a number may not reflect the full value (all the digits) that the numeric value actually contains. The following program (values.awk) illustrates this:

{
   $1 = $2 + $3
   # see it for what it is
   printf("$1 = %.12g\n", $1)
   # use CONVFMT
   a = "<" $1 ">"
   print "a =", a
   # use OFMT
   print "$1 =", $1
}

This program shows the full value of the sum of $2 and $3 using printf, and then prints the string values obtained from both automatic conversion (via CONVFMT) and from printing (via OFMT).

Here is what happens when the program is run:

$ echo 2 3.654321 1.2345678 | awk -f values.awk
-| $1 = 4.8888888
-| a = <4.88889>
-| $1 = 4.88889

This makes it clear that the full numeric value is different from what the default string representations show.

CONVFMT's default value is "%.6g", which yields a value with at least six significant digits. For some applications, you might want to change it to specify more precision. On most modern machines, most of the time, 17 digits is enough to capture a floating-point number's value exactly.2

Unlike numbers in the abstract sense (such as what you studied in high school or college math), numbers stored in computers are limited in certain ways. They cannot represent an infinite number of digits, nor can they always represent things exactly. In particular, floating-point numbers cannot always represent values exactly. Here is an example:

$ awk '{ printf("%010d\n", $1 * 100) }'
515.79
-| 0000051579
515.80
-| 0000051579
515.81
-| 0000051580
515.82
-| 0000051582
Ctrl-d

This shows that some values can be represented exactly, whereas others are only approximated. This is not a "bug" in awk, but simply an artifact of how computers represent numbers.

Another peculiarity of floating-point numbers on modern systems is that they often have more than one representation for the number zero! In particular, it is possible to represent "minus zero" as well as regular, or "positive" zero.

This example shows that negative and positive zero are distinct values when stored internally, but that they are in fact equal to each other, as well as to "regular" zero:

$ gawk 'BEGIN { mz = -0 ; pz = 0
> printf "-0 = %g, +0 = %g, (-0 == +0) -> %d\n", mz, pz, mz == pz
> printf "mz == 0 -> %d, pz == 0 -> %d\n", mz == 0, pz == 0
> }'
-| -0 = -0, +0 = 0, (-0 == +0) -> 1
-| mz == 0 -> 1, pz == 0 -> 1

It helps to keep this in mind should you process numeric data that contains negative zero values; the fact that the zero is negative is noted and can affect comparisons.


Footnotes

  1. http://www.validgh.com/goldberg/paper.ps.

  2. Pathological cases can require up to 752 digits (!), but we doubt that you need to worry about this.