In Part 1, we looked at the following Python code:
x = 3.141592653589793
y = 6.022e23
x = (y + x) - y
print(x)
The correct answer is clearly 3.141592653589793
but Python gives the result 0.0
. This problem is called catastrophic cancellation.
Catastrophic cancellation is not specific to Python; we will run into the same problem in any programming language that makes use of floating-point arithmetic, including Matlab, FORTRAN, Java, R etc.. The problem does not arise from the specifics of any one programming language, but instead arises from the way that the continuous number-line of real numbers is approximated by a finite and discrete combination of integer bits in the hardware of a digital computer.
As we saw in the previous post, floating-point representation allows us to vary the precision with which we represent a real number by varying where we write the decimal binary point. Given a finite number of binary digits, this allows us to make a trade-off between the precision and the range of the numbers we can represent.
Switching back to decimal for example, if we allocate two decimal digits for the mantissa and one digit for the exponent we can represent large numbers like:
2,000,000
2,100,000
or small numbers like:
0.00020
0.00021
But notice from the above that when we fix the exponent (6 or -4 in the examples above), and then add the smallest possible increment to the mantissa (going from 2.0 to 2.1 in the examples), then the absolute distance we move to the right on the number line depends on the exponent; for an exponent of 6, we incremented our final number by 100,000, whereas for an exponent of -4, we only moved a distance of 0.00020.
We can visualise the continuous real number line, and see how the infinite set of real numbers is mapped onto the finite set of floating-point numbers:
In the above visualization the real numbers are represented by the continuous line at the top of the diagram, whereas the finite floating-point numbers are represented by the discrete vertical lines. We can immediately see that there are many floating-point numbers which get mapped onto the same floating-point number. Moreover the density decreases as the magnitude of the numbers increases, and as density decreases so does the precision with which we can represent a given real number.
To see this intuitively, let’s pick two numbers, x and y from the real number line, which are shown in red and blue in the figure below.
Notice that both x and y get mapped into the same floating-point number f. The round-off error for x is f-x
, and for y it is y-f
. It should be intuitively obvious that the potential round-off error increases as we move either to the extreme left or right of the number line.
This explains the catastrophic cancellation in the Python example we started with, which is repeated below.
x = 3.141592653589793
y = 6.022e23
x = (y + x) - y
print(x)
In the example above, the problem occurs in the statement x = (y + x) - y
. At this point y is a very large number (Avogadro’s constant), and at this scale we have relatively small precision (lower density of floats). The left-hand side of the subtraction operator y + x
is represented internally by a floating-point number f. We then increment by a small distance on the number line by adding x
(pi), and this is represented by the same floating-point number f. The result of the calculation is then f - f, which of course is equal to zero!
This is called catastrophic cancellation because the resulting error is not a small round-off error relative to the value we are trying to represent (pi). This illustrates the importance of assessing the relative error in floating-point calculations, which we will discuss in the next post.
Further reading
For a more detailed treatment of catastrophic cancellation and round-off error see the text-books below.