Numerical Computing in Python - Part 1
It's important to understand that digital computers are inherently discrete systems. This means that real numbers, which exist in a continuum, cannot always be exactly represented within a computer. If we do not take into this into account in our code, it can have catastrophic consequences.
When we use a computer for mathematical modelling, we take a quantitative model typically expressed as equations and functions over the domain of real numbers. The set of real numbers is continuous, yet digital computers only have a finite number of possible discrete states, which are represented by the on-off states of its billions of microscopic transistors etched into its chips. Each transistor can only represent a single on-off state, which can encode the binary digits 1 or 0, i.e. a single bit. The versatility of a computer arises from being able to combine billions of bits to represent, or encode arbitrary numbers, text or images.
As discussed, there are a continuum of real numbers. But how can we precisely represent an infinite set with a finite number of states? The answer is that we cannot! The best we can do is represent real numbers with an approximation based on integer arithmetic, and inevitably there will be errors in our calculations. The art of scientific computing is to keep these errors under control.
Fixed-point representation
In fixed-point representation, continuous values are represented as two integers (x) and (y). The real number is then obtained through an equation like
For example, the real number 500.4421 would be represented as a tuple of integers (x = 500), (y = 44210). The exponent 5 is *fixed* and dictates the precision of the representation. However, this method suffers from a lack of versatility, particularly in representing a wide range of values, leading to its limited use in practice.
Floating-point representation
In contrast, floating-point representation offers variable precision. It allows for a much broader range of values by employing a dynamic exponent, in a manner similar to scientific notation.
In fact, we humans also use a version of floating-point representation through scientific notation. In this system, numbers are expressed in the form (m times 10^n), which can be abbreviated in ASCII as <m>e<n>
. For example,
would be written as 6.72e+11
. In this expression, (m) is called the significand or mantissa, (n) is the exponent, and 10 is the base. When Python displays floating-point numbers, it employs scientific notation for readability. For example, executing print(672000000000000000.0)
in Python outputs 6.72e+17
. However, it's crucial to understand that internally, the number is not represented in this manner. Scientific notation is simply a human-readable format and does not reflect the digital representation of the number.
In digital computers, the base used in floating-point representation is 2, not 10. Both the mantissa and the exponent are stored in binary, making the representation
The internal format used to approximate real numbers is known as floating-point. This format is guided by the IEEE Standard 755, which aims to provide a universal representation across different hardware implementations. Floating-point numbers are usually available in two standard sizes: 32-bit and 64-bit, with the latter often referred to as double precision or simply double values. Calculations involving these IEEE-standard floating-point numbers are typically performed in hardware on modern computers.
Understanding these nuances is critical for anyone working with scientific computing, quantitative finance, or any field that relies on numerical calculations. The limitations of finite precision can have wide-reaching implications, and being aware of them is the first step in mitigating potential issues.
Finite precision in Python
Try running the following code in Python:
y = 0.1 + 0.2
x = 0.3
print(x == y)
Clearly the correct answer is True
yet Python outputs False
! Why is this? Let's look at the actual value of y:
0.30000000000000004
We see that there is a small, but detectable, absolute error in the represented value and the actual value. This occurs because the fraction 3/10 cannot be written using a finite number of binary digits, in the same way that we cannot exactly represent 1/3 using a finite number of decimal digits.
Whenever we work with floating-point numbers, there is nearly always a finite absolute error like this in our calculations. The art of numerical computing is to keep this error as small and possible, and within acceptable bounds. This can be difficult in more complicated calculations; for example, if we have a very long loop, and we iteratively refine a solution each time we go round the loop, if we are not careful errors can compound on each iteration, leading to a very large error by the time the loop has finished.
This problem predates modern digital computers, going back to the time of the abacus, and has given rise to an entire branch of mathematics called numerical analysis, and algorithms which make use of floating-point numbers are called numerical algorithms. We have to be extremely careful when implementing numerical algorithms to ensure, for example, that the errors in our calculations are stable and do not grow as the number of iterations increases.
Numpy
The numpy package in Python provides lots of functions that are helpful for numerical computing. To use numpy we first import it:
import numpy as np
We can now use the functions in numpy by prefixing them with np.
.
As we have seen, even if you are very careful, there will nearly always be some small finite absolute error in our calculations. This means that it does not make sense to use Python's standard equals operator (==) on floating-point types, otherwise we will run into the problem from out previous example above. Numpy provides an alternative way of comparing floating-number numbers taking into account finite precision by providing a function called isclose
which checks for equality within a relative or absolute error. We can modify our previous code to provide the correct answer for 0.3 = 0.1 + 0.2.
import numpy as np
y = 0.1 + 0.2
x = 0.3
print(np.isclose(x, y))
Now we see that Python outputs True
in contrast to the code that made use of the ==
operator.
This isn't the end of the story though. Let's look at another example of floating-point error.
x = 3.141592653589793
y = 6.022e23
z = (x + y) - y
print(z)
Clearly the above calculation should result in the output 3.141592653589793, yet it prints out zero!
This is clearly a very serious error, and we can't simply resolve it by using np.isclose()
. This problem is called catastrophic cancellation, and we will examine it in more detail in the next article.