Chapter 1 - Intro

In the clean, well-lit world of mathematics, we work with perfect objects. Numbers can be infinitely long, lines are perfectly straight, and theorems hold with absolute certainty. It’s a world of ideals.

But when we bring a computer into the room, things get messy. A computer is a finite machine. It can’t store an infinite number of digits, and it can’t run for an infinite amount of time. Numerical methods is the art and science of bridging this gap, of translating the perfect ideas of mathematics into practical, reliable instructions for a very real, very finite machine.

Our journey begins with the most fundamental difference of all: the numbers themselves. The real numbers are a continuous line; the computer’s numbers are a set of discrete points on that line, like stepping stones across a river. This single, simple fact is the source of our first great challenge and our first set of clever ideas.

Round-off Error and Error Propagation

Let’s start with a calculation so simple it feels trivial. What is $1 - (3 \times ((4/3) - 1))$ ? You know the answer is zero. But let’s ask the computer.

# Code 1.1.3: A simple calculation
a = 4/3
b = a - 1
c = 3 * b
e = 1 - c
print(e)

The output isn’t 0. It’s something like 2.220446049250313e-16.

This tiny, non-zero result is round-off error. It happens because the computer can’t store $4/3$ perfectly. It stores the closest number it can, and this tiny initial error gets carried through the calculation. This isn’t a bug; it’s a fundamental consequence of using a finite machine.

Machine Numbers: The Stepping Stones

Computers represent numbers using a floating-point format, which is like scientific notation. A number is stored as a sign, a mantissa (the significant digits), and an exponent. For a given base $B$ (usually 2), a number looks like this:

x = \pm (0. a_{1} a_{2} \dots a_{m})_{B} \cdot B^{E}

The number of digits in the mantissa, $m$ , is finite. This means there are gaps between the numbers the computer can represent.

As the numbers get larger, the gaps between them grow. This has a crucial consequence: we care less about the absolute size of an error and more about its size relative to the number we’re trying to represent.

Absolute vs. Relative Error

If we’re approximating a true value $x$ with a computed value $\tilde{x}$ , we can define the error in two ways:

Absolute Error: The raw difference, $∣∣ \tilde{x} - x ∣∣$ . An error of 1cm is tiny if you’re measuring the distance to the moon, but huge if you’re measuring a microchip.
Relative Error: The error scaled by the true value, $\frac{∣∣ x ~ - x ∣∣}{∣∣ x ∣∣}$ . This is usually the more important metric. It tells us how many significant digits we got right.

A key number is the machine epsilon, $ϵ_{mach}$ . It’s the distance between 1.0 and the next representable floating-point number. It tells you the best possible relative accuracy you can hope for with numbers around 1.0.

import numpy as np
 
# Get info about standard 64-bit floating point numbers
finfo = np.finfo(np.float64)
 
# The machine epsilon for 64-bit floats
print(f"Machine Epsilon: {finfo.eps}")
 
# The largest representable number
print(f"Max Value: {finfo.max}")
 
# The smallest positive representable number
print(f"Min Value (tiny): {finfo.tiny}")

The Dangers: Overflow, Underflow, and Cancellation

When calculations go wrong, it’s usually for one of three reasons.

Overflow: The result is bigger than the largest number the machine can store (like finfo.max). The computer gives up and returns inf. This is almost always a disaster.
Underflow: The result is smaller than the smallest positive number the machine can store. The computer usually just rounds it to zero. This is often harmless, but not always.
Catastrophic Cancellation: This is the silent killer. It happens when you subtract two numbers that are almost equal.

Let’s focus on that last one. It’s the most subtle and the most dangerous.

Catastrophic Cancellation: The Enemy Within

Everyone knows the quadratic formula for $x^{2} + a x + b = 0$ :

x = \frac{- a \pm a ^{2} - 4 b}{2}

Let’s test it on a polynomial whose roots we know, like $p (x) = (x - c) (x - 1/ c) = x^{2} - (c + 1/ c) x + 1$ . The roots are exactly $c$ and $1/ c$ .

What happens if $c$ is very large, say $c = 1000$ ? The roots are $1000$ and $0.001$ . The coefficient $a = - (1000 + 0.001)$ is a large negative number. The term $a^{2} - 4$ will be very, very close to $∣ a ∣$ .

So, one of the roots, $\frac{- a - a ^{2} - 4}{2}$ , involves subtracting two nearly identical numbers. Let’s see the damage.

# Code 1.1.7: Unstable quadratic root-finding
import matplotlib.pyplot as plt
 
def zeros_quad_poly(a, b):
    D = a**2 - 4*b
    wD = np.sqrt(D)
    z1 = (-a - wD) / 2
    z2 = (-a + wD) / 2
    return np.array([z1, z2])
 
# Test for p(x) = (x-c)*(x-1/c)
c_vals = np.logspace(1, 8, 100)
a_vals = -(c_vals + 1/c_vals)
 
# Exact roots
x0_exact = 1/c_vals
x1_exact = c_vals
 
# Computed roots
roots = zeros_quad_poly(a_vals, 1.0)
z0_computed = roots[0] # Should be the small root
z1_computed = roots[1] # Should be the large root
 
# Plot relative errors
plt.loglog(c_vals, abs(z0_computed - x0_exact) / x0_exact, 'r+', label='Small Root (Unstable)')
plt.loglog(c_vals, abs(z1_computed - x1_exact) / x1_exact, 'b.', label='Large Root (Stable)')
plt.ylabel('Relative Error')
plt.xlabel('c')
plt.grid()
plt.legend()
plt.show()

The result is a catastrophe. The large root is found with beautiful precision. But the relative error in the small root explodes. We lose all our significant digits.

Catastrophic Cancellation

When you subtract two nearly equal numbers, the leading digits cancel out. Any small errors in the original numbers are then magnified enormously relative to the tiny result. This is like trying to weigh a feather by weighing a truck with and without the feather on it, the tiny difference you’re looking for is completely lost in the noise of the large measurements.

Sidestepping the Traps

The good news is that we can often avoid these problems with a bit of cleverness.

A Stable Quadratic Solver

How can we fix our quadratic solver? The problem came from the ± sign. One choice led to a stable addition, the other to an unstable subtraction. The trick is to use a different piece of math: Vieta’s formulas. For a quadratic $x^{2} + a x + b = 0$ , we know the product of the roots is $x_{1} x_{2} = b$ .

So, here’s the stable plan:

First, calculate the root that doesn’t involve cancellation. This is the one whose magnitude is larger. $x_{1} = \frac{- a - sign ( a ) a ^{2} - 4 b}{2}$
Then, find the second root using Vieta’s formula, which involves a stable division. $x_{2} = b / x_{1}$

This simple algebraic rearrangement completely avoids the subtraction of nearly equal numbers.

The Complex Step Derivative: A Bit of Magic

Here’s an even more surprising trick. To approximate a derivative, instead of stepping a small distance $h$ along the real line, let’s step a distance $ih$ into the complex plane.

The Taylor series for $f (x_{0} + ih)$ is:

f (x_{0} + ih) = f (x_{0}) + f^{'} (x_{0}) (ih) - \frac{f ^{''} ( x _{0} )}{2} h^{2} - i \frac{f ^{'''} ( x _{0} )}{6} h^{3} + \dots

Now, just take the imaginary part of both sides and divide by $h$ :

\frac{Im ( f ( x _{0} + ih ))}{h} = f^{'} (x_{0}) - \frac{f ^{'''} ( x _{0} )}{6} h^{2} + \dots

This gives us a new formula for the derivative:

f^{'} (x_{0}) \approx \frac{Im ( f ( x _{0} + ih ))}{h}

This formula is beautiful for two reasons. First, there is no subtraction, so it is completely immune to catastrophic cancellation. Second, the error is $O (h^{2})$ , making it much more accurate than the standard $O (h)$ forward difference. It’s a perfect example of how a little more mathematical thinking can lead to a vastly superior algorithm.

Computational Cost: The Ticking Clock

It’s not enough for an algorithm to be accurate; it also has to be fast. The computational cost is how we measure this. We’re interested in how the runtime scales as the size of the problem, $n$ , gets large. This is the algorithm’s asymptotic complexity.

We use Big O notation for this. An algorithm that takes $O (n^{2})$ steps will be about four times slower if you double the problem size. An $O (n^{3})$ algorithm will be eight times slower. For large $n$ , the difference is enormous.

The NumPy Advantage: Vectorization

Python is an easy language to write, but it’s not fast. A for loop in Python can be a thousand times slower than the equivalent loop in C or Fortran. So how can we do high-performance computing?

The answer is vectorization. Libraries like NumPy are our secret weapon. When you perform an operation on a whole NumPy array, like y = np.sin(x), NumPy doesn’t run a Python loop. It calls a highly optimized, pre-compiled C or Fortran routine that operates on the data in a tight, fast loop.

The golden rule of scientific computing in Python is: avoid Python loops. If you can express your calculation in terms of operations on entire arrays, NumPy will make it fly.

# A slow Python loop
import math
a = list(range(1_000_000))
b = [0.0] * 1_000_000
# %timeit for i in range(len(a)): b[i] = math.sin(a[i])
# On my machine: ~200 ms
 
# The fast NumPy way
x = np.arange(1_000_000)
y = np.zeros_like(x, dtype=float)
# %timeit y = np.sin(x)
# On my machine: ~5 ms

Thinking in Matrices

Linear algebra is the language of scientific computing. But to be effective, we need to go beyond the textbook definitions and think about how matrix operations actually work.

Think Before You Compute

A blind translation of a formula into code is a recipe for inefficiency. The structure of your matrices is everything.

Example 1: Diagonal Matrices Suppose $D$ is a diagonal $n \times n$ matrix and $A$ is a full $n \times n$ matrix. How do we compute the product $D A$ ?

The Naive Way: C = D @ A. This uses a general matrix-matrix multiplication algorithm, which costs $O (n^{3})$ operations.
The Smart Way: We know that left-multiplying by a diagonal matrix just scales the rows. The $i$ -th row of the product is simply the $i$ -th row of $A$ multiplied by the scalar $D_{ii}$ . We can do this efficiently with NumPy’s broadcasting.

n = 1000
A = np.random.rand(n, n)
d = np.random.rand(n)
D = np.diag(d)
 
# The Naive Way: O(n^3)
# %timeit C1 = D @ A
 
# The Smart Way: O(n^2)
# %timeit C2 = d[:, np.newaxis] * A

The smart way is orders of magnitude faster because it uses $O (n^{2})$ operations instead of $O (n^{3})$ . It respects the sparse structure of $D$ .

Example 2: Rank-1 Matrices A matrix of rank 1 can always be written as an outer product of two vectors, $A = u v^{T}$ . How do we compute the product $A x$ ?

The Naive Way: First, build the full $m \times n$ matrix $A = u v^{T}$ . Then, multiply it by $x$ . This costs $O (mn)$ to build the matrix and another $O (mn)$ to multiply.
The Smart Way: Use associativity. The parentheses make all the difference. $A x = (u v^{T}) x = u (v^{T} x)$ First, compute the inner product $v^{T} x$ . This is a scalar and costs only $O (n)$ operations. Then, multiply the vector $u$ by this scalar, which costs $O (m)$ . The total cost is just $O (m + n)$ , a spectacular improvement over $O (mn)$ .

The lesson is clear, and it’s the motto of this course: Think before you compute. Don’t just translate formulas. Understand the structure, exploit it, and let the mathematics guide you to a better, faster, and more stable algorithm.

Next: Chapter 2 - Polynomial Interpolation

CS Notes

Explorer

Chapter 1 - Intro

Round-off Error and Error Propagation

Machine Numbers: The Stepping Stones

Absolute vs. Relative Error

The Dangers: Overflow, Underflow, and Cancellation

Catastrophic Cancellation: The Enemy Within

Sidestepping the Traps

A Stable Quadratic Solver

The Complex Step Derivative: A Bit of Magic

Computational Cost: The Ticking Clock

The NumPy Advantage: Vectorization

Thinking in Matrices

Think Before You Compute

Table of Contents

Graph View

Backlinks