Recap and Motivation: The Unanswered Question

In our previous sessions, we established a powerful framework for polynomial interpolation. We saw that the choice of basis and evaluation method is critical.

  • The Monomial Basis is numerically unstable.
  • The Newton Basis provides a stable method to find coefficients.
  • The Lagrange Basis leads to the elegant and highly efficient barycentric form for evaluation, costing only per point after an initial setup to compute the weights.

Most importantly, we confronted the Achilles’ heel of high-degree interpolation: Runge’s phenomenon. We discovered that the intuitive choice of equally spaced nodes is a trap, leading to wild oscillations. The error formula showed us the way out:

To minimize this error, we must choose nodes that minimize the magnitude of the nodal polynomial. We stated that the solution was to use Chebyshev nodes.

This leaves us with a crucial unanswered question: Why? What is so special about these specific points? Why do they have this magical error-minimizing property? To answer this, we must dive deep into the world of the functions that generate them: the Chebyshev polynomials. Understanding them will not only justify our choice of nodes but also reveal an entirely new, and even more powerful, way to perform interpolation.

Chebyshev Polynomials: The Optimal Polynomials

The Chebyshev polynomials are a sequence of orthogonal polynomials that are fundamental to numerical analysis, especially in the context of approximation theory. They are the key to understanding and taming the behavior of high-degree polynomials.

Definition and Properties

There are two kinds of Chebyshev polynomials, but we will focus on the “first kind.”

Definition: Chebyshev Polynomials of the First Kind

The -th Chebyshev polynomial, , is defined on the interval by the formula:

This definition seems strange. How can a trigonometric formula produce a polynomial? Let’s see. By setting (so ), the definition becomes . Using trigonometric identities, we can see they are indeed polynomials in :

All are indeed polynomials of degree . They can be generated efficiently using a three-term recurrence relation, derived from a simple trigonometric identity:

The Chebyshev polynomials have several crucial properties directly relevant to our interpolation problem:

  1. Boundedness: Because is defined via cosine, it is perfectly bounded: This “equi-oscillation” behavior is the exact opposite of the monomial , which is tiny near and explodes near . This property is the secret to their stability.
  2. Extrema (Points of max/min oscillation): The points where reaches its maximum and minimum values of are:
  3. Roots (The Chebyshev Nodes): The roots of are the points where it crosses the x-axis. They are given by: These are precisely the Chebyshev nodes we were looking for!

The “Minimax” Property and Orthogonality: Two Solutions in One

Chebyshev polynomials offer two profound solutions to the challenges of interpolation.

1. The Minimax Property: The Best Nodes for Interpolation

Let’s return to our main goal: minimizing the nodal polynomial term in the error formula, . We want to choose the nodes to make the peak value of this polynomial as small as possible across the interval. This is a classic engineering optimization problem: minimize the maximum error (a “minimax” problem).

The solution is provided by the Chebyshev polynomials.

Theorem 2.4.14 (The Minimax Property)

Among all monic polynomials of degree (polynomials where the leading coefficient of is 1), the polynomial has the smallest maximum magnitude on the interval .

The practical consequence: To minimize the maximum value of the nodal polynomial , you must choose the nodes to be the roots of the next Chebyshev polynomial, .

This theorem is the formal reason why Chebyshev nodes defeat Runge’s phenomenon. By clustering near the endpoints, they “pin down” the nodal polynomial where it would otherwise explode, forcing its oscillations to have a uniform, minimal amplitude across the entire interval. This answers the question from our last lecture. Chebyshev nodes are the optimal placement of sample points to minimize the worst-case interpolation error, regardless of which basis (Newton, Lagrange) you use.

2. Orthogonality: The Best Basis for Interpolation

We’ve just seen that Chebyshev nodes are the best points to use with our existing Lagrange/Barycentric method. But the Chebyshev polynomials are so powerful, they invite a new question: what if we use them not just to find nodes, but as the basis functions themselves?

Instead of writing , let’s try to write:

Why would we do this? Because the Chebyshev polynomials form an orthogonal basis. This is a concept from linear algebra that makes finding the coefficients incredibly easy and efficient.

Discrete Orthogonality of Chebyshev Polynomials

When evaluated at the Chebyshev nodes , the Chebyshev polynomials are orthogonal. This means the “dot product” (a discrete sum) of any two different Chebyshev polynomials is zero:

Why should you care about this? Because it completely eliminates the need to solve a system of equations to find the coefficients .

To find a specific coefficient, say , we can use a trick. Take the “dot product” of our polynomial with :

Because of orthogonality, every term on the right side becomes zero except for one:

Solving for our coefficient is now trivial:

Since we are interpolating, , so the formula is simply:

The Engineering Payoff of Orthogonality

  • No Linear System to Solve: Orthogonality lets us compute each coefficient independently, bypassing the work of Newton’s method or the of a general linear solve.
  • Robustness: Errors in computing one coefficient do not affect the others.
  • Blazing Speed: This summation is not just any sum. It is a special type called a Discrete Cosine Transform (DCT), a close relative of the Fast Fourier Transform (FFT). It can be computed in only operations. This is asymptotically the fastest known way to find the coefficients of an interpolating polynomial.

Evaluating in the Chebyshev Basis: Clenshaw’s Algorithm

Once we have the coefficients in time, we need an efficient way to evaluate . The proper tool is Clenshaw’s Algorithm, a fast () and numerically stable recurrence, analogous to Horner’s method for the monomial basis.

Watch the video!

Clenshaw's Algorithm

To evaluate :

  1. Initialize: , .
  2. Iterate backwards: For , compute:
  3. The result is: . This stable algorithm is what’s used in professional software libraries to evaluate functions represented in a Chebyshev series.

Beyond Polynomials: Trigonometric Interpolation

The definition of Chebyshev polynomials, , was not an accident. It reveals a deep connection between optimal polynomials and trigonometry. This leads to a natural question: what if the function we are trying to model is inherently periodic or cyclical?

Consider phenomena like:

  • The vibration of a guitar string.
  • The voltage in an AC power line.
  • Seasonal temperature fluctuations.

For these problems, polynomials are a poor choice. A non-constant polynomial will always fly off to , which fundamentally mismatches the bounded, repeating nature of the signal. We need a basis that is born from oscillations. This brings us to the revolutionary idea of Joseph Fourier.

The Fourier Idea: Decomposing Functions into Simple Waves

Fourier’s profound insight was that any reasonably well-behaved periodic function can be perfectly described as a sum of simple sine and cosine waves of different frequencies and amplitudes. This is like decomposing a musical chord into its individual notes.

This is a fundamentally different approach from a Taylor series.

  • A Taylor series is a local approximation, accurate near a single point.
  • A Fourier series is a global approximation, describing the function’s behavior over an entire interval by its frequency content.

The Mathematical Framework: The Space and the Fourier Basis

To make this precise, we work in the space of “finite energy” signals, . This space is powerful because it includes functions with jumps and corners, which are common in real-world signals.

The Space : The Space of Finite Energy Signals

The space consists of all complex-valued functions on the interval for which the total “energy” is finite:

This space has a natural inner product:

Our building blocks are the Fourier basis functions, which are complex exponentials that elegantly combine sines and cosines:

Here, the integer is the frequency. These functions form a complete orthonormal basis for the space. This means any function in this space can be written as an infinite sum (a Fourier series):

The coefficients are the Fourier coefficients. Thanks to orthogonality, we find them with an inner product:

The Key Insight: Smoothness and the Decay of Fourier Coefficients

The power of this decomposition lies in the deep connection between a function’s visual “smoothness” and how quickly its Fourier coefficients shrink to zero for high frequencies .

The plots on the slide illustrate this fundamental principle:

  • Red (Ramp Function): Discontinuous. Needs many high-frequency waves to create the sharp jump. Coefficients decay slowly, like .
  • Green (Hat Function): Continuous but has a sharp corner. Smoother. Coefficients decay faster, like .
  • Blue (Smooth Function): Infinitely smooth. No sharp features. Coefficients decay exponentially fast. The function can be accurately represented with just a few low-frequency terms.

Theorem 3.1.16: Smoothness vs. Decay Rate

The core theorem states that if a function has continuous derivatives, its Fourier coefficients decay at least as fast as .

This is the theoretical foundation for lossy data compression (like JPEG and MP3). A smooth signal can be accurately represented by storing only its first few significant Fourier coefficients.

Furthermore, differentiation in the time domain becomes simple multiplication in the frequency domain:

This transforms calculus into algebra, a cornerstone of modern numerical methods.

The Gibbs Phenomenon

When we approximate a discontinuous function (like a square wave) with a finite number of Fourier terms, an artifact appears.

The approximation overshoots the true value at the jump. This is the Gibbs phenomenon. The overshoot’s height is a universal constant (about 9% of the jump) and does not disappear as we add more terms; it just gets squeezed into a narrower region. It’s the price we pay for approximating a sharp edge with smooth waves.

From Continuous to Discrete: The Discrete Fourier Transform (DFT)

In practice, we have discrete samples, not a continuous function. For trigonometric interpolation, we take samples at equidistant points .

Equidistant Points: Bad for Polynomials, Perfect for Fourier

This is a critical distinction. We just learned that equidistant points are terrible for high-degree polynomial interpolation due to Runge’s phenomenon. However, for trigonometric interpolation, they are the perfect choice because they preserve the orthogonality of the sampled sine and cosine waves.

We approximate the continuous Fourier integral with a discrete sum, which gives us the Discrete Fourier Transform (DFT).

The Discrete Fourier Transform (DFT)

Given data points , the DFT computes discrete Fourier coefficients :

The trigonometric interpolant is then the finite sum using these coefficients:

This trigonometric polynomial is the unique function from its class that passes exactly through all the data points .

A naive implementation of the DFT costs operations. However, the revolutionary Fast Fourier Transform (FFT) algorithm computes the exact same result in an incredible time. This efficiency is what makes all of modern digital signal processing possible.

Continue here: 05 Discrete Fourier Transform