04 Chebyshev Polynomials and Trigonometric (Fourier) Interpolation

Recap and Motivation: The Unanswered Question

In our previous sessions, we established a powerful framework for polynomial interpolation. We saw that the choice of basis and evaluation method is critical.

The Monomial Basis is numerically unstable.
The Newton Basis provides a stable $O (n^{2})$ method to find coefficients.
The Lagrange Basis leads to the elegant and highly efficient barycentric form for evaluation, costing only $O (n)$ per point after an initial $O (n^{2})$ setup to compute the weights.

Most importantly, we confronted the Achilles’ heel of high-degree interpolation: Runge’s phenomenon. We discovered that the intuitive choice of equally spaced nodes is a trap, leading to wild oscillations. The error formula showed us the way out:

Error (x) = f (x) - p (x) = \frac{f ^{(n + 1)} ( ξ )}{( n + 1 )!} The part we can control (x - x_{0}) (x - x_{1}) \dots (x - x_{n})

To minimize this error, we must choose nodes that minimize the magnitude of the nodal polynomial. We stated that the solution was to use Chebyshev nodes.

This leaves us with a crucial unanswered question: Why? What is so special about these specific points? Why do they have this magical error-minimizing property? To answer this, we must dive deep into the world of the functions that generate them: the Chebyshev polynomials. Understanding them will not only justify our choice of nodes but also reveal an entirely new, and even more powerful, way to perform interpolation.

Chebyshev Polynomials: The Optimal Polynomials

The Chebyshev polynomials are a sequence of orthogonal polynomials that are fundamental to numerical analysis, especially in the context of approximation theory. They are the key to understanding and taming the behavior of high-degree polynomials.

Definition and Properties

There are two kinds of Chebyshev polynomials, but we will focus on the “first kind.”

Definition: Chebyshev Polynomials of the First Kind

The $n$ -th Chebyshev polynomial, $T_{n} (x)$ , is defined on the interval $x \in [- 1, 1]$ by the formula:
$T_{n} (x) = cos (n arccos (x))$

This definition seems strange. How can a trigonometric formula produce a polynomial? Let’s see. By setting $θ = arccos (x)$ (so $x = cos (θ)$ ), the definition becomes $T_{n} (cos θ) = cos (n θ)$ . Using trigonometric identities, we can see they are indeed polynomials in $x$ :

$T_{0} (x) = cos (0) = 1$
$T_{1} (x) = cos (θ) = x$
$T_{2} (x) = cos (2 θ) = 2 cos^{2} (θ) - 1 = 2 x^{2} - 1$
$T_{3} (x) = cos (3 θ) = 4 cos^{3} (θ) - 3 cos (θ) = 4 x^{3} - 3 x$

All $T_{n} (x)$ are indeed polynomials of degree $n$ . They can be generated efficiently using a three-term recurrence relation, derived from a simple trigonometric identity:

T_{n + 1} (x) = 2 x T_{n} (x) - T_{n - 1} (x)

The Chebyshev polynomials have several crucial properties directly relevant to our interpolation problem:

Boundedness: Because $T_{n} (x)$ is defined via cosine, it is perfectly bounded: $∣ T_{n} (x) ∣ \leq 1 for x \in [- 1, 1]$ This “equi-oscillation” behavior is the exact opposite of the monomial $x^{n}$ , which is tiny near $x = 0$ and explodes near $x = \pm 1$ . This property is the secret to their stability.
Extrema (Points of max/min oscillation): The points where $T_{n} (x)$ reaches its maximum and minimum values of $\pm 1$ are: $x_{k} = cos (\frac{kπ}{n}), k = 0, 1, \dots, n$
Roots (The Chebyshev Nodes): The roots of $T_{n} (x)$ are the points where it crosses the x-axis. They are given by: $x_{k} = cos (\frac{( 2 k + 1 ) π}{2 n}), k = 0, 1, \dots, n - 1$ These are precisely the Chebyshev nodes we were looking for!

The “Minimax” Property and Orthogonality: Two Solutions in One

Chebyshev polynomials offer two profound solutions to the challenges of interpolation.

1. The Minimax Property: The Best Nodes for Interpolation

Let’s return to our main goal: minimizing the nodal polynomial term in the error formula, $∣ (x - x_{0}) (x - x_{1}) \dots (x - x_{n}) ∣$ . We want to choose the nodes ${x_{i}}$ to make the peak value of this polynomial as small as possible across the interval. This is a classic engineering optimization problem: minimize the maximum error (a “minimax” problem).

The solution is provided by the Chebyshev polynomials.

Theorem 2.4.14 (The Minimax Property)

Among all monic polynomials of degree $n + 1$ (polynomials where the leading coefficient of $x^{n + 1}$ is 1), the polynomial $\frac{1}{2 ^{n}} T_{n + 1} (x)$ has the smallest maximum magnitude on the interval $[- 1, 1]$ .

The practical consequence: To minimize the maximum value of the nodal polynomial $∣ (x - x_{0}) \dots (x - x_{n}) ∣$ , you must choose the nodes $x_{0}, \dots, x_{n}$ to be the roots of the next Chebyshev polynomial, $T_{n + 1} (x)$ .

This theorem is the formal reason why Chebyshev nodes defeat Runge’s phenomenon. By clustering near the endpoints, they “pin down” the nodal polynomial where it would otherwise explode, forcing its oscillations to have a uniform, minimal amplitude across the entire interval. This answers the question from our last lecture. Chebyshev nodes are the optimal placement of sample points to minimize the worst-case interpolation error, regardless of which basis (Newton, Lagrange) you use.

2. Orthogonality: The Best Basis for Interpolation

We’ve just seen that Chebyshev nodes are the best points to use with our existing Lagrange/Barycentric method. But the Chebyshev polynomials are so powerful, they invite a new question: what if we use them not just to find nodes, but as the basis functions themselves?

Instead of writing $p (x) = \sum y_{i} ℓ_{i} (x)$ , let’s try to write:

p (x) = c_{0} T_{0} (x) + c_{1} T_{1} (x) + \dots + c_{n} T_{n} (x)

Why would we do this? Because the Chebyshev polynomials form an orthogonal basis. This is a concept from linear algebra that makes finding the coefficients $c_{k}$ incredibly easy and efficient.

Discrete Orthogonality of Chebyshev Polynomials

When evaluated at the Chebyshev nodes $x_{l}$ , the Chebyshev polynomials are orthogonal. This means the “dot product” (a discrete sum) of any two different Chebyshev polynomials is zero:
$⟨ T_{k}, T_{j} ⟩ = l = 0 \sum n T_{k} (x_{l}) T_{j} (x_{l}) = 0 for k \neq = j$

Why should you care about this? Because it completely eliminates the need to solve a system of equations to find the coefficients $c_{k}$ .

To find a specific coefficient, say $c_{k}$ , we can use a trick. Take the “dot product” of our polynomial $p (x)$ with $T_{k} (x)$ :

⟨ p (x), T_{k} (x)⟩ = ⟨ c_{0} T_{0} + c_{1} T_{1} + \dots + c_{n} T_{n}, T_{k} ⟩

Because of orthogonality, every term on the right side becomes zero except for one:

⟨ p (x), T_{k} (x)⟩ = c_{k} ⟨ T_{k}, T_{k} ⟩

Solving for our coefficient is now trivial:

c_{k} = \frac{⟨ p ( x ) , T _{k} ( x )⟩}{⟨ T _{k} , T _{k} ⟩} = \frac{\sum _{l = 0}^{n} p ( x _{l} ) T _{k} ( x _{l} )}{\sum _{l = 0}^{n} T _{k} ( x _{l} ) ^{2}}

Since we are interpolating, $p (x_{l}) = y_{l}$ , so the formula is simply:

c_{k} = \frac{\sum _{l = 0}^{n} y _{l} T _{k} ( x _{l} )}{\sum _{l = 0}^{n} T _{k} ( x _{l} ) ^{2}}

The Engineering Payoff of Orthogonality

No Linear System to Solve: Orthogonality lets us compute each coefficient $c_{k}$ independently, bypassing the $O (n^{2})$ work of Newton’s method or the $O (n^{3})$ of a general linear solve.

Robustness: Errors in computing one coefficient do not affect the others.

Blazing Speed: This summation is not just any sum. It is a special type called a Discrete Cosine Transform (DCT), a close relative of the Fast Fourier Transform (FFT). It can be computed in only $O (n lo g n)$ operations. This is asymptotically the fastest known way to find the coefficients of an interpolating polynomial.

Evaluating in the Chebyshev Basis: Clenshaw’s Algorithm

Once we have the coefficients $c_{k}$ in $O (n lo g n)$ time, we need an efficient way to evaluate $p (x) = \sum c_{k} T_{k} (x)$ . The proper tool is Clenshaw’s Algorithm, a fast ( $O (n)$ ) and numerically stable recurrence, analogous to Horner’s method for the monomial basis.

Watch the video!

Clenshaw's Algorithm

To evaluate $p (x) = \sum_{k = 0}^{n} c_{k} T_{k} (x)$ :

Initialize: $d_{n + 1} = 0$ , $d_{n + 2} = 0$ .

Iterate backwards: For $k = n, n - 1, \dots, 0$ , compute: $d_{k} = c_{k} + 2 x d_{k + 1} - d_{k + 2}$

The result is: $p (x) = d_{0} - x d_{1}$ . This stable algorithm is what’s used in professional software libraries to evaluate functions represented in a Chebyshev series.

Beyond Polynomials: Trigonometric Interpolation

The definition of Chebyshev polynomials, $T_{n} (x) = cos (n arccos (x))$ , was not an accident. It reveals a deep connection between optimal polynomials and trigonometry. This leads to a natural question: what if the function we are trying to model is inherently periodic or cyclical?

Consider phenomena like:

The vibration of a guitar string.
The voltage in an AC power line.
Seasonal temperature fluctuations.

For these problems, polynomials are a poor choice. A non-constant polynomial will always fly off to $\pm \infty$ , which fundamentally mismatches the bounded, repeating nature of the signal. We need a basis that is born from oscillations. This brings us to the revolutionary idea of Joseph Fourier.

The Fourier Idea: Decomposing Functions into Simple Waves

Fourier’s profound insight was that any reasonably well-behaved periodic function can be perfectly described as a sum of simple sine and cosine waves of different frequencies and amplitudes. This is like decomposing a musical chord into its individual notes.

This is a fundamentally different approach from a Taylor series.

A Taylor series is a local approximation, accurate near a single point.
A Fourier series is a global approximation, describing the function’s behavior over an entire interval by its frequency content.

The Mathematical Framework: The $L^{2}$ Space and the Fourier Basis

To make this precise, we work in the space of “finite energy” signals, $L^{2} ([0, 1])$ . This space is powerful because it includes functions with jumps and corners, which are common in real-world signals.

The Space $L^{2} ([0, 1])$ : The Space of Finite Energy Signals

The space $L^{2} ([0, 1])$ consists of all complex-valued functions $f$ on the interval $[0, 1]$ for which the total “energy” is finite:
$∥ f ∥_{L^{2}}^{2} = \int_{0}^{1} ∣ f (x) ∣^{2} d x < \infty$
This space has a natural inner product:
$⟨ g, f ⟩_{L^{2}} := \int_{0}^{1} g (x) \overline{f (x)} d x$

Our building blocks are the Fourier basis functions, which are complex exponentials that elegantly combine sines and cosines:

ϕ_{k} (x) = e^{i 2 πk x} = cos (2 πk x) + i sin (2 πk x)

Here, the integer $k$ is the frequency. These functions form a complete orthonormal basis for the $L^{2}$ space. This means any function $f (t)$ in this space can be written as an infinite sum (a Fourier series):

f (t) = k = - \infty \sum \infty \hat{f} (k) e^{i 2 πk t}

The coefficients $\hat{f} (k)$ are the Fourier coefficients. Thanks to orthogonality, we find them with an inner product:

\hat{f} (k) = ⟨ f, ϕ_{k} ⟩_{L^{2}} = \int_{0}^{1} f (t) e^{- i 2 πk t} d t

The Key Insight: Smoothness and the Decay of Fourier Coefficients

The power of this decomposition lies in the deep connection between a function’s visual “smoothness” and how quickly its Fourier coefficients $\hat{f} (k)$ shrink to zero for high frequencies $∣ k ∣$ .

The plots on the slide illustrate this fundamental principle:

Red (Ramp Function): Discontinuous. Needs many high-frequency waves to create the sharp jump. Coefficients decay slowly, like $∣ \hat{f} (k) ∣ \sim 1/∣ k ∣$ .
Green (Hat Function): Continuous but has a sharp corner. Smoother. Coefficients decay faster, like $∣ \hat{f} (k) ∣ \sim 1/∣ k^{2} ∣$ .
Blue (Smooth Function): Infinitely smooth. No sharp features. Coefficients decay exponentially fast. The function can be accurately represented with just a few low-frequency terms.

Theorem 3.1.16: Smoothness vs. Decay Rate

The core theorem states that if a function has $n$ continuous derivatives, its Fourier coefficients decay at least as fast as $O (k^{- n})$ .
$The smoother the function, the faster its Fourier coefficients decay.$
This is the theoretical foundation for lossy data compression (like JPEG and MP3). A smooth signal can be accurately represented by storing only its first few significant Fourier coefficients.

Furthermore, differentiation in the time domain becomes simple multiplication in the frequency domain:

f^{'} (k) = (i 2 πk) \hat{f} (k)

This transforms calculus into algebra, a cornerstone of modern numerical methods.

The Gibbs Phenomenon

When we approximate a discontinuous function (like a square wave) with a finite number of Fourier terms, an artifact appears.

The approximation overshoots the true value at the jump. This is the Gibbs phenomenon. The overshoot’s height is a universal constant (about 9% of the jump) and does not disappear as we add more terms; it just gets squeezed into a narrower region. It’s the price we pay for approximating a sharp edge with smooth waves.

From Continuous to Discrete: The Discrete Fourier Transform (DFT)

In practice, we have discrete samples, not a continuous function. For trigonometric interpolation, we take $N$ samples $y_{l} = f (x_{l})$ at equidistant points $x_{l} = l / N$ .

Equidistant Points: Bad for Polynomials, Perfect for Fourier

This is a critical distinction. We just learned that equidistant points are terrible for high-degree polynomial interpolation due to Runge’s phenomenon. However, for trigonometric interpolation, they are the perfect choice because they preserve the orthogonality of the sampled sine and cosine waves.

We approximate the continuous Fourier integral with a discrete sum, which gives us the Discrete Fourier Transform (DFT).

The Discrete Fourier Transform (DFT)

Given $N$ data points $y_{l}$ , the DFT computes $N$ discrete Fourier coefficients $\hat{f}_{N} (k)$ :
$\hat{f}_{N} (k) = \frac{1}{N} l = 0 \sum N - 1 y_{l} e^{- i 2 πk l / N}$
The trigonometric interpolant is then the finite sum using these coefficients:
$p_{N} (t) = k = - N /2 + 1 \sum N /2 \hat{f}_{N} (k) e^{i 2 πk t}$
This trigonometric polynomial $p_{N} (t)$ is the unique function from its class that passes exactly through all the data points $(x_{l}, y_{l})$ .

A naive implementation of the DFT costs $O (N^{2})$ operations. However, the revolutionary Fast Fourier Transform (FFT) algorithm computes the exact same result in an incredible $O (N lo g N)$ time. This efficiency is what makes all of modern digital signal processing possible.

Continue here: 05 Discrete Fourier Transform

CS Notes

Explorer

04 Chebyshev Polynomials and Trigonometric (Fourier) Interpolation

Recap and Motivation: The Unanswered Question

Chebyshev Polynomials: The Optimal Polynomials

Definition and Properties

The “Minimax” Property and Orthogonality: Two Solutions in One

1. The Minimax Property: The Best Nodes for Interpolation

2. Orthogonality: The Best Basis for Interpolation

Evaluating in the Chebyshev Basis: Clenshaw’s Algorithm

Beyond Polynomials: Trigonometric Interpolation

The Fourier Idea: Decomposing Functions into Simple Waves

The Mathematical Framework: The $L^{2}$ Space and the Fourier Basis

The Key Insight: Smoothness and the Decay of Fourier Coefficients

The Gibbs Phenomenon

From Continuous to Discrete: The Discrete Fourier Transform (DFT)

Table of Contents

Graph View

Backlinks

CS Notes

Explorer

04 Chebyshev Polynomials and Trigonometric (Fourier) Interpolation

Recap and Motivation: The Unanswered Question

Chebyshev Polynomials: The Optimal Polynomials

Definition and Properties

The “Minimax” Property and Orthogonality: Two Solutions in One

1. The Minimax Property: The Best Nodes for Interpolation

2. Orthogonality: The Best Basis for Interpolation

Evaluating in the Chebyshev Basis: Clenshaw’s Algorithm

Beyond Polynomials: Trigonometric Interpolation

The Fourier Idea: Decomposing Functions into Simple Waves

The Mathematical Framework: The L2 Space and the Fourier Basis

The Key Insight: Smoothness and the Decay of Fourier Coefficients

The Gibbs Phenomenon

From Continuous to Discrete: The Discrete Fourier Transform (DFT)

Table of Contents

Graph View

Backlinks

The Mathematical Framework: The $L^{2}$ Space and the Fourier Basis