03 Polynomial Interpolation using Lagrange Form, Barycentric Lagrange Form, Runge's Phenomenon, Chebyshev Nodes

In our last lecture, we established that while polynomial interpolation is a powerful idea, the naive approach using the monomial basis ${1, x, x^{2}, \dots}$ is a numerical trap due to the ill-conditioned Vandermonde matrix. The Newton basis provided a much more stable and efficient alternative by constructing the polynomial incrementally.

Today, we will explore another perspective on the same problem: the Lagrange form of the interpolating polynomial. This approach will lead us to a profound insight: the choice of the interpolation points $x_{i}$ is just as important as the algorithm we use.

The Lagrange Basis: A Different Set of Building Blocks

The Newton basis was constructed to make the system matrix for the coefficients lower triangular. The Lagrange basis is built on an even simpler and more elegant idea.

Recall that we are looking for a polynomial $p (x)$ in the space of polynomials of degree at most $n$ , $P_{n}$ , such that $p (x_{i}) = y_{i}$ . We can write this polynomial as a linear combination of basis functions:

p (x) = i = 0 \sum n y_{i} ℓ_{i} (x)

Look closely at this formula. We are using the data values $y_{i}$ directly as the coefficients. For this to work, the basis functions $ℓ_{i} (x)$ must have a very special property. When we evaluate this sum at a node $x_{j}$ , we want to get $y_{j}$ . This means that for the sum to collapse to just the term $y_{j} ℓ_{j} (x_{j})$ , we need $ℓ_{i} (x_{j})$ to be 1 when $i = j$ and 0 when $i \neq = j$ .

This leads to the definition of the Lagrange cardinal basis polynomials.

Recall: We looked at this in Discrete Math class: Lagrange Interpolation

Definition: Lagrange Cardinal Polynomials

For a given set of $n + 1$ distinct nodes ${x_{0}, x_{1}, \dots, x_{n}}$ , the $i$ -th Lagrange cardinal polynomial, $ℓ_{i} (x)$ , is the unique polynomial of degree $n$ that satisfies:
$ℓ_{i} (x_{j}) = δ_{ij} = {10 if i = j if i \neq = j$
This “picker” property is the defining characteristic of a cardinal basis.

How do we construct such a polynomial? To make $ℓ_{i} (x)$ equal to zero at all nodes $x_{j}$ where $j \neq = i$ , the polynomial must contain the factors $(x - x_{0}), (x - x_{1}), \dots, (x - x_{i - 1}), (x - x_{i + 1}), \dots, (x - x_{n})$ . This gives us the numerator:

j = 0, j \neq = i \prod n (x - x_{j})

This product is a polynomial of degree $n$ that is zero at all the correct nodes. Now, we just need to ensure it equals 1 at $x = x_{i}$ . We can achieve this by simply dividing by the value of the numerator at $x_{i}$ :

ℓ_{i} (x) = \frac{\prod _{j = 0, j \neq = i}^{n} ( x - x _{j} )}{\prod _{j = 0, j \neq = i}^{n} ( x _{i} - x _{j} )}

The plot on the slide shows the five Lagrange basis polynomials for five nodes. Notice how each polynomial (e.g., the red one) is equal to 1 at its corresponding node and 0 at all other nodes.

The Lagrange Interpolating Polynomial

With this basis, constructing the final interpolating polynomial is trivial. We don’t need to solve any system of equations. The solution is given immediately by the Lagrange form:

p (x) = i = 0 \sum n y_{i} ℓ_{i} (x)

This is the same unique interpolating polynomial we found with the Newton method, just expressed in a different basis.

A Beautiful Property: Partition of Unity

The Lagrange basis polynomials have a remarkable property: they sum to one.
$k = 0 \sum n ℓ_{k} (x) = 1 for all x$
Proof: Consider interpolating the constant function $f (x) = 1$ . The data values are $y_{i} = 1$ for all $i$ . The unique polynomial of degree $n$ that passes through these points is simply the constant polynomial $p (x) = 1$ . Using the Lagrange form, this polynomial is $p (x) = \sum_{k = 0}^{n} (1) \cdot ℓ_{k} (x)$ . Therefore, the sum of the basis functions must be 1. This is called a “partition of unity.”

The Barycentric Form: A More Stable and Efficient Way to Evaluate

We’ve seen that the Lagrange form provides a beautiful theoretical solution to the interpolation problem:

p (x) = i = 0 \sum n y_{i} ℓ_{i} (x) where ℓ_{i} (x) = \frac{\prod _{j = 0, j \neq = i}^{n} ( x - x _{j} )}{\prod _{j = 0, j \neq = i}^{n} ( x _{i} - x _{j} )}

However, if we try to use this formula directly for computation, we run into two problems:

Inefficiency: To evaluate $p (x)$ at a single point, we must compute each of the $n + 1$ Lagrange polynomials $ℓ_{i} (x)$ . Each $ℓ_{i} (x)$ involves a product of $n$ terms. This leads to a total cost of $O (n^{2})$ operations for each point we want to evaluate. If we need to plot the polynomial, this becomes very expensive.
Numerical Instability: For large $n$ , the numerator and denominator in $ℓ_{i} (x)$ can become very large or very small, leading to potential issues with overflow, underflow, and rounding errors, especially when $x$ is far from the interpolation interval.

We need a smarter way to evaluate the same polynomial. This is where the barycentric form comes in. It is not a different polynomial; it is an algebraic rearrangement of the Lagrange form that is far superior for computation.

Derivation: From Lagrange to Barycentric

Let’s start by defining two key components.

1. The Nodal Polynomial, $L (x)$ : This is the polynomial whose roots are simply our interpolation nodes.

L (x) = (x - x_{0}) (x - x_{1}) \dots (x - x_{n})

We can use this to simplify the numerator of our Lagrange polynomial $ℓ_{i} (x)$ . The product in the numerator is just $L (x)$ with the $(x - x_{i})$ term missing. So, we can write:

j = 0, j \neq = i \prod n (x - x_{j}) = \frac{L ( x )}{x - x _{i}}

2. The Barycentric Weights, $λ_{k}$ : These are constants that depend only on the nodes $x_{i}$ . The $k$ -th weight is defined as:

λ_{k} = \frac{1}{\prod _{j = 0, j \neq = k}^{n} ( x _{k} - x _{j} )}

Notice that the denominator here is exactly the same as the denominator in the original Lagrange polynomial $ℓ_{k} (x)$ .

Now, let’s substitute these into the Lagrange formula for $ℓ_{i} (x)$ :

ℓ_{i} (x) = \frac{L ( x )}{x - x _{i}} \cdot \frac{1}{\prod _{j = 0, j \neq = i}^{n} ( x _{i} - x _{j} )} = L (x) \frac{λ _{i}}{x - x _{i}}

This is already a much cleaner expression. Now, substitute this back into the main Lagrange sum for $p (x)$ :

p (x) = i = 0 \sum n y_{i} (L (x) \frac{λ _{i}}{x - x _{i}})

Since $L (x)$ does not depend on the summation index $i$ , we can factor it out:

p (x) = L (x) i = 0 \sum n \frac{λ _{i}}{x - x _{i}} y_{i}

This is the first barycentric form. It’s better, but we can make it even more stable.

The final trick comes from the “partition of unity” property we saw earlier: $\sum_{i = 0}^{n} ℓ_{i} (x) = 1$ . Let’s rewrite this using our new expression for $ℓ_{i} (x)$ :

i = 0 \sum n L (x) \frac{λ _{i}}{x - x _{i}} = 1

Factoring out $L (x)$ gives:

L (x) (i = 0 \sum n \frac{λ _{i}}{x - x _{i}}) = 1 ⟹ L (x) = \frac{1}{\sum _{i = 0}^{n} \frac{λ _{i}}{x - x _{i}}}

Now, substitute this expression for $L (x)$ back into our first barycentric form.

p (x) = L (x) (\frac{1}{\sum _{k = 0}^{n} \frac{λ _{k}}{x - x _{k}}}) Original Sum (k = 0 \sum n \frac{λ _{k}}{x - x _{k}} y_{k})

This gives us the final result:

p (x) = \frac{\sum _{k = 0}^{n} \frac{λ _{k}}{x - x _{k}} y _{k}}{\sum _{k = 0}^{n} \frac{λ _{k}}{x - x _{k}}}

This is the second barycentric form, and it is the state-of-the-art method for evaluating interpolating polynomials.

Intuition: What Does “Barycentric” Mean?

The name “barycentric” comes from the concept of a center of mass (a barycenter). The formula expresses the value of the interpolating polynomial $p (x)$ as a weighted average of the data values $y_{k}$ .

p (x) = k = 0 \sum n w_{k} (x) y_{k} where the weights are w_{k} (x) = \frac{\frac{λ _{k}}{x - x _{k}}}{\sum _{j = 0}^{n} \frac{λ _{j}}{x - x _{j}}}

The weights $w_{k} (x)$ sum to 1. The term $\frac{1}{x - x _{k}}$ acts as a measure of proximity. When the evaluation point $x$ is very close to a node $x_{k}$ , the term $\frac{1}{x - x _{k}}$ becomes huge. This makes the weight $w_{k} (x)$ for that point approach 1, and all other weights approach 0. Consequently, the value of the polynomial $p (x)$ approaches $y_{k}$ , exactly as it should. The value of the polynomial at any point is a blend of the values at the nodes, with the nearest nodes having the most influence.

Why the Barycentric Form is So Good

Let’s summarize the immense practical advantages of this formula.

Advantages of the Barycentric Form

Efficiency (The “Prep-then-Eval” Strategy): The most expensive part of the process is calculating the barycentric weights $λ_{k}$ . This takes $O (n^{2})$ operations. However, the crucial point is that the weights depend only on the nodes $x_{i}$ , not on the data values $y_{i}$ . We can perform this expensive $O (n^{2})$ calculation once as a pre-computation step. After that, every single evaluation of $p (x)$ at a new point costs only $O (n)$ operations. This is a massive improvement over the $O (n^{2})$ cost of the naive Lagrange form.

Stability: The second barycentric form is numerically much more stable. By dividing the two sums, we cancel out the large, potentially problematic term $L (x)$ . This avoids the intermediate calculation of very large or small numbers and makes the formula robust against rounding errors.

Flexibility: Imagine you are an experimentalist who has fixed measurement locations (the nodes $x_{i}$ are set). You can run your experiment multiple times, getting different sets of data values $y_{i}$ . With the barycentric approach, you compute the expensive weights $λ_{k}$ just once for your setup. Then, for each new set of measurements, you can find the corresponding interpolating polynomial in just $O (n)$ time per evaluation point.

For these reasons, high-quality numerical libraries like SciPy use the barycentric form for their polynomial interpolation routines. It represents the pinnacle of combining theoretical elegance with numerical robustness and efficiency.

The Error in Polynomial Interpolation: How Good Is Our Guess?

We now have powerful tools, the Newton and Barycentric forms, to construct and evaluate the unique interpolating polynomial. But this brings us to the most important question: how good is it? How close is our polynomial model, $p (x)$ , to the true, underlying function, $f (x)$ , that we are trying to discover?

The answer is given by a fundamental theorem that precisely quantifies the interpolation error.

Theorem 2.2.11 (The Error Formula)

Let $f$ be an $(n + 1)$ -times continuously differentiable function. Let $p (x)$ be the polynomial of degree at most $n$ that interpolates $f$ at the distinct nodes $x_{0}, \dots, x_{n}$ . Then for any $x$ in the interpolation interval, there exists some (unknown) point $ξ$ within that interval such that the error is given by:
$Error (x) = f (x) - p (x) = \frac{f ^{(n + 1)} ( ξ )}{( n + 1 )!} (x - x_{0}) (x - x_{1}) \dots (x - x_{n})$

This formula is incredibly revealing. It’s not just an abstract bound; it tells us exactly what the error depends on. Let’s break it down into its three components:

The Function’s Smoothness ( $f^{(n + 1)} (ξ)$ ): This term represents the $(n + 1)$ -th derivative of the true function, evaluated at some mysterious point $ξ$ . We don’t know where $ξ$ is, but we know its magnitude is bounded by the maximum value of the derivative on the interval. Intuition: This term tells us that “wiggly” functions are hard to interpolate. If a function has large higher-order derivatives, it changes direction rapidly, and a low-degree polynomial will struggle to keep up. Smooth, slowly-varying functions with small higher derivatives are easy to interpolate.
The Number of Points ( $(n + 1)!$ ): The factorial in the denominator is a powerful force. As we increase the number of interpolation points (and thus the degree $n$ ), the factorial grows extremely fast. This suggests that, all else being equal, adding more points should dramatically decrease the error.
The Choice of Nodes ( $\prod (x - x_{i})$ ): This is the nodal polynomial, $(x - x_{0}) (x - x_{1}) \dots (x - x_{n})$ . This term depends on the geometry of our measurement points. This is the only part of the error that we, as designers of an experiment or algorithm, can directly control. To minimize the overall error, our goal must be to choose the nodes $x_{0}, \dots, x_{n}$ in a way that makes the magnitude of this product as small as possible across the entire interval.

The Trap of Equidistant Nodes: Runge’s Phenomenon

What is the most natural, intuitive way to choose our measurement points? We space them out evenly. This is called using equidistant nodes. For an interval $[a, b]$ , we would choose $x_{j} = a + j \cdot h$ for a fixed step size $h$ .

This seemingly sensible choice turns out to be a terrible idea for high-degree polynomial interpolation.

Let’s look at the nodal polynomial term, $∣ (x - x_{0}) \dots (x - x_{n}) ∣$ , for equidistant points. It can be shown that this function is relatively small in the middle of the interval but grows to be enormous near the endpoints. This imbalance means that while the interpolation might be good in the center, the error will explode near the boundaries.

This disastrous effect is known as Runge’s phenomenon. Even for a perfectly smooth and well-behaved function, the interpolating polynomial for equidistant nodes will develop wild oscillations near the ends of the interval as the degree $n$ increases.

The plots on slide 8 are the definitive illustration of this failure. They show the error when interpolating a smooth function.

The red curve (Equidistant) shows the error for equally spaced nodes. In the center, the error is small. But near the boundaries, it explodes into oscillations that are far larger than the function itself. Adding more points would only make these oscillations worse.
The blue curve (Chebyshev) shows the error for a smarter choice of nodes, which we will discuss next. The error is orders of magnitude smaller and is distributed evenly across the entire interval.

This behavior is quantified by a value called the Lebesgue constant, $Λ_{n}$ . It acts as an error amplification factor. The error bound can be stated as:

∥ f - p ∥_{\infty} \leq (1 + Λ_{n}) ∥ f - p^{*} ∥_{\infty}

where $p^{*}$ is the best possible polynomial approximation to $f$ . For equidistant points, the Lebesgue constant grows exponentially with $n$ :

Λ_{10} \approx 40 Λ_{20} \approx 30, 000 Λ_{40} \approx 1 0^{10}

An error amplification factor of $1 0^{10}$ means the method is completely useless. The rounding errors and approximation errors are magnified to the point of absurdity.

The Peril of High-Degree Interpolation with Equidistant Nodes

Using a high-degree polynomial to interpolate data at equally spaced points is one of the most common and dangerous mistakes in numerical analysis. The result will almost certainly be a wildly oscillating and meaningless curve. The intuition that “more points must mean a better fit” is dangerously wrong in this context.

The Solution: Chebyshev Nodes

So, if not equally spaced, how should we choose the nodes? The error formula points the way: we must choose the nodes $x_{k}$ to minimize the maximum value of the nodal polynomial, $∣ \prod (x - x_{k}) ∣$ .

You might wanna skip to the 2nd half of the video…

The solution to this minimization problem is given by the Chebyshev nodes. These points are the projections onto the x-axis of equally spaced points on a semicircle.

On the standard interval $[- 1, 1]$ , the Chebyshev nodes are given by the simple formula:

x_{k} = cos (\frac{kπ}{n}), k = 0, \dots, n

Notice their distribution: they are clustered more densely near the endpoints of the interval (at -1 and 1) and are sparser in the middle. This non-uniform spacing is precisely what is needed to counteract the polynomial’s tendency to oscillate at the boundaries. It “pins down” the polynomial where it’s most likely to go wild.

The results are dramatic. The Lebesgue constant for Chebyshev nodes grows only logarithmically, $Λ_{n} \approx \frac{2}{π} lo g (n)$ , which is extremely slow. This guarantees that as you increase the number of points $n$ , the interpolating polynomial will converge to the true function (for any reasonably smooth function).

The Golden Rule of Polynomial Interpolation

If you have the freedom to choose where you take your measurements (i.e., you can plan your experiment), always choose the Chebyshev nodes (or points with a similar clustering at the boundaries). This choice tames the error, defeats Runge’s phenomenon, and ensures that your high-degree polynomial interpolation is a stable, convergent, and powerful numerical tool.

Continue here: 04 Chebyshev Polynomials and Trigonometric (Fourier) Interpolation

CS Notes

Explorer

03 Polynomial Interpolation using Lagrange Form, Barycentric Lagrange Form, Runge's Phenomenon, Chebyshev Nodes

The Lagrange Basis: A Different Set of Building Blocks

The Lagrange Interpolating Polynomial

The Barycentric Form: A More Stable and Efficient Way to Evaluate

Derivation: From Lagrange to Barycentric

Intuition: What Does “Barycentric” Mean?

Why the Barycentric Form is So Good

The Error in Polynomial Interpolation: How Good Is Our Guess?

The Trap of Equidistant Nodes: Runge’s Phenomenon

The Solution: Chebyshev Nodes

Table of Contents

Graph View

Backlinks