06 Estimating Probabilities

In the realm of probability, we often find ourselves in situations where calculating exact probabilities is either computationally infeasible or analytically intractable. Imagine trying to determine the precise probability of a complex event in a large, intricate system. Direct computation may be overwhelmingly complex. This is where the art of estimating probabilities becomes invaluable. Instead of seeking exact values, we aim to find bounds or approximations that provide useful insights into the likelihood of events. These estimations, while not perfectly precise, can be sufficient for making informed decisions and understanding the behavior of random systems.

Markov’s Inequality: A First Bound (2.7.1)

We begin with a remarkably simple yet surprisingly useful tool for probability estimation: Markov’s Inequality. Markov’s Inequality provides an upper bound on the probability that a non-negative random variable exceeds a certain threshold, using only its expectation. It’s a testament to the power of expectation as a single-number summary of a distribution.

Satz 2.67 (Markov’s Inequality): Let $X$ be a non-negative random variable, and let $t$ be any positive real number ( $t > 0$ ). Then the probability that $X$ is greater than or equal to $t$ is bounded by the ratio of its expectation to $t$ :

P r [X \geq t] \leq \frac{E [ X ]}{t}

Or, equivalently, in a slightly rearranged form:

P r [X \geq t \cdot E [X]] \leq \frac{1}{t}

This inequality is remarkably general, holding for any non-negative random variable, regardless of its specific distribution. This generality is both its strength and its weakness. It is widely applicable but can sometimes provide loose bounds compared to distribution-specific estimates.

Proof of Markov’s Inequality: The proof is a beautiful example of probabilistic reasoning based on the definition of expectation. Recall that for a non-negative random variable $X$ , its expectation is defined as:

E [X] = x \in W_{X} \sum x \cdot P r [X = x]

Since $X$ is non-negative, all values in its range $W_{X}$ are greater than or equal to zero. We can split the sum into two parts: values of $x$ that are less than $t$ and values of $x$ that are greater than or equal to $t$ :

E [X] = x \in W_{X} \sum x \cdot P r [X = x] = \geq 0 x \in W_{X}, x < t \sum x \cdot P r [X = x] + \geq \sum_{x \in W_{X}, x \geq t} t \cdot P r [X = x] x \in W_{X}, x \geq t \sum x \cdot P r [X = x]

Since $X$ is non-negative, the first term is non-negative. In the second term, for all $x \geq t$ , we have $x \geq t$ . Therefore, we can lower bound the second term by replacing $x$ with $t$ :

E [X] \geq x \in W_{X}, x \geq t \sum t \cdot P r [X = x] = t \cdot x \in W_{X}, x \geq t \sum P r [X = x] = t \cdot P r [X \geq t]

Dividing both sides by $t$ (since $t > 0$ ), we arrive at Markov’s Inequality:

P r [X \geq t] \leq \frac{E [ X ]}{t}

Markov’s Inequality provides a simple, expectation-based bound on tail probabilities for non-negative random variables. It is often used as a starting point for more refined probability estimations.

Chebyshev’s Inequality: Leveraging Variance (2.7.1)

While Markov’s Inequality is powerful in its generality, it only utilizes the expectation of the random variable. To obtain tighter bounds, we can incorporate information about the variance, which measures the spread of the distribution around its mean. This leads us to Chebyshev’s Inequality, a refinement of Markov’s Inequality that utilizes both expectation and variance.

Satz 2.68 (Chebyshev’s Inequality): Let $X$ be a random variable with expectation $E [X]$ and variance $Va r [X]$ . For any real number $t > 0$ , the probability that the absolute deviation of $X$ from its expectation is greater than or equal to $t$ is bounded by the ratio of the variance to $t^{2}$ :

P r [∣ X - E [X] ∣ \geq t] \leq \frac{Va r [ X ]}{t ^{2}}

Or, equivalently, using the standard deviation $σ = Va r [X]$ :

P r [∣ X - E [X] ∣ \geq k \cdot σ] \leq \frac{1}{k ^{2}}

Chebyshev’s Inequality provides a bound on the probability of deviations from the mean, using variance to quantify the spread. The bound decreases quadratically with increasing $t$ , indicating that larger deviations from the mean become increasingly less probable as variance decreases.

Proof of Chebyshev’s Inequality: The proof elegantly reduces Chebyshev’s Inequality to Markov’s Inequality by considering the squared deviation from the mean, $(X - E [X])^{2}$ , which is always a non-negative random variable.

Let $Y = (X - E [X])^{2}$ . Then $Y$ is a non-negative random variable, and its expectation is the variance of $X$ : $E [Y] = E [(X - E [X])^{2}] = Va r [X]$ . We want to bound the probability $P r [∣ X - E [X] ∣ \geq t]$ . This event is equivalent to the event $(X - E [X])^{2} \geq t^{2}$ , which is the event ${Y \geq t^{2}}$ . Applying Markov’s Inequality to the non-negative random variable $Y$ with threshold $t^{2}$ , we get:

P r [∣ X - E [X] ∣ \geq t] = P r [(X - E [X])^{2} \geq t^{2}] = P r [Y \geq t^{2}] \leq \frac{E [ Y ]}{t ^{2}} = \frac{Va r [ X ]}{t ^{2}}

This completes the proof of Chebyshev’s Inequality. By squaring the deviation and applying Markov’s Inequality, Chebyshev’s Inequality leverages the variance to provide a tighter bound on tail probabilities compared to Markov’s Inequality alone.

Chernoff Bounds: Exponential Tail Bounds for Sums of Independent Bernoulli Variables (2.7.2)

For sums of independent Bernoulli random variables, we can obtain even tighter bounds on tail probabilities than those provided by Markov’s or Chebyshev’s Inequalities. These tighter bounds are known as Chernoff bounds. Chernoff bounds exploit the specific structure of sums of independent Bernoulli variables to provide exponential decay in the tail probabilities, offering much sharper estimates for rare events.

Satz 2.70 (Chernoff Bounds): Let $X_{1}, \dots, X_{n}$ be independent Bernoulli random variables with $P r [X_{i} = 1] = p_{i}$ and $P r [X_{i} = 0] = 1 - p_{i}$ . Let $X = \sum_{i = 1}^{n} X_{i}$ be the sum of these variables, and let $μ = E [X] = \sum_{i = 1}^{n} p_{i}$ be the expected value of $X$ . Then, for any $δ > 0$ :

(i) Upper Tail Bound (δ > 0): For $0 < δ \leq 1$ :

P r [X \geq (1 + δ) μ] \leq e^{- \frac{1}{3} δ^{2} μ}

(ii) Lower Tail Bound (δ > 0): For $0 < δ \leq 1$ :

P r [X \leq (1 - δ) μ] \leq e^{- \frac{1}{2} δ^{2} μ}

(iii) Very Large Deviations (t ≥ 2eμ): For $t \geq 2 e μ$ :

P r [X \geq t] \leq 2^{- t}

Chernoff bounds provide exponential tail bounds, meaning that the probability of deviations from the mean decays exponentially with the size of the deviation. This is a much faster decay than the polynomial decay provided by Chebyshev’s Inequality, making Chernoff bounds significantly tighter for sums of independent Bernoulli variables.

Proof Sketch of Chernoff Bound (iii): The proof of Chernoff bounds typically involves a technique called the moment-generating function method. For bound (iii), we consider the probability $P r [X \geq t]$ and apply Markov’s Inequality to the random variable $Y = 4^{X}$ . By carefully bounding the expectation of $Y$ , $E [4^{X}]$ , and applying Markov’s Inequality, we can derive the exponential tail bound. The details of the proof are somewhat technical and involve calculus and properties of exponential functions.

Chernoff bounds are indispensable tools in the analysis of randomized algorithms, particularly for analyzing the probability of success or failure, tail probabilities of performance measures, and concentration of random variables around their means. They provide sharp probabilistic estimates that are crucial for designing and evaluating efficient and reliable randomized algorithms.

In summary, this section has explored powerful techniques for estimating probabilities when exact calculations are difficult or unnecessary. Markov’s and Chebyshev’s Inequalities provide general bounds based on expectation and variance, while Chernoff bounds offer tighter, exponential tail bounds specifically for sums of independent Bernoulli variables. These tools are essential for probabilistic analysis and algorithm design, allowing us to reason about random phenomena and make informed decisions in the face of uncertainty.

Prev: 05 Multiple Random Variables | Next: 07 Randomized Algorithms

CS Notes

Explorer

06 Estimating Probabilities

Markov’s Inequality: A First Bound (2.7.1)

Chebyshev’s Inequality: Leveraging Variance (2.7.1)

Chernoff Bounds: Exponential Tail Bounds for Sums of Independent Bernoulli Variables (2.7.2)

Table of Contents

Graph View

Backlinks