Binomial Distribution
Introduction
The binomial distribution is one of the most important and widely used probability distributions in statistics. It describes the number of successes in a sequence of n independent trials, each with probability p of success. This article presents a complete and in-depth analysis of the binomial distribution, including its theory, practical applications, and detailed examples.
What is the Binomial Distribution?
The binomial distribution is a discrete probability distribution that models the number of successes in n independent trials, where each trial has only two possible outcomes: success (with probability p) or failure (with probability 1-p).
Conditions for Binomial Distribution
For a situation to be modeled by a binomial distribution, the following conditions must be met:
- 1. Fixed number of trials (n): The experiment consists of n identical trials
- 2. Two possible outcomes: Each trial results in success or failure
- 3. Constant probability (p): The probability of success p is the same in each trial
- 4. Independence: Trials are independent of each other
Binomial Formula
The probability function of the binomial distribution is given by:
Probability Function
P(X = k) = C(n,k) × pᵏ × (1-p)ⁿ⁻ᵏP(X = k) = (n! / (k! × (n-k)!)) × pᵏ × (1-p)ⁿ⁻ᵏ- • X: Random variable (number of successes)
- • k: Number of successes (0, 1, 2, ..., n)
- • n: Number of trials
- • p: Probability of success in each trial
- • C(n,k): Binomial coefficient (combinations)
Notation
Standard Notation
X ~ Binomial(n, p)Read as: "X follows a binomial distribution with parameters n and p"
Parameters n and p
Parameter n
Number of trials: Must be a positive integer. Determines how many independent trials are performed.
Parameter p
Probability of success: Must be between 0 and 1 (0 ≤ p ≤ 1). Represents the probability of success in each individual trial.
Mean and Variance
The measures of central tendency and dispersion of the binomial distribution are:
Fundamental Statistics
Mean (Expected Value)
E[X] = μ = n × pThe expected number of successes in n trials.
Variance
Var(X) = σ² = n × p × (1-p)Measures the spread of values around the mean.
Standard Deviation
σ = √(n × p × (1-p))Square root of the variance.
Examples
Example 1: Coin Toss
One of the simplest and most didactic examples of the binomial distribution:
Problem
We toss a fair coin 10 times. What is the probability of getting exactly 6 heads?
Solution
- • n = 10, p = 0.5, k = 6
- • P(X = 6) = C(10,6) × (0.5)⁶ × (0.5)⁴
- • P(X = 6) ≈ 0.205
Therefore, the probability of getting exactly 6 heads in 10 tosses is approximately 20.5%.
Example 2: Quality Control
Problem
In a production line, 5% of products are defective. If we inspect 20 products randomly, what is the probability of finding exactly 2 defective products?
Solution
- • n = 20, p = 0.05, k = 2
- • P(X = 2) ≈ 0.189
The probability of finding exactly 2 defective products is approximately 18.9%.
Example 3: Lottery
Problem
Suppose the probability of a person winning a prize in a lottery is 0.01 (1%). If 100 people participate, what is the probability of exactly 3 people winning?
Solution
- • n = 100, p = 0.01, k = 3
- • P(X = 3) ≈ 0.061
The probability of exactly 3 people winning is approximately 6.1%.
Shape Properties
Shape of the Distribution
The shape of the binomial distribution depends on the values of n and p:
p = 0.5
Symmetric distribution around the mean
p < 0.5
Right-skewed distribution (negative bias)
p > 0.5
Left-skewed distribution (positive bias)
Main Properties
- • Possible values: X can assume values from 0 to n
- • Sum of probabilities: Σ P(X = k) = 1 (sum from k = 0 to n)
- • Symmetry: If p = 0.5, the distribution is symmetric
- • Maximum variance: Occurs when p = 0.5
- • Reproduction: The sum of independent binomial variables is also binomial
CDF
The cumulative distribution function (CDF) gives the probability that X is less than or equal to k:
CDF Formula
F(k) = P(X ≤ k) = Σ P(X = i) for i = 0 to kThe CDF accumulates all probabilities from 0 to k.
Normal and Poisson Approximations
Normal Approximation
For large values of n, the binomial distribution can be approximated by the normal distribution:
Conditions for Normal Approximation
- • n × p ≥ 5
- • n × (1-p) ≥ 5
When these conditions are met, we can use:
X ≈ N(n×p, n×p×(1-p))Poisson Approximation
When n is large and p is small, we can approximate the binomial by Poisson:
Conditions for Poisson Approximation
- • n ≥ 20
- • p ≤ 0.05 (or n×p ≤ 5)
Under these conditions, we can use λ = n×p:
X ≈ Poisson(λ = n×p)Applications
Application Areas
- • Quality Control: Number of defective items in samples
- • Surveys and Polls: Number of favorable responses
- • Medicine: Number of patients responding to treatment
- • Marketing: Number of customers who make a purchase
- • Engineering: Number of components that fail in tests
- • Finance: Number of successful investments
- • Data Analysis: Modeling binary events
When NOT to Use
⚠️ When NOT to Use the Binomial Distribution
- • Sampling without replacement: Use hypergeometric distribution
- • Variable probability: If p changes between trials
- • Dependent trials: If trials are not independent
- • More than two outcomes: Use multinomial distribution
- • Variable number of trials: Use geometric or negative binomial distribution
Statistical Tests
The binomial distribution is used in several statistical tests:
- • Binomial Test: Tests whether a proportion equals a specific value
- • Sign Test: Tests whether the median equals a specific value
- • McNemar Test: Tests associations in paired 2×2 tables
Conclusion
The binomial distribution is a fundamental and powerful tool in statistics, especially useful for modeling situations where we have a fixed number of independent trials, each with two possible outcomes.
Understanding the binomial distribution in depth allows modeling a wide variety of real phenomena, from quality control to opinion polls. The key to effectively using the binomial distribution is to carefully verify that all necessary conditions are met in your specific situation.
Remember: the binomial distribution assumes independent trials with constant success probability. If these conditions are not met, consider alternative distributions such as hypergeometric or Poisson, depending on the context.