Binomial Distribution - Articles

Introduction

The binomial distribution is one of the most important and widely used probability distributions in statistics. It describes the number of successes in a sequence of n independent trials, each with probability p of success. This article presents a complete and in-depth analysis of the binomial distribution, including its theory, practical applications, and detailed examples.

What is the Binomial Distribution?

The binomial distribution is a discrete probability distribution that models the number of successes in n independent trials, where each trial has only two possible outcomes: success (with probability p) or failure (with probability 1-p).

Conditions for Binomial Distribution

For a situation to be modeled by a binomial distribution, the following conditions must be met:

1. Fixed number of trials (n): The experiment consists of n identical trials
2. Two possible outcomes: Each trial results in success or failure
3. Constant probability (p): The probability of success p is the same in each trial
4. Independence: Trials are independent of each other

Binomial Formula

The probability function of the binomial distribution is given by:

Probability Function

P(X = k) = C(n,k) × pᵏ × (1-p)ⁿ⁻ᵏ

P(X = k) = (n! / (k! × (n-k)!)) × pᵏ × (1-p)ⁿ⁻ᵏ

• X: Random variable (number of successes)
• k: Number of successes (0, 1, 2, ..., n)
• n: Number of trials
• p: Probability of success in each trial
• C(n,k): Binomial coefficient (combinations)

Notation

Standard Notation

X ~ Binomial(n, p)

Read as: "X follows a binomial distribution with parameters n and p"

Parameters n and p

Parameter n

Number of trials: Must be a positive integer. Determines how many independent trials are performed.

Parameter p

Probability of success: Must be between 0 and 1 (0 ≤ p ≤ 1). Represents the probability of success in each individual trial.

Mean and Variance

The measures of central tendency and dispersion of the binomial distribution are:

Fundamental Statistics

Mean (Expected Value)

E[X] = μ = n × p

The expected number of successes in n trials.

Variance

Var(X) = σ² = n × p × (1-p)

Measures the spread of values around the mean.

Standard Deviation

σ = √(n × p × (1-p))

Square root of the variance.

Examples

Example 1: Coin Toss

One of the simplest and most didactic examples of the binomial distribution:

Problem

We toss a fair coin 10 times. What is the probability of getting exactly 6 heads?

Solution

• n = 10, p = 0.5, k = 6
• P(X = 6) = C(10,6) × (0.5)⁶ × (0.5)⁴
• P(X = 6) ≈ 0.205

Therefore, the probability of getting exactly 6 heads in 10 tosses is approximately 20.5%.

Example 2: Quality Control

Problem

In a production line, 5% of products are defective. If we inspect 20 products randomly, what is the probability of finding exactly 2 defective products?

Solution

• n = 20, p = 0.05, k = 2
• P(X = 2) ≈ 0.189

The probability of finding exactly 2 defective products is approximately 18.9%.

Example 3: Lottery

Problem

Suppose the probability of a person winning a prize in a lottery is 0.01 (1%). If 100 people participate, what is the probability of exactly 3 people winning?

Solution

• n = 100, p = 0.01, k = 3
• P(X = 3) ≈ 0.061

The probability of exactly 3 people winning is approximately 6.1%.

Shape Properties

Shape of the Distribution

The shape of the binomial distribution depends on the values of n and p:

p = 0.5

Symmetric distribution around the mean

p < 0.5

Right-skewed distribution (negative bias)

p > 0.5

Left-skewed distribution (positive bias)

Main Properties

• Possible values: X can assume values from 0 to n
• Sum of probabilities: Σ P(X = k) = 1 (sum from k = 0 to n)
• Symmetry: If p = 0.5, the distribution is symmetric
• Maximum variance: Occurs when p = 0.5
• Reproduction: The sum of independent binomial variables is also binomial

CDF

The cumulative distribution function (CDF) gives the probability that X is less than or equal to k:

CDF Formula

F(k) = P(X ≤ k) = Σ P(X = i) for i = 0 to k

The CDF accumulates all probabilities from 0 to k.

Normal and Poisson Approximations

Normal Approximation

For large values of n, the binomial distribution can be approximated by the normal distribution:

Conditions for Normal Approximation

• n × p ≥ 5
• n × (1-p) ≥ 5

When these conditions are met, we can use:

X ≈ N(n×p, n×p×(1-p))

Poisson Approximation

When n is large and p is small, we can approximate the binomial by Poisson:

Conditions for Poisson Approximation

• n ≥ 20
• p ≤ 0.05 (or n×p ≤ 5)

Under these conditions, we can use λ = n×p:

X ≈ Poisson(λ = n×p)

Applications

Application Areas

• Quality Control: Number of defective items in samples
• Surveys and Polls: Number of favorable responses
• Medicine: Number of patients responding to treatment
• Marketing: Number of customers who make a purchase
• Engineering: Number of components that fail in tests
• Finance: Number of successful investments
• Data Analysis: Modeling binary events

When NOT to Use

⚠️ When NOT to Use the Binomial Distribution

• Sampling without replacement: Use hypergeometric distribution
• Variable probability: If p changes between trials
• Dependent trials: If trials are not independent
• More than two outcomes: Use multinomial distribution
• Variable number of trials: Use geometric or negative binomial distribution

Statistical Tests

The binomial distribution is used in several statistical tests:

• Binomial Test: Tests whether a proportion equals a specific value
• Sign Test: Tests whether the median equals a specific value
• McNemar Test: Tests associations in paired 2×2 tables

Conclusion

The binomial distribution is a fundamental and powerful tool in statistics, especially useful for modeling situations where we have a fixed number of independent trials, each with two possible outcomes.

Understanding the binomial distribution in depth allows modeling a wide variety of real phenomena, from quality control to opinion polls. The key to effectively using the binomial distribution is to carefully verify that all necessary conditions are met in your specific situation.

Remember: the binomial distribution assumes independent trials with constant success probability. If these conditions are not met, consider alternative distributions such as hypergeometric or Poisson, depending on the context.