P(A) = favorable outcomes / total outcomes
0 ≤ P(A) ≤ 1
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
P(A') = 1 − P(A)
Probability quantifies uncertainty. These rules come from set theory — unions and intersections. Counting techniques like combinatorics (permutations and combinations) are essential for computing probabilities in finite sample spaces.
Bayes' Theorem is the backbone of Bayesian inference and machine learning. It lets us update beliefs with new evidence. Total probability connects this to partition: P(B) = Σ P(B|Aᵢ)·P(Aᵢ).
Example: Disease Testing
Disease prevalence: 1%. Test sensitivity: 99%. False positive rate: 5%.
Geometric(p): P(X = k) = (1−p)ᵏ⁻¹·p — trials until first success. Connects to exponential functions
Continuous Distributions
Normal: f(x) = (1/σ√(2π))·e^(−(x−μ)²/(2σ²))
Z-score: z = (x − μ)/σ
Standard normal: μ = 0, σ = 1
For continuous distributions, probabilities are areas under the curve — you need integration. The normal (Gaussian) distribution is the most important, governing everything from measurement error to stock prices.
Uniform(a, b): f(x) = 1/(b−a) — constant density
Exponential(λ): f(x) = λe⁻λˣ — time between events
Chi-squared: Sum of squared standard normals — used in goodness-of-fit tests
Central Limit Theorem
Central Limit Theorem: Regardless of the population distribution, the sample mean X̄ approaches a normal distribution N(μ, σ²/n) as n → ∞. This is why the normal distribution is so important, and why statistical inference works. The convergence concept mirrors limits in calculus.