Probability & Distributions

Model uncertainty with probability and understand how random variables behave.

Probability Basics

P(A) = favorable outcomes / total outcomes
0 ≤ P(A) ≤ 1
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
P(A') = 1 − P(A)

Probability quantifies uncertainty. These rules come from set theory — unions and intersections. Counting techniques like combinatorics (permutations and combinations) are essential for computing probabilities in finite sample spaces.

Conditional Probability & Bayes' Theorem

P(A|B) = P(A ∩ B)/P(B)
Independent events: P(A ∩ B) = P(A)·P(B)
Bayes: P(A|B) = P(B|A)·P(A)/P(B)

Bayes' Theorem is the backbone of Bayesian inference and machine learning. It lets us update beliefs with new evidence. Total probability connects this to partition: P(B) = Σ P(B|Aᵢ)·P(Aᵢ).

Example: Disease Testing

Disease prevalence: 1%. Test sensitivity: 99%. False positive rate: 5%.

P(Disease | Positive) = (0.99×0.01)/(0.99×0.01 + 0.05×0.99) ≈ 0.0099/0.0594 ≈ 16.7%

Even with a good test, a positive result is only 16.7% likely to be a true positive when the disease is rare!

Discrete Distributions

W8OjytBfQqN6mt6OMqOtLnpyGO9uAGkLH9mgcgZE96yoDp9D257OcBI8nHJVG/n9QOaT17wxhZv9Z1bLGIKB2QaaVAn8viGR6fAUKfPO7Ymk9rEYuJmUnV7b3rUgPf7HwlDlVmNn2AXRNT0Z2dDJzIl0JufqlX9frNYLIVgplfiXbQjySRbURvUkHGn6UD0nmHYVVsOnPLxu+tztO+BbN+jSnHgi+kBZ0sRdeTV2YZj6c1QMXB7CKJimYYrEeXBa8mmOQoEul8hjwTdfRlLIszTidat5olX8AhF4VNfw7I9YQENRHHtbaA+n+CIYL0vRN1hc4eZaBcqGmaj8fgKYGxIN8rM1IV2K2rOpGFxtJSZtWQVEdZ9Tba7BRRTHtduiIMHu//x5Cmmzus2LLXKjMF2Xb8f0++FL+ExZrWJ1CUZ5CRE5OCjQ345enedOtmGY+RUJDxhTYNAhEA4x+Rt3FVqs/XXuJUKXGk3z4dhi+0LNij3VGcOgInv02/doaHaAhQxORcIiTrEdw7F0Vrw2V0iP6DMGmKjhw+Buxl3yJu7xwYjDtnIqndO0uYJSpJVD0XehTjL6DOPnD36+CdPn7F4ihqfkNa8JxEKaGqyrpXL/w+bsw2Wwq6ojAMqu3wrK4RPvBMwJIyuRSfOaoPaWhUmSDLVox8UFgeBlw+6LR244x5nMqmx9mdXwsaNyd6HEwMFM9f3wSeAOAr+VdSPW294x0r1RtkNKDm6hv+YWrQLR8leQhlIoxl06w+pLRpQL3jHSHVGyS4t3hC4Ac+Tr85rQtR8GO6sSClScPFMUyvzdAueWsRUDMrmrVk0JItYh29j0osK3kfqu6Db7KS0+7N

Bernoulli: Single trial, p = success. E[X] = p, Var = p(1−p)
Binomial(n, p): P(X = k) = C(n,k)·pᵏ(1−p)ⁿ⁻ᵏ — uses polynomial expansion
Poisson(λ): P(X = k) = e⁻λ·λᵏ/k! — models rare events per interval
Geometric(p): P(X = k) = (1−p)ᵏ⁻¹·p — trials until first success. Connects to exponential functions

Continuous Distributions

Normal: f(x) = (1/σ√(2π))·e^(−(x−μ)²/(2σ²))
Z-score: z = (x − μ)/σ
Standard normal: μ = 0, σ = 1

For continuous distributions, probabilities are areas under the curve — you need integration. The normal (Gaussian) distribution is the most important, governing everything from measurement error to stock prices.

Uniform(a, b): f(x) = 1/(b−a) — constant density
Exponential(λ): f(x) = λe⁻λˣ — time between events
t-distribution: Used in hypothesis testing with small samples
Chi-squared: Sum of squared standard normals — used in goodness-of-fit tests

9Pskpmc3VnlnoiUblTsxcPkS3iGPfM7Pa1DpdeKYGLkvIrMq1SfvKBf3i/vgdQzq0r4ApIaBiWWscuvdB3NqjBiZ2pj6hz/QoD+eFz4ru6OO8bEmvpzjGBH5yAJC2IAnTYdPiAUmVeRXHJX87NGpf5mvOYPZkZJl06oMkHQbNFuazWEeoh1rd/dtzJXbVSdbRlMmv2/vpKGajvZWHTV8CyWSeRD73a7BX2mpPeHstcgJJsIumVOeqtDJcot20nJyrxPMonNGwUjM64nbzYOfxmaiGscUPT8Bjh/cHc42v3AaMV/TbNTkoneu1tn4BcmySC/6wmX5NMZ8Md7NOb8QXjm2h5smAaQqrygzaPPln6vq7MnLIRjKDFTvPQ91YBFuLpA5kywIqN/LDi7ZUGvVfjTgUVXElbQAZjY0OHA3Ra+vZMVgsq+0tk5ymLvO5qIYQ3PT8rJ6/5rlX/yBhkvUta0GBbHvCMMDxMkFM3lB3y106+tMDAuhVgx52JhYDedTWl2f3mXx5CnKrnGlYDrmDqq0DFINIPldUg0A4yfS9lSrmJhirXQSx+tqF9sUlL1Lr7Wkjl5wU+PmxJfOzO8Urn2P9fbM62Eot8EVmFLgZYEBrOMF+Pt3ZnA/WOuUU4A9XjJlQEyZHw9qO1ZIG7gAbjkhHvyyCqEmAsr8KBKX1ZskNOl4iy8ov3hgxZo69HiINdNynO4OYcHBnnOclnqtea208fBP2V9kEMXlJ2WM6cNcY27E9UD11wUGXnerGvMAGUbA6Pp+lkkpreUeVuWfq+r8iJoBfV0gZ72Rzsx3orc3WvqmtLBrrBLtQzzB17b1w+wDcV87/QHQIpsscHKa36

Central Limit Theorem

Central Limit Theorem: Regardless of the population distribution, the sample mean X̄ approaches a normal distribution N(μ, σ²/n) as n → ∞. This is why the normal distribution is so important, and why statistical inference works. The convergence concept mirrors limits in calculus.

← PreviousDescriptive Statistics Next Lesson →Statistical Inference