In This Lesson Probability Basics Conditional Probability & Bayes' Theorem Discrete Distributions Continuous Distributions Central Limit Theorem Probability Basics
P(A) = favorable outcomes / total outcomes
0 ≤ P(A) ≤ 1
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
P(A') = 1 − P(A)
5YU7tJo25ZEaQbhY/I+Gyk0khRZECqRr0MhmIeDm8t12AL3OUaGn0laP9EF8G8iZ9g2wGTzhswt9O9ct7NoahyAMiPZONvpdelTo5plvT7C8VEtFa/P+FY0kcjo++5iwP0opPBJQg02itwfbL1Dds5GwIiw0Wq7uFiVUb6tuWDTlJarabrEYMZEbD5fsFnGOdfeD5EIFwF0JjDQzmqT5cYdAECoZVOEhRx26gCwxQUNfxVXVs9VThCtiNdPESJSSrDxNzWFR50WG4+R/hH9g4BAmEfGrBmPu3vzmrOQaHJnqUbMYLia5xaL35KW6zXZnsfiijPefkhKjRgOfpH0K6p9+kEl9HhytIrlnMZ1u8bRti7oVPfEdPJ62AXz9DjQUxklpu7ZQh+XGdye6T9uBvImTr4utYeuZFsDDn9XuwWq/x2lbq/02wU0EQSzv8BszJ3AKwn2a1KIpWsfo8H/hr3JoTjAdTR9Qdr4Uv1lF1oPWiB/g4Fl8N9bEO9EPYjDc1U295tveKxuOpa6JphasA7trvvVoU2vRCluftZ9KsowzoZl2xKkfAK5rsZJT42Gcky/qL2WzPgl26W89JkwzMCLidHY/yigqSek72eMDe+AwdYZmv1VFXFzpZki7y4i4cYgwBMWKgv2y1wfqX/JASjlIrF6rKaOZVsEjjaZOTq2OBDfqzuIvj+FUaCXoFSHQj17zmnAbCKvUowcP/e4BYU79wyCCiISKakS2yFiLedXd9Y5WTBoXTkwBHVcpkfriGFIqr3+LZNiXOwkM1lGFbJxbPlO3D4MRF0NYE6mm4FXt4Aw1twDj3WMe5IdT7Ay8ROSYcaTIinsCAKRun0d Probability quantifies uncertainty. These rules come from set theory — unions and intersections. Counting techniques like combinatorics (permutations and combinations) are essential for computing probabilities in finite sample spaces.
Conditional Probability & Bayes' Theorem
P(A|B) = P(A ∩ B)/P(B)
Independent events: P(A ∩ B) = P(A)·P(B)
Bayes: P(A|B) = P(B|A)·P(A)/P(B)
Bayes' Theorem is the backbone of Bayesian inference and machine learning. It lets us update beliefs with new evidence. Total probability connects this to partition: P(B) = Σ P(B|Aᵢ)·P(Aᵢ).
Example: Disease Testing Disease prevalence: 1%. Test sensitivity: 99%. False positive rate: 5%.
P(Disease | Positive) = (0.99×0.01)/(0.99×0.01 + 0.05×0.99) ≈ 0.0099/0.0594 ≈ 16.7%
Even with a good test, a positive result is only 16.7% likely to be a true positive when the disease is rare!
CrMlRbY52kKHbiT+b0jY0AdkyifGJV6o/CWFBUtq9Vi+jZZtVLZwd78Wqp0fKn2SyvP/NwcmiJ7B6aCOYUYU8sjE4gdYKSOlQXFOcnz7iEwSMk2EyqrD9ALeAVDv3JXoUJlhfea+ZFPsRDmHRMKxuezAkvCiW2M9rM+kD3PLftdcQq9L9zOHWzuG+5cqmrg0ZgbXzuw/Hd8J/QxJ/imQFdwU+PTlIWL2ZZeS9QAWIEU2IDHmnpnc7j/HKi6IsL7tupPuFL7baS+ZOJvBG+03IrIwfTAkogCSimcIEC9EEOeFAp427fMArhlaNZg3gGOg6BREuhzK71fucfzPskgiyOzUdUkWnACWGg1EBlNbpPquv33vtpWtTQgu9X9MUihPjaptQzjQInF5QdpbmZrMRAkCpCQA7PODF0ECf+sXkAtMiB52N1mPSEJhuzMb674mrb2m/95GytKxoGnZtjwKUcB0zHoCbprR93m7h62RyLk9Pb/WO1kqpkmMryo/S5RUPE7CX7eNb1gKIaBA4GQecYPQzYjZp3OJf49f8Oawb4v6R6fWxZ0PaSlvqvndtvL+7sEezvH4E8DYLNAMaICQ5/SOUdvHWbKfEeEvsmDFR7+ZOApsAORtOgKpy5Np11HBQ65x2SieAKM1WDFokeTHd+ebtje+z/GTYHsHaijIGItA/E/3RezZb0FstaFrVcTLDEQHuPHZMulcMDeAzy4XQpFqCKsemGftn0SUjMctUnz3xj4vMEbLloHaATDvPJ3oYJzhW5o4+8s08CZDpPyrtc946VlnJq9VfPczHqz2m/pMbocxNB+nfVeZUYkyAw6/b3HSzOEIE8TGmt8f+st Discrete Distributions Bernoulli: Single trial, p = success. E[X] = p, Var = p(1−p) Binomial(n, p): P(X = k) = C(n,k)·pᵏ(1−p)ⁿ⁻ᵏ — uses polynomial expansion Poisson(λ): P(X = k) = e⁻λ·λᵏ/k! — models rare events per interval Geometric(p): P(X = k) = (1−p)ᵏ⁻¹·p — trials until first success. Connects to exponential functions Continuous Distributions
Normal: f(x) = (1/σ√(2π))·e^(−(x−μ)²/(2σ²))
Z-score: z = (x − μ)/σ
Standard normal: μ = 0, σ = 1
For continuous distributions, probabilities are areas under the curve — you need integration. The normal (Gaussian) distribution is the most important, governing everything from measurement error to stock prices.
Uniform(a, b): f(x) = 1/(b−a) — constant density Exponential(λ): f(x) = λe⁻λˣ — time between events t-distribution: Used in hypothesis testing with small samples Chi-squared: Sum of squared standard normals — used in goodness-of-fit tests Central Limit Theorem 7WmbhE6wNBlcCWGFRSMmFsJLyZQ1fPc05csHLue1GBW1jkum8Q9utrwif1CDYqRUqwiSBq0Y9jMhGvg6bFS57J2yn3uIHjCWOriKTqvpSvFs5u/aOOfoQh/DCTJ/Vrn1aj1TMYdBcJU4EyUb0wHkichnQXev3d+Nmsba5npG5RoSive0BP/T9yEuWnr4eXtgexyPmHjePQn3O0OXQcrlLI7os4pw5/DTxP6V/r+ekJElBwK/d9Urz/TEjigQbzck1+HAxT7EA8mjx/WLgOsoOZN1bvbVYArKIOrxmJpLSeiUmh+xgsJGkrkWkJ377vF2noU/gCNPsc6x0QMie17J7G68yTVo/xWqzGSU+NhnpCFri9Sif/y43Ce4IajDAXTRGBIl+PgWqypXCaojJrSGWG/1b4pKh5Hc8vzltODtMrBx1kwCF+MZBNpQTa1g7pXWgNOeLqtQ9q8JJWFgj3CXMZ2SZxu6EIZXG76fQeW/0ElGlqr7b7Pk9OEmEHAYrib/E1vYM1aIFl+NupoYP5zCfK3Dhg8hge+5oZJOXVmuLJlntq9LlDD4TJ4KJATfYvXs00qtPyWcgq8J58bPlWSwX204yrGH4REhvhOEE9pbWiVcNe9WVaWzPTLCTgbqkWYXoFAf3BK25FdXbzwuSi7lqYqWOg92efQ7D/EGK3OoAJ0Jgq3czAiHytfxz2VEeE4bQSEfbU5pPQJw2yoIYaMy7HlxldhUe5g8uOlvTKTDJYTLyoVFFcreWySfjXADCxBKc3z/UY4FKaPF8pX4KGAlWmOLA45Ri03O/JSXKpY67uTma+5TAuMW6T9mygRG4ifdnP7kD+X4FG7tBqYp9Ew Central Limit Theorem: Regardless of the population distribution, the sample mean X̄ approaches a normal distribution N(μ, σ²/n) as n → ∞. This is why the normal distribution is so important, and why
statistical inference works. The convergence concept mirrors
limits in calculus .
q6IJKK4j1yPTeVwUeD0PoKkt8IQNRMq3qWHtLpCrrcVFwR+Jg24qcflMZY24C2JT75VkI9UHWzzEUf10fD9dciQ12X3v/iW44vjCQvqQVrHjPFRsZaXas2ZAa4xzjr0AQthUuDMTFGF4SNtqShCTJB0ReMxgZgkOmpJyanPXaEfN3/w4d75Df15N+6w3pV008j5X+2xG/qt4Hgx4MsEe+/c5CBFuS33MomKJE06Mf95K0yfBwhJsDgIP/6ZSfZhC8u/zhWbvjbD74rlnl1ADU1tcHGPjTKI1GleBocO3FEMQx1PbdREMUGOTAAiIJC0GmQEGr7rGcXIfK8gi+kAdhPdy9Se7KA0usoaaagY5xstv/lFBRRyMreRUY9WFrWNrKs0A3ATesnlceUCX2cK5YJiBvT+KwTNcH+uVyszIPCHkq/FNT16Sd9fZ0qRgyBeI795fyuWOvxTu+oui2k+WBGVBSC0VwPphUnP3JQpRY6jbE4aLTrU9O7tCO667gPF78wCej69GMEkJ/aiLQiuNSWPI5UejpRmdm57aNCWKkaKrxUS0eTLQPk/PgdobCGHuULhlUo2tOKkF2BCbn+TC236r6TaIKuUNcZc82zlipejthyJydfriNZUEkUYMXa4uWuTfrVsi8Xlu7fUikWfmvok90B/ZDX4LfOT6ULEqMli0WiD5ABvYCPlZ19dq4LaCqEqHW7ontKI+4OkpB4kmwwH5IRwMmGM3M6az7c32h1uGNqznofXrhTW2ZAb+nxML9IoQgyRgkcJZDucnwbQml7BeuB0Y4CC50kKTK/nEBMPoAmdaRQcO5kY90vH2kP95GcojA9uvp2vrsTNPvlP