In This Lesson Probability Basics Conditional Probability & Bayes' Theorem Discrete Distributions Continuous Distributions Central Limit Theorem Probability Basics
P(A) = favorable outcomes / total outcomes
0 ≤ P(A) ≤ 1
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
P(A') = 1 − P(A)
Probability quantifies uncertainty. These rules come from set theory — unions and intersections. Counting techniques like combinatorics (permutations and combinations) are essential for computing probabilities in finite sample spaces.
Conditional Probability & Bayes' Theorem
P(A|B) = P(A ∩ B)/P(B)
Independent events: P(A ∩ B) = P(A)·P(B)
Bayes: P(A|B) = P(B|A)·P(A)/P(B)
Bayes' Theorem is the backbone of Bayesian inference and machine learning. It lets us update beliefs with new evidence. Total probability connects this to partition: P(B) = Σ P(B|Aᵢ)·P(Aᵢ).
Example: Disease Testing pE7exho4yMMlvfV4yLEkLC8Y8hZq0UpfUm1OCIn2CQT5Jnq87ITTIpCQHCW5z+NoL8bx0wNso/KUgPUsazZQFtHjMkZk5JnrDsjdYcbPXA1GdzbhswN4cF74uRcAwCymHgZQ0pzL9Uul5zKTgvMWfv/I8qyEwHTbpW6wfnyxYluWIs9WZJVgkmNowy2jw0qD21aasBFQBLPxGpnw6ieANrp6kaG3E5FSin+6gtYiLi3aQDsssQoFKSbnPm9EHFw81x8JsjwjXaLlC/byBJzbKR7g++dS++vxtS2yuGVA048fU5oRNUjJfhD54Yx6RCBWoXHSvvXiQnxPRqNH8TCjH59F8FuF4EDWBNtsPg5cX46Wha70qwp/QRu0Z4XvfCqLewtGjwnQFQ+Jnrbwv6Ey5r/kb7vNLH+KzwfwH0d06rt4MBl+JKFD7FYZHgkzQHhn0UtQMFtgsw5BulggEfwxLkVOSTh0UJJxgshw6KQtBImdTf+Gjc8LITyocLV/eC9Lwk6mMWPpUXFp8HNOTeUDQPKcrdkGsmhQX+ZWeh835GKFFRMxpYUNAjhUdQY9fypAOM3sguIBN7AnPtmppDs6M3j/fsM0AkUHlUyi/00T1iNtN5O10z9oHNwwZa7Euf4znWtjhUToPF3Gp/w2783ohQa3nwGKkXkou7P07GNDJubb7duVGVaKQkR4qn8zvgxN7zEh6XaQEQE7GMIUBWKKIgLRikCUzaK29AYxSFMfNsGYwMNhS+9/hZBHtJ+8ud+Eh5La5Zc9Zj9AMsNH+wzX0Qhsrfa9U9QN1sK0CzV1i2+fTTHTPXNYGKb/Ocwxy3t5vkXCxqDcV4xPAdQmffnhnDy3iz Disease prevalence: 1%. Test sensitivity: 99%. False positive rate: 5%.
Bp9hXKjZNwD4cXX9CGCE8UUa+CNpRKrDvFVUEzsaKQMqjHlGkrykh7rmgsp6c49q/2mV1wDcgR0KrrREFcdrJFi4wd0OWooIqFbd6AiaRbuwpJD7w6nqilDmiOr6yELsxEsV4DsEA35cgOLhXAllAcjhQ8RvKsqhOIFgNc0UXs2AWmcf6NkZf6Wes2l2AbURc3bN1wBQjFAdSLk3omFWcE+96tmrunzkx7z4OYYNd9vBhgU9jT9hroRK9eeDemNIxDUKmioCQm8Lh1ch8JylfDqXYa/x+//zY3B6C80HPMRHlaaB3N/1E5StUwRjOdEap+Mr78gIn9CXJy/RWx12EWbw9lV3K/mced0ZovMsTny2F2Rf8yUAgBGF0ourLgC2EaNaTvMqhgZUsH4YC2MuRwtR1rxvoqU2qYncQd4jLOm284l7zvrqXEbd4N5MD8nq0Ki/ivDpPWNwmUT6VuDApXriIyiEdKWj9r+7XC3+q/LX8c/EdlPyyU/LMVvYeBnBhm5vCIeuIYregPc9RyJZIZixdbWVQs0Cy2CER/DHxLb1OQ7VH0alxQFlTZJGSNOGVugQ/8LwBzv2yf1KWHYsnKlO7BTN3pBv6VQxsfpglfkpZUnEdJDLloJKKTltVgbBcBJAZ0OpPesP6tWfoFO68EzoE6AdufN9D1uMpG1+742qLthFzFWz6iAKJYbWoMnwlrnjGWV9xM6QirDJrKYYiByETdEJFrisdGQUB8UcSkKNmY4imdZ20Oy+MrmIYVZeE3H+nnrrgUqW1jqHuTdRmfXpwLncAIarkTpz03ZxN43qn0Hk/XOYbdtdHxYpovveXtg8r0os+FZhWPCCQhkly2JCg P(Disease | Positive) = (0.99×0.01)/(0.99×0.01 + 0.05×0.99) ≈ 0.0099/0.0594 ≈ 16.7%
Even with a good test, a positive result is only 16.7% likely to be a true positive when the disease is rare!
Discrete Distributions Bernoulli: Single trial, p = success. E[X] = p, Var = p(1−p)b4vsiRDgv9EWt1tmxTL3VP72wtNLMQNlAMjBGd6Wn4ry6TdNcgp74V7HVnt2l+0JaJG2GLNfGRdhb/j2x1iuyHl2nSsMTPfBjbntBZcxYFPXFJL4p8o7mo1YThcU1e5fvdi5z3wya6aPylG2OTzccCIOtm0DwS1PM4O/3jfSsjvYl0zjZPau4rx2JpGFWrSnmbH29VX8IpuI/uev3rlLhTcTKh661y5z8xVbr4F7MX35pe/6vq+C5kEsv+KIYWtYQiXEPYLfklkDiJyWV4HSs89BUJX4q02sEYHcyyafHWOzrX+/UASD1yXYyQb+FGMtlWGrKOnwC8N7ST2gqEzhweLZF1tWhRpRJZ65OQM0HAnqZdwTKHGKQiKrpsNX50wxj723kQflXB97dPjZhQTjeAIH7C7jpe0Z7Y/eRtmcZetK/QAsm5Uo3eAdnXBuq9jnFYq4Hsl2XqLdEBPZHKn+iNIZfoRz/CUFinP8hAEA2C1Lig3UrD5fP4LaRjbaWWBr9Vev3wy9cxGKU1k5wesU5YElUi0DZE7jtPqvmQe1VJ+2cmkMly6IobX+KVm6l0CTl8HldqK69qVEodC0PFNhr+ce0ChhnckU/DmvB92+L359HsJaFXFlrUHowYtUmSVkAw18Ah4GPMRYyHZibf1sfGJmvpFymDHeRAffHuCwoqM4V3fP6WRoHbVuNukC0wot0i2TEKEkqIzUkm0+EpvdWb3XOEDAP6gWBEhQB/q/3KUuHDzMB491cNBK6rdQFddXTxLPowxvEs0QQYA5hCSiWEMt3j64iWGbGo+batda0zwnQsfgZ2YeFKA9eArr57X+siNzwRD9nJND/djk30ePVDLxD Binomial(n, p): P(X = k) = C(n,k)·pᵏ(1−p)ⁿ⁻ᵏ — uses polynomial expansion Poisson(λ): P(X = k) = e⁻λ·λᵏ/k! — models rare events per interval Geometric(p): P(X = k) = (1−p)ᵏ⁻¹·p — trials until first success. Connects to exponential functions iS1H2LBZiwpb2AbdlKZm1j332sk9zQN1iKCwCBVC5FhKeeDE8GknjgltUucxYrudZ2ncV+5EaFcwAC4tw1fD54E4B6DmxhYWpm6hNz4Pa2uUAVL3uq/m3g1h2cfNGJwTkKtjuhQMOyDkFVf0yvGVrJF5W6ymh3o5zmm4ZaMfC6H3FHsb7g6R0MLiG/YDoQN6SYDFbYawD0XttxYZkPE7jfspbhchumeFip4227lvAnfe4zekq6rhRJ7LYSnWmldZi5NxZNeFd7JOUlG3GIhgX2LCTBonSA60NsQpogc5wIcOd+cmH0KfR4pcSTamDvmIVzdf5ku2SqDdtlLX1sH1zYmvsbeDXTUzEdNWgYtw2xUDnCxveU+b+evJ6fkSg+FW7CT/Osxaj/SGoUpwc1ep8hfDA0Dpe8/nEFnGF1k/apfBazweiAqS7E4WNz9iS3z7abp82MFMk9XA4tDGeZDqUY+jdxhQ0oZKMZim+F6zeG3ymqG81BVIPxFoO+IqKzHixzzbDlA253HJl9zAtZvZBufjvG8tsMQzhwYW3erLfj1umFi1lzcY5rHX8vyk+WvB6rSgb1qDtOFtdtvfGM1irz5w/YCflbtkbXwTdoqgpzNc9QDR5Y87uP3HFdvhAJ2yaJ0N4NZin8FAkFiYK9SLwufuBeyAYKP5pbZIX2aTEkFdlfCPv4UgAbU0vsKDXNanDl7xZRJ/tiTewl3VBndlFgDk/17L5ii/VmlIAA4Pc2QWsOTfF/1Rvri/dc2EN4FkhZHotzhNGancW4nU5gy3g1G9M49eU6VpjjbWJuDGanVmM2kV0svyiHY2HU2k2Ubiry4b/nmw3GsLOn5kIvdrhg5wG Continuous Distributions
Normal: f(x) = (1/σ√(2π))·e^(−(x−μ)²/(2σ²))
Z-score: z = (x − μ)/σ
Standard normal: μ = 0, σ = 1
For continuous distributions, probabilities are areas under the curve — you need integration. The normal (Gaussian) distribution is the most important, governing everything from measurement error to stock prices.
Uniform(a, b): f(x) = 1/(b−a) — constant density Exponential(λ): f(x) = λe⁻λˣ — time between events t-distribution: Used in hypothesis testing with small samples Chi-squared: Sum of squared standard normals — used in goodness-of-fit tests Central Limit Theorem Central Limit Theorem: Regardless of the population distribution, the sample mean X̄ approaches a normal distribution N(μ, σ²/n) as n → ∞. This is why the normal distribution is so important, and why
statistical inference works. The convergence concept mirrors
limits in calculus .