In This Lesson Probability Basics Conditional Probability & Bayes' Theorem diQ4TgyLBFZ6ipc5bBtYl1/5a0xam5iygWDLF9g3+cc7yOGsd2NudmW9aIPzMS/3EAkDteTVbzlypPqqU3KIFibsRToXi1wym2dJlHqXnx68RLlFVhUEGy+cqQ0dfGzR8xxKiBTL/eWa2Vb2a110ZSmdV+YgNQmjIVjYWWWxlzIVIESQ8T9XKWbP5hhc+pkWy1orMgknYvSP/Rj/bfVtdlf37Cj/9SClOMOm6fRW4Woz8sglUu535iZ2BXHoDmBb9aCKj/GAWg6hmTNBbXnU9qA2SLmpJ3yblnfv9EjqyxrNujtiL4s4VkYkGO7dgBS5iny0S6wKZcs3m4C2m5IiPl+HIB1a7UVv4NQTJwgsQ83U5UKV+cqrxEBmnWaLPLLIEWHyB+53YrYXCQp0tgtyf8I3k/tphG4QfNFR/T8K8PbKYD5T1XKUyLi3POt/7XEBz16eOnRmGq4Z92HSxPNib71c51gwyz1o+FpmFQRrk1CULWCpqauatH96gv0J0G+MjrGghReqOh+XYQfCbYOjVLtpqXKWCsTEWjNb4ABzSLa2YAo8z3S+1Co9StCVUn0R/yrKaw6knZ6FJQXnD1n/UXcNjJAbmEKYs+YztiBnGdqLR6NhT+BFSZtEIbYMmFFjOwS8UoYfs/sN5z25L/6gD40GdMwNAkGusrkwHshj5eRcdjqtyBbZYFrj3l03Hgk5R40uq5iiZalGiu2Ktc/1mdlZp+pnmYmB0+AvX7771iyO9Xh6AGg2y11H92PDqrf3pYJ+FPRvgRGxB+hCzq36Ow5vpyP8zAmT4q7+BHq/UfrfAqLk9aBx4AcOCgapaGSyTHaYL/agJ/BynNiW6hg+nsji9Q== Discrete Distributions Continuous Distributions Central Limit Theorem Probability Basics
P(A) = favorable outcomes / total outcomes
0 ≤ P(A) ≤ 1
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
P(A') = 1 − P(A)
Probability quantifies uncertainty. These rules come from set theory — unions and intersections. Counting techniques like combinatorics (permutations and combinations) are essential for computing probabilities in finite sample spaces.
Conditional Probability & Bayes' Theorem
P(A|B) = P(A ∩ B)/P(B)
Independent events: P(A ∩ B) = P(A)·P(B)
Bayes: P(A|B) = P(B|A)·P(A)/P(B)
Bayes' Theorem is the backbone of Bayesian inference and machine learning. It lets us update beliefs with new evidence. Total probability connects this to partition: P(B) = Σ P(B|Aᵢ)·P(Aᵢ).
Example: Disease Testing Disease prevalence: 1%. Test sensitivity: 99%. False positive rate: 5%.
P(Disease | Positive) = (0.99×0.01)/(0.99×0.01 + 0.05×0.99) ≈ 0.0099/0.0594 ≈ 16.7%
Even with a good test, a positive result is only 16.7% likely to be a true positive when the disease is rare!
Discrete Distributions Bernoulli: Single trial, p = success. E[X] = p, Var = p(1−p)WEErsK2DjH+AvBfnnol3wJvN9yw/53XggljTry/cmFHuNqTJq0VZmWcLrnyzV8mdoboPEJX5BGh373dfBw8HMLOgE362G+MAxq6p3ZMpYSWfmxEPadU7BDe+jSAYrn3REnpvn53QbAgRzAVo4c0JWNLaajUsJ8PIzHDt4iD/U0FBKC3BfpyFGIb205I9NDr/4nh8PCI6S16zrW4pKk58hjLx789QmlRY4DxNN//dSnTVKFJCe57W06XUlrwd5Su/BLBZox0fb1jELVr79c1P+EPtjWwJ9l6qkXXVv+jYKsTkyfdyK+bsKkRwG+Er2LAC1V2zyLqZll2OP7Gm6Tl2VnYzna9tzsYg49tg+YJqpYIBdUGzLoKZJK1bhpviXOLV/ZyRGJmATH8Ht0KpxVixMM0KJuhK3YTI0rqfg3pg2eoYOOiIm8XdVgEXn3wrsCKXaWmxeSeMocjzpaA97pk+uqxNbiR9j7jdCMZguHk0VClXu1hUKOkIkFtBzoQbnYF9zoSei3U5GBNnPk4LcFe7j5zRxDnXJuudpTmt0Fe2908SNbBlyQLsOdOLErbjO+r7JjICgxfYMD57HJ+A/XHRRJb3k877iymewEkLhsBQR5mRCoimOibBgaFGELSFYVX/kNS9LpEPvUWY9UCTWcVf+453qxAgK75GR67X/bmlrWBC+ncOODZvpffeoBmjYhc9olyXf8c0SXGimASHDsIf3rfHyvY7DW2/I3I8RaYOrb6wcHYQ6ONFni+6vulsbxKeW4kta7SJsHEv1AsEAXH5PLfhtj9WpztuNm320STgz5E3l3GPG/C6XzhYuD1jUqAEgMl3/VmKTtyWml24bLoz12JQ/w6D Binomial(n, p): P(X = k) = C(n,k)·pᵏ(1−p)ⁿ⁻ᵏ — uses polynomial expansion Poisson(λ): P(X = k) = e⁻λ·λᵏ/k! — models rare events per interval Geometric(p): P(X = k) = (1−p)ᵏ⁻¹·p — trials until first success. Connects to exponential functions G/kMzLLET6u4HBd+X28sO5fzQQBuD1Lg1Z6r14h2Sd6E2vRjpdsRr7UhOthWD2nbNP0YNDxxi7dXcJ49CLFOEOknBupvZ9YjQjzvkfTTYR3AstvAZHQWv2GyZ8Q+l86yku9/+HjJ0HXy+R6w1CG1yKbPHjqvXaeCtUOSFKeH2XHKp1L7+iwuwrJceHyg3lAYwx71E/h6Bw/yVuh8k0QhJAGvtssgtPzP0oIHSGtPVrWK13ocjyw1k9v75C9k2hlbGBpN3KmXOZ/FVH0LhxEJweksKmZzxsO+sBAzj1D0Zrbs7dk2IvrRJ0YwWnYXOrp8QpHhamixtN7QctHMbIW+E7RhZ3WJS3QJyJXmWfgNVNq7t+lhs1AxHKJsmu+JFbZCEmCF3AtVAWBH/VKgXa+N0a/XBgAylPJ+gOgyXWHF4Xzm8DbMx8f25zS5Vti8Hf1LuXLcpEInbhTsYE73IIxfKlCUXldRL1oqbjNsjxDd7sEPOjzT778ZApYiRlZrmAiw4OMA6+TESeCHSVEaKK30OlfATOfL8ZROgLDBPlcswNY3h4oZxd/gxKppyi283zp57P0Yvg+qnP98c2LCR88nhY8DF8iurijtOw90JB89FHomUbsFTbijulDdwyzEA9u2g8FhEU8FTSR8iaRFD7N//3Jo1kq5v8CsP0XlX4mDQUQHvRIUzQBYaSy8lYFybFt9gwt69QUPGOTOo/d5+N5w07YlusgeMMPYGns5ZWUIdZ7CDjwfpSvZ6tbriNVaGzAuZpZXHaoBNRKBT6abNwb0Qmh1KooqUygLLxSGw/KnJBkoPQBr/fITzsR3m4raG+4A+vYNaIuh+KM9ujgC1Y9ObCoByayi Continuous Distributions
Normal: f(x) = (1/σ√(2π))·e^(−(x−μ)²/(2σ²))
Z-score: z = (x − μ)/σ
Standard normal: μ = 0, σ = 1
For continuous distributions, probabilities are areas under the curve — you need integration. The normal (Gaussian) distribution is the most important, governing everything from measurement error to stock prices.
Uniform(a, b): f(x) = 1/(b−a) — constant densityvghRQB5Y5K/XzHE3YspvZJKGSdIFWs6kP2cZZ3WJUG8l4SOzjF/Bt4Xx3CwFo1qYaZ64KeqkoAA8fNOeUN2hBHfvoJXqzeDtstLAWX064JMUAPs4Uf0o/QDiJEivqh9XBlc7MYIzj7H7js75Mb40pZ/7SxnSHXsQ4YN48ZDTJJW0mBeTxxqc5t9CK0dEfg1ylo8gqLU2Hgx0dl55d08Tii5MzCD//JHlmBVC73wZpsXNen8DS13Lc9mM8SyI+vk1eLEbigCmZLoz6yk6A7RcapiFzul0Cza0c3OpWa1PhR5pKt3JLqbkgC1pYf7n3uXLzfpFHckVD5ezNVB593ENiGaIOEpGX8TaG+H380W7N0uewTATPaWINrSoc73+5p154l5FXT7ks7icXeTnNEr39cAZFxFeIpwbSX6UKabCvfQJQmRKJPlSq3XyiMosrDphwCt+NJE/tVOrcHiCh1M+fIDEmjrRcfdK3G1I19adiXJOWdb99eoKGSj3DetdqWsYJd1/gMtXnWSG8VWxDvzco1J1oPaaIDD/DnbsoIIDhmv+omHlQCpijV0uxCP8yotAZzhndtXFxjteP0Xxf1ThG5SC3++IfiTeO0XMByccnVEVQ6FJf1HbhVodb8QysGivtoWjiIPAuKSl0FgEQWIlnJxaVwFNrxTQfAEEDnGYJu70lmKA6wLB81IghHEiKR+TpQo6R99/Tg8PywcWeTMl6mVhZ0creatPJgpim66YZrCOWlKmgLR4XDqmlrosJSRN7xRwV2D7u1bUCoYErdsbpbIcuJaNDvXYkpoNKtUdpi5wbm1y8m3ld5+/3z6jfdc7x3GO9rTg63cSFV5ffx3Bcp3GZc04 Exponential(λ): f(x) = λe⁻λˣ — time between events t-distribution: Used in hypothesis testing with small samples Chi-squared: Sum of squared standard normals — used in goodness-of-fit tests Central Limit Theorem Central Limit Theorem: Regardless of the population distribution, the sample mean X̄ approaches a normal distribution N(μ, σ²/n) as n → ∞. This is why the normal distribution is so important, and why
statistical inference works. The convergence concept mirrors
limits in calculus .