Probability & Distributions

uy2Xkl8ISHprGeT/vG3Vw/Cwzefx/o1G/ti7xTIbbcAW1Boq376IZMRjffw1qfq01k3ERHsO5mPrmLyCTdmEM59JMf4uq33n5gjlqnH1Nl07t3LIwalPGGMwq/BmSbplkSnN+Bo2IQJpmgeaoVch1mE3GlPUiUYU4BmM+GJJuiMHpPO62CE7pgopOEtqyo85NF+71b97CVedKKfGPpcEbIsBpuet0AzcIT2rZQSotut0Y36jHg6+vlZ3KPlGMRsn8YKbGGjC3pnQcEb2maJi8y6TT5C0pDTruXQN23B5ps93UE+PRO/eyQL12yZfdbq6k+km53koi81VH3SZ8iZDu6GPuHE1PfpAUmyNHZcGe22f7mXxQZqDpv0KfUfnCKq0h97+NZ+Vvsrsis37srSjDVCO9A7FSKaROCS/jf7ki71xMTvKGcFRaqhkgySvSNHMYAvR0Ov/kGh6yDf+adqLfXg4aXh1z00ujoZO4yqBIy5kd23psVfPc4jk5p1v9FOVYRQBoJrBr4T8beggwwNilunHWDTkvUld0pbotGoMNrRDPWI0MyGYOaQaj/245znbp9Cv+PrXwTo4mU2pFe0fdewQ4AGYFt5wRFG4gKI0k52pOjm5Fnc+jB9anzw4vXNkA+L6+Ouk2utn0AxfVfxgSYg5+7PHoxyM5ub7VfoIYCI1oIsD51JK7ZzgTiDijwJJ8Mpt+Do3fzXj5nWxagwLehPJ4YCsT0oNGOM3zWFv0ayI8OwhjUGVCzldv6uG1/NOJ9e3tD4B/w6dxcftwcjdyg/NmnPlQ+mT/+oUWA4y4EJ+x+ZqA34KwJYY2VNF/gqeYYj+tsVIF4fhfpmtON8B8ZF3iRUK

Model uncertainty with probability and understand how random variables behave.

Yce2hzJCE4QmOrMn1usE3x+H3QjdUxD99Ix2g4MKZN7Tk+nPuThcbRbLS1/3VPRPTXVQJLs6A24OMSG7x6yphPGxDU7AmsA3nmMG4wveLY10Jng9qaHVbG7/CdsCZ2yketAwQ/WJqQTB9H2cyeSvbBJPrgmZzXDiLcywa2dXr+bgE42pgoYhZSgifOTlrySX3yxY+ZLV4qoeObzxyy75Bb/mZzSD/vmJeDZ+JMc/cnNa4Cc+vNwAdbFDZovEkLn3KKWhqfMf2+DysNznIwcojSfS5ykS3LR28EaXp0YbY5nC2rmITS/EpxrR9Va85JRERVrQtVhRX6D92Mb373jlwG/BOlVqaFymOHG3QMv/nTB9WgkK3iGYEx/olE2U18mD9VbM5gk8py/1WC/QuvPZuqdsrZxFfT2ztUwY2yf8ubrNOcqiBdIZfTaS+3KQQ8Xe3Gh6ZHiUa3ZQ9vIYGlZPu6vyOInXirsiKfwVnADVhKovQZoiO8KeLl4QM220gZeUL72XW9EK7GTouqd5EyjlOA1NAGd4ckOlNooG0AmrMNDRcJ7m+35+GIMjLMoWGa9lc7/5FnA9N64OJE/WIoI79vhmYaceUcwu3zkHb8KHKmQ353q2ftN+htI4G4LInXO1ZnKTrPc9bZZERKi87LkU1NTcuiGLP7AFst2t79xAKLZP0N2LHH/UFmHaRQ2dmMyMse/8TXu1wbC5vkucC4oxGXd/i1Wui4uZREjbAHSXfOwqI8+3xuxh5grxSQD/H9ZXNqS433Kxr57eufmmbWB5IEJHnb3dmQxDJSp9B4n2pdACWpZcCHid5wBW8jI27gAcEXH++saEaDDj3IJT+8JTVKl+m2J7

Probability Basics

P(A) = favorable outcomes / total outcomes
0 ≤ P(A) ≤ 1
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
P(A') = 1 − P(A)

Probability quantifies uncertainty. These rules come from set theory — unions and intersections. Counting techniques like combinatorics (permutations and combinations) are essential for computing probabilities in finite sample spaces.

Conditional Probability & Bayes' Theorem

P(A|B) = P(A ∩ B)/P(B)
Independent events: P(A ∩ B) = P(A)·P(B)
Bayes: P(A|B) = P(B|A)·P(A)/P(B)

Bayes' Theorem is the backbone of Bayesian inference and machine learning. It lets us update beliefs with new evidence. Total probability connects this to partition: P(B) = Σ P(B|Aᵢ)·P(Aᵢ).

Example: Disease Testing

Disease prevalence: 1%. Test sensitivity: 99%. False positive rate: 5%.

P(Disease | Positive) = (0.99×0.01)/(0.99×0.01 + 0.05×0.99) ≈ 0.0099/0.0594 ≈ 16.7%

Even with a good test, a positive result is only 16.7% likely to be a true positive when the disease is rare!

Discrete Distributions

  • Bernoulli: Single trial, p = success. E[X] = p, Var = p(1−p)
  • WEErsK2DjH+AvBfnnol3wJvN9yw/53XggljTry/cmFHuNqTJq0VZmWcLrnyzV8mdoboPEJX5BGh373dfBw8HMLOgE362G+MAxq6p3ZMpYSWfmxEPadU7BDe+jSAYrn3REnpvn53QbAgRzAVo4c0JWNLaajUsJ8PIzHDt4iD/U0FBKC3BfpyFGIb205I9NDr/4nh8PCI6S16zrW4pKk58hjLx789QmlRY4DxNN//dSnTVKFJCe57W06XUlrwd5Su/BLBZox0fb1jELVr79c1P+EPtjWwJ9l6qkXXVv+jYKsTkyfdyK+bsKkRwG+Er2LAC1V2zyLqZll2OP7Gm6Tl2VnYzna9tzsYg49tg+YJqpYIBdUGzLoKZJK1bhpviXOLV/ZyRGJmATH8Ht0KpxVixMM0KJuhK3YTI0rqfg3pg2eoYOOiIm8XdVgEXn3wrsCKXaWmxeSeMocjzpaA97pk+uqxNbiR9j7jdCMZguHk0VClXu1hUKOkIkFtBzoQbnYF9zoSei3U5GBNnPk4LcFe7j5zRxDnXJuudpTmt0Fe2908SNbBlyQLsOdOLErbjO+r7JjICgxfYMD57HJ+A/XHRRJb3k877iymewEkLhsBQR5mRCoimOibBgaFGELSFYVX/kNS9LpEPvUWY9UCTWcVf+453qxAgK75GR67X/bmlrWBC+ncOODZvpffeoBmjYhc9olyXf8c0SXGimASHDsIf3rfHyvY7DW2/I3I8RaYOrb6wcHYQ6ONFni+6vulsbxKeW4kta7SJsHEv1AsEAXH5PLfhtj9WpztuNm320STgz5E3l3GPG/C6XzhYuD1jUqAEgMl3/VmKTtyWml24bLoz12JQ/w6D
  • Binomial(n, p): P(X = k) = C(n,k)·pᵏ(1−p)ⁿ⁻ᵏ — uses polynomial expansion
  • Poisson(λ): P(X = k) = e⁻λ·λᵏ/k! — models rare events per interval
  • Geometric(p): P(X = k) = (1−p)ᵏ⁻¹·p — trials until first success. Connects to exponential functions
G/kMzLLET6u4HBd+X28sO5fzQQBuD1Lg1Z6r14h2Sd6E2vRjpdsRr7UhOthWD2nbNP0YNDxxi7dXcJ49CLFOEOknBupvZ9YjQjzvkfTTYR3AstvAZHQWv2GyZ8Q+l86yku9/+HjJ0HXy+R6w1CG1yKbPHjqvXaeCtUOSFKeH2XHKp1L7+iwuwrJceHyg3lAYwx71E/h6Bw/yVuh8k0QhJAGvtssgtPzP0oIHSGtPVrWK13ocjyw1k9v75C9k2hlbGBpN3KmXOZ/FVH0LhxEJweksKmZzxsO+sBAzj1D0Zrbs7dk2IvrRJ0YwWnYXOrp8QpHhamixtN7QctHMbIW+E7RhZ3WJS3QJyJXmWfgNVNq7t+lhs1AxHKJsmu+JFbZCEmCF3AtVAWBH/VKgXa+N0a/XBgAylPJ+gOgyXWHF4Xzm8DbMx8f25zS5Vti8Hf1LuXLcpEInbhTsYE73IIxfKlCUXldRL1oqbjNsjxDd7sEPOjzT778ZApYiRlZrmAiw4OMA6+TESeCHSVEaKK30OlfATOfL8ZROgLDBPlcswNY3h4oZxd/gxKppyi283zp57P0Yvg+qnP98c2LCR88nhY8DF8iurijtOw90JB89FHomUbsFTbijulDdwyzEA9u2g8FhEU8FTSR8iaRFD7N//3Jo1kq5v8CsP0XlX4mDQUQHvRIUzQBYaSy8lYFybFt9gwt69QUPGOTOo/d5+N5w07YlusgeMMPYGns5ZWUIdZ7CDjwfpSvZ6tbriNVaGzAuZpZXHaoBNRKBT6abNwb0Qmh1KooqUygLLxSGw/KnJBkoPQBr/fITzsR3m4raG+4A+vYNaIuh+KM9ujgC1Y9ObCoByayi

Continuous Distributions

Normal: f(x) = (1/σ√(2π))·e^(−(x−μ)²/(2σ²))
Z-score: z = (x − μ)/σ
Standard normal: μ = 0, σ = 1

For continuous distributions, probabilities are areas under the curve — you need integration. The normal (Gaussian) distribution is the most important, governing everything from measurement error to stock prices.

  • Uniform(a, b): f(x) = 1/(b−a) — constant density
  • vghRQB5Y5K/XzHE3YspvZJKGSdIFWs6kP2cZZ3WJUG8l4SOzjF/Bt4Xx3CwFo1qYaZ64KeqkoAA8fNOeUN2hBHfvoJXqzeDtstLAWX064JMUAPs4Uf0o/QDiJEivqh9XBlc7MYIzj7H7js75Mb40pZ/7SxnSHXsQ4YN48ZDTJJW0mBeTxxqc5t9CK0dEfg1ylo8gqLU2Hgx0dl55d08Tii5MzCD//JHlmBVC73wZpsXNen8DS13Lc9mM8SyI+vk1eLEbigCmZLoz6yk6A7RcapiFzul0Cza0c3OpWa1PhR5pKt3JLqbkgC1pYf7n3uXLzfpFHckVD5ezNVB593ENiGaIOEpGX8TaG+H380W7N0uewTATPaWINrSoc73+5p154l5FXT7ks7icXeTnNEr39cAZFxFeIpwbSX6UKabCvfQJQmRKJPlSq3XyiMosrDphwCt+NJE/tVOrcHiCh1M+fIDEmjrRcfdK3G1I19adiXJOWdb99eoKGSj3DetdqWsYJd1/gMtXnWSG8VWxDvzco1J1oPaaIDD/DnbsoIIDhmv+omHlQCpijV0uxCP8yotAZzhndtXFxjteP0Xxf1ThG5SC3++IfiTeO0XMByccnVEVQ6FJf1HbhVodb8QysGivtoWjiIPAuKSl0FgEQWIlnJxaVwFNrxTQfAEEDnGYJu70lmKA6wLB81IghHEiKR+TpQo6R99/Tg8PywcWeTMl6mVhZ0creatPJgpim66YZrCOWlKmgLR4XDqmlrosJSRN7xRwV2D7u1bUCoYErdsbpbIcuJaNDvXYkpoNKtUdpi5wbm1y8m3ld5+/3z6jfdc7x3GO9rTg63cSFV5ffx3Bcp3GZc04
  • Exponential(λ): f(x) = λe⁻λˣ — time between events
  • t-distribution: Used in hypothesis testing with small samples
  • Chi-squared: Sum of squared standard normals — used in goodness-of-fit tests

Central Limit Theorem

Central Limit Theorem: Regardless of the population distribution, the sample mean X̄ approaches a normal distribution N(μ, σ²/n) as n → ∞. This is why the normal distribution is so important, and why statistical inference works. The convergence concept mirrors limits in calculus.