Probability & Distributions

Model uncertainty with probability and understand how random variables behave.

5kWXVUnMas9490xROYdx//gpVpfrJ6m/NJpS+Csf2D99QOEPrFGEyFb5xNXLqmoN7hyoZBzoV5X+KpPr+XSj/VX9ONMTiOAN+POM1jwF3zMsgJd/PtedNlTplLBtBvn815rsP/XhRni9WgCtun7MCGyU2mRjjF+EE64C18a0CO54WsHvbL8lDV9cXhJfwg98HfxTbxlrGZYK90XQvvQaiaWeLLmtGRD0sHYpsRlSN1t0sWfMszk6C2v+yjtIMQAOf3zhM6y1+Wq/+JJKFgzE+zvLZPjjizJmsKV4pflE7vmqbcH8tYZjG3vl9DBK9awvyF17kKKLeZY1j9kA0UPb/uxBRQfdvpdIwULY9csjlDu/XNtiezt5z2SUC/qyCu7pWVA1RiAIKqbZW7dq1+Sl9KC1+jACW682lSAKloD7fJHjf1VhqjJh0NyJ5gA9f6eNeKuPGQ7lazKCD1nlMhT2yYi7FhSPrYXNsgv2vkYJdHv9kwTFxipP376tQZkbgdJy8kH1dH9Q7gdFyRSVucprd0MHDOySiqEQquVpXww9EDH3lxVuF5jJoMNFkzLwAH8MfVzbYmKK9+9X3z/1YGB+1ajbpANMKsIIVlpDKqwPo+vsuBnKGHRj91kvh8q5qLoYo6fra257oYkbZJX913MRsNNWCXvBHWMz/dJqV+RCNehVziaUUw2gCMvIlLV++isXxePK06p9i71dp9eOd2fqgPK2VzavIsrzKa5x4nayl/S/kxtkkvXvKo8c6s4Aq/nfqduFxmzIqVwgv0Y/UiDFWPcyrqjQAm65DCdqnS4KJtdYJdW3dpNYvEFmrkgmiId2fqWhY8snijPzvnga8AJGoRZv5QOWcm

Probability Basics

P(A) = favorable outcomes / total outcomes
0 ≤ P(A) ≤ 1
P(A ∪ B) = P(A) + P(B) − P(A ∩ B)
P(A') = 1 − P(A)

Probability quantifies uncertainty. These rules come from set theory — unions and intersections. Counting techniques like combinatorics (permutations and combinations) are essential for computing probabilities in finite sample spaces.

Conditional Probability & Bayes' Theorem

VlupIQzSf8w/BRXKygkNiXt9ToW4CyE+5CdmPcJ8UcnKjp4ZYbzU37k1+QLAjTX3Q9jfdTg2nu2vGUME46gq/fOgnFr6QK6IeuRyD9lVI+3zRXnqBaP9HGX0XGOKAYrl0o0JlUw59DE/1QwwXn9nod00kaISIG/vGpETcFvSE8Tkpr7sG9PMbCEtVyHEAd4vMrlc+lvQWhygoqir0xsugl+sBmnKBgwvELz1QjWt1H4H4KRNr+oyAyyYoHdS+mofvRnHGop5hryLPzwgglT7zpXuibo1Q1CAoS+KbcRFWo9kmw6gVfjUc8KbSRbpB9vV9MCsCkjHqUXHFwAUNEQVyFr8iIgN7Q9O7MaGlm5sPRNyXYFHcgznfgGi7tiuwqjpLsWK26eU5PTY+wZCq3trhA3yOBTv2NoDrZmBTOLylY6Ez/G9ipRAIScDtbyP7Oofaz/WbdKSwX9bVeqdBlqmngS57Alysw/VSRc6YEGQ/NLpLiERur9mA21U0sJJg3f8L+XJykXeLnSsJUXxM6rafn8eENiDcnrGYXVtNMz29uWPcqG6TGc3jSD6fwaKcrxkTFncUHZL9Yo02BromLSFn0es8XJuzlZD0hbCx4O3/q0eHIfLVmxt4pGhVNkj15kIhbcnMY3q1B78ZLqdemypXarH58a18djOqrj74AQHZH6iQ/37Pv0IW93k4n1P2tp/unkDdjtKapVJUxb1C8S66f6etCIPRcCMXaHTZ4mJtAzqwuZvTUQ2nr67HxlZXJ3/8rv/lYrhxPsgLXUm2uOjg0tDcQ5X4D4eU96f+LM8SLKfpenEPwrrNUOY24M62V3eKf8dtA9qbg5xQTLNY0brgOjVC2X1fd
P(A|B) = P(A ∩ B)/P(B)
Independent events: P(A ∩ B) = P(A)·P(B)
Bayes: P(A|B) = P(B|A)·P(A)/P(B)

Bayes' Theorem is the backbone of Bayesian inference and machine learning. It lets us update beliefs with new evidence. Total probability connects this to partition: P(B) = Σ P(B|Aᵢ)·P(Aᵢ).

ECyStNcO9WcTqUtBKyV3eHtp5w+ScsIl//RLh7yGNIeo3K/CAnCGghq2/bGBgM3WsvYvZFzsLZrP79IzAtFZE7LuAcxN0fqSw8WOrMxA8TA5MyxOWAbGlH/+IHgRvuSsyVEY0ue5ACabrYYgsHcKHmgvIvarnEYizapy8ZNebYgiR3nDhHOZoqkcKkOxhUXuYzJyc6h9cfi3spFXN4K56+hGQmpXpSv1ArNWdCk2OF0RgkjIIzV9kSu7B793wkc8vwB8IFezLS+QiKgGsKt8FPLKldZgFEw996kSraEtJdAOAThxIva/gQM4+d+YLsDJ1p0tMRmwoFod+rqTa248EQoNDAlQexOwMBQb3uTPkeRWU703i/Xl4f4pavzIM9+z1o4+zRSP/4OKJ9I0+FZ3i/YRTqYvkYXi8JnQHD3r6iqfH8h9claTKirjjmIcFOrqKlVHzuMYpH2QchKR6om2aHpAtJre4LwPQe6T98hma77+pw3WiK4MA9HIwojeOE0QF8rt5mLcLvzk50rowcKijRjE8k2ouE1/q+bG8Wfcru9Mdx2dC3t1x5sfJ4b3xoltlJYX3pXVgE2puwNCXNdu7dME9eepeluRnUkezfyoI3kGD/lcq8OQEwJ3fADna2TY+UuELtxvNvCPHk0Ptyfp1bgP4G3DCr9eRQ2JyaNyakaJJgkkELU6XtYiSBwxY5g89Gp25O6y8JdQUgyTg3lCQPrY+759MULqpS322rs+V16m4yWOiblHBiFz7SCn9QOjdoePp5/Q/Ox85gmEKbeywX6kQxrUzJSQtqefmOEDKsoV4d9m7sJjH3lIEZvvFWiGmdt2eXqdJgUYiTLSQaRlP3PSLWZlMv

Example: Disease Testing

H3eZW6PYZ6SA3/SLOe0grSLnW4PHAjXysxg2OZMt/FGfLwBuNaTnWqjFBdl2fDMukRAaPvj/UYpKvMaS9k1yPApcxTXbg4cisfYxz5rY1y4OnaH9m+kRfNBSch+ssTYXK3xPCK++RL3Tqlqt04r8FJyXO/Bz2wAYfW+KRm6131YA/76hTJCX05R7FzG5RrFfPLVbKcSmi3QrvVcJ5lmSNFuLT3H9bilpYNv2rblP6W0mntwXSivghCpjqGPrNNzaxcRBsxulv3+8LEtnPzcXxo+VLvDbCAADjKf1v5O5USn3PIloXINvVQQCCHyJRjjZEOqxg6vas8dCKBWuYWNPEekBuElA/yaaFt70AtMJOjXQd904fAf3RGqDhaVXdxC3QTJ4xwVTgZlvtRdiVOHuH5L4q5Ev7ksGUtH9qoTbEBX/VpCXpV15ZxyHDA9WvdcO52sq69wxXg+AHKtmN1MEQzPbTEseLGTivyzH/jLZRKQPd6Icxx9XCauLmiAOuU7vD0gMc8V8wzBlFNt4ntw0Ldwprgxc8+y9/m+Nc6FVMRpPaENkqrqaZpq5g2/pa2OUqanRTC/Uv/5RHBo1fLEwzgCctVkqru1el9v3ibYHHXbhAYH+eKPF2jkg89NkOovjRmlHZKbTI5qYOo4Vqf4ndeh6swFrMJawEkeLt6e5zOrSC6o9rJZVigfvBjk7qKluiqfO6lz14LVhZhomYj4ixmrPoEy6vPW+tnSpytj7/8qcQPK+WA+AT4RMLSjJOOcvDo6PZHckP4N8i6OZyB4oqDNFUmBzD9Ql8YKZ9PjhDQKcTqxJWCxHgLeiVSu50CdfB22YR1oV+DWZp5P07ovwIi4aBRk+p

Disease prevalence: 1%. Test sensitivity: 99%. False positive rate: 5%.

P(Disease | Positive) = (0.99×0.01)/(0.99×0.01 + 0.05×0.99) ≈ 0.0099/0.0594 ≈ 16.7%

Even with a good test, a positive result is only 16.7% likely to be a true positive when the disease is rare!

Discrete Distributions

  • Bernoulli: Single trial, p = success. E[X] = p, Var = p(1−p)
  • Binomial(n, p): P(X = k) = C(n,k)·pᵏ(1−p)ⁿ⁻ᵏ — uses polynomial expansion
  • Poisson(λ): P(X = k) = e⁻λ·λᵏ/k! — models rare events per interval
  • Geometric(p): P(X = k) = (1−p)ᵏ⁻¹·p — trials until first success. Connects to exponential functions

Continuous Distributions

Normal: f(x) = (1/σ√(2π))·e^(−(x−μ)²/(2σ²))
Z-score: z = (x − μ)/σ
Standard normal: μ = 0, σ = 1

For continuous distributions, probabilities are areas under the curve — you need integration. The normal (Gaussian) distribution is the most important, governing everything from measurement error to stock prices.

  • Uniform(a, b): f(x) = 1/(b−a) — constant density
  • Exponential(λ): f(x) = λe⁻λˣ — time between events
  • t-distribution: Used in hypothesis testing with small samples
  • Chi-squared: Sum of squared standard normals — used in goodness-of-fit tests

Central Limit Theorem

Central Limit Theorem: Regardless of the population distribution, the sample mean X̄ approaches a normal distribution N(μ, σ²/n) as n → ∞. This is why the normal distribution is so important, and why statistical inference works. The convergence concept mirrors limits in calculus.