Statistical Inference

Make evidence-based conclusions about populations using sample data.

Point & Interval Estimation

Confidence interval for the mean:
x̄ ± z*(σ/√n)  (known σ)
x̄ ± t*(s/√n)  (unknown σ, use t-distribution)

A 95% CI means: if we repeated the sampling many times, about 95% of the intervals would contain the true parameter. This frequentist interpretation connects to probability theory. The margin of error shrinks as n grows — reflecting the limit behavior of estimation.

Hypothesis Testing

The framework:

  1. State hypotheses: H₀ (null) vs. Hₐ (alternative)
  2. Choose α: Significance level (usually 0.05)
  3. Compute test statistic: z = (x̄ − μ₀)/(σ/√n)
  4. Find p-value: P(observing data this extreme | H₀ is true)
  5. Decision: If p-value < α, reject H₀

Example: One-sample z-test

Claim: μ = 500. Sample: n = 36, x̄ = 515, σ = 60

z = (515 − 500)/(60/√36) = 15/10 = 1.5

p-value ≈ 0.134 > 0.05 → fail to reject H₀

Type I & II Errors

PYHLE6HbyxXVYQPo5Qs3jOyvjRyV3V3RRG5JSQxfru76p7atPeGi4SbOqPEd0ubqaoQbuZay+dlmSoePd7tCtCIThuD5Izj74QSF5ToZJf86CTvxbpEgcn2xdGIb6Myk0E7/IWpO55+t9Pqt0b6+I5tMY5muHp7KE8aWiuvoaXX6mLgJXH5I5dlm1FkKL3Qxqzeq0WhENyOUcw254r06KGnw9OnlCOHE5qjnLdj8ZWHMMjyZrp6CNLJwvSk7LAYAaRTIKcxBAdkyUYRM4lugGshs6rRm2H9+KgBUNPIu7O9SVkq+nCuE3flhdzDNXnBAQAPjrHanEZGFmCDQj9VqOc6E1sYW50IOoXV/rrIeiGUQ/+oPCKvehvdn/sC0KGpa8O4+ectUV7WbG23xNIGYmLEL6xRlDPbgky7iQjmaW538ktq2/f5I+ti4NAeL0fIMRRkETcGegbBBpSkMeRuuUj3Owuz63ce+Niuqr2tcFt+DNpC2U3fiPePeREZgBMHTemEvHv3Cb4mJo4ly8G1rZeIlYNuayP6rVRZQn+wPjni8M/9m9Qcrqv/+FWC2fBYmQo78Yw5bUxzg+ARxjRgQG710giyo+1NAkOBmKYECqph5W7miZqNFaU0HjET9+zcLCtEwmg6TryeC6Ud7vmPdXw0GjOyxUxWE/Ad3WDtd/EXJjXTh+pqt+YoaCnmUX6Qq06yxO/AlW1hMK6o5HHbbekhYwriz4hJhZ2pIa7amLEh45s/sd14kVYVAU46Jq8063iYckJxjSOW3n1PstKry1H+/lBmIW9pkNkcVYb6q2bkvg1C5JbR8WhlmZKkYQ2zmUP0gsiVZ3P6YbeIYKz1ls6E7TK6j4ZFghWowMJ1lV3/e0tgmwliclIpUiBOlKSiKFk1a9I4Zric21yRQQn+oU4duwCHgd4XrzD342s0jATqb1M58SFmHKKWwkPN1ct6/vYrybURXf847lW7AMFCPydTNnH25WecbCDy3E7ZEvVL7XDkfFagwE4tfQn8hJKD0mROfyW6O51FY288r8JYXYvL6BWy9M+x2bVgb5RlQH1Q
  • Type I (α): Rejecting H₀ when it's true (false positive)
  • Type II (β): Failing to reject H₀ when it's false (false negative)
  • Power = 1 − β: Probability of correctly rejecting a false H₀

Increasing sample size increases power without inflating α. These trade-offs are fundamental to experimental design.

Regression Analysis

n+7iQBr74b4Ven2xrjGKbTiPT6r8iJprSLscKghEbyfXv+Qnjzu8Twmmzvund6AWg10axsORj5BL34kz9fGtWiQHuiW+J68a+7Pt8lBiXzn6pj4bCRq8hh+N+Djn+xAoQ3XNJXI4+HxYVKjxP44Vj788m/d+DmebuVYXx2exORV1bAzCe92y/7AnQL9iRc7hNwn54cmTtputuIyRzgBWNiusJNeDZ0zxPRKfd+NhBsShafvDFOvzcvRGkzvJMnFjKxtjYCltyAQu6/FQuJallM1pWCOwN/JObWYSeYCRKTyMEcqPUFuapsAtQwvlfS6qaP4Nv8mcA27CQlJrKk3kKa3x9v26zFl/h+5YHMBKSyf+T/StpbswIO2DCKB2wSy8eKJNcxlWQjzszZwhU0IoUyf5C6jB/ZgDUHaN1VKrsmpqAGrjHb8ENRu0Wd+CsT9EF48qI9fNlt8sXQZ6F2k2yjrKUg2/MVo7/PtYyKZvJZDqCk/4l+yK6Pq2A4NIkSvDfKtbNdGZseCB4+PHxZhEfNSGkn8mF7qe5XDtQJDsM3hzyKxUPacBPYZVGAezh5ySfyx+TO/OhbB3ce3o4X1gCqWbdUfRmRmRELCREZZBV5Dqnj7pw56suyO7730jPcU69bJ5E9skeDpnapCJblFTPI7fe1ypQ4o0/6SBylrLrha0rxLttSYEVSZuSW42i/rG7CJLDpdkO6h4e/XUc2Z0fl34RxeVVoYKIWHh+GRQqw2/L5R04sns60O4QXBxhxjhPfJieNwTYrU3btf7hPbtC/8o4Fbi6C8aMJpLSB5H/iYBaVj5TucHUAHy6/XrG4Hmg43kMw8+4spij1h0LnWsj11dzZR5FoK5/dRK8et6qyjTcZ92h8mU5ie1Y4zAjMNlHW+slqauncOyRmWG92dPd/Ah2tYWygguJDUgrlMpWohiXpCgSvwW8EliArpooxaU0Y6bbICGuPdq78yrS7nkRGYd8uq8wyAXDnSG6bhbigvcbfv3v2mUT6iKaCObHGvUnCXWkyuTzAh/sYEqnhfZOSJVNfecDplWP8VT6eDxRI
Simple linear regression: ŷ = b₀ + b₁x
b₁ = Σ(xᵢ − x̄)(yᵢ − ȳ)/Σ(xᵢ − x̄)²
b₀ = ȳ − b₁x̄
R² = 1 − SS_res/SS_tot

Regression finds the line of best fit using calculus optimization (minimizing the sum of squared residuals). For multiple predictors, matrix algebra gives the solution: b = (XᵀX)⁻¹Xᵀy.

ANOVA & Chi-Square Tests

ANOVA (Analysis of Variance) tests whether means differ across groups — it generalizes the t-test. The F-statistic = MS_between / MS_within. Chi-Square tests independence in contingency tables and goodness-of-fit for categorical data.

lRElDlxdWGuyhnJ0DgYKXW6fN2rtCYyuU+w5RwCifhyAfg2DkLMZGkZwLgd+sEAnEcfRyONd90qlVGC6wnvBOHY821ll7Voh88kwYGzTd3hASf7c0EV45S0q/UFPl/nWhhwr7Bi1X+zK8PRW7XvIrtG19Cf66xICe7wOqT9CBAI3RTtJ9KlcEaeOj40XypgeggeuyhCy1fi8H9KJ+ERyOnvZxsnhCVal4wxhFmKHE9WdYCzfae+RI+Yn7b/TAlgLFhkuU5EWwb+QoqoMbS+K3m+ONERjdHp7jq1Tf0FfELyivEgl8xkD7iIzkmhrAugO9fk3Kfd4d6h737RFijWm9rY+bZT2xrH4QpMlcxrcd5jUPMT1gWaKFkgcHgCwtK7CejjVPhdlXHETD4j/ibYimUsnMTOXmOtrGeLZN3ir6d2+Liiok4dZ/hN6HqAB4H5hSE3qW/saTjDOsan0QmTGIiHR41bu8UNI5bJ/E9GHhPqSNpnxkOHSwAiQ6iUH2V/EJcRetgiqIs8ikN+MnxylvuFnNa8wXXCGy1uUhsa/Kr5EvDWSBge17Lku3m2Q8d+aKr7VJfZ4d1o5pJ9t2qqr/bai29bVsDGVKpALYcpaSpVQEVJPmYHj+kELvuTB/+TPRFY8WTkc7no+n/N8wuhpke6igCIkzIPBLq79yEk7b14bKlD2/GVSZr/lSIK9YjpFALn020MvbbgO6HFlT7tpJkqPMKCHtZoX+8qwzJq8bBqHjo6qz6oAaHag0D6fzu5aAzgnNQiegQmgLplM1sKb6hqIb/9ImW/LXnWEfTxMED5C2dafO2RLGrP1QsCDaH0B6ce2H/hNHEMEIJegu1Jlzec2GCEQ3N2Bf8NtlhiiJ25gTcuhtBhW8EhqKgiqKq/RH6wIkFXcz8VzNOdqYoQeoAKB2GAhhSeWur4Y5r0mueTtOJp+RMvTGFMKL2pRR2EeEDlIZjDbf7+AgkInf56BvYeV1UaUEMDIrX6RmBs+Gb1S2gbYrwdtFsucX8W3YzxKyRjlrx9Wbf0zwhyDRIx92b0v+AjDqHDmT1n+K6Zr1j
Modern statistics increasingly uses computational methods: bootstrapping, permutation tests, and Bayesian approaches. These still rely on the probability and descriptive foundations covered earlier, but add computational power to handle complex real-world data.