Descriptive Statistics

Summarize, visualize, and understand data before diving into inference.

OW7dJJzvwpROs+56wBjQlHu2v36/b1mJP5aGYKYwv4TYjC+9FYvThr0ljTfAcsWLSazlAbIt/oYZnyeBYkPdLm4TfboQ8t+WY5LpPOYwEN5PN9uUA0rADm1G83kgzk2dby4BrRRHv3GDHfuNJc8zLiarPSAymx2ChKshPTHGn9DQHa/YbFlZHL4Gy0/Tf5oBfzFW6Y6QNrjYRFeBdngzIASFuLT2CWKsf9QBZkHElRVwqHwQue9TmnQz/VhIS5r1QrLTnIsKoxZimxBH5Hkq8C+a0khOzr7xkmc/qtq6rYbVJfnHVms9CTQBDQ2iROMCY5UOjzL2ReKPgXWnjKidLf1BE7PMHIyiXoIUq0xzgEIBzO+GgVscd2u2aemjRqerS+9+uI9mGOT4Q4xbMH8HyrEIQxyKSpAfEn5VhTp+DXAedoHDF31Mphxazc0Ch4nvheFUqcxF7LZgAaxvxKH1M68ML1U1bLZ1OMRNUQR6rnm+KZvNDvt195C19/OELWR+lE7OSUkXzm3RAvxLyHtvEY6rjzOFcW2HKdrqg0pYZqHO4hcDWFCr0djAxawB/MnkeJcBSkSl/NIz+CRaobUQXhkM6NmlDxAusze6Qf9TFBvNbL48u3ISLekFe+Fe3uch9fJnBFq+6xHFaPMkIBnavbz9tTwb/JkNezbCKzC2EP6Xh8hLlrbaQNvAZSOK/FvXxV3vIrNqnKcet6gyBFXksG6GYyzXGU9kq1oX/GkOlIogbcp29y8UM0uifBIQikLeyno61lVdeHvQPF7eEhwGhm8VkcGQSn7CuluZc1779MTRAVn5ABXrNUfJQLtBtxEY7s/rzNLDNL6Nj9EbXhBK8l8z7mYnE5RFjHnaoDlLJFVlwY50o69qucX8lf/Yumt0fHZrnRbhAVAe9vC1jEPMqxre

Measures of Center

Mean: x̄ = (Σxᵢ)/n
Median: middle value when sorted
Mode: most frequent value

The mean uses algebraic operations; it's sensitive to outliers. The median is more robust. For symmetric distributions, mean ≈ median ≈ mode.

Measures of Spread

Variance: σ² = Σ(xᵢ − x̄)²/n
Standard deviation: σ = √(σ²)
Range: max − min
IQR: Q3 − Q1

Variance measures average squared deviation from the mean. Standard deviation has the same units as the data — it's the most widely used spread measure. These connect to the normal distribution via the 68-95-99.7 rule.

Example: Data: 4, 7, 8, 10, 11

Mean = 40/5 = 8

Deviations: −4, −1, 0, 2, 3. Squared: 16, 1, 0, 4, 9

Variance = 30/5 = 6. SD = √6 ≈ 2.45

Data Visualization

  • Histograms: Show distribution shape and frequency (area under the curve connects to integration)
  • Box plots: Display median, quartiles, and outliers
  • Scatter plots: Show relationships between two variables → regression
  • Bar/pie charts: Compare categorical data

Distribution Shape

Skewness describes asymmetry: right-skewed (mean > median, long right tail), left-skewed (mean < median). Kurtosis describes tail heaviness. The normal distribution has skewness 0 and kurtosis 3 (by convention, "excess kurtosis" = 0).

Descriptive statistics is the first step of any data analysis. Before fitting regression models or running hypothesis tests, always visualize and summarize your data. As the saying goes: "Plot your data."
x/o0ODhXyefsP/+XssrdAC+27CUU+NTzk32qhHBwLZdpL1gEsKmoTfJ52pG0rBvLxHjW1gCLgTiyWm/ftwhGmTfXKcffR/q1rVSu+Vt13W+4DsylqIj/3zycZN93IaRrI1OvagjcmnVnN30J61vGB0tRuqitIJOxutHJNFN6GwYOk44w9cvvgea8KO6f4zTywS6IMYI6kqPzk7sxTNuDn3rzZ4nW673D5iL2rZqNubKCpLm/HLIFcKW4+w8S5IMKOO93oyjbwaqrQo8/DXfurtllWAo/i6qR6h/eDyhCen/a3facdCGGBrVwK4pRcQzhs/N+a0WrIgjBCjEXMSjiQH7yIr2JXIbB+eQcM5KzIimp4ISJxlkOm93eT3y+sxeuF2lY5eB1VuCbg7MB9zss1gOAH6EEf2z3cqryRdXYU2/c3SbZkOgdJh6OQYETk7X7cRetmtJ7j00bCL4xnqZXkqDWwLbXx6UTo5Q5Co91JmWi3NVfdisJVsdQcr3UzHw+y/ucd5Bd5bPCvqcZ3EUUSswpKiL8abqgK3DnugtkNQldhw5jzQQzwSTciE4Uy9gBS+tdSJWd9N4rsDHXfepHkxU4Gl2kNId3NZRyYcHTWQSo8mVLdncEMS/OViyzYzWcMEJLM+dnICFVkMD+hHhRrhZNGHEOP1SM4JDyH2Ag8Oo4t7OTAD9A4thDv33bdqiTlUodN0fz2RYLdZ4DpeGl/F2VN/PQiT+SD+9xQ9MZhF7w2XygL9B1+Ji0MSlF79BSh9n/9411gr4X6+TytVvdu8iZMl0S/ZI6PbSYH5rOAGZLPIbWFcFIRuqcdhDlPwg77caLWJBXQR18kLqL2aQEydC4H21SM8A8LWKqa709sBVpc0VCnZm/sR+ofSWlXKD5C38/evJBfvfNH/9Vlx/J6qBh