Descriptive Statistics

Summarize, visualize, and understand data before diving into inference.

Measures of Center

Mean: x̄ = (Σxᵢ)/n
Median: middle value when sorted
Mode: most frequent value

The mean uses algebraic operations; it's sensitive to outliers. The median is more robust. For symmetric distributions, mean ≈ median ≈ mode.

Measures of Spread

e0iq867oyBrnX6OojZQ98KdmvLTMwnetjc7zXzOavkqSoHp9nugzG/53P1gUje5PtYQokTyjGoytrkZp0lvDh9h8zq8JFVVWHZmvJJPZ6cvXjG5y/2kd4w51lO+osH0ROj0IWOowONg8ihiWmwPCoE33/HMT7JN2moRkGv30dW41/AxaI/qMJMPOg+DbnKWl083+V/cCFiVOH9pbg503+OW3ASjcGfW1Eh7rOV0FfZfX8U33RT4qBxTsofDOCwfroX5nJTIr41///a6ulXFv4+owq0XQfCDGYicB6vgcKySbJhzz3nYziAInWsU7A/UpOeful4QsBFWlG72ATa3JXzwq+aYXsi+jQTJSLC11c0gBM6XEOFZQW3CqZnZExOXJkyvOjTvCNnj7JyDFu+bUVZvTKDMTdI+PPbx8o8U11Er1Zql7LniJrHIic4oTXzrwFPsTrmWCIhNpB7+GKb+YAjkbTVpYfk+Ex2tWYdRgAiE4GqO2RC6BPzFA0FsuDafD40uaUZYS/biO9outGk/zjYWpO55et9Pqd3n++E5SD44UqH5Gp9orSMuvoaLRavb7daOjGAauiHAxfcnJ9KDEzVInZ97TuLOBHFTYaShWsV5s85ZiaDtJMV7VfzfeiMYen6t/xg4x1BYT2kKgDg/Z6bl3/ZNLt+VDKVs8nsQt+ypqMHQu/l9fA531WVjQa8zwEH1+spDi+yXA0Cvgw1k2XB1Fj+IfovM89wpXDKYIi/kRGU10ZhieU/ceyqkAqWtnnR+oQupm/hZ9qGcghkesXjmSd6aLGEJ4QCflf1ppHhlYHuKF0JMFIS756zdqUWs71O53h/FnuUuXQiuzv1sJyEiuhBPjQaoeSKpxtnoIJzcuny7B+MC/XzlzejAsJdGwrYUz3BkxDKa5mo
Variance: σ² = Σ(xᵢ − x̄)²/n
Standard deviation: σ = √(σ²)
Range: max − min
IQR: Q3 − Q1

Variance measures average squared deviation from the mean. Standard deviation has the same units as the data — it's the most widely used spread measure. These connect to the normal distribution via the 68-95-99.7 rule.

Example: Data: 4, 7, 8, 10, 11

Mean = 40/5 = 8

Deviations: −4, −1, 0, 2, 3. Squared: 16, 1, 0, 4, 9

Variance = 30/5 = 6. SD = √6 ≈ 2.45

Data Visualization

  • Histograms: Show distribution shape and frequency (area under the curve connects to integration)
  • Box plots: Display median, quartiles, and outliers
  • Scatter plots: Show relationships between two variables → regression
  • Bar/pie charts: Compare categorical data

Distribution Shape

Skewness describes asymmetry: right-skewed (mean > median, long right tail), left-skewed (mean < median). Kurtosis describes tail heaviness. The normal distribution has skewness 0 and kurtosis 3 (by convention, "excess kurtosis" = 0).

Descriptive statistics is the first step of any data analysis. Before fitting regression models or running hypothesis tests, always visualize and summarize your data. As the saying goes: "Plot your data."