Summarize, visualize, and understand data before diving into inference.
Measures of Center
Mean: x̄ = (Σxᵢ)/n
Median: middle value when sorted
Mode: most frequent value
The mean uses algebraic operations; it's sensitive to outliers. The median is more robust. For symmetric distributions, mean ≈ median ≈ mode.
Measures of Spread
Variance: σ² = Σ(xᵢ − x̄)²/n
Standard deviation: σ = √(σ²)
Range: max − min
IQR: Q3 − Q1
Variance measures average squared deviation from the mean. Standard deviation has the same units as the data — it's the most widely used spread measure. These connect to the normal distribution via the 68-95-99.7 rule.
Histograms: Show distribution shape and frequency (area under the curve connects to integration)
Box plots: Display median, quartiles, and outliers
Scatter plots: Show relationships between two variables → regression
Bar/pie charts: Compare categorical data
Distribution Shape
Skewness describes asymmetry: right-skewed (mean > median, long right tail), left-skewed (mean < median). Kurtosis describes tail heaviness. The normal distribution has skewness 0 and kurtosis 3 (by convention, "excess kurtosis" = 0).
Descriptive statistics is the first step of any data analysis. Before fitting regression models or running hypothesis tests, always visualize and summarize your data. As the saying goes: "Plot your data."