Descriptive Statistics

Summarize, visualize, and understand data before diving into inference.

Measures of Center

qizLtahggS7LBWp3GJnVL51duQsxYbppGZqUQGoSQr1Rb6TOmXZdYoUMhx7VTc+5NrM4n9GOql2fd47mysYtYRWXNqR8eNIGnXBWZ2fqBzds0P9fvWOUYE+EfiP4rEX5Eir1jOxhZjgbI055lOWLlbxQGQeVhZX865T9IPMnXs8tQK5+IzeDNSz/LrkgsqtpMpq3gu54rplr6euXInsTC/V9uWIWl1zjqKz7Gy3EOi7a0tCUpnU30BzU/f+am2SFNmkSOxIorF71DAt4S3l7ZNEtKTc7OgGjiJ41Lg5moqYGp0xVDN9vcKt4doZXR5XpFg+I5XjrqInCavXp42RRaOSzVSI4MjI/qfHPifcGlXMMgAnL6BWyV/AbNnPYkaoiibfSVITeEy3orEYwBFS+XuxpCdrf49MxmXg371yLojKKSaaV2wcH4DHn1oE7dhd2VJkg2h9m/3WNk62N0uRBLb45nvHAiu9xK/nlXFtF72ilmfHA9u+JWWS55i/2grnWu2e7R9ExkIPhIEc4WKYX044HzrZy/9DaFE0C15yvKa29thtttcAGxBpMk9tTqQIf+xyHTflVpfj6ah0RPaWJ5QoFmyoXugf1f5Uh3/azw1wg6eVdkm3g80DwWBYETWnblvnwRZJ0wg4QdkAzXradn1nQfAPJSckWKCRJNCPCZydEsAWLZXdb8OwWOnh851uQlQPwhOpFuHBErERBRP9iz1Bsc2AUTf9hU42xJkgDCJQPZzt3xWSZuYRYG6EGYXpBrpAbNXADeFjlGc86yo1PVGCltUfqnobg2v4IY2wHkr4KFqnECVkPoe34m7Kl5ZvRcdBC7HbG5/Q8pbwD+RHhDHP2MHdHVLjXiqScaMlaGSB5FveUCbPU9VZF6Zc+W4CmdkYWLjaVpJv0svuznUxO1RHxX
Mean: x̄ = (Σxᵢ)/n
Median: middle value when sorted
Mode: most frequent value

The mean uses algebraic operations; it's sensitive to outliers. The median is more robust. For symmetric distributions, mean ≈ median ≈ mode.

h+ke6cSUfROCI0+e9oRnzS/27hgTv6G7a0OEg+olyAu6NX00cCtzuQ1fbJ/W3WT6vPUsPH08EucN0RibeepLO/MMhV78hkQqXYpD55/kWWb9wl8V3lgZeMcuBotoPTp029TTIeXAcw1WYUXBjoONGffKnmO/rxT9J4gNIPjYOd+mHVdTwurGGAjhztp4nMwvCuLFcoy+J41IU7oXolxJMHJKX6j9wEgx3hzwGSkg/i/fkIfKPdPCf5rKwgv5hvXe+/1Vynr7wuQXtsjyIarAne7ivJNvSA3PO7IrxF+bX9ym8hh/yLxKAScMJsCtX1d6OJVWzeq8EXs+mT64E6A4/MV/WJgA53DVMT1F4iZ/uKqxDSsUDBexQXMt31GenIIgjXb003ISUg+yxS84fXlJir916zFgInmo0ijycpLqMs26rWeBANJjihlvegABWD75zNzThNeNT3OQPXjffY2AwMxMs6N7/+gHqKUY6/lu96QJEC+4Tic8HHJmNT8Z5ewVuGGjv1PerFJEKqGVL/g4RSuCQ6cBJ457h/ZZMnTpGQ5ekig7c1SSDbZZCtUQ3/p0JsNRpRZVd7HqGRbsSFgiJht0gSTCqZu1fj1mZnyTsxY4ysH8SAtmrLO07FuDcGP4H1fng9PRb7UOCmm3DRTO9TYY+6i3aUo69Fx3mDefOkmWkERyS6ViPnjR31Ce8Y5xjGRdJGzHy377GhuUSRVY+sxbBSjmicAk7adWyWNXjyVebu7IrvFaiPx3qHZRyYdZkl1r9AFrtFeVJ3EyLvIj2gNvCsfRWZzcj+dcwIbADojU9M5//YmnyTqddLlsv5gCyYqQONV5WdK6Oi8ckib3wyR2wRdl1YIk+AIztigMnG2xaFgbSOhoTj6d+wCSMB9Mbsk2rsrW5hwp9ouXY09mCre+s4

Measures of Spread

Variance: σ² = Σ(xᵢ − x̄)²/n
Standard deviation: σ = √(σ²)
Range: max − min
IQR: Q3 − Q1

Variance measures average squared deviation from the mean. Standard deviation has the same units as the data — it's the most widely used spread measure. These connect to the normal distribution via the 68-95-99.7 rule.

Example: Data: 4, 7, 8, 10, 11

Mean = 40/5 = 8

Deviations: −4, −1, 0, 2, 3. Squared: 16, 1, 0, 4, 9

QQFEyMQoR0d/smAMfw6IEuw62aWTEdx43EHmDaMRyjg04xLbOG0vgiiVe72On6uuIUk+gu/hJ4J9ZhMgJC1QGCAilugOcZMqaqheqG2pf887wBggmVDMzJ+r1ifRIlrIsenY3a9pmYY4PFWAfCfctyG1Se0iZqWWtOgpHHA5pgb+8cQwM711OdtG/Z1rwjI/why35XymFrsrDi5545jpG8NtMQpItmhw+I9I1MdJRMjglQvbjkfWQnGcZf+dUSij9G89CiF+ehgrSBHUa/Y/Tre/KvebZ+H4d68x2QtRUKQl7m1oou9qbt39kwWAHgEbyXX2HM0uFA1pz4y2wBQXVIe2bKkCPKm16acWF5SWD94YNoHhI5jMSARzemCEEUCottDSxRO+TOSqW5t99vvsnxhoLpQLGhYbQ/xjjshBlw2qexKv/OKriqPNuKLVPzuw0GyfD/3nvG4DW7+HGWjKj/SLyjAxfiKw5RrZUoAuP8Gz+/3jeeuLTafT0funvU6bhjwTO9oyXa9AjEDMh9bDiL2vWnKsC5HcN7pw4Dqx1OV8uBV1c+mwN2G3JY6ZYpYUHI5xOqYwAbgP/5q9+Quk8AXgLUxydcZ2NduGXKp/fVa8F4JjQ0MvNk8m1ktdnxVj8d7XvfL9CZtjepxq+7gTVYnmtQfAj3wCFFVKMq0QErFNiRlHQnxLNZqnDRZqByBldULBCt98ByZj6gd1x8kAlXat42134ScoDSasVjEEfVzeGSd0onUp4Vv4HJECyRxc1UFFGPXNU74h5UL63U6phQb+EXd2wpcXZJD7EIuIUqH1q1YvFufesl7xYVJYt6B+EVFHDQoKkWqvjm8Ly8r4Lkn/x1A0XdtyTIrQmBBDD++lR/YMPkcXs3owaQrPn/6L0cvFaO8xk+YqfL27ht6lsCQY=

Variance = 30/5 = 6. SD = √6 ≈ 2.45

Data Visualization

  • Histograms: Show distribution shape and frequency (area under the curve connects to integration)
  • Box plots: Display median, quartiles, and outliers
  • Scatter plots: Show relationships between two variables → regression
  • Bar/pie charts: Compare categorical data

Distribution Shape

Skewness describes asymmetry: right-skewed (mean > median, long right tail), left-skewed (mean < median). Kurtosis describes tail heaviness. The normal distribution has skewness 0 and kurtosis 3 (by convention, "excess kurtosis" = 0).

Descriptive statistics is the first step of any data analysis. Before fitting regression models or running hypothesis tests, always visualize and summarize your data. As the saying goes: "Plot your data."
12tePm0tIQ+L6DMVafBLPZra1XgI8JdVUJK9XGffqSxYp1UIV65A1+nb8HVn9klWtKaBGK70vvTAa5gzjUpAZrVX3tc/oXCn3VsiqSLhqfNldYy0KtoZOPeLa54fM87ceKWS56bY3MpiAOCdTn3Gf0laBDZvs2XONTL0PyUny08ClBY8QUVk5F5y8LxKfYOyOvt9dqsNKfLyviy/pagHzK2hsLB33/PGhZUG8BFvo6rLweGoIbdp4Ujxv/mJVS8fnLGnxtOL0IP5+OOGuA3BWfijqgUnCbQ8WEGLP2ToQ9W5wB4ZWN0uCioGTqmU+BWRFy0afOCXf2TSHgE4u+AMViFxdMXbXeBx95tpu0IG5e/Ck8gv63nP9o78ynZsOkLuL/7XAcLW9UALMIwiorD5VC7BlYstBwTHiaY1Oi0gmCq7DB84LSzooF6nZbf43YdCh2z/etXEvgYL+j1Uz/+6RsE8YSWHReLlpJGw1RyJz2xWBvgfxkPqbYu3v3M8X9EADLfgmSSDL6sud7KHSorTorANqhwjYrQk2hIyqrbfeuL7C/rPy93ygXv6GcMOfw0CFZaZsdvudFvr8nLyfMvD6/u7rrhSRlFU1HEcGnatQMnap1TtBGQOXEdfdfm8QRLk7viBftUUqZPPLpwFhCJRm37I095v3+hHOxlP56871kyIEKeAQmjxlkgYD6Qa4X3QFx5xWF78WQM3hxwDAE0nTP80JCYmPoDReFOwPNH91Ej0r5PWLAeJq2S5exX6Z6cwWOw7QrdB4UFOG0vWtNJpgVR489K6BWcr5dbDHoms7Gh0GBeTROE6CwnwUDCCrBn6QN5/OZ//0cfF99MepftFc4tqjFFceUEpAI6VwfIIJUZ2gkVr4DnF2Vdls+tqUxd8OajEMz+A3r3SWwbncHBwQ26Uyp