Descriptive Statistics

Summarize, visualize, and understand data before diving into inference.

GD+Ms6wMwkhgUZl8x798R3fRbLmZ6N+7WxeooEJzUMmJEIKp4qwKMxE14Kk6Pl7Gr0olXzHEk+i8ldpm9ZZ9TQ0drRgoewYoHyAZJY+MGW2zRcGWEr2reSU0615tJY9e1XEh8EN3eV9iRBvy6s1ThAcrwT9HTC50OMQ26k8qtYFqnJXxPvRkTrfzLlzdYLWI+AQ7qkreS50NXPQ2Jroz6yPGjC1DpXdINJcm1fNt0IhJ0TTOdfWw2OCk1h2d+XApM1VK5s1b0H5hA789M0Jxju2fLtb0I9IXxhuf4nHOI/bsFHbmDlh2R6veeA8m3OMLedIbEx26p88A+c8YriWtz37GnXDZdXFhCITy7xVybDZ9jJQarQJCZLy35L6XAjYA/Yldi6bBoWiib/y6Ut6uNwRplRDI4avN//zO9mKM97JULpExrTmNhuwqoFnVA11adwadxPq1u2Q3REuCqgrREk1HgDi5mgXQmCjgy02fnqJdk5hJ81/NiqhLyNiWMadGR6UL24ILHtdIpnHTMlVduk/5RSMGGgjLrzvgzplyMu/h9X3j3bhqATcYCr5OqSH1VQhbV6YOuvV0PXzkpwIbzmofxj1xbCgpIg5cr9JLq/nrNg5jCYpoRMFDJNism+BkJVlZlw3HOfWLfTgLHPcGntntOEqyHt3KUWImHRw1b/MUF4AVzkSJysikgV+TSZMP9615ZndK2VYpyUc1RTPrMHNoNNRuawYWePgCHE89sgrDTHe1ORNta1LV2xfTKTCqfgoFiHLkwDXEFuYWEwwgCL2ptG+ryusJiSMmrjAK9/hVVNvE2h8BK4XSTqT3NSn6qA6DVjIlZeeSRHpjYRmQDM1kCwHq9rVyPE312Cgjh6SrSe9KRgVUu5+DCE5gypP7ah55f3kUHxtR9cKIykaSfj

Measures of Center

Mean: x̄ = (Σxᵢ)/n
Median: middle value when sorted
Mode: most frequent value
xBZqdZD2quD3xeOW7VibnHBLzDfNWWecK8UgT0OsM/9yM8jbXChmCWsB1FUyBPGkw7oFsNpSPNbJc0f3K7d/3ROho4vIlrWFq69RakcfClz0ncfKvDnm+CqVzFddR1WL4BrlYWUEnT8KPrWU+Czfu2qvdqvLYMdIg8ptFevMvBwt6568OW01QoJg5e8J3sXeTySjQDF7FkUyzWlUslb7zIV7MHyzuu4dq2UVeKS0nD1lDn0whAW+lc0nkJ8hNjLT9wlOoR3P4cNgE0p5nFBqEf281SjE3W0QARdXw4+urKoJ/w6hok2bS6+pq4NK9dHBmoidHGvOL4otXFOMF1j5ysMl3F69OO8ArlHjMhTjwGa3y5aRTDHLO+XMr4JywjOomVYw4t2lfAnd4mBH1Ybz7pBsla/JOLU3f9fNLP0cOfIcNL8Tfy82WNE1u9IxvOltwHrBT0624EXhJPX61mWudaFfBNu1qisjU/6DQy5X0v5g2QE3SJg0dI7LJGNtng8t8Ph4xMSXWEfP4uVfK7i0CHr2Xy4QOAuAiOx4LYkW55t1TPsBOe/n/Rt2gFBK5l8SY/yq3CN+xsd7mGQC+0hMthDc9bW/voKGRCIHXPxKgYDoaTBhElN0LdwsCpNzgSkQr98IJ5eTjyLERXMRd8woI9YC2vl/alz400GYHuxW9wHec7N8IW7bJ0JixhqI+1Gf2MpI/XkGZZ4TxCOwRbo7tVs9MTTdjIVxfSEoxjzOTlHU5lgQxYPJkYE4WoxxO0uI/30jGeGtRdT4Ump4ZeSjPdUwem3Kw28qUs1lPCVbHfIZFCaRK2dFkL53oiBXh8aPRaK1RWvCkBFzwRfInDduV+x7KtMdTL/OQHYR8S651x5bFhqIH46H6oa4kt49DI0Kr3qHEMEo0HEwyh1pc2SPkl

The mean uses algebraic operations; it's sensitive to outliers. The median is more robust. For symmetric distributions, mean ≈ median ≈ mode.

Measures of Spread

Variance: σ² = Σ(xᵢ − x̄)²/n
Standard deviation: σ = √(σ²)
Range: max − min
IQR: Q3 − Q1

Variance measures average squared deviation from the mean. Standard deviation has the same units as the data — it's the most widely used spread measure. These connect to the normal distribution via the 68-95-99.7 rule.

Example: Data: 4, 7, 8, 10, 11

Mean = 40/5 = 8

Deviations: −4, −1, 0, 2, 3. Squared: 16, 1, 0, 4, 9

Variance = 30/5 = 6. SD = √6 ≈ 2.45

Data Visualization

  • Histograms: Show distribution shape and frequency (area under the curve connects to integration)
  • j58WSVsUWVePJ6O2F7TP8ZFATgA8oJM0VC2pzlXP4l4oDxKynm9LG6YCiVbuyiqpkqHZKr8cWiKA/2s8mP0duFKa76KSRpyugROaVicZWfBq0YqhmlViE/nHnk5UON7AOnkNUqkkFZqsKvtCLBaCg6YGEMD3Jr+NP+3KgYmt32xPjzWPe4bT76Dd9MNe5j3dW9r4EsqDlwvVia9vQbDelolDRmpxFzl+CAQCLqhJmfc8Cwo4ErAcOORlVLLdwzdGC4dwtqdvWGg0//v1+cOHD2Vi/9MJPS/PVRhsKtsV9uOKBqEtQKKlgHX2QrwnKNR9DF2DyXkAzc1o1rrhZzg4/FZnc4PizToIEmmVplIonbivkrzBJJEocjH8CbU0j1OzmorSmv9WMu4cobH6MPScXR6p1V8V70FG0LeTWfdBxC8HTJclgGxKj58jsQhNhmqVyMFwKAQN9FPUQmpuSLclD37zE6WvvH2Dnkq9qeEjOT1l2ca/4mLEP+dysAV5yc+nF69OqrhvUfpQAcuWod9qd2VNNX8NAwXbHoKsNBeVW+KGIM3U7G9xX6/5zQ4R3+AmVkMkyLABOGJWGkWNjcDT2xyk3mR78vt42lUs7YQ+e30EN8uYVEywhWZaTdmxohlmgcn6VoIgvndPkQa64ZyrySl7rJ/ovEbuxXxWrUQ22IiOn9VBzKopcJf51XLQVyriO4xkeh3kdK2QU9sZxsOI/Zx25rKkY1VPvdgzSh5vVof5U0R8kQOwCxTKLdGGe95GUNJ9jnhlvJUji8LW+QX1wWUQyVys4BWjRYBHXzEl+A94XGIyqtfZxUpu2KvuhduBEtI8p8Z+nIsbjahtQjjRI1R8fi8jHTlAoPnFdxvewuSUnEXx2fJ43xfqRjvXb2oIkYNtuq5K3iP/aPpJ5atW7F
  • Box plots: Display median, quartiles, and outliers
  • z3PzxsnWzBIQILY62oWR6heVqF5mQsYHJRxuYs9HiIjxS/uqwLMQ5mNlMscI8I5f62xEwK3DJ7uNP+CMsFU+5ZAbzbYcwWWPXKF/ajVuTS/KGgvazIOr6NyugqXCIw/bu9MZFTpfn3xO+V4YbH54a5mGLBop1wn/V01nh01nf9PCxWluVfoT0s+lIYY3WpM2T4gtyn55T9J83LlghrlWJwy97QZRszrWg7GTAFUrRiHh6KVPP0FELXXUpNKjsEQDxp+TKAvyan8LRN3EX0pGSShJVYjK0HOgn59xO2Ef3MvL3DoF60dc+Ztzuht3ZV9lO/9VcsBkWv2WQ45qcBsfQV/fd2AFVhesw2vSAOYcxCVzZjFX8weasiLZ/M4rth7iwMkzaZlTPq6BN//nJkVKzliiqSRgvC0Lbk3BimwL4nK5G7oZwBrMNSNw6IHPCqsLmI5T/SN5rx9iVxWDBCafgGO5XMtR06ehzeaLFEu5cHKGMdIl3RzfGZtKKkLjETja2jkR7FmiERhIMZ8N3O+oMDm0VxR+pPsfN9WcgmjMop0T24O82NQCUePtf31Qvt9/O8LaFN0r8Kzv5bvwmK5DRhHq9m0Mh+28QjjGOO/U7srq8kFPjf/5gxBMtmVj8AZOZIKnzZcFZhHcn+VXYVkAJVLc4AU4A7bh5XNp+7lGrX3Z9WkLfvr3AXI+YywJqo0swKjmzV+m8Yyy/NeP7CNs1rvbV4R+3qar0xauBrrpRM2aLlBC3WOr9mVXmOnhtUdVjfeKYmxFSAG2dq8J8L4anNFgHD+He86P6qTZRviK9YO+QlYJRz4Q7BNZjJi7kS1cn/XpiNRS4UMXb9NXcTUdEyvX8sx6LKrFO8KnXlq/U47zj9cpS+HLOw4QgExmTdST+8qSUa1cXFKKbOYtLKgGgJ
  • Scatter plots: Show relationships between two variables → regression
  • Bar/pie charts: Compare categorical data

Distribution Shape

Skewness describes asymmetry: right-skewed (mean > median, long right tail), left-skewed (mean < median). Kurtosis describes tail heaviness. The normal distribution has skewness 0 and kurtosis 3 (by convention, "excess kurtosis" = 0).

Descriptive statistics is the first step of any data analysis. Before fitting regression models or running hypothesis tests, always visualize and summarize your data. As the saying goes: "Plot your data."