March 16, 2025

What is Descriptive Statistics?

Descriptive statistics summarize and organize characteristics of a data set.

Two main types:

  • Measures of central tendency (mean, median, mode)

  • Measures of dispersion (range, variance, standard deviation)

Benefits:

  • Simplifies large amounts of data

  • Identifies patterns and trends

  • Foundation for inferential statistics

Measures of Central Tendency

Three main measures:

  • Mean: 76.08 (arithmetic average)
  • Median: 75.23 (middle value when data is ordered)
  • Mode: 74 (most frequent value)

Each measure provides different insights about the data’s central location.

Measures of Dispersion

Dispersion tells us how spread out the data is:

  • Range: 52.64 (difference between max and min)
  • Variance: 132.59 (average squared deviation from mean)
  • Standard Deviation: 11.51 (square root of variance)

Higher values indicate greater data spread.

Distribution Visualization with Histogram

Boxplot Visualization

3D Interactive Visualization

Mathematical Foundations - Formulas

For a dataset \(X = \{x_1, x_2, \ldots, x_n\}\) with \(n\) observations:

Mean: \[\bar{x} = \frac{1}{n}\sum_{i=1}^{n}x_i\]

Sample Variance: \[s^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2\]

Sample Standard Deviation: \[s = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2}\]

Mathematical Foundations - Properties

Properties of the Mean:

If \(X = \{x_1, x_2, \ldots, x_n\}\) and \(c\) is a constant:

  1. Linear Transformation: \[\overline{aX+b} = a\bar{X} + b\]

  2. Sum of Means: \[\overline{X+Y} = \bar{X} + \bar{Y}\]

  3. Minimization Property: \[\sum_{i=1}^{n}(x_i - \bar{x})^2 \leq \sum_{i=1}^{n}(x_i - c)^2\]

for any value \(c \neq \bar{x}\)

R Code for Descriptive Statistics

# Create 3D data
set.seed(789)
n <- 100
x <- rnorm(n, mean = 10, sd = 2)
y <- rnorm(n, mean = 15, sd = 3)
z <- 2*x + 3*y + rnorm(n, mean = 0, sd = 5)

plot_data <- data.frame(x = x, y = y, z = z)

# Create 3D scatter plot
plot_ly(plot_data, x = ~x, y = ~y, z = ~z, 
        marker = list(size = 5, color = ~z, colorscale = 'Viridis',
                      opacity = 0.8)) %>%
  add_markers() %>%
  layout(scene = list(xaxis = list(title = 'X Variable'),
                     yaxis = list(title = 'Y Variable'),
                     zaxis = list(title = 'Z Variable')),
         title = "3D Scatter Plot of Variables")

Summary and Applications

Key Takeaways:

  • Descriptive statistics provide numeric and visual summaries

  • Different measures reveal different aspects of data

  • Critical foundation for data analysis

Applications:

  • Business: Financial performance metrics

  • Healthcare: Patient outcome measurements

  • Education: Student performance analysis

  • Research: Experimental results summary