Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data.
These measures indicate the center of a distribution.
The arithmetic average of a set of values.
Formula: \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}\]
The middle value in a list of numbers sorted from lowest to highest.
The value that occurs most often.
Dispersion tells us how “spread out” the data is.
The average of the squared differences from the Mean. \[s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n - 1}\]
The square root of the variance. It is the most common measure of spread. \[s = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n - 1}}\]
Let’s analyze a dataset of 50 employees in a startup.
# Create a dataset: Most earn between 40k-60k, but the CEO earns 250k
set.seed(123)
salaries <- c(rnorm(48, mean=50000, sd=5000), 250000, 260000)
# Calculating Central Tendency
mean_sal <- mean(salaries)
median_sal <- median(salaries)
# Calculating Dispersion
sd_sal <- sd(salaries)
range_sal <- range(salaries)
iqr_sal <- IQR(salaries)
# Display results
cat("Mean Salary:", mean_sal,
"\nMedian Salary:", median_sal,
"\nStandard Deviation:", sd_sal)
## Mean Salary: 58302.36
## Median Salary: 50021.49
## Standard Deviation: 40830.6
A histogram shows the frequency distribution. We will add a vertical line for the Mean (red) and Median (blue).
hist(salaries,
breaks = 15,
col = "lightgray",
border = "white",
main = "Distribution of Salaries",
xlab = "Salary ($)")
abline(v = mean_sal, col = "red", lwd = 2, lty = 2) # Dashed Red = Mean
abline(v = median_sal, col = "blue", lwd = 2) # Solid Blue = Median
legend("topright", legend=c("Mean", "Median"), col=c("red", "blue"), lty=c(2,1), lwd=2)
Boxplots are the best way to visualize outliers (the CEO’s salary).
boxplot(salaries,
horizontal = TRUE,
col = "lightblue",
main = "Boxplot of Salaries",
xlab = "Salary ($)")
| Statistic | R Function | Description |
|---|---|---|
| Mean | mean(x) |
Average value |
| Median | median(x) |
Middle value |
| Standard Deviation | sd(x) |
Average distance from mean |
| Variance | var(x) |
Squared dispersion |
| Interquartile Range | IQR(x) |
Spread of the middle 50% |
| Summary | summary(x) |
Min, Q1, Median, Mean, Q3, Max |
islands dataset in R (type
data(islands)).