In statistical inference, we aim to make generalizations about a population based on a sample. To do this, we use two broad categories of statistical tests: Parametric and Non-Parametric.
The choice between these two depends on the nature of your data, the sample size, and the underlying distribution of the population.
Parametric statistics are based on the assumption that the data are sampled from a population that follows a specific probability distribution (usually the Normal Distribution).
One of the most common parametric tests is the Student’s t-test. The formula for a one-sample t-test is:
\[t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}\]
Where: * \(\bar{x}\): Sample mean. * \(\mu_0\): Population mean (hypothesized). * \(s\): Sample standard deviation. * \(n\): Sample size.
Blood Pressure Research: A pharmaceutical company tests a new drug. Since blood pressure in a large population is naturally normally distributed, they use a parametric t-test to compare the mean blood pressure before and after the medication.
Non-parametric statistics (often called distribution-free tests) do not assume that the data follow a specific distribution. Instead of using the actual values, these tests often use the ranks of the data.
Instead of comparing means, the Mann-Whitney U test compares the ranks of two groups. The U statistic is calculated as:
\[U = n_1 n_2 + \frac{n_1(n_1 + 1)}{2} - R_1\]
Where: * \(n_1, n_2\): Sample sizes of the two groups. * \(R_1\): Sum of ranks for the first group.
Customer Satisfaction: A restaurant asks customers to rate their experience from 1 (Poor) to 5 (Excellent). Because “Excellent” minus “Good” doesn’t have a mathematical numerical value, and the data is likely skewed, a non-parametric Mann-Whitney U or Wilcoxon test is used to compare two different branches.
| Feature | Parametric Tests | Non-Parametric Tests |
|---|---|---|
| Assumed Distribution | Normal | Any (Distribution-free) |
| Measure of Center | Mean | Median |
| Power | Higher (if assumptions met) | Lower (less sensitive) |
| Data Type | Ratio or Interval | Ordinal or Nominal |
| Outliers | Highly affected | Robust (less affected) |
In R, we often use the Shapiro-Wilk Test to check for normality before deciding which test to use.
# Generate two groups of data
# Group A: Normally distributed
group_a <- c(2.1, 2.5, 3.0, 2.8, 3.2, 2.9, 3.5)
# Group B: Skewed data with an outlier
group_b <- c(1.2, 1.5, 1.8, 2.0, 1.9, 8.5, 1.1)
# 1. Test for Normality (Shapiro-Wilk)
# p > 0.05 implies normality
shapiro.test(group_a) # Likely Normal
##
## Shapiro-Wilk normality test
##
## data: group_a
## W = 0.98483, p-value = 0.9796
shapiro.test(group_b) # Likely Not Normal due to 8.5 outlier
##
## Shapiro-Wilk normality test
##
## data: group_b
## W = 0.5779, p-value = 0.000149
# 2. Decision:
# For Group A (Normal), we might use a t-test:
t_result <- t.test(group_a, mu = 2.5)
# For Group B (Non-Normal), we use the Wilcoxon test:
w_result <- wilcox.test(group_b, mu = 2.5)
print(paste("Parametric p-value:", round(t_result$p.value, 4)))
## [1] "Parametric p-value: 0.0846"
print(paste("Non-Parametric p-value:", round(w_result$p.value, 4)))
## [1] "Non-Parametric p-value: 0.2969"
If your data fails the parametric assumptions, you should switch to the non-parametric equivalent:
| Parametric Test | Non-Parametric Equivalent | Purpose |
|---|---|---|
| Independent t-test | Mann-Whitney U Test | Compare 2 independent groups |
| Paired t-test | Wilcoxon Signed-Rank Test | Compare 2 related groups (pre/post) |
| One-way ANOVA | Kruskal-Wallis Test | Compare 3 or more groups |
| Pearson Correlation | Spearman Correlation | Measure relationship between variables |
Parametric tests are more powerful when the data is “well-behaved” (normal). However, Non-parametric tests are essential tools in real-world scenarios where data is messy, samples are small, or observations are based on subjective rankings. ```