1. Introduction

In statistical inference, we aim to make generalizations about a population based on a sample. To do this, we use two broad categories of statistical tests: Parametric and Non-Parametric.

The choice between these two depends on the nature of your data, the sample size, and the underlying distribution of the population.


2. Parametric Statistics

Parametric statistics are based on the assumption that the data are sampled from a population that follows a specific probability distribution (usually the Normal Distribution).

Key Assumptions

  1. Normality: The data follow a normal (bell-shaped) distribution.
  2. Homogeneity of Variance: The variance within groups is roughly equal.
  3. Independence: Observations are independent of each other.
  4. Data Type: Usually applied to Interval or Ratio scale data.

Mathematical Formula (The t-test)

One of the most common parametric tests is the Student’s t-test. The formula for a one-sample t-test is:

\[t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}\]

Where: * \(\bar{x}\): Sample mean. * \(\mu_0\): Population mean (hypothesized). * \(s\): Sample standard deviation. * \(n\): Sample size.

Real-Life Example

Blood Pressure Research: A pharmaceutical company tests a new drug. Since blood pressure in a large population is naturally normally distributed, they use a parametric t-test to compare the mean blood pressure before and after the medication.


3. Non-Parametric Statistics

Non-parametric statistics (often called distribution-free tests) do not assume that the data follow a specific distribution. Instead of using the actual values, these tests often use the ranks of the data.

When to use Non-Parametric Tests?

  1. When the data is skewed or has heavy outliers.
  2. When the sample size is very small.
  3. When the data is Ordinal (e.g., rankings or Likert scales).

Mathematical Formula (Mann-Whitney U Test)

Instead of comparing means, the Mann-Whitney U test compares the ranks of two groups. The U statistic is calculated as:

\[U = n_1 n_2 + \frac{n_1(n_1 + 1)}{2} - R_1\]

Where: * \(n_1, n_2\): Sample sizes of the two groups. * \(R_1\): Sum of ranks for the first group.

Real-Life Example

Customer Satisfaction: A restaurant asks customers to rate their experience from 1 (Poor) to 5 (Excellent). Because “Excellent” minus “Good” doesn’t have a mathematical numerical value, and the data is likely skewed, a non-parametric Mann-Whitney U or Wilcoxon test is used to compare two different branches.


4. Comparison Table

Feature Parametric Tests Non-Parametric Tests
Assumed Distribution Normal Any (Distribution-free)
Measure of Center Mean Median
Power Higher (if assumptions met) Lower (less sensitive)
Data Type Ratio or Interval Ordinal or Nominal
Outliers Highly affected Robust (less affected)

5. R Implementation: Choosing the Test

In R, we often use the Shapiro-Wilk Test to check for normality before deciding which test to use.

Scenario: Comparing Weight Loss in Two Groups

# Generate two groups of data
# Group A: Normally distributed
group_a <- c(2.1, 2.5, 3.0, 2.8, 3.2, 2.9, 3.5)

# Group B: Skewed data with an outlier
group_b <- c(1.2, 1.5, 1.8, 2.0, 1.9, 8.5, 1.1) 

# 1. Test for Normality (Shapiro-Wilk)
# p > 0.05 implies normality
shapiro.test(group_a) # Likely Normal
## 
##  Shapiro-Wilk normality test
## 
## data:  group_a
## W = 0.98483, p-value = 0.9796
shapiro.test(group_b) # Likely Not Normal due to 8.5 outlier
## 
##  Shapiro-Wilk normality test
## 
## data:  group_b
## W = 0.5779, p-value = 0.000149
# 2. Decision:
# For Group A (Normal), we might use a t-test:
t_result <- t.test(group_a, mu = 2.5)

# For Group B (Non-Normal), we use the Wilcoxon test:
w_result <- wilcox.test(group_b, mu = 2.5)

print(paste("Parametric p-value:", round(t_result$p.value, 4)))
## [1] "Parametric p-value: 0.0846"
print(paste("Non-Parametric p-value:", round(w_result$p.value, 4)))
## [1] "Non-Parametric p-value: 0.2969"

6. Summary of Corresponding Tests

If your data fails the parametric assumptions, you should switch to the non-parametric equivalent:

Parametric Test Non-Parametric Equivalent Purpose
Independent t-test Mann-Whitney U Test Compare 2 independent groups
Paired t-test Wilcoxon Signed-Rank Test Compare 2 related groups (pre/post)
One-way ANOVA Kruskal-Wallis Test Compare 3 or more groups
Pearson Correlation Spearman Correlation Measure relationship between variables

7. Conclusion

Parametric tests are more powerful when the data is “well-behaved” (normal). However, Non-parametric tests are essential tools in real-world scenarios where data is messy, samples are small, or observations are based on subjective rankings. ```


How to use this:

  1. Install RStudio.
  2. File -> New File -> R Markdown.
  3. Delete the template content and paste the code above.
  4. Click the Knit button to generate a professional PDF or HTML lecture hand-out.