1. Introduction

In statistical inference, we aim to make generalizations about a population based on a sample. To do this, we use two broad categories of statistical tests: Parametric and Non-Parametric.

The choice between these two depends on the nature of your data, the sample size, and the underlying distribution of the population.

2. Parametric Statistics

Parametric statistics are based on the assumption that the data are sampled from a population that follows a specific probability distribution (usually the Normal Distribution).

Key Assumptions

Normality: The data follow a normal (bell-shaped) distribution.
Homogeneity of Variance: The variance within groups is roughly equal.
Independence: Observations are independent of each other.
Data Type: Usually applied to Interval or Ratio scale data.

Mathematical Formula (The t-test)

One of the most common parametric tests is the Student’s t-test. The formula for a one-sample t-test is:

\[t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}}\]

Where: * \(\bar{x}\): Sample mean. * \(\mu_0\): Population mean (hypothesized). * \(s\): Sample standard deviation. * \(n\): Sample size.

Real-Life Example

Blood Pressure Research: A pharmaceutical company tests a new drug. Since blood pressure in a large population is naturally normally distributed, they use a parametric t-test to compare the mean blood pressure before and after the medication.

3. Non-Parametric Statistics

Non-parametric statistics (often called distribution-free tests) do not assume that the data follow a specific distribution. Instead of using the actual values, these tests often use the ranks of the data.

When to use Non-Parametric Tests?

When the data is skewed or has heavy outliers.
When the sample size is very small.
When the data is Ordinal (e.g., rankings or Likert scales).

Mathematical Formula (Mann-Whitney U Test)

Instead of comparing means, the Mann-Whitney U test compares the ranks of two groups. The U statistic is calculated as:

\[U = n_1 n_2 + \frac{n_1(n_1 + 1)}{2} - R_1\]

Where: * \(n_1, n_2\): Sample sizes of the two groups. * \(R_1\): Sum of ranks for the first group.

Real-Life Example

Customer Satisfaction: A restaurant asks customers to rate their experience from 1 (Poor) to 5 (Excellent). Because “Excellent” minus “Good” doesn’t have a mathematical numerical value, and the data is likely skewed, a non-parametric Mann-Whitney U or Wilcoxon test is used to compare two different branches.

4. Comparison Table

Feature	Parametric Tests	Non-Parametric Tests
Assumed Distribution	Normal	Any (Distribution-free)
Measure of Center	Mean	Median
Power	Higher (if assumptions met)	Lower (less sensitive)
Data Type	Ratio or Interval	Ordinal or Nominal
Outliers	Highly affected	Robust (less affected)

5. R Implementation: Choosing the Test

In R, we often use the Shapiro-Wilk Test to check for normality before deciding which test to use.

Scenario: Comparing Weight Loss in Two Groups

# Generate two groups of data
# Group A: Normally distributed
group_a <- c(2.1, 2.5, 3.0, 2.8, 3.2, 2.9, 3.5)

# Group B: Skewed data with an outlier
group_b <- c(1.2, 1.5, 1.8, 2.0, 1.9, 8.5, 1.1) 

# 1. Test for Normality (Shapiro-Wilk)
# p > 0.05 implies normality
shapiro.test(group_a) # Likely Normal

## 
##  Shapiro-Wilk normality test
## 
## data:  group_a
## W = 0.98483, p-value = 0.9796

shapiro.test(group_b) # Likely Not Normal due to 8.5 outlier

## 
##  Shapiro-Wilk normality test
## 
## data:  group_b
## W = 0.5779, p-value = 0.000149

# 2. Decision:
# For Group A (Normal), we might use a t-test:
t_result <- t.test(group_a, mu = 2.5)

# For Group B (Non-Normal), we use the Wilcoxon test:
w_result <- wilcox.test(group_b, mu = 2.5)

print(paste("Parametric p-value:", round(t_result$p.value, 4)))

## [1] "Parametric p-value: 0.0846"

print(paste("Non-Parametric p-value:", round(w_result$p.value, 4)))

## [1] "Non-Parametric p-value: 0.2969"

6. Summary of Corresponding Tests

If your data fails the parametric assumptions, you should switch to the non-parametric equivalent:

Parametric Test	Non-Parametric Equivalent	Purpose
Independent t-test	Mann-Whitney U Test	Compare 2 independent groups
Paired t-test	Wilcoxon Signed-Rank Test	Compare 2 related groups (pre/post)
One-way ANOVA	Kruskal-Wallis Test	Compare 3 or more groups
Pearson Correlation	Spearman Correlation	Measure relationship between variables

7. Conclusion

Parametric tests are more powerful when the data is “well-behaved” (normal). However, Non-parametric tests are essential tools in real-world scenarios where data is messy, samples are small, or observations are based on subjective rankings. ```

How to use this:

Install RStudio.
File -> New File -> R Markdown.
Delete the template content and paste the code above.
Click the Knit button to generate a professional PDF or HTML lecture hand-out.

Lesson I: Parametric vs. Non-Parametric Statistics

Statistical Inference and Decision Making

Statistics Department

2025-12-29

1. Introduction

2. Parametric Statistics

Key Assumptions

Mathematical Formula (The t-test)

Real-Life Example

3. Non-Parametric Statistics

When to use Non-Parametric Tests?

Mathematical Formula (Mann-Whitney U Test)

Real-Life Example

4. Comparison Table

5. R Implementation: Choosing the Test

Scenario: Comparing Weight Loss in Two Groups

6. Summary of Corresponding Tests

7. Conclusion

How to use this: