Dataset: mtcars

The mtcars dataset contains fuel consumption and 10 aspects of automobile design and performance for 32 cars (1973–74 models).

Variable	Description
mpg	Miles/(US) gallon
cyl	Number of cylinders
disp	Displacement (cu.in.)
hp	Gross horsepower
drat	Rear axle ratio
wt	Weight (1000 lbs)
qsec	1/4 mile time
vs	Engine (0 = V-shaped, 1 = straight)
am	Transmission (0 = automatic, 1 = manual)
gear	Number of forward gears
carb	Number of carburetors

Required library

library(car)

## Loading required package: carData

library(emmeans)

## Welcome to emmeans.
## Caution: You lose important information if you filter this package's results.
## See '? untidy'

library(multcomp)

## Loading required package: mvtnorm

## Loading required package: survival

## Loading required package: TH.data

## Loading required package: MASS

## 
## Attaching package: 'TH.data'

## The following object is masked from 'package:MASS':
## 
##     geyser

library(MASS)
library(nnet)
library(ggpubr)

## Loading required package: ggplot2

library(ggplot2)

data(mtcars)

Convert variable:

The am easier to interpret output (e.g., “Manual” vs “Automatic” instead of 0 vs 1). Correct statistical treatment: Many R functions (like aov, ggplot2, t.test, etc.) treat factors as categorical variables and automatically handle group separation.

The cyl (cylinders) variable is also numeric (e.g., 4, 6, 8), but in many statistical models or visualizations, we treat it as a categorical grouping variable. Converting it to a factor means: R treats each cylinder count as a distinct category, not a continuous variable.

mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual"))
mtcars$cyl <- factor(mtcars$cyl)

All parametric test:

One Sample t-test

Q: Is the average mileage (mpg) of cars in this dataset significantly different from the industry standard of 20 mpg?

Explanation: We’re comparing a sample mean (mean mpg from mtcars) to a known population value (20 mpg). This checks if cars from that era were more/less fuel efficient than expected.

t.test(mtcars$mpg, mu = 20)

## 
##  One Sample t-test
## 
## data:  mtcars$mpg
## t = 0.08506, df = 31, p-value = 0.9328
## alternative hypothesis: true mean is not equal to 20
## 95 percent confidence interval:
##  17.91768 22.26357
## sample estimates:
## mean of x 
##  20.09062

Two Sample t-test (Independent Samples)

Q: Do cars with automatic and manual transmissions have different average mileages (mpg)?

Explanation: Here, we’re comparing two independent groups (am = 0 vs am = 1) on a continuous outcome (mpg). Useful for car manufacturers comparing fuel efficiency of designs.

t.test(mpg ~ am, data = mtcars) #response ~ predictor

## 
##  Welch Two Sample t-test
## 
## data:  mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means between group Automatic and group Manual is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group Automatic    mean in group Manual 
##                17.14737                24.39231

Paired Sample t-test

Q: Suppose we had mileage data for the same cars before and after a new fuel additive. Did the additive significantly change fuel efficiency?

Explanation: You’d compare mpg_before vs mpg_after for the same cars. Since such data isn’t in mtcars, this is a hypothetical use case.

set.seed(123)
mpg_before <- mtcars$mpg
mpg_after <- mpg_before + rnorm(32, 1, 0.5)

t.test(mpg_before, mpg_after, paired = TRUE)

## 
##  Paired t-test
## 
## data:  mpg_before and mpg_after
## t = -11.626, df = 31, p-value = 7.811e-13
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -1.1518914 -0.8080549
## sample estimates:
## mean difference 
##      -0.9799731

One-Way ANOVA

Q: Do cars with different numbers of cylinders (4, 6, 8) have significantly different mean horsepower (hp)?

Explanation: We’re comparing the mean of a continuous variable (hp) across more than two groups defined by cyl.

Note: One-Way ANOVA tests whether there are significant differences between the means of more than two groups, based on one independent (categorical) variable.

aov1 <- aov(hp ~ cyl, data = mtcars)
summary(aov1)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## cyl          2 104031   52015   36.18 1.32e-08 ***
## Residuals   29  41696    1438                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

TukeyHSD(aov1)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = hp ~ cyl, data = mtcars)
## 
## $cyl
##          diff       lwr       upr     p adj
## 6-4  39.64935 -5.627454  84.92616 0.0949068
## 8-4 126.57792 88.847251 164.30859 0.0000000
## 8-6  86.92857 43.579331 130.27781 0.0000839

plot(TukeyHSD(aov1), las = 1)

Comment:

The F value is 36.18 and p value is 0.00000000132. Since, p value < 0.05. Then we may conclude that there is enough evidence to reject null hypothesis. Therefore, here at least one cylinder group has a significantly different average horsepower.

Tukey HSD

To find which group is different, we have to test Post-Hoc Test (Tukey HSD). Difference between 4 and 6 cylinders is not statistically significant. Cars with 8 cylinders have significantly higher horsepower than both 4 and 6-cylinder cars.

Two-Way ANOVA

Q: How do the number of cylinders and transmission type together affect fuel efficiency (mpg)?

Explanation: Two independent variables (cyl and am) and one dependent variable (mpg). Useful in automotive design to assess interaction effects.The effects of two categorical variables (in this case, am and cyl) on a continuous outcome (mpg).

Note: Two-Way ANOVA tests the effect of two independent variables on one dependent variable, and whether there’s an interaction effect between them.

aov2 <- aov(mpg ~ am * cyl, data = mtcars)
summary(aov2)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## am           1  405.2   405.2  44.064 4.85e-07 ***
## cyl          2  456.4   228.2  24.819 9.35e-07 ***
## am:cyl       2   25.4    12.7   1.383    0.269    
## Residuals   26  239.1     9.2                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Explanation: Look at interaction and main effects.
interaction.plot(mtcars$cyl, mtcars$am, mtcars$mpg,
                 col = c("blue", "red"), legend = TRUE,
                 main = "Interaction: Transmission & Cylinders on MPG")

Repeated Measure ANOVA

Q: Does the antihypertensive medication lead to a statistically significant reduction in systolic blood pressure over the 4-week period?

# Create the data
patient <- factor(1:10)
time1 <- c(150, 160, 155, 148, 152, 158, 149, 151, 157, 153)
time2 <- c(145, 155, 150, 144, 147, 153, 145, 146, 152, 148)
time3 <- c(140, 150, 145, 140, 142, 148, 141, 142, 147, 143)

# Combine into a data frame
data <- data.frame(patient, time1, time2, time3)

# Reshape the data to long format
library(tidyr)
data_long <- pivot_longer(data, cols = time1:time3, names_to = "time", values_to = "bp")
data_long$time <- factor(data_long$time, levels = c("time1", "time2", "time3"))

# Load necessary package
library(ez)

# Perform the ANOVA
anova_results <- ezANOVA(
  data = data_long,
  dv = bp,
  wid = patient,
  within = time,
  detailed = TRUE
)

# View the results
print(anova_results)

## $ANOVA
##        Effect DFn DFd         SSn   SSd         F            p p<.05       ges
## 1 (Intercept)   1   9 661864.5333 386.8 15400.157 7.277289e-16     * 0.9994108
## 2        time   2  18    451.2667   3.4  1194.529 7.312607e-20     * 0.5362859
## 
## $`Mauchly's Test for Sphericity`
##   Effect         W            p p<.05
## 2   time 0.1614764 0.0006798855     *
## 
## $`Sphericity Corrections`
##   Effect       GGe        p[GG] p[GG]<.05      HFe       p[HF] p[HF]<.05
## 2   time 0.5439147 1.133123e-11         * 0.561053 5.56485e-12         *

Mixed Model ANOVA

# Install & Load Required Packages.
library(lme4)

## Loading required package: Matrix

## 
## Attaching package: 'Matrix'

## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack

library(lmerTest)   # For p-values with lmer

## 
## Attaching package: 'lmerTest'

## The following object is masked from 'package:lme4':
## 
##     lmer

## The following object is masked from 'package:stats':
## 
##     step

library(emmeans)    # For post hoc analysis
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.4     ✔ tibble    3.2.1
## ✔ purrr     1.0.4

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ Matrix::expand() masks tidyr::expand()
## ✖ dplyr::filter()  masks stats::filter()
## ✖ dplyr::lag()     masks stats::lag()
## ✖ Matrix::pack()   masks tidyr::pack()
## ✖ dplyr::recode()  masks car::recode()
## ✖ dplyr::select()  masks MASS::select()
## ✖ purrr::some()    masks car::some()
## ✖ Matrix::unpack() masks tidyr::unpack()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

# Example wide-format data
data_wide <- data.frame(
  Subject = 1:10,
  Group = rep(c("A", "B"), each = 5),
  Time1 = c(150, 148, 152, 149, 151, 149, 152, 151, 150, 153),
  Time2 = c(140, 138, 139, 137, 138, 147, 149, 149, 148, 151),
  Time3 = c(132, 129, 130, 127, 126, 145, 147, 146, 144, 148)
)

# Convert to long format
data_long <- data_wide %>%
  pivot_longer(cols = starts_with("Time"), 
               names_to = "Time", 
               values_to = "SBP")

# Convert variables to factors
data_long <- data_long %>%
  mutate(
    Group = factor(Group),
    Time = factor(Time, levels = c("Time1", "Time2", "Time3")),
    Subject = factor(Subject)
  )
# Random intercept model: repeated measures nested in Subject (Run Mixed Model ANOVA)
model <- lmer(SBP ~ Group * Time + (1 | Subject), data = data_long)

# ANOVA table
anova(model)

## Type III Analysis of Variance Table with Satterthwaite's method
##            Sum Sq Mean Sq NumDF DenDF F value    Pr(>F)    
## Group      128.27  128.27     1     8  112.36 5.486e-06 ***
## Time       858.87  429.43     2    16  376.15 3.538e-14 ***
## Group:Time 330.87  165.43     2    16  144.91 5.615e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Post Hoc Tests
# a. Pairwise comparisons of Time within each Group
emmeans(model, pairwise ~ Time | Group, adjust = "bonferroni")

## $emmeans
## Group = A:
##  Time  emmean    SE   df lower.CL upper.CL
##  Time1    150 0.746 14.2      148      152
##  Time2    138 0.746 14.2      137      140
##  Time3    129 0.746 14.2      127      130
## 
## Group = B:
##  Time  emmean    SE   df lower.CL upper.CL
##  Time1    151 0.746 14.2      149      153
##  Time2    149 0.746 14.2      147      150
##  Time3    146 0.746 14.2      144      148
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## 
## $contrasts
## Group = A:
##  contrast      estimate    SE df t.ratio p.value
##  Time1 - Time2     11.6 0.676 16  17.166  <.0001
##  Time1 - Time3     21.2 0.676 16  31.372  <.0001
##  Time2 - Time3      9.6 0.676 16  14.206  <.0001
## 
## Group = B:
##  contrast      estimate    SE df t.ratio p.value
##  Time1 - Time2      2.2 0.676 16   3.256  0.0149
##  Time1 - Time3      5.0 0.676 16   7.399  <.0001
##  Time2 - Time3      2.8 0.676 16   4.143  0.0023
## 
## Degrees-of-freedom method: kenward-roger 
## P value adjustment: bonferroni method for 3 tests

# b. Compare Groups at each Time point
emmeans(model, pairwise ~ Group | Time, adjust = "bonferroni")

## $emmeans
## Time = Time1:
##  Group emmean    SE   df lower.CL upper.CL
##  A        150 0.746 14.2      148      152
##  B        151 0.746 14.2      149      153
## 
## Time = Time2:
##  Group emmean    SE   df lower.CL upper.CL
##  A        138 0.746 14.2      137      140
##  B        149 0.746 14.2      147      150
## 
## Time = Time3:
##  Group emmean    SE   df lower.CL upper.CL
##  A        129 0.746 14.2      127      130
##  B        146 0.746 14.2      144      148
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## 
## $contrasts
## Time = Time1:
##  contrast estimate   SE   df t.ratio p.value
##  A - B        -1.0 1.06 14.2  -0.948  0.3592
## 
## Time = Time2:
##  contrast estimate   SE   df t.ratio p.value
##  A - B       -10.4 1.06 14.2  -9.856  <.0001
## 
## Time = Time3:
##  contrast estimate   SE   df t.ratio p.value
##  A - B       -17.2 1.06 14.2 -16.301  <.0001
## 
## Degrees-of-freedom method: kenward-roger

ANCOVA (Analysis of Covariance)

Q: Do cars with different transmission types (am) differ in mpg after controlling for wt (weight)?

Explanation: ANCOVA is used to compare group means (transmission types) while controlling for continuous variables (covariates like weight).

mpg: Dependent variable (miles per gallon) am: Factor variable (transmission: Automatic vs Manual) wt: Covariate (car weight, continuous)

ancova <- aov(mpg ~ am + wt, data = mtcars)
summary(ancova)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## am           1  405.2   405.2   42.22 4.11e-07 ***
## wt           1  442.6   442.6   46.12 1.87e-07 ***
## Residuals   29  278.3     9.6                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Chi-Square Test of Independence

Q: Is there a relationship between engine type (vs: V-shaped or straight) and transmission type (am)?

Explanation: Both vs and am are categorical. This tests for independence—e.g., whether straight engines are more likely to have manual transmission.

crosstab <- table(mtcars$vs, mtcars$am)
chisq.test(crosstab)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  crosstab
## X-squared = 0.34754, df = 1, p-value = 0.5555

Parametric Test

Shefayet Hossain Monna

2025-05-28