The mtcars dataset contains fuel consumption and 10 aspects of automobile design and performance for 32 cars (1973–74 models).
| Variable | Description |
|---|---|
| mpg | Miles/(US) gallon |
| cyl | Number of cylinders |
| disp | Displacement (cu.in.) |
| hp | Gross horsepower |
| drat | Rear axle ratio |
| wt | Weight (1000 lbs) |
| qsec | 1/4 mile time |
| vs | Engine (0 = V-shaped, 1 = straight) |
| am | Transmission (0 = automatic, 1 = manual) |
| gear | Number of forward gears |
| carb | Number of carburetors |
library(car)
## Loading required package: carData
library(emmeans)
## Welcome to emmeans.
## Caution: You lose important information if you filter this package's results.
## See '? untidy'
library(multcomp)
## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: TH.data
## Loading required package: MASS
##
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
##
## geyser
library(MASS)
library(nnet)
library(ggpubr)
## Loading required package: ggplot2
library(ggplot2)
data(mtcars)
The am easier to interpret output (e.g., “Manual” vs “Automatic” instead of 0 vs 1). Correct statistical treatment: Many R functions (like aov, ggplot2, t.test, etc.) treat factors as categorical variables and automatically handle group separation.
The cyl (cylinders) variable is also numeric (e.g., 4, 6, 8), but in many statistical models or visualizations, we treat it as a categorical grouping variable. Converting it to a factor means: R treats each cylinder count as a distinct category, not a continuous variable.
mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual"))
mtcars$cyl <- factor(mtcars$cyl)
Q: Is the average mileage (mpg) of cars in this dataset significantly different from the industry standard of 20 mpg?
Explanation: We’re comparing a sample mean (mean mpg from mtcars) to a known population value (20 mpg). This checks if cars from that era were more/less fuel efficient than expected.
t.test(mtcars$mpg, mu = 20)
##
## One Sample t-test
##
## data: mtcars$mpg
## t = 0.08506, df = 31, p-value = 0.9328
## alternative hypothesis: true mean is not equal to 20
## 95 percent confidence interval:
## 17.91768 22.26357
## sample estimates:
## mean of x
## 20.09062
Q: Do cars with automatic and manual transmissions have different average mileages (mpg)?
Explanation: Here, we’re comparing two independent groups (am = 0 vs am = 1) on a continuous outcome (mpg). Useful for car manufacturers comparing fuel efficiency of designs.
t.test(mpg ~ am, data = mtcars) #response ~ predictor
##
## Welch Two Sample t-test
##
## data: mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means between group Automatic and group Manual is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean in group Automatic mean in group Manual
## 17.14737 24.39231
Q: Suppose we had mileage data for the same cars before and after a new fuel additive. Did the additive significantly change fuel efficiency?
Explanation: You’d compare mpg_before vs mpg_after for the same cars. Since such data isn’t in mtcars, this is a hypothetical use case.
set.seed(123)
mpg_before <- mtcars$mpg
mpg_after <- mpg_before + rnorm(32, 1, 0.5)
t.test(mpg_before, mpg_after, paired = TRUE)
##
## Paired t-test
##
## data: mpg_before and mpg_after
## t = -11.626, df = 31, p-value = 7.811e-13
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -1.1518914 -0.8080549
## sample estimates:
## mean difference
## -0.9799731
Q: Do cars with different numbers of cylinders (4, 6, 8) have significantly different mean horsepower (hp)?
Explanation: We’re comparing the mean of a continuous variable (hp) across more than two groups defined by cyl.
Note: One-Way ANOVA tests whether there are significant differences between the means of more than two groups, based on one independent (categorical) variable.
aov1 <- aov(hp ~ cyl, data = mtcars)
summary(aov1)
## Df Sum Sq Mean Sq F value Pr(>F)
## cyl 2 104031 52015 36.18 1.32e-08 ***
## Residuals 29 41696 1438
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
TukeyHSD(aov1)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = hp ~ cyl, data = mtcars)
##
## $cyl
## diff lwr upr p adj
## 6-4 39.64935 -5.627454 84.92616 0.0949068
## 8-4 126.57792 88.847251 164.30859 0.0000000
## 8-6 86.92857 43.579331 130.27781 0.0000839
plot(TukeyHSD(aov1), las = 1)
The F value is 36.18 and p value is 0.00000000132. Since, p value < 0.05. Then we may conclude that there is enough evidence to reject null hypothesis. Therefore, here at least one cylinder group has a significantly different average horsepower.
To find which group is different, we have to test Post-Hoc Test (Tukey HSD). Difference between 4 and 6 cylinders is not statistically significant. Cars with 8 cylinders have significantly higher horsepower than both 4 and 6-cylinder cars.
Q: How do the number of cylinders and transmission type together affect fuel efficiency (mpg)?
Explanation: Two independent variables (cyl and am) and one dependent variable (mpg). Useful in automotive design to assess interaction effects.The effects of two categorical variables (in this case, am and cyl) on a continuous outcome (mpg).
Note: Two-Way ANOVA tests the effect of two independent variables on one dependent variable, and whether there’s an interaction effect between them.
aov2 <- aov(mpg ~ am * cyl, data = mtcars)
summary(aov2)
## Df Sum Sq Mean Sq F value Pr(>F)
## am 1 405.2 405.2 44.064 4.85e-07 ***
## cyl 2 456.4 228.2 24.819 9.35e-07 ***
## am:cyl 2 25.4 12.7 1.383 0.269
## Residuals 26 239.1 9.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Explanation: Look at interaction and main effects.
interaction.plot(mtcars$cyl, mtcars$am, mtcars$mpg,
col = c("blue", "red"), legend = TRUE,
main = "Interaction: Transmission & Cylinders on MPG")
Q: Does the antihypertensive medication lead to a statistically significant reduction in systolic blood pressure over the 4-week period?
# Create the data
patient <- factor(1:10)
time1 <- c(150, 160, 155, 148, 152, 158, 149, 151, 157, 153)
time2 <- c(145, 155, 150, 144, 147, 153, 145, 146, 152, 148)
time3 <- c(140, 150, 145, 140, 142, 148, 141, 142, 147, 143)
# Combine into a data frame
data <- data.frame(patient, time1, time2, time3)
# Reshape the data to long format
library(tidyr)
data_long <- pivot_longer(data, cols = time1:time3, names_to = "time", values_to = "bp")
data_long$time <- factor(data_long$time, levels = c("time1", "time2", "time3"))
# Load necessary package
library(ez)
# Perform the ANOVA
anova_results <- ezANOVA(
data = data_long,
dv = bp,
wid = patient,
within = time,
detailed = TRUE
)
# View the results
print(anova_results)
## $ANOVA
## Effect DFn DFd SSn SSd F p p<.05 ges
## 1 (Intercept) 1 9 661864.5333 386.8 15400.157 7.277289e-16 * 0.9994108
## 2 time 2 18 451.2667 3.4 1194.529 7.312607e-20 * 0.5362859
##
## $`Mauchly's Test for Sphericity`
## Effect W p p<.05
## 2 time 0.1614764 0.0006798855 *
##
## $`Sphericity Corrections`
## Effect GGe p[GG] p[GG]<.05 HFe p[HF] p[HF]<.05
## 2 time 0.5439147 1.133123e-11 * 0.561053 5.56485e-12 *
# Install & Load Required Packages.
library(lme4)
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
library(lmerTest) # For p-values with lmer
##
## Attaching package: 'lmerTest'
## The following object is masked from 'package:lme4':
##
## lmer
## The following object is masked from 'package:stats':
##
## step
library(emmeans) # For post hoc analysis
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.4 ✔ tibble 3.2.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ Matrix::expand() masks tidyr::expand()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ Matrix::pack() masks tidyr::pack()
## ✖ dplyr::recode() masks car::recode()
## ✖ dplyr::select() masks MASS::select()
## ✖ purrr::some() masks car::some()
## ✖ Matrix::unpack() masks tidyr::unpack()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Example wide-format data
data_wide <- data.frame(
Subject = 1:10,
Group = rep(c("A", "B"), each = 5),
Time1 = c(150, 148, 152, 149, 151, 149, 152, 151, 150, 153),
Time2 = c(140, 138, 139, 137, 138, 147, 149, 149, 148, 151),
Time3 = c(132, 129, 130, 127, 126, 145, 147, 146, 144, 148)
)
# Convert to long format
data_long <- data_wide %>%
pivot_longer(cols = starts_with("Time"),
names_to = "Time",
values_to = "SBP")
# Convert variables to factors
data_long <- data_long %>%
mutate(
Group = factor(Group),
Time = factor(Time, levels = c("Time1", "Time2", "Time3")),
Subject = factor(Subject)
)
# Random intercept model: repeated measures nested in Subject (Run Mixed Model ANOVA)
model <- lmer(SBP ~ Group * Time + (1 | Subject), data = data_long)
# ANOVA table
anova(model)
## Type III Analysis of Variance Table with Satterthwaite's method
## Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
## Group 128.27 128.27 1 8 112.36 5.486e-06 ***
## Time 858.87 429.43 2 16 376.15 3.538e-14 ***
## Group:Time 330.87 165.43 2 16 144.91 5.615e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Post Hoc Tests
# a. Pairwise comparisons of Time within each Group
emmeans(model, pairwise ~ Time | Group, adjust = "bonferroni")
## $emmeans
## Group = A:
## Time emmean SE df lower.CL upper.CL
## Time1 150 0.746 14.2 148 152
## Time2 138 0.746 14.2 137 140
## Time3 129 0.746 14.2 127 130
##
## Group = B:
## Time emmean SE df lower.CL upper.CL
## Time1 151 0.746 14.2 149 153
## Time2 149 0.746 14.2 147 150
## Time3 146 0.746 14.2 144 148
##
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
##
## $contrasts
## Group = A:
## contrast estimate SE df t.ratio p.value
## Time1 - Time2 11.6 0.676 16 17.166 <.0001
## Time1 - Time3 21.2 0.676 16 31.372 <.0001
## Time2 - Time3 9.6 0.676 16 14.206 <.0001
##
## Group = B:
## contrast estimate SE df t.ratio p.value
## Time1 - Time2 2.2 0.676 16 3.256 0.0149
## Time1 - Time3 5.0 0.676 16 7.399 <.0001
## Time2 - Time3 2.8 0.676 16 4.143 0.0023
##
## Degrees-of-freedom method: kenward-roger
## P value adjustment: bonferroni method for 3 tests
# b. Compare Groups at each Time point
emmeans(model, pairwise ~ Group | Time, adjust = "bonferroni")
## $emmeans
## Time = Time1:
## Group emmean SE df lower.CL upper.CL
## A 150 0.746 14.2 148 152
## B 151 0.746 14.2 149 153
##
## Time = Time2:
## Group emmean SE df lower.CL upper.CL
## A 138 0.746 14.2 137 140
## B 149 0.746 14.2 147 150
##
## Time = Time3:
## Group emmean SE df lower.CL upper.CL
## A 129 0.746 14.2 127 130
## B 146 0.746 14.2 144 148
##
## Degrees-of-freedom method: kenward-roger
## Confidence level used: 0.95
##
## $contrasts
## Time = Time1:
## contrast estimate SE df t.ratio p.value
## A - B -1.0 1.06 14.2 -0.948 0.3592
##
## Time = Time2:
## contrast estimate SE df t.ratio p.value
## A - B -10.4 1.06 14.2 -9.856 <.0001
##
## Time = Time3:
## contrast estimate SE df t.ratio p.value
## A - B -17.2 1.06 14.2 -16.301 <.0001
##
## Degrees-of-freedom method: kenward-roger
Q: Do cars with different transmission types (am) differ in mpg after controlling for wt (weight)?
Explanation: ANCOVA is used to compare group means (transmission types) while controlling for continuous variables (covariates like weight).
mpg: Dependent variable (miles per gallon) am: Factor variable (transmission: Automatic vs Manual) wt: Covariate (car weight, continuous)
ancova <- aov(mpg ~ am + wt, data = mtcars)
summary(ancova)
## Df Sum Sq Mean Sq F value Pr(>F)
## am 1 405.2 405.2 42.22 4.11e-07 ***
## wt 1 442.6 442.6 46.12 1.87e-07 ***
## Residuals 29 278.3 9.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Q: Is there a relationship between engine type (vs: V-shaped or straight) and transmission type (am)?
Explanation: Both vs and am are categorical. This tests for independence—e.g., whether straight engines are more likely to have manual transmission.
crosstab <- table(mtcars$vs, mtcars$am)
chisq.test(crosstab)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: crosstab
## X-squared = 0.34754, df = 1, p-value = 0.5555