R Markdown

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:dplyr':
## 
##     recode
## 
## The following object is masked from 'package:purrr':
## 
##     some
set.seed(123)

df <- mtcars %>%
  as_tibble() %>%
  mutate(
    am = factor(am, labels = c("Automatic", "Manual"))
  )

str(df)
## tibble [32 × 11] (S3: tbl_df/tbl/data.frame)
##  $ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num [1:32] 160 160 108 258 360 ...
##  $ hp  : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
##  $ vs  : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : Factor w/ 2 levels "Automatic","Manual": 2 2 2 1 1 1 1 1 1 1 ...
##  $ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
head(df)
response <- "mpg"
group <- "am"

##A1
ggplot(df, aes(x = .data[[response]])) +
  geom_histogram(bins = 10) +
  facet_wrap(vars(.data[[group]])) +
  labs(title = "Histogram of mpg by transmission group",
       x = "Miles per Gallon", y = "Count")

ggplot(df, aes(x = .data[[group]], y = .data[[response]])) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.12, alpha = 0.7) +
  labs(title = "Boxplot of mpg by transmission group",
       x = "Transmission", y = "Miles per Gallon")

##The plots suggest that manual cars tend to have higher mpg than automatic cars.
##The center of the manual group appears higher, and the spread is somewhat similar
##across both groups.

##A2
df_split <- split(df[[response]], df[[group]])
lapply(df_split, shapiro.test)
## $Automatic
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.97677, p-value = 0.8987
## 
## 
## $Manual
## 
##  Shapiro-Wilk normality test
## 
## data:  X[[i]]
## W = 0.9458, p-value = 0.5363
##The Shapiro-Wilk test checks whether the data are normally distributed.
##Null hypothesis: the data follow a normal distribution.
##If p-values are greater than 0.05, we do not reject normality,
##suggesting the normality assumption is reasonable.

ggplot(df, aes(sample = .data[[response]])) +
  stat_qq() +
  stat_qq_line() +
  facet_wrap(vars(.data[[group]])) +
  labs(title = "QQ plots of mpg by transmission group",
       x = "Theoretical quantiles", y = "Sample quantiles")

##B1
bartlett.test(df[[response]] ~ df[[group]])
## 
##  Bartlett test of homogeneity of variances
## 
## data:  df[[response]] by df[[group]]
## Bartlett's K-squared = 3.2259, df = 1, p-value = 0.07248
car::leveneTest(df[[response]] ~ df[[group]], center = median)
##B2
##Null hypothesis: the variances of mpg are equal between transmission groups.
##If the p-value is greater than 0.05, we fail to reject the null hypothesis,
##meaning the equal-variance assumption is reasonable.

##C1
t_pooled <- t.test(df[[response]] ~ df[[group]], var.equal = TRUE)
t_pooled
## 
##  Two Sample t-test
## 
## data:  df[[response]] by df[[group]]
## t = -4.1061, df = 30, p-value = 0.000285
## alternative hypothesis: true difference in means between group Automatic and group Manual is not equal to 0
## 95 percent confidence interval:
##  -10.84837  -3.64151
## sample estimates:
## mean in group Automatic    mean in group Manual 
##                17.14737                24.39231
#C2
t_welch <- t.test(df[[response]] ~ df[[group]], var.equal = FALSE)
t_welch
## 
##  Welch Two Sample t-test
## 
## data:  df[[response]] by df[[group]]
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means between group Automatic and group Manual is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group Automatic    mean in group Manual 
##                17.14737                24.39231
##C3
##Null hypothesis (H0): The mean mpg is the same for automatic and manual cars.
##Alternative hypothesis (H1): The mean mpg differs between transmission types.

##Based on the variance test results, the Welch test is generally safer if variances are not clearly equal.
##The t-test output reports a test statistic and p-value. If the p-value is
##less than 0.05, we reject the null hypothesis and conclude that there is
##a statistically significant difference in mean mpg between transmission types.

##D1
w_rank <- wilcox.test(df[[response]] ~ df[[group]], exact = FALSE)
w_rank
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  df[[response]] by df[[group]]
## W = 42, p-value = 0.001871
## alternative hypothesis: true location shift is not equal to 0
##D2
#A nonparametric test like the Wilcoxon test is preferred when data are
#strongly non-normal or contain large outliers. It compares the ranks
#of the values instead of the means.
#If the Wilcoxon test also shows a small p-value, it supports the same
#conclusion as the t-test.

##E1
# The analysis compared miles per gallon (mpg) between automatic and
# manual transmission cars. Exploratory plots suggested that manual cars
# tend to have higher mpg values. The Shapiro-Wilk tests indicated that
# the normality assumption was reasonably satisfied, and the variance
# tests suggested that the equal-variance assumption was not strongly
# violated. The two-sample t-tests showed a statistically significant
# difference in mpg between the groups at the 0.05 significance level.
# The Wilcoxon rank-sum test produced a similar result, indicating that
# both parametric and nonparametric tests agree that manual cars tend to
# have higher fuel efficiency.