#Searching for a dataset
data(package = .packages(all.available = TRUE))
Introduction
Unit of observation: individual car model
Number of observations: 32
Variables:
mpg (Miles per gallon) - Numeric, Continuous variable
hp (Horsepower) - Numeric, Continuous variable
wt (Weight) - Numeric, Continuous variable
qsec (Quarter Mile Time) - Numeric, Continuous variable
am (Transmission type) - Categorical, Nominal variable
Research question:
Is there a significant difference in the fuel efficiency (mpg) between cars with automatic transmissions (am=0) and cars with manual transmissions (am=1)?
H0: μ1 - μ2 = 0 (μ1 is the average efficiency measured in miles per gallon for automatic cars and μ2 is the average efficiency measured in miles per gallon for manual cars )
H1:μ1 - μ2 =/ 0
Source of data: Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411
#Importing database in R
library(carData)
## Warning: package 'carData' was built under R version 4.3.2
mydata <- force(mtcars)
#Removing variables cyl, disp, drat, vs, gear and carb from the dataset and displaying first 6 rows of the dataset
mydata1 <- mydata[, !(names(mydata) %in% c("cyl", "disp", "drat", "vs", "gear", "carb"))]
head(mydata1)
## mpg hp wt qsec am
## Mazda RX4 21.0 110 2.620 16.46 1
## Mazda RX4 Wag 21.0 110 2.875 17.02 1
## Datsun 710 22.8 93 2.320 18.61 1
## Hornet 4 Drive 21.4 110 3.215 19.44 0
## Hornet Sportabout 18.7 175 3.440 17.02 0
## Valiant 18.1 105 3.460 20.22 0
str(mydata1)
## 'data.frame': 32 obs. of 5 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
Description
mpg: Miles per gallon, represents fuel efficiency of the car in miles per gallon
hp: Horsepower, represents the power of the car’s engine in horsepower
wt: Weight, represents the weight of a car in thousands of pounds
qsec: Quarter Mile Time, represents a time taken by the car to cover a quarter mile distance in seconds
am: Transmission type, represents the type of transmission (manual”1” or automatic “0”)
#Convert categorical variables to factors
mydata1$amF <- factor(mydata1$am,
levels = c(0, 1),
labels = c("Automatic", "Manual"))
#Descriptive statistics by group
library(psych)
result <- describeBy(mydata1$mpg, group = mydata1$amF)
print(result)
##
## Descriptive statistics by group
## group: Automatic
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 19 17.15 3.83 17.3 17.12 3.11 10.4 24.4 14 0.01 -0.8 0.88
## ------------------------------------------------------------
## group: Manual
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 13 24.39 6.17 22.8 24.38 6.67 15 33.9 18.9 0.05 -1.46 1.71
Interpretation
The average miles per gallon for automatic cars is 17.15, while average miles per gallon for manual cars is 24.39. The mean fuel efficiency for manual cars is notably higher than that of the automatic vehicles. This implies that, on average, manual cars tend to have better fuel efficiency.
The higher standard deviation of 6.17 for manual cars in contrast to the lower standard deviation of 3.83 for automatic cars, indicates that there is greater variability in fuel efficiency among manual cars, suggesting a more diverse range of fuel efficiency values within a transmission type.
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
ggplot(mydata1, aes(x = mpg, fill = amF)) +
geom_histogram(position = position_dodge(width = 2), binwidth = 2, colour = "Black") +
ylab("Frequency") +
labs(fill = "amF")
H0: Miles per gallon (efficiency) are normally distributed within both groups
H1: Miles per gallon (efficiency) are not normally distributed within both groups
#Checking the normallity assumption with Shapiro Wilk test
library(rstatix)
## Warning: package 'rstatix' was built under R version 4.3.2
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
##
## filter
mydata1 %>%
group_by(amF) %>%
shapiro_test(mpg)
## # A tibble: 2 × 4
## amF variable statistic p
## <fct> <chr> <dbl> <dbl>
## 1 Automatic mpg 0.977 0.899
## 2 Manual mpg 0.946 0.536
#Additionally, I will check the normallity assumption using ggqqplot (for small samples)
library(ggpubr)
## Warning: package 'ggpubr' was built under R version 4.3.2
ggqqplot(mydata1,
"mpg",
facet.by = "amF")
t.test(mydata1$mpg ~ mydata1$amF,
paired = FALSE,
var.equal = FALSE,
alternative = "two.sided")
##
## Welch Two Sample t-test
##
## data: mydata1$mpg by mydata1$amF
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means between group Automatic and group Manual is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean in group Automatic mean in group Manual
## 17.14737 24.39231
H0: μ1 - μ2 = 0
H1: μ1 - μ2 =/ 0
library(effectsize)
##
## Attaching package: 'effectsize'
## The following objects are masked from 'package:rstatix':
##
## cohens_d, eta_squared
## The following object is masked from 'package:psych':
##
## phi
effectsize::cohens_d(mydata1$mpg ~ mydata1$amF,
pooled_sd = FALSE)
## Cohen's d | 95% CI
## --------------------------
## -1.41 | [-2.26, -0.53]
##
## - Estimated using un-pooled SD.
interpret_cohens_d(-1.41, rules = "sawilowsky2009")
## [1] "very large"
## (Rules: sawilowsky2009)
Conclusion: