Homework 1

#Searching for a dataset
data(package = .packages(all.available = TRUE))

Introduction

Unit of observation: individual car model
Number of observations: 32
Variables:
- mpg (Miles per gallon) - Numeric, Continuous variable
- hp (Horsepower) - Numeric, Continuous variable
- wt (Weight) - Numeric, Continuous variable
- qsec (Quarter Mile Time) - Numeric, Continuous variable
- am (Transmission type) - Categorical, Nominal variable
Research question:

Is there a significant difference in the fuel efficiency (mpg) between cars with automatic transmissions (am=0) and cars with manual transmissions (am=1)?

H0: μ1 - μ2 = 0 (μ1 is the average efficiency measured in miles per gallon for automatic cars and μ2 is the average efficiency measured in miles per gallon for manual cars )

H1:μ1 - μ2 =/ 0
Source of data: Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411

#Importing database in R
library(carData)

## Warning: package 'carData' was built under R version 4.3.2

mydata <- force(mtcars)

#Removing variables cyl, disp, drat, vs, gear and carb from the dataset and displaying first 6 rows of the dataset
mydata1 <- mydata[, !(names(mydata) %in% c("cyl", "disp", "drat", "vs", "gear", "carb"))]
head(mydata1)

##                    mpg  hp    wt  qsec am
## Mazda RX4         21.0 110 2.620 16.46  1
## Mazda RX4 Wag     21.0 110 2.875 17.02  1
## Datsun 710        22.8  93 2.320 18.61  1
## Hornet 4 Drive    21.4 110 3.215 19.44  0
## Hornet Sportabout 18.7 175 3.440 17.02  0
## Valiant           18.1 105 3.460 20.22  0

str(mydata1)

## 'data.frame':    32 obs. of  5 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...

Description

mpg: Miles per gallon, represents fuel efficiency of the car in miles per gallon
hp: Horsepower, represents the power of the car’s engine in horsepower
wt: Weight, represents the weight of a car in thousands of pounds
qsec: Quarter Mile Time, represents a time taken by the car to cover a quarter mile distance in seconds
am: Transmission type, represents the type of transmission (manual”1” or automatic “0”)

#Convert categorical variables to factors
mydata1$amF <- factor(mydata1$am,
                      levels = c(0, 1),
                      labels = c("Automatic", "Manual"))

#Descriptive statistics by group
library(psych)

result <- describeBy(mydata1$mpg, group = mydata1$amF)

print(result)

## 
##  Descriptive statistics by group 
## group: Automatic
##    vars  n  mean   sd median trimmed  mad  min  max range skew kurtosis   se
## X1    1 19 17.15 3.83   17.3   17.12 3.11 10.4 24.4    14 0.01     -0.8 0.88
## ------------------------------------------------------------ 
## group: Manual
##    vars  n  mean   sd median trimmed  mad min  max range skew kurtosis   se
## X1    1 13 24.39 6.17   22.8   24.38 6.67  15 33.9  18.9 0.05    -1.46 1.71

Interpretation

The average miles per gallon for automatic cars is 17.15, while average miles per gallon for manual cars is 24.39. The mean fuel efficiency for manual cars is notably higher than that of the automatic vehicles. This implies that, on average, manual cars tend to have better fuel efficiency.
The higher standard deviation of 6.17 for manual cars in contrast to the lower standard deviation of 3.83 for automatic cars, indicates that there is greater variability in fuel efficiency among manual cars, suggesting a more diverse range of fuel efficiency values within a transmission type.

library(ggplot2)

## 
## Attaching package: 'ggplot2'

## The following objects are masked from 'package:psych':
## 
##     %+%, alpha

ggplot(mydata1, aes(x = mpg, fill = amF)) +
  geom_histogram(position = position_dodge(width = 2), binwidth = 2, colour = "Black") +
  ylab("Frequency") +
  labs(fill = "amF")

H0: Miles per gallon (efficiency) are normally distributed within both groups

H1: Miles per gallon (efficiency) are not normally distributed within both groups

#Checking the normallity assumption with Shapiro Wilk test
library(rstatix)

## Warning: package 'rstatix' was built under R version 4.3.2

## 
## Attaching package: 'rstatix'

## The following object is masked from 'package:stats':
## 
##     filter

mydata1 %>%
  group_by(amF) %>%
  shapiro_test(mpg)

## # A tibble: 2 × 4
##   amF       variable statistic     p
##   <fct>     <chr>        <dbl> <dbl>
## 1 Automatic mpg          0.977 0.899
## 2 Manual    mpg          0.946 0.536

I cannot reject H0 since p-value>0.05, therefore, based on the results of Shapiro Wilk test we assume variable is distributed normally within both groups

#Additionally, I will check the normallity assumption using ggqqplot (for small samples)
library(ggpubr)

## Warning: package 'ggpubr' was built under R version 4.3.2

ggqqplot(mydata1, 
         "mpg",
         facet.by = "amF")

Since all the points lay in the grey area, I conclude that variable is normally distributed within both groups. That’s why I can use Independent t-test with Welch correction.

t.test(mydata1$mpg ~ mydata1$amF,
       paired = FALSE,
       var.equal = FALSE,
       alternative = "two.sided")

## 
##  Welch Two Sample t-test
## 
## data:  mydata1$mpg by mydata1$amF
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means between group Automatic and group Manual is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group Automatic    mean in group Manual 
##                17.14737                24.39231

H0: μ1 - μ2 = 0

H1: μ1 - μ2 =/ 0

Based on the sample data we can reject H0 (p<0.05) and conclude that there is a difference in the efficiency between automatic and manual cars, while manual cars are on average more efficient.

library(effectsize)

## 
## Attaching package: 'effectsize'

## The following objects are masked from 'package:rstatix':
## 
##     cohens_d, eta_squared

## The following object is masked from 'package:psych':
## 
##     phi

effectsize::cohens_d(mydata1$mpg ~ mydata1$amF,
                     pooled_sd = FALSE)

## Cohen's d |         95% CI
## --------------------------
## -1.41     | [-2.26, -0.53]
## 
## - Estimated using un-pooled SD.

interpret_cohens_d(-1.41, rules = "sawilowsky2009")

## [1] "very large"
## (Rules: sawilowsky2009)

The difference of distribution between efficiency of the manual and automatic cars is very large (r=-1.41).

Conclusion:

Based on the sample data, I found that there is a difference in efficiency between manual and automatic cars (p<0.05). The size of the differences is very large (r= -1.41).

Homework 1

Anja Ilic

2024-01-10