##by Michael Ige
Introduction
Assume that a researcher is interested to compare the drivability of two similar vehicles from two different manufacturers. 15 test drivers were hired and drivability performance data for two cars are provided in the Drivability data file.
Objective
Conduct a t-test in R to determine if there is any statistically significant difference in the drivability of the two cars.
library(readr)
Drivability_data <- read_csv("C:/Users/babao/Desktop/R_wd/Practice Data/Drivability_data.csv")
## Rows: 15 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): Vehicle_1, Vehicle_2
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(Drivability_data)
print(Drivability_data)
## # A tibble: 15 × 2
## Vehicle_1 Vehicle_2
## <dbl> <dbl>
## 1 8 8
## 2 8 7
## 3 8 9
## 4 9 9
## 5 7 10
## 6 9 9
## 7 9 8
## 8 8 10
## 9 9 6
## 10 8 7
## 11 10 7
## 12 7 7
## 13 7 6
## 14 8 8
## 15 10 9
Rating Scale:1 =Very poor to drive, 10 Excellent to drive
Step 1:State the Hypotheses
Null Hypothesis : There is no difference in the drivability between the two cars.
Alternative Hypothesis: There is difference in the drivability between the two cars.
Step 2: Determine if the groups are or unpaired
There was only one group of 15 drivers with ID numbers 1 to 15.
Each driver drove the two cars at different times consequtively.
Conclusion: the driver’s group is PAIRED
Step 3: Determine if the data test is one-tailed or two-tailed.
The t-test is to determine if there is difference in drivability.
The difference is assumed to be non-directional (i.e., can be higher or lower).
The t-test will be a TWO-TAILED TEST
Step 4: Determine the alpha
The testing alpha is assumed to .05 or at 95% confidence level.
Step 5: Determine if the there is equal or unequal variance
var(Drivability_data$Vehicle_1)
## [1] 0.952381
var(Drivability_data$Vehicle_2)
## [1] 1.714286
From the above, we have EQUAL VARIANCE data.
Step 6:Compute the t-Test
summary(Drivability_data)
## Vehicle_1 Vehicle_2
## Min. : 7.000 Min. : 6
## 1st Qu.: 8.000 1st Qu.: 7
## Median : 8.000 Median : 8
## Mean : 8.333 Mean : 8
## 3rd Qu.: 9.000 3rd Qu.: 9
## Max. :10.000 Max. :10
dim(Drivability_data)
## [1] 15 2
Two Sample t-Test
library(psych)
## Warning: package 'psych' was built under R version 4.2.1
describe(Drivability_data)
## vars n mean sd median trimmed mad min max range skew kurtosis
## Vehicle_1 1 15 8.33 0.98 8 8.31 1.48 7 10 3 0.22 -1.11
## Vehicle_2 2 15 8.00 1.31 8 8.00 1.48 6 10 4 0.00 -1.37
## se
## Vehicle_1 0.25
## Vehicle_2 0.34
Vehicle_1 <- rnorm(15,mean = 8.33, sd = .98)
Vehicle_2 <- rnorm(15, mean = 8.00, sd = 1.31)
t.test(Vehicle_1, Vehicle_2, var.test = TRUE)
##
## Welch Two Sample t-test
##
## data: Vehicle_1 and Vehicle_2
## t = 1.4615, df = 27.11, p-value = 0.1554
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.2264063 1.3480817
## sample estimates:
## mean of x mean of y
## 8.575295 8.014457
Interpretation
p-value (.11) is greater than the alpha (0.05), therefore:
We do not reject the null hypothesis that there is no statistically significant difference in drivability rating between the two cars.
The drivability level of the two cars are thesame according to the ratings the drivers (Vehicle 1 mean = 8.33, vehicle 2 mean =8.00)
The difference in rating mean is .33 representing just 4% overall which is insignificant statistically.
Finally, we can conclude that the vehicles drive well at the same level.
Visualization
hist(Vehicle_1)
hist(Vehicle_2)
boxplot(Vehicle_1, Vehicle_2)
plot(Drivability_data$Vehicle_1,Drivability_data$Vehicle_2,
pch = 15,
xlab = "drivers",
ylab = "ratings" )
abline(0,1, col="blue", lwd=2)
boxplot(Vehicle_1,Vehicle_2,
data=Drivability_data,
main="Drivability Ratings by Drivers",
xlab="Vehicle",
ylab="Ratings",
col="steelblue",
border="black"
)