Muhammad Azamuddin Nasution bin Raduan
Loo Si Jie
Yah Tian
Ling
Jashvinpal Singh A/L Gurpal Singh
An insurance company wants to know if the average speed at which men drive cars is greater than that of women drivers. The company took a random sample of 27 cars driven by men on a highway
## [1] 76 71 71 68 72 72 70 77 76 71 75 73 71 72 74 71 75 72 71 76 75 73 72 72 71
## [26] 72 72
Another sample of 18 cars driven by women on the same highway gave
## [1] 68 70 68 66 67 72 72 70 66 69 65 67 71 68 69 68 63 66
Assume that the speeds at which all men and women drive cars on this highway are both normally distributed with the same population standard deviation. Test at the 5% significance level whether the mean speed of cars driven by all men drivers on this highway is greater than that of cars driven by all women drivers.
We begin by visualising the data through a box plot of the two
speeds, to obtain some insight about the sample data.
Running a two-sample t-test using t.test() function on
our sample data give us the output below:
male <- c(76, 71, 71, 68, 72, 72, 70, 77, 76, 71, 75, 73, 71, 72, 74, 71, 75, 72,
71, 76, 75, 73, 72, 72, 71, 72, 72)
female <- c(68, 70, 68, 66, 67, 72, 72, 70, 66, 69, 65, 67, 71, 68, 69, 68, 63, 66)
t.test(male, female, var.equal = TRUE,alternative = 'greater')##
## Two Sample t-test
##
## data: male and female
## t = 6.627, df = 43, p-value = 2.237e-08
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 3.413768 Inf
## sample estimates:
## mean of x mean of y
## 72.62963 68.05556
Let \(\mu_{male}\) and \(\mu_{female}\) to be the actual average speeds of male and female drivers respectively.
The Null Hypothesis, \(H_0\) states that the average speeds of male and female drivers are equal. In other words, \[ \tag{1} H_0: \mu_{male} = \mu_{female} \quad \textrm{or} \quad H_0: \mu_{male} - \mu_{female} = 0. \] The Alternative Hypothesis, \(H_{\alpha}\) states that the average speed of male drivers is greater than female drivers. In other words, \[ \tag{2} H_{\alpha}: \mu_{male} > \mu_{female} \quad \textrm{or} \quad H_{\alpha}: \mu_{male} - \mu_{female} > 0. \]
For the test statistic, we use the formula \[ \tag{3.1} \frac{(M_{male}-M_{female}) - (\mu_{male} - \mu_{female})}{SE(M_{male}-M_{female})} \] where \[ \tag{3.2} SE(M_{male} - M_{female}) = s_{pooled}\sqrt{\frac{1}{n_{male}} + \frac{1}{n_{female}}} \] and \[ \tag{3.3} s_{pooled} = \sqrt{\frac{(n_{male}-1)s_{male}^2+(n_{female}-1)s_{female}^2}{n_{male} + n_{female} - 2}}. \]
Below is the summary of variables, their description, and their R-equivalent functions.
| Variable | Description | R equivalent |
|---|---|---|
| \(M_{male}\) | Sample mean of male drivers | mean(male) |
| \(M_{female}\) | Sample mean of female drivers | mean(female) |
| \(SE(M_{male}-M_{female})\) | Standard error of sample mean difference | - |
| \(n_{male}\) | Number of samples of male drivers’ speeds | length(male) |
| \(n_{female}\) | Number of samples of female drivers’ speeds | length(female) |
| \(s_{pooled}\) | Pooled standard deviation (SD) | - |
| \(s_{male}\) | Sample SD of male drivers | sd(male) |
| \(s_{female}\) | Sample SD of female drivers | sd(female) |
Assuming \(H_0\) is true, then \(\mu_{male} = \mu_{female}\) and the t-statistic fulfills Condition (4.1). \(t_{df}\) is a t-distribution with \(df = n_{male} + n_{female} - 2\) degrees of freedom.
\[ \tag{4.1} t = \frac{(M_{male}-M_{female}) - (\mu_{male} - \mu_{female})}{SE(M_{male}-M_{female})} \sim t_{n_{male} + n_{female} - 2} \] We may write the t-statistic in R code such that
t <- function(x,y) {
mean_difference <- mean(x) - mean(y)
denom <- sqrt((1/length(x)) + (1/length(y)))
s_pooled_num <- (length(x)-1)*sd(x)^2 + (length(y)-1)*sd(y)^2
s_pooled_denom <- length(x) + length(y) - 2
s_pooled_result <- sqrt(s_pooled_num/s_pooled_denom)
standard_error <- denom*s_pooled_result
result <- mean_difference/standard_error
print(result)
}
t(male, female) ## [1] 6.626997
which is our t-statistic value.
Now, we can find the p-value. The p-value is given by the formula \[ \tag{4.2} p = P(t_{n_{male} + n_{female} - 2} > t)= P(t_{43} > 6.627 ) \] The R code to obtain the p-value is written such that
p_value <- function(x,y) {
t_statistic <- x
deg_of_freedom <- y
result <- pt(t_statistic, df = deg_of_freedom, lower.tail = FALSE)
print(result)
}
p_value(6.627, 43)## [1] 2.23712e-08
which is our p-value.
The critical t-value, denoted \(t_{\alpha}\) is defined such that \[
\tag{5.1}
P(t > t_{\alpha}) = \alpha
\] where any \(t_{\alpha}\) that
does not satisfy equation \((5.1)\) is
defined such that \[
\tag{5.2}
P(t \leq t_{\alpha}) = 1 - \alpha
\] which forms the critical region. To find \(t_{\alpha}\), we need to find the
corresponding cumulative distribution function such that \[
\tag{5.3}
P(t > t_{\alpha}) = 0.05
\] We can write the R code to find the critical t-value using the
qt() function
critical_t_value <- function(x,y) {
alpha <- x
deg_of_freedom <- y
result <- qt(p= alpha, df= deg_of_freedom, lower.tail=FALSE)
print(result)
}
critical_t_value(0.05, 43)## [1] 1.681071
which is our critical t-value.
The decision rule states that
\(t = 6.627 > t_{\alpha} = 1.681\), therefore \(H_{0}\) is rejected. The average speeds of male drivers is greater than female drivers.
myt.test functionNow, we can write the function myt.test.
myt.test <- function(x,y,a) {
#State the Hypotheses
print("Null Hypothesis: mean difference is equal to 0")
print("Alternative Hypothesis: mean difference is greater than 0")
#Define the variables
mean_difference <- mean(x)-mean(y)
denom <- sqrt((1/length(x)) + (1/length(y)))
s_pooled_num <- (length(x)-1)*sd(x)^2 + (length(y)-1)*sd(y)^2
s_pooled_denom <- length(x) + length(y) - 2
s_pooled_result <- sqrt(s_pooled_num/s_pooled_denom)
standard_error <- denom*s_pooled_result
deg_of_freedom <- length(x) + length(y) - 2
#Find the t-statistic
t_statistic <- mean_difference/standard_error
print(paste("The t-statistic is", t_statistic))
#Find the p-value
p_value <- pt(t_statistic, df = deg_of_freedom, lower.tail = FALSE)
print(paste("The p-value is",p_value))
#Find the critical t-value
critical_t_value <- qt(p = a, df= deg_of_freedom, lower.tail=FALSE)
print(paste("The critical t-value is",critical_t_value))
#Implement the decision rule
if(t_statistic > critical_t_value) {
print(paste(t_statistic, ">", critical_t_value, ", Reject the Null Hypothesis"))
}
else {
print(paste(t_statistic, "=<", critical_t_value, ", Accept the Null Hypothesis"))
}
}Inserting the input, which is male, female,
and 0.05, we have
## [1] "Null Hypothesis: mean difference is equal to 0"
## [1] "Alternative Hypothesis: mean difference is greater than 0"
## [1] "The t-statistic is 6.62699658870628"
## [1] "The p-value is 2.23714586906683e-08"
## [1] "The critical t-value is 1.68107070320252"
## [1] "6.62699658870628 > 1.68107070320252 , Reject the Null Hypothesis"
* 2nd Discussion (10 January 2023) Presenting the final results,
correcting miscalculations, debugging the code.