SIT1002 Group Assignment Presentation

Muhammad Azamuddin Nasution bin Raduan
Loo Si Jie
Yah Tian Ling
Jashvinpal Singh A/L Gurpal Singh

Problem 1

An insurance company wants to know if the average speed at which men drive cars is greater than that of women drivers. The company took a random sample of 27 cars driven by men on a highway

##  [1] 76 71 71 68 72 72 70 77 76 71 75 73 71 72 74 71 75 72 71 76 75 73 72 72 71
## [26] 72 72

Another sample of 18 cars driven by women on the same highway gave

##  [1] 68 70 68 66 67 72 72 70 66 69 65 67 71 68 69 68 63 66

Assume that the speeds at which all men and women drive cars on this highway are both normally distributed with the same population standard deviation. Test at the 5% significance level whether the mean speed of cars driven by all men drivers on this highway is greater than that of cars driven by all women drivers.

Solution

We begin by visualising the data through a box plot of the two speeds, to obtain some insight about the sample data.

Running a two-sample t-test using t.test() function on our sample data give us the output below:

male <- c(76, 71, 71, 68, 72, 72, 70, 77, 76, 71, 75, 73, 71, 72, 74, 71, 75, 72,
          71, 76, 75, 73, 72, 72, 71, 72, 72)
female <- c(68, 70, 68, 66, 67, 72, 72, 70, 66, 69, 65, 67, 71, 68, 69, 68, 63, 66)
t.test(male, female, var.equal = TRUE,alternative = 'greater')
## 
##  Two Sample t-test
## 
## data:  male and female
## t = 6.627, df = 43, p-value = 2.237e-08
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  3.413768      Inf
## sample estimates:
## mean of x mean of y 
##  72.62963  68.05556

Step 1 and 2: Stating The Hypotheses

Let \(\mu_{male}\) and \(\mu_{female}\) to be the actual average speeds of male and female drivers respectively.

The Null Hypothesis, \(H_0\) states that the average speeds of male and female drivers are equal. In other words, \[ \tag{1} H_0: \mu_{male} = \mu_{female} \quad \textrm{or} \quad H_0: \mu_{male} - \mu_{female} = 0. \] The Alternative Hypothesis, \(H_{\alpha}\) states that the average speed of male drivers is greater than female drivers. In other words, \[ \tag{2} H_{\alpha}: \mu_{male} > \mu_{female} \quad \textrm{or} \quad H_{\alpha}: \mu_{male} - \mu_{female} > 0. \]

Step 3: The Test Statistic

For the test statistic, we use the formula \[ \tag{3.1} \frac{(M_{male}-M_{female}) - (\mu_{male} - \mu_{female})}{SE(M_{male}-M_{female})} \] where \[ \tag{3.2} SE(M_{male} - M_{female}) = s_{pooled}\sqrt{\frac{1}{n_{male}} + \frac{1}{n_{female}}} \] and \[ \tag{3.3} s_{pooled} = \sqrt{\frac{(n_{male}-1)s_{male}^2+(n_{female}-1)s_{female}^2}{n_{male} + n_{female} - 2}}. \]

Below is the summary of variables, their description, and their R-equivalent functions.

Variable Description R equivalent
\(M_{male}\) Sample mean of male drivers mean(male)
\(M_{female}\) Sample mean of female drivers mean(female)
\(SE(M_{male}-M_{female})\) Standard error of sample mean difference -
\(n_{male}\) Number of samples of male drivers’ speeds length(male)
\(n_{female}\) Number of samples of female drivers’ speeds length(female)
\(s_{pooled}\) Pooled standard deviation (SD) -
\(s_{male}\) Sample SD of male drivers sd(male)
\(s_{female}\) Sample SD of female drivers sd(female)

Step 4: Implementing The t-Statistic

Assuming \(H_0\) is true, then \(\mu_{male} = \mu_{female}\) and the t-statistic fulfills Condition (4.1). \(t_{df}\) is a t-distribution with \(df = n_{male} + n_{female} - 2\) degrees of freedom.

\[ \tag{4.1} t = \frac{(M_{male}-M_{female}) - (\mu_{male} - \mu_{female})}{SE(M_{male}-M_{female})} \sim t_{n_{male} + n_{female} - 2} \] We may write the t-statistic in R code such that

t <- function(x,y) {
  mean_difference <- mean(x) - mean(y)
  denom <- sqrt((1/length(x)) + (1/length(y)))
  s_pooled_num <- (length(x)-1)*sd(x)^2 + (length(y)-1)*sd(y)^2
  s_pooled_denom <- length(x) + length(y) - 2
  s_pooled_result <- sqrt(s_pooled_num/s_pooled_denom)
  standard_error <- denom*s_pooled_result
  result <- mean_difference/standard_error
  print(result)
}
t(male, female) 
## [1] 6.626997

which is our t-statistic value.

Now, we can find the p-value. The p-value is given by the formula \[ \tag{4.2} p = P(t_{n_{male} + n_{female} - 2} > t)= P(t_{43} > 6.627 ) \] The R code to obtain the p-value is written such that

p_value <- function(x,y) {
  t_statistic <- x
  deg_of_freedom <- y
  result <- pt(t_statistic, df = deg_of_freedom, lower.tail = FALSE)
  print(result)
}
p_value(6.627, 43)
## [1] 2.23712e-08

which is our p-value.

Step 5: Finding The Critical t-Value

The critical t-value, denoted \(t_{\alpha}\) is defined such that \[ \tag{5.1} P(t > t_{\alpha}) = \alpha \] where any \(t_{\alpha}\) that does not satisfy equation \((5.1)\) is defined such that \[ \tag{5.2} P(t \leq t_{\alpha}) = 1 - \alpha \] which forms the critical region. To find \(t_{\alpha}\), we need to find the corresponding cumulative distribution function such that \[ \tag{5.3} P(t > t_{\alpha}) = 0.05 \] We can write the R code to find the critical t-value using the qt() function

critical_t_value <- function(x,y) {
  alpha <- x
  deg_of_freedom <- y
  result <- qt(p= alpha, df= deg_of_freedom, lower.tail=FALSE)
  print(result)
}
critical_t_value(0.05, 43)
## [1] 1.681071

which is our critical t-value.

Step 6: The Decision Rule

The decision rule states that

  1. If \(t > t_{\alpha}\), or if \(t\) is beyond the critical region, then \(H_{0}\) is rejected.
  2. If \(t \leq t_{\alpha}\), or if \(t\) is within the critical region, then \(H_{0}\) is accepted.

\(t = 6.627 > t_{\alpha} = 1.681\), therefore \(H_{0}\) is rejected. The average speeds of male drivers is greater than female drivers.

myt.test function

Now, we can write the function myt.test.

myt.test <- function(x,y,a) {
  
  #State the Hypotheses
  print("Null Hypothesis: mean difference is equal to 0")
  print("Alternative Hypothesis: mean difference is greater than 0")
  
  #Define the variables
  mean_difference <- mean(x)-mean(y)
  denom <- sqrt((1/length(x)) + (1/length(y)))
  s_pooled_num <- (length(x)-1)*sd(x)^2 + (length(y)-1)*sd(y)^2
  s_pooled_denom <- length(x) + length(y) - 2
  s_pooled_result <- sqrt(s_pooled_num/s_pooled_denom)
  standard_error <- denom*s_pooled_result
  deg_of_freedom <- length(x) + length(y) - 2
  
  #Find the t-statistic
  t_statistic <- mean_difference/standard_error
  print(paste("The t-statistic is", t_statistic))
  
  #Find the p-value
  p_value <- pt(t_statistic, df = deg_of_freedom, lower.tail = FALSE)
  print(paste("The p-value is",p_value))
  
  #Find the critical t-value
  critical_t_value <- qt(p = a, df= deg_of_freedom, lower.tail=FALSE)
  print(paste("The critical t-value is",critical_t_value))
  
  #Implement the decision rule
  if(t_statistic > critical_t_value) {
    print(paste(t_statistic, ">", critical_t_value, ", Reject the Null Hypothesis"))
  }
  else {
    print(paste(t_statistic, "=<", critical_t_value, ", Accept the Null Hypothesis"))
  }
  
}

Inserting the input, which is male, female, and 0.05, we have

myt.test(male,female,0.05)
## [1] "Null Hypothesis: mean difference is equal to 0"
## [1] "Alternative Hypothesis: mean difference is greater than 0"
## [1] "The t-statistic is 6.62699658870628"
## [1] "The p-value is 2.23714586906683e-08"
## [1] "The critical t-value is 1.68107070320252"
## [1] "6.62699658870628 > 1.68107070320252 , Reject the Null Hypothesis"

Proof of discussion

* 2nd Discussion (10 January 2023) Presenting the final results, correcting miscalculations, debugging the code.