# Define some z/t-test statistic functions

## z test
z_test <- function(x, mu, sigma, n) {
  z <- (x - mu) / (sigma / sqrt(n))
  return(z)
}

## z test (proportion)
z_test2 <- function(x, mu, n) {
  z <- (x - mu) / sqrt((mu * (1-mu) / n))
  return(z)
}

## t Test
t_test <- function(x, mu, s, n) {
  t <- (x - mu) / (s / sqrt(n))
  return(t)
}

## s.e 
## standard error
standard_error <- function(sigma1, sigma2, n1, n2) {
  se <- sqrt((sigma1^2 / n1) + (sigma2^2 / n2))
  return(se)
}

## standard error (proportion)
standard_error_pro <- function(x1, n1, x2, n2) {
  # Calculate sample proportions
  p1 <- x1 / n1
  p2 <- x2 / n2
  
  # Calculate pooled sample proportion
  p <- (x1 + x2) / (n1 + n2)
  
  # Calculate standard error
  se <- sqrt(p * (1 - p) * (1/n1 + 1/n2))
  
  return(se)
}

Q1

Using traditional methods, it takes 109 hours to receive a basic driving license. A new license training method using Computer Aided Instruction (CAI) has been proposed. A researcher used the technique with 190 students and observed that they had a mean of 110 hours. Assume the standard deviation is known to be 6. A level of significance of 0.05 will be used to determine if the technique performs differently than the traditional method. Make a decision to reject or fail to reject the null hypothesis.

\[ \begin{align*} \text{Null:} \quad H_0: \mu = 109 \\ \text{Alternative:} \quad H_a: \mu ≠ 109 \end{align*} \]

# set up parameters 
mu <- 109
xbar <- 110
sample_size <- 190
sample_std <- 6
alpha <-0.05

# critical value (two-tail test)
cvalue <- qnorm(p=alpha/2, sd = 1, lower.tail = FALSE ) |> 
  round(3)

# z-test statistic
zscore <- z_test(x=xbar,mu = mu, sigma = sample_std, n = sample_size) |> 
  round(3)

# comparison with significance level 
zscore > cvalue

## [1] TRUE

print(paste("criticle value is", cvalue))

## [1] "criticle value is 1.96"

print(paste("z score is", zscore))

## [1] "z score is 2.297"

# check mannually 
z = (110 - 109)/(6/sqrt(190))
z

## [1] 2.297341

Conclusion: We reject the null hypothesis with 95% confidence level We are 95% confident that the new CAI system does make a difference in the average amount of time to obtain a driving licence.

Q2

Our environment is very sensitive to the amount of ozone in the upper atmosphere. The level of ozone normally found is 5.3 parts/million (ppm). A researcher believes that the current ozone level is at an insufficient level. The mean of 5 samples is 5.0 ppm with a standard deviation of 1.1. Does the data support the claim at the 0.05 level? Assume the population distribution is approximately normal.

\[ \begin{align} \text{Null:} \quad H_0: \theta = 5.3 \\ \text{Alternative:} \quad H_a: \theta < 5.3 \end{align} \]

# parameter set up 
mu <- 5.3
xbar <- 5.0
sample_size <- 5
sample_std <- 1.1
alpha <-0.05
df <- 5-1

# critical value (left-tail test)
cvalue <- qt(p=alpha, df = df ) |> 
  round(3)

# t-test statistic
tscore <- t_test(x=xbar,mu = mu, s = sample_std, n = sample_size) |> 
  round(3)

# comparison with significance level 
tscore > cvalue

## [1] TRUE

print(paste("criticle value is", cvalue))

## [1] "criticle value is -2.132"

print(paste("t score is", tscore))

## [1] "t score is -0.61"

# plot the standard normal density on the domain [-4,4]
curve(dnorm(x),
      xlim = c(-4, 4),
      main = " Left-tail Test at Significance Level 0.05",
      yaxs = 'i',
      xlab = 't-statistic',
      ylab = '',
      lwd = 2,
      axes = 'F')

# add the x-axis
axis(1, 
     at = c(-4, 0, -2.132,-0.61, 4),  # Updated: Include -1.64 for a left-sided test
     padj = 0.5,
     labels = c('', 0, expression(t == -2.132), expression(t == -0.61), ''),
     las = 2) 

# shade the rejection region in the lower tail
polygon(x = c(-4, seq(-4, -2.132, 0.01), -2.132),
        y = c(0, dnorm(seq(-4, -2.132, 0.01)), 0), 
        col = 'salmon')

# add a vertical line at t-score 
abline(v = -0.61, lty = 2, col = 'black') 

polygon(x = c(-4, seq(-4, -0.61, 0.01), -0.61),
        y = c(0, dnorm(seq(-4, -0.61, 0.01)), 0))

Conclusion: We would fail to reject the null hypothesis with 95% confidence level.

Q2 Part II

\[ \begin{align} \text{Null:} \quad H_0: \theta = 5.3 \\ \text{Alternative:} \quad H_a: \theta ≠ 5.3 \end{align} \]

# parameter set up 
mu <- 5.3
xbar <- 5.0
sample_size <- 5
sample_std <- 1.1
alpha <-0.05
df <- 5-1

# critical value (left-tail test)
cvalue <- qt(p=alpha/2, df = df ) |> 
  round(3)

# t-test statistic
tscore <- t_test(x=xbar,mu = mu, s = sample_std, n = sample_size) |> 
  round(3)

# comparison with significance level 
tscore > cvalue

## [1] TRUE

print(paste("criticle value is", cvalue))

## [1] "criticle value is -2.776"

print(paste("t score is", tscore))

## [1] "t score is -0.61"

curve(dnorm(x),
      xlim = c(-4, 4),
      main = " Left-tail Test at Significance Level 0.05",
      yaxs = 'i',
      xlab = 't-statistic',
      ylab = '',
      lwd = 2,
      axes = 'F')

# add the x-axis
axis(1, 
     at = c(-4, -2.776, -0.61, 0, 2.776, 4),  # Corrected: Adjusted positions
     padj = 0.5,
     labels = c('', expression(t == -2.776), expression(t == -0.61), 0, expression(t == 2.776), ''),  # Corrected: Adjusted labels
     las = 2) 

# shade the rejection region in the lower tail
polygon(x = c(-4, seq(-4, -2.776, 0.01), -2.776),
        y = c(0, dnorm(seq(-4, -2.776, 0.01)), 0), 
        col = 'salmon')

# add a vertical line at t-score 
abline(v = -0.61, lty = 2, col = 'black')

Q3

Our environment is very sensitive to the amount of ozone in the upper atmosphere. The level of ozone normally found is 7.3 parts/million (ppm). A researcher believes that the current ozone level is not at a normal level. The mean of 51 samples is 7.1 ppm with a variance of 0.49. Assume the population is normally distributed. A level of significance of 0.01 will be used. Show all work and hypothesis testing steps.

\[ \begin{align} \text{Null:} \quad H_0: \theta = 7.3 \\ \text{Alternative:} \quad H_a: \theta ≠ 7.3 \end{align} \]

# parameter set up 
mu <- 7.3
xbar <- 7.1
sample_size <- 51
sample_std <- sqrt(0.49)
alpha <-0.01
df <- 51-1

# critical value (two-tail test)
cvalue <- qt(alpha/2, df = df ) |> 
  round(3)

# t-test statistic
tscore <- t_test(x=xbar,mu = mu, s = sample_std, n = sample_size) |> 
  round(3)

# comparison with significance level 
tscore > cvalue

## [1] TRUE

print(paste("criticle value is", cvalue))

## [1] "criticle value is -2.678"

print(paste("t score is", tscore))

## [1] "t score is -2.04"

# plot the standard normal density on the domain [-4,4]
curve(dnorm(x),
      xlim = c(-4, 4),
      main = 'Two-Tailed Test at Significance Level 0.01 with t-score of -2.04',
      yaxs = 'i',
      xlab = 'z-statistic',
      ylab = '',
      lwd = 2,
      axes = 'F')

# add the x-axis with rotated labels
axis(1, 
     at = c(-4, -2.678, -2.04, 0, 2.678, 4), 
     padj = 0.5,
     labels = c('', expression(z == -2.678), -2.04, 0, expression(z == 2.678), ''),
     las = 2)  # Rotate labels

# shade the rejection regions in both tails
polygon(x = c(-4, seq(-4, -2.678, 0.01), -2.678),
        y = c(0, dnorm(seq(-4, -2.678, 0.01)), 0), 
        col = 'salmon')

polygon(x = c(2.678, seq(2.678, 4, 0.01), 4),
        y = c(0, dnorm(seq(2.678, 4, 0.01)), 0), 
        col = 'salmon')

# add a vertical line at z-score of -2.04 and shade the left side in red
abline(v = -2.04, lty = 2, col = 'black')  # vertical line at t-score -2.04

Conclusion: We fail to reject the null hypothesis with a 95% confidence level.

Q4

A publisher reports that 36% of their readers own a laptop. A marketing executive wants to test the claim that the percentage is actually less than the reported percentage. A random sample of 100 found that 29% of the readers owned a laptop. Is there sufficient evidence at the 0.02 level to support the executive’s claim?

\[ \begin{align} \text{Null:} \quad H_0: p = 0.36 \\ \text{Alternative:} \quad H_a: p < 0.36 \end{align} \]

# parameter set up 
mu <- 0.36
xbar <- 0.29
sample_size <- 100
alpha <-0.02

# critical value (left-tail test)
cvalue <- qnorm(p = alpha, sd = 1, lower.tail = TRUE ) |> 
  round(3)

# t-test statistic
zscore <- z_test2(x = xbar, mu = mu , n = sample_size) |> 
  round(3)

# comparison with significance level 
zscore > cvalue

## [1] TRUE

print(paste("criticle value is", cvalue))

## [1] "criticle value is -2.054"

print(paste("z score is", zscore))

## [1] "z score is -1.458"

Conclusion: We fail to reject the null hypothesis with a 98% confidence level. The z score is outside of the rejection zone which is anywhere smaller than a critical value of -2.054.

Q5

A hospital director is told that 31% of the treated patients are uninsured. The director wants to test the claim that the percentage of uninsured patients is less than the expected percentage. A sample of 380 patients found that 95 were uninsured. Make the decision to reject or fail to reject the null hypothesis at the 0.05 level.

\[ \begin{align} \text{Null:} \quad H_0: p = 0.31 \\ \text{Alternative:} \quad H_a: p < 0.31 \end{align} \]

# parameter set up 
mu <- 0.31
xbar <- 95/380
sample_size <- 380
alpha <-0.05

# critical value (left-tail test)
cvalue <- qnorm(p = alpha, sd = 1, lower.tail = TRUE ) |> 
  round(3)

# t-test statistic
zscore <- z_test2(x = xbar, mu = mu , n = sample_size) |> 
  round(3)

# comparison with significance level 
zscore > cvalue

## [1] FALSE

print(paste("criticle value is", cvalue))

## [1] "criticle value is -1.645"

print(paste("z score is", zscore))

## [1] "z score is -2.529"

Conclusion: We would reject the null hypothesis with 95% confidence level because the z score is in the rejection region which is the area smaller the critical value of -1.64.

Q6

A standardized test is given to a sixth-grade class. Historically the mean score has been 112 with a standard deviation of 24. The superintendent believes that the standard deviation of performance may have recently decreased. She randomly sampled 22 students and found a mean of 102 with a standard deviation of 15.4387. Is there evidence that the standard deviation has decreased at the 𝛼 = 0.1 level?

\[ \begin{align*} \text{Null:} \quad H_0: \sigma^2 = 24\\ \text{Alternative:} \quad H_a: \sigma^2 < 24 \end{align*} \]

# set up parameters 
mu <- 112
population_std <- 24
xbar <- 102
sample_size <- 22
sample_std <- 15.4387
alpha <- 0.1
df <- sample_size -1

xscore <- round((((sample_size - 1) * sample_std^2) / population_std^2),3)

# chi-square distribution value
cvalue <- round(qchisq(p = alpha, df = df ),3)


print(paste("criticle value is", cvalue))

## [1] "criticle value is 13.24"

print(paste("x score is", xscore))

## [1] "x score is 8.69"

curve(dchisq(x, df = 21), from = 0, to = 40,
      main = 'Chi-Square Distribution (df = 21)', #add title
      ylab = 'Density', #change y-axis label
      lwd = 2, #increase line width to 2
      col = 'steelblue') #change line color to steelblue

Conclusion: Based on the chi-square distribution value, we would reject the null with 99% confidence level.

Q7

A medical researcher wants to compare the pulse rates of smokers and non-smokers. He believes that the pulse rate for smokers and non-smokers is different and wants to test this claim at the 0.1 level of significance. The researcher checks 32 smokers and finds that they have a mean pulse rate of 87, and 31 non-smokers have a mean pulse rate of 84. The standard deviation of the pulse rates is found to be 9 for smokers and 10 for non-smokers. Let 𝜇1 be the true mean pulse rate for smokers and 𝜇2 be the true mean pulse rate for non-smokers.

\[ \begin{align*} \text{Null:} \quad H_0: \mu_1 = \mu_2\\ \text{Alternative:} \quad H_a: \mu_1 ≠ \mu_2\\ \end{align*} \]

# parameters set up
alpha <- 0.1

## sample 1 (smoker)
n1 <- 32
xbar1 = 87
sample_std1 <- 9
var1 <- (sample_std1)^2

## sample 2 (non-smoker)
n2 <- 31
xbar2 <- 84
sample_std2 <- 10
var2 <- (sample_std2)^2

num_df <- (((var1 / n1) + (var2 / n2))^2)
# den_df1 <- (((var1 / n1)^2) / n1 -1) + (((var2 / n2)^2) / n2 -1)
den_df <- 0.553545636598162
df <- num_df / den_df

# critical value (two-tail test)
cvalue <- round(qt(p = alpha/2, df = df),3)

# standard error 
se <- standard_error(sigma1 = sample_std1, sigma2 = sample_std2, n1 = n1, n2 = n2)

# t-score
tscore <- round(((xbar1 - xbar2) - 0) / se, 3)


# comparison with significance level 
tscore > cvalue

## [1] TRUE

print(paste("critical value is", cvalue))

## [1] "critical value is -1.671"

print(paste("t score is", tscore))

## [1] "t score is 1.25"

Conclusion: We do not reject the null with 99% confidence level.

Q8

Given two independent random samples with the following results:

𝑛1 = 11 𝑥̅1 = 127 𝑠1 = 33 𝑛2 = 18 𝑥̅2 = 157 𝑠2 = 27

Use this data to find the 95% confidence interval for the true difference between the population means. Assume that the population variances are not equal and that the two populations are normally distributed.

\[ \begin{align*} \text{Null:} \quad H_0: \mu_1 = \mu_2 \\ \text{Alternative:} \quad H_a:\mu_1 ≠ \mu_2 \end{align*} \]

# parameters set up
alpha <- 0.05
## sample 1 
n1 <- 11
xbar1 = 127
sample_std1 <- 33

## sample 2 
n2 <- 18
xbar2 <- 157
sample_std2 <- 27

# calculation 
pediff <- xbar1 - xbar2
se <- standard_error(sigma1 = 33, sigma2 = 27, n1 = n1, n2 = n2)

tscore <- (pediff - 0)/se

# variance
var1 <- (sample_std1)^2
var2 <- (sample_std2)^2

# degrees of freedom 
num_df <- (((var1 / n1) + (var2 / n2))^2)
den_df <- (((var1 / n1)^2) / n1 -1) + (((var2 / n2)^2) / n2 -1)
df <- num_df / den_df

# critical value 
cvalue <- qt( p= alpha/2, df = df)

cvalue <- round(cvalue, 3)
tscore <- round(tscore,3)


print(paste("critical value is", cvalue))

## [1] "critical value is -2.087"

print(paste("t score is", tscore))

## [1] "t score is -2.54"

Conclusion: We reject the null hypothesis with 95% confidence level because the absolute value of t score is within the rejection region on the upper tail.

Q9

Two men, A and B, who usually commute to work together decide to conduct an experiment to see whether one route is faster than the other. The men feel that their driving habits are approximately the same, so each morning for two weeks one driver is assigned to route I and the other to route II. The times, recorded to the nearest minute, are shown in the following table.

Using this data, find the 98% confidence interval for the true mean difference between the average travel time for route I and the average travel time for route II. Let 𝑑 = (𝑟𝑜𝑢𝑡𝑒 𝐼 𝑡𝑟𝑎𝑣𝑒𝑙 𝑡𝑖𝑚𝑒) − (𝑟𝑜𝑢𝑡𝑒 𝐼𝐼 𝑡𝑟𝑎𝑣𝑒𝑙 𝑡𝑖𝑚𝑒). Assume that the populations of travel times are normally distributed for both routes.

\[ \begin{align*} \text{Null:} \quad H_0: \mu_1 - \mu_2 = 0\\ \text{Alternative:} \quad H_a:\mu_1 - \mu_2 ≠ 0 \end{align*} \]

# parameter set up
alpha = 0.02

i <- c(32,27,34,24,31,25,30,23,27,35)
ii <- c(28,28,33,25,26,29,33,27,25,33)

## sample mean
xbar1 <- mean(i)
xbar2 <- mean(ii)

## sample size
n1 <- length(i)
n2 <- length(ii)

## sample standard deviation
sample_std1 <- sd(i)
sample_std2 <- sd(ii)

## variance
var1 <- sample_std1^2
var2 <- sample_std2^2

# degrees of freedom
num_df <- (((var1 / n1) + (var2 / n2))^2)
#den_df1 <- ((var1 / n1)^2 / n1 - 1) + ((var2 / n2)^2 / n2 - 1)
den_df <- 0.47088
df <- num_df / den_df

# t score calculate
pediff <- xbar1 - xbar2
se <- standard_error(sigma1 = sample_std1, sigma2 = sample_std2, n1 = n1, n2 = n2)

# margin of error
margin_error <- qt(p = alpha/2, df = df) * se

# confidence interval 
lower <- round(pediff - margin_error,3)
upper <- round(pediff + margin_error,3)

print(paste("lower bond is " , lower))

## [1] "lower bond is  4.413"

print(paste("upper bond is " , upper))

## [1] "upper bond is  -4.213"

Q10

The U.S. Census Bureau conducts annual surveys to obtain information on the percentage of the voting-age population that is registered to vote. Suppose that 391 employed persons and 510 unemployed persons are independently and randomly selected, and that 195 of the employed persons and 193 of the unemployed persons have registered to vote. Can we conclude that the percentage of employed workers ( 𝑝1 ), who have registered to vote, exceeds the percentage of unemployed workers ( 𝑝2 ), who have registered to vote? Use a significance level of 𝛼 = 0.05 for the test.

\[ \begin{align*} \text{Null:} \quad H_0: p_1 \leq p_2 \\ \text{Alternative:} \quad H_a: p_1 > p_2 \end{align*} \]

# parameters set up
alpha <- 0.05

## employed group 
N1 <- 391
n1 <- 195
p1 <- n1/N1

## unemployed group
N2 <- 510
n2 <- 193
p2 <- n2/N2

# estimation of p
p <- (( p1 * n1) + ( p2 * n2 )) / ( n1 + n2)

# z-value 
zscore <- round( (p1 - p2) / sqrt((p*(1 - p)) * (1/n1 + 1/n2)), 3 )

# critical value
cvalue <- round(1-qnorm(0.05),3)

print(paste("zscore is",zscore))

## [1] "zscore is 2.387"

print(paste("critical value is", cvalue))

## [1] "critical value is 2.645"

# plot the standard normal density on the domain [-4,4]
curve(dnorm(x),
      xlim = c(-4, 4),
      main = 'Two-Tailed Test at Significance Level 0.05',
      yaxs = 'i',
      xlab = 't-statistic',
      ylab = '',
      lwd = 2,
      axes = 'F')

# add the x-axis
axis(1, 
     at = c(-4, 0, 2.387, 2.645, 4),  
     padj = 0.5,
     labels = c('', 0, expression(t == 2.387), expression(t == 2.645), ''),
     las =2) 

# shade the rejection regions in both tails

polygon(x = c(2.645, seq(2.645, 4, 0.01), 4),
        y = c(0, dnorm(seq(2.645, 4, 0.01)), 0), 
        col = 'salmon')

# add a line of t score
abline(v = 2.387, lty = 2, col = 'black')

Conclusion: we fail to reject the null hypothesis with 95% confidence level.

DA_HW#3

Pin Lyu

2023-11-15

Q1

Q2

Q2 Part II

Q3

Q4

Q5

Q6

Q7

Q8

Q9

Q10