ADEC7310 Homework 5

Question 1 (In class Lecture notes)

Using traditional methods, it takes 109 hours to receive a basic driving license. A new license training method using Computer Aided Instruction (CAI) has been proposed. A researcher used the technique with 190 students and observed that they had a mean of 110 hours. Assume the standard deviation is known to be 6. A level of significance of 0.05 will be used to determine if the technique performs differently than the traditional method. Make a decision to reject or fail to reject the null hypothesis. Show all work in R. Given: μ=109,n=190,x¯=110,σ=6,alpha=.05 To Do: Determine if the technique performs differently than the traditional method. Burden of proof falls on alternative hypothesis.
Thoughts:
I got these steps essentially from: https://www.geeksforgeeks.org/r-language/hypothesis-testing-in-r-programming/ including the decision IF…ELSE code. 1. Statement of the hypotheses
a. Null Hypothesis: mu = 109
b. Alternate Hypothesis: mu != 109 c. Two-tailed due to we are looking any difference, whether higher or lower 2. Test Statistic
a. Since we know sigma, we can use z-distribution 3. Determine critical values (rejection or fail-rejection zone)
a. Use alpha 0.05 split amongst both tails (0.025 each)
4. Calculate p-value a. Assumes Null Hypothesis is TRUE b. Probability of observing a sample AT LEAST AS extreme as ours 5. Decide

### Question 1 

#From Professor Sharma
rm(list = ls()) # Clear environment
gc()            # Clear unused memory

##           used (Mb) gc trigger (Mb) max used (Mb)
## Ncells  618277 33.1    1408984 75.3   702071 37.5
## Vcells 1154454  8.9    8388608 64.0  1928053 14.8

cat("\f")       # Clear the console

#dev.off()      # Clear par(mfrow=c(3,2)) type commands 

# What I know
mu_h0 <- 109        # Traditional method mean (null hypothesis value)
n <- 190           # Sample size
x_bar <- 110       # Sample mean from CAI method
sigma <- 6         # Population standard deviation (known)
alpha <- 0.05      # Significance level

# Calculate the standard error
std_error <- sigma / sqrt(n)
cat("Standard Error =", std_error, "\n")

## Standard Error = 0.4352858

# Calculate the Z test statistic
Z_calc <- (x_bar - mu_h0) / std_error
cat("Calculated Z-statistic =", Z_calc, "\n")

## Calculated Z-statistic = 2.297341

# For a two-tailed test, find critical values
# Lower tail: 0.025, Upper tail: 0.975
zcrit_lower <- qnorm(alpha/2)
zcrit_upper <- qnorm(1 - alpha/2)

cat("Critical Z-values:\n")

## Critical Z-values:

cat("  Lower critical value =", zcrit_lower, "\n")

##   Lower critical value = -1.959964

cat("  Upper critical value =", zcrit_upper, "\n")

##   Upper critical value = 1.959964

# Calculate p-value for two-tailed test
# We need to account for both tails
p_value <- 2 * (1 - pnorm(abs(Z_calc)))
cat("P-value =", p_value, "\n")

## P-value = 0.0215993

# Decide
if(p_value < alpha) {
  cat("Decision: REJECT the null hypothesis\n")
  decision <- "Reject H0"
} else {
  cat("Decision: FAIL TO REJECT the null hypothesis\n")
  decision <- "Fail to reject H0"
}

## Decision: REJECT the null hypothesis

# Visualize using Professor Cojoc's given code
shadenorm = function(below=NULL, above=NULL, pcts = c(0.025,0.975), mu=109, sig=6, numpts = 500, color = "gray", dens = 40,                    justabove= FALSE, justbelow = FALSE, lines=FALSE,between=NULL,outside=NULL){
  
  if(is.null(between)){
    below = ifelse(is.null(below), qnorm(pcts[1],mu,sig), below)
    above = ifelse(is.null(above), qnorm(pcts[2],mu,sig), above)
  }
  if(is.null(outside)==FALSE){
    below = min(outside)
    above = max(outside)
  }
  
  lowlim = mu - 4*sig                         # min point plotted on x axis
  uplim  = mu + 4*sig                         # max point plotted on x axis
  x.grid = seq(lowlim,uplim, length= numpts)
  dens.all = dnorm(x.grid,mean=mu, sd = sig)
  
  if(lines==FALSE){
    plot(x.grid, dens.all, type="l", xlab="X", ylab="Density")    # label y and x axis
  }
  
  if(lines==TRUE){
    lines(x.grid,dens.all)
  }
  
  if(justabove==FALSE){
    x.below    = x.grid[x.grid<below]
    dens.below = dens.all[x.grid<below]
    polygon(c(x.below,rev(x.below)),c(rep(0,length(x.below)),rev(dens.below)),col=color,density=dens)
  }
  if(justbelow==FALSE){
    x.above    = x.grid[x.grid>above]
    dens.above = dens.all[x.grid>above]
    polygon(c(x.above,rev(x.above)),c(rep(0,length(x.above)),rev(dens.above)),col=color,density=dens)
  }
  
  if(is.null(between)==FALSE){
    from = min(between)
    to   = max(between)
    x.between    = x.grid[x.grid>from&x.grid<to]
    dens.between = dens.all[x.grid>from&x.grid<to]
    polygon(c(x.between,rev(x.between)),c(rep(0,length(x.between)),rev(dens.between)),col=color,density=dens)
  }
}

# function call
shadenorm(mu = 109, sig = 6, pcts = c(0.025,0.975))

CONCLUSION: There is sufficient evidence to conclude that the CAI training produces a different mean than the traditional method (~1 hour longer). Therefore, H0 is rejected.

Question 2 (Lecture notes)

Our environment is very sensitive to the amount of ozone in the upper atmosphere. The level of ozone normally found is 5.3 parts/million (ppm). A researcher believes that the current ozone level is at an insufficient level. The mean of 5 samples is 5.0 parts per million (ppm) with a standard deviation of 1.1. Does the data support the claim at the 0.05 level? Assume the population distribution is approximately normal. Given: μ=5.3,n=5,x¯=5,σ=1.1,alpha=.05 To Do: Researcher believes that the current ozone level is at an insufficient level - does the data support the claim at the 0.05 level?

#From Professor Sharma
#rm(list = ls()) # Clear environment
#gc()            # Clear unused memory
#cat("\f")       # Clear the console
#dev.off()      # Clear par(mfrow=c(3,2)) type commands 

# Set up Hypothesis
# H0 = mu = 5.3         # Ozone level is at the normal amount
# Ha < mu 5.3           # Ozone level is insufficient (lower) than normal
# Left-tailed test

# Calculate test statistic
# Using t-test because sample n = 5 (small)

# What I know
mu_h0 <- 5.3      # hypothesized population mean
xbar <- 5.0      # sample mean
s_sigma <- 1.1         # sample standard deviation
n <- 5           # sample size
alpha <- 0.05    # significance level


# Calculate standard error
std_error <- s_sigma / sqrt(n)
std_error

## [1] 0.491935

# Calculate t test statistic
t_stat <- (xbar - mu_h0) / std_error
t_stat

## [1] -0.6098367

# Calculate degrees of freedom
df <- n -1
df

## [1] 4

# Critical value (left-tailed)
t_crit <- qt(alpha, df)
t_crit

## [1] -2.131847

# P-value (left-tailed)
p_value <- pt(t_stat, df)
p_value

## [1] 0.2874568

# Visualization
curve(dt(x, df = 4), from = -4, to = 4, 
      main = "t-Distribution (df = 4)\nOzone Level Hypothesis Test",
      xlab = "t-value", 
      ylab = "Density",
      lwd = 3, 
      col = "black")

# Shade rejection region
x_shade <- seq(-4, t_crit, length.out = 10)
y_shade <- dt(x_shade, df = 4)
polygon(c(-4, x_shade, t_crit), c(0, y_shade, 0), 
        col = "red", border = NA)

# Add vertical lines
abline(v = t_crit, col = "red", lwd = 2, lty = 2)
abline(v = t_stat, col = "blue", lwd = 2)
abline(v = 0, col = "gray", lty = 2)

# Add labels
text(t_crit, 0.1, paste("Critical value\nt =", round(t_crit, 3)), pos = 2, col = "red", cex = 0.7)
text(t_stat, 0.15, paste("Test statistic\nt =", round(t_stat, 3)), pos = 4, col = "blue", cex = 0.7)
text(-3, 0.35, paste("alpha = 0.05\nReject H0"), col = "red", cex = 0.7)

legend("topright", 
       legend = c("t-distribution", "Rejection region", 
                  "Critical value", "Test statistic"),
       col = c("blue", "red", "orange", "blue"),
       lwd = c(2, NA, 2, 2), lty = c(1, NA, 2, 1),
       pch = c(NA, 19, NA, NA), bty = "n",
       cex = 0.8)

Conclusion:

p-value is ~0.287
alpha = 0.05
Since 0.287 > 0.05
DECISION: Fail to reject H0
In other words, evidence fo the researcher’s claim is insufficient ### Question 3 (Lecture notes)
Our environment is very sensitive to the amount of ozone in the upper atmosphere. The level of ozone normally found is 7.3 parts/million (ppm). A researcher believes that the current ozone level is not at a normal level. The mean of 51 samples is 7.1 ppm with a variance of 0.49. Assume the population is normally distributed. A level of significance of 0.01 will be used. Show all work and hypothesis testing steps. Given: μ=7.3,n=51,x¯=7.1,σ2=0.49,alpha=.01 To Do: Researcher believes that the current ozone level is not at normal level. Thus, set a double sided hypothesis. Thoughts:
1. State the hypothesis
- H0 = mu = 7.3 (ozone is normal)
- Ha <> 7.3 (ozone is NOT at normal levels)
1. Significance level
- alpha = 0.01 (given)
- alpha / 2 = split for each tail
1. Test statistic formula
- t-distribution
- degrees of freedom
1. Find critical values
2. Decision

### Question 3
# What I know
mu_h0 <- 7.3              # Population mean of Null Hypothesis
n <- 51                   # Sample size
x_bar <- 7.1              # Sample mean
variance <- 0.49          # Sample variance
sigma<- sqrt(variance)    # Sample standard deviation
alpha <- 0.01             # Significance level

# Step 1: Calculate test statistic
SE <- sigma/ sqrt(n)
t_stat <- (x_bar - mu_h0) / SE

# Step 2: Find critical values (two-tailed)
df <- n - 1
t_crit <- qt(1 - alpha/2, df)

# Step 3: Calculate p-value (two-tailed)
p_value <- 2 * pt(t_stat, df)

# Step 4: Decision
if (abs(t_stat) > t_crit) {
  cat("Decision: REJECT H0\n")
  cat("Conclusion: There IS sufficient evidence that ozone level differs from 7.3 ppm\n")
} else {
  cat("Decision: FAIL TO REJECT H0\n")
  cat("Conclusion: There is NOT sufficient evidence that ozone level differs from 7.3 ppm\n")
}

## Decision: FAIL TO REJECT H0
## Conclusion: There is NOT sufficient evidence that ozone level differs from 7.3 ppm

# Visual representation
curve(dt(x, df), from = -4, to = 4, 
      main = "t-Distribution",
      xlab = "t-Value", ylab = "Density",
      lwd = 2)

# Shade rejection regions
x_left <- seq(-4, -t_crit, length = 100)
y_left <- dt(x_left, df)
polygon(c(x_left, rev(x_left)), c(y_left, rep(0, length(y_left))),
        col = "red", border = NA)

x_right <- seq(t_crit, 4, length = 100)
y_right <- dt(x_right, df)
polygon(c(x_right, rev(x_right)), c(y_right, rep(0, length(y_right))),
        col = "red", border = NA)

# Add vertical lines
abline(v = c(-t_crit, t_crit), col = "red", lwd = 2, lty = 2)
abline(v = t_stat, col = "blue", lwd = 2)
abline(v = 0, col = "black", lwd = 1, lty = 3)

# Add legend
legend("topright", 
       legend = c(paste("Critical values: ±", round(t_crit, 3)),
                  paste("Test statistic:", round(t_stat, 3)),
                  "Rejection regions"),
       col = c("red", "blue", "red"),
       lty = c(2, 1, NA), 
       lwd = c(3, 3, NA),
       pch = c(NA, NA, 15))

Quesion 4 (Lecture notes)

A publisher reports that 36% of their readers own a laptop. A marketing executive wants to test the claim that the percentage is actually less than the reported percentage. A random sample of 100 found that 29% of the readers owned a laptop. Is there sufficient evidence at the 0.02 level to support the executive’s claim? Show all work and hypothesis testing steps.

Given: pi=.36,n=100,p^=.29,alpha=.02

To Do: Executive wants to test the claim that the percentage is actually less than the reported percentage. Thus, set a single sided hypothesis.

### Question 4

# This is a LEFT-TAILED TEST because we're testing if the proportion is       
# LESS THAN the claimed value (not just different from it).                   

# Population parameter under the null hypothesis
# This is what we ASSUME is true until proven otherwise
pi_0 <- 0.36
n <- 100
p_hat <- 0.29
alpha <- 0.02         # Significance level - our tolerance

# Calculate the expected counts
expected_successes <- n * pi_0
expected_failures <- n * (1 - pi_0)

# Check if conditions are met
# 5 being the rule of thumb to determine if we can use the normal approximation 
if (expected_successes >= 5 & expected_failures >= 5) {
  cat("Both conditions are satisfied!\n")
  cat("We can proceed with the normal approximation (Z-test)\n\n")
} else {
  cat("Conditions NOT met - normal approximation may not be valid\n\n")
}

## Both conditions are satisfied!
## We can proceed with the normal approximation (Z-test)

# Calculate the standard error
# The standard error tells us how much variability we expect in sample
# proportions if we repeated this survey many times with n=100
SE <- sqrt((pi_0 * (1 - pi_0)) / n)

# Calculate the Z test statistic
# This standardizes the difference between what we observed  
# and what we expected under h0
z_stat <- (p_hat - pi_0) / SE

# For a left-tailed test, we want the Z-value where P(Z < z_crit) = alpha
z_crit <- qnorm(alpha)

# pnorm gives us the cumulative probability P(Z < z)
# For a left-tailed test, this is exactly what we want
p_value <- pnorm(z_stat)

# MAKE THE DECISION
cat("Test Statistic:  Z =", round(z_stat, 4), "\n")

## Test Statistic:  Z = -1.4583

cat("Critical Value:  z_crit =", round(z_crit, 4), "\n\n")

## Critical Value:  z_crit = -2.0537

cat("Is Z_test < z_crit?\n")

## Is Z_test < z_crit?

cat(" ", round(z_stat, 4), "<", round(z_crit, 4), "?\n")

##   -1.4583 < -2.0537 ?

if (z_stat < z_crit) {
  cat("  YES → Z_test is in the rejection region\n")
  cat("  DECISION: REJECT h0\n\n")
  decision_cv <- "REJECT h0"
} else {
  cat("  NO →", round(z_stat, 4), "≥", round(z_crit, 4), "\n")
  cat("  Z_test is NOT in the rejection region\n")
  cat("  DECISION: FAIL TO REJECT h0\n\n")
  decision_cv <- "FAIL TO REJECT h0"
}

##   NO → -1.4583 ≥ -2.0537 
##   Z_test is NOT in the rejection region
##   DECISION: FAIL TO REJECT h0

# CREATE VISUALIZATION

# Draw the standard normal distribution curve
curve(dnorm(x), from = -4, to = 4, 
      xlab = "Z (Standard Normal)", 
      ylab = "Probability Density",
      main = "Left-Tailed Hypothesis Test\nh0 = pi_0 = 0.36",
      lwd = 2.5, 
      col = "blue",
      las = 1,           # Make axis labels horizontal
      cex.main = 1.2,    # Make title larger
      cex.lab = 1.1)     # Make axis labels larger

# Shade the rejection region (left tail, area = alpha)
# This represents the "extreme" values that would lead us to reject h0
z_sequence <- seq(-4, z_crit, length.out = 200)
polygon(c(z_sequence, z_crit, -4),
        c(dnorm(z_sequence), 0, 0),
        col = "red")

# Add a vertical line at critical value
abline(v = z_crit, col = "red", lwd = 2.5, lty = 2)
text(z_crit - 0.3, 0.15, 
     paste0("Critical Value\nz = ", round(z_crit, 3)),
     pos = 2, col = "red", cex = 0.90, font = 2)

# Add a vertical line at test statistic
abline(v = z_stat, col = "darkgreen", lwd = 2.5)
text(z_stat + 0.3, 0.28, 
     paste0("Test Statistic\nz = ", round(z_stat, 3)),
     pos = 4, col = "darkgreen", cex = 0.90, font = 2)

# Label the rejection region
text(-3.2, 0.04, 
     paste0("Rejection Region\n(alpha = ", alpha, ")"), 
     col = "brown", cex = 0.90, font = 2)

# Add a label for the non-rejection region
text(1, 0.05, 
     "Non-Rejection Region", 
     col = "blue", cex = 0.90, font = 2)

# Add legend
legend("topright", 
       legend = c("Standard Normal Distribution", 
                  "Rejection Region (alpha = 0.02)",
                  "Test Statistic", 
                  "Critical Value"),
       col = c("blue", "red", "darkgreen", "red"),
       lwd = c(2, 10, 2, 2),
       lty = c(1, 1, 1, 2),
       cex = 0.85,
       bg = "white")

# Add grid for easier reading
grid(col = "gray", lty = "dotted")

### Question 5: A hospital director is told that 31% of the treated patients are uninsured. The director wants to test the claim that the percentage of uninsured patients is less than the expected percentage. A sample of 380 patients found that 95 were uninsured. Make the decision to reject or fail to reject the null hypothesis at the 0.05 level. Show all work and hypothesis testing steps.

Given: pi =.31, n = 380, p^= 95/380 = 0.25, alpha= 0.05

To Do: Researcher believes that the current ozone level is not at normal level. Thus, set a double sided hypothesis.

Thoughts: I might be going way out in left field, but the question states “…LESS THAN…” which to me means a one-tail test (Left) but then the to do says “…double-sided hypothesis…” Which, I will assume is a misprint. Therefore, I will use a left-tailed test.

# Clear the environment
rm(list = ls())
#dev.off()      # Clear par(mfrow=c(3,2)) type commands 

# What I know
alpha <- 0.05         # Significance level
pi_0 <- 0.31          # Hypothesized population proportion
n <- 380              # Sample size
x <- 95               # Number of uninsured patients in sample
p_hat <- x / n        # Sample proportion

# Calculate the expected counts
np <- n * pi_0
n_1minus_p <- n * (1 - pi_0)

# Check if conditions are met
# 5 being the rule of thumb to determine if we can use the normal approximation 
if (np >= 5 & n_1minus_p >= 5) {
  cat("Both conditions are satisfied!\n")
  cat("We can proceed with the normal approximation (Z-test)\n\n")
} else {
  cat("Conditions NOT met - normal approximation may not be valid\n\n")
}

## Both conditions are satisfied!
## We can proceed with the normal approximation (Z-test)

# Calculate the test statistic, but first SE (standard error)
SE <- sqrt(pi_0 * (1 - pi_0) / n)
z <- (p_hat - pi_0) / SE

# Critical value for left-tailed test
z_crit <- qnorm(alpha)

# Decide
if (z < z_crit) {
  cat("YES:", round(z, 4), "<", round(z_crit, 4), "\n")
  cat("Decision: REJECT NULL Hypothesis\n")
} else {
  cat("  • NO:", round(z, 4), ">=", round(z_crit, 4), "\n")
  cat("  • Decision: FAIL TO REJECT NULL Hypothesis\n")
}

## YES: -2.5289 < -1.6449 
## Decision: REJECT NULL Hypothesis

# Visualize
# Plot the standard normal distribution
curve(dnorm(x), from = -4, to = 4, 
      main = "Left-Tailed Hypothesis Test for Proportion\n",
      xlab = "Z-score", 
      ylab = "Probability Density",
      lwd = 2.5, 
      col = "navy",
      las = 1,
      cex.main = 1.2,
      cex.lab = 1.1)

# Shade the rejection region (left tail)
z_seq <- seq(-4, z_crit, length.out = 200)
polygon(c(z_seq, z_crit, -4), 
        c(dnorm(z_seq), 0, 0),
        col = "red")

# Add critical value line
abline(v = z_crit, col = "red", lwd = 2.5, lty = 2)
text(z_crit - 0.3, 0.25, 
     paste0("Critical Value\nZ = ", round(z_crit, 3), "\n(alpha = ", alpha, ")"), 
     pos = 2, col = "red", cex = 0.9, font = 2)

# Add test statistic line
abline(v = z, col = "darkgreen", lwd = 3)
text(z - 0.3, 0.35, 
     paste0("Test Statistic\nZ = ", round(z, 3)), 
     pos = 2, col = "darkgreen", cex = 0.9, font = 2)

# Add grid for readability
grid(col = "gray", lty = "dotted")

# Add legend
legend("topright", 
       legend = c("Standard Normal Distribution",
                  "Rejection Region (alpha = 0.05)", 
                  "Critical Value", 
                  "Test Statistic"),
       col = c("navy", "red", "red", "darkgreen"),
       lwd = c(2.5, 10, 2.5, 3),
       lty = c(1, 1, 2, 1),
       cex = 0.85,
       bg = "white")

### Question 6:
A medical researcher wants to compare the pulse rates of smokers and non-smokers. He believes that the pulse rate for smokers and non-smokers is different and wants to test this claim at the 0.1 level of significance. The researcher checks 32 smokers and finds that they have a mean pulse rate of 87, and 31 non-smokers have a mean pulse rate of 84. The standard deviation of the pulse rates is found to be 9 for smokers and 10 for non-smokers. Let μ1 be the true mean pulse rate for smokers and μ2 be the true mean pulse rate for non-smokers. Show all work and hypothesis testing steps.

Let smoker group be indexed by 1, non-smoker group by 2. Given: n1=32, mu1=87, n2=31, mu2=84, sigma1=9, sigma2=10, alpha=10%

To Do: Test if the pulse rate for smokers and non-smokers is different at the 0.1 level of significance. Thus, double sided test.

# Clear the environment
rm(list = ls())
#dev.off()      # Clear par(mfrow=c(3,2)) type commands 

# State the hypothesis
# Null Hypothesis (h0): mu_smoker = mu_non (teh mean pulse rate for smokers equals the mean pulse rate for non-smokers)
# Alternate Hypothesis (ha): mu_smoker <> mu_non
# Therefore: two-tailed test since the word "different" is used

# Significance level (per question)
alpha = 0.10

# What I know
n_smoke <- 32 
xbar_smoke <- 87
sigma_smoke <- 9

n_non <- 31
xbar_non <- 84
sigma_non <- 10

# Calculate standard error
SE <- sqrt((sigma_smoke^2 / n_smoke) + (sigma_non^2 / n_non))

# Calculate z-test statistic
z <- (xbar_smoke - xbar_non) / SE

# Determine critical values
# Since alpha =0.10, we can split this in 2 for each tail.

# Lower and Upper critical values
z_crit_low <-  qnorm(alpha / 2)
z_crit_up <-  qnorm(1 - alpha /2 )

# Make a decision
if (abs(z) > z_crit_up) {
  decision <- "Reject h0"
} else {
  decision <- "Fail to reject h0"
}

cat("Decision:", decision, "\n")

## Decision: Fail to reject h0

# OK, let's try to visualize this
curve(dnorm(x), from = -4, to = 4, 
      main = "Two-Tailed Z-Test for Difference in Means
      \nSmokers vs Non-Smokers (pulse rates)",
      xlab = "Z-score", ylab = "Density",
      lwd = 2, col = "blue")

# Add critical regions (shaded)
z_vals_left <- seq(-4, z_crit_low, length.out = 100)
z_vals_right <- seq(z_crit_up, 4, length.out = 100)

polygon(c(z_vals_left, rev(z_vals_left)), 
        c(dnorm(z_vals_left), rep(0, length(z_vals_left))),
        col = "red", border = NA)

polygon(c(z_vals_right, rev(z_vals_right)), 
        c(dnorm(z_vals_right), rep(0, length(z_vals_right))),
        col = "red", border = NA)

# Add critical values
abline(v = c(z_crit_low, z_crit_up), 
       col = "red", lwd = 2, lty = 2)

# Add observed test statistic
abline(v = z, col = "darkgreen", lwd = 3)

# Add legend
legend("topleft", 
       legend = c("Rejection regions (alpha = 0.10)",
                  "Critical values (+/-1.645)",
                  paste0("Observed Z = ", round(z, 3))),
       col = c("red", "red", "darkgreen"),
       lwd = c(10, 2, 3),
       lty = c(1, 2, 1),
       cex = .75,
       bty = "o")

### Question 7:
Given two independent random samples with the following results:

n1=11,x¯1=127,σ1=33,n2=18,x¯2=157,σ2=27

Use this data to find the 95% confidence interval for the true difference between the population means. Assume that the population variances are not equal and that the two populations are normally distributed.

To Do: Create a 95% confidence interval for true difference between the population means.

# Set up the problem
# Sample 1: n1 = 11, xbar1 = 127, sigma1 = 33
# Sample 2: n2 = 18, xbar2 = 157, sigma2 = 27
# Confidence level: 95%

# What I know
n1 <- 11
xbar1 <- 127
sigma1 <- 33

n2 <- 18
xbar2 <- 157
sigma2 <- 27

confidence_level <- 0.95
alpha <- 1 - confidence_level

# Calculate the difference sample means (point estimate)
point_est <- xbar1 - xbar2

# Calculate standard error
SE <- sqrt(sigma1^2 / n1 + sigma2^2 / n2)

# Calculate z-interval
z_crit <-  qnorm(1 - alpha / 2)

# From the readings in OpenIntro Statistics, we can use the smaller of n1-1 and n2-1 for the degrees of freedom
df <- 10  # (n1 -1)

# Calculate Margin of Error
margin_error <- z_crit * SE

# Confidence interval
CI_lower <- point_est - margin_error
CI_upper <- point_est + margin_error

# 95% Confidence Interval
cat("\n95% Confidence Interval:\n")

## 
## 95% Confidence Interval:

cat("(", round(CI_lower, 2), ",", round(CI_upper, 2), ")\n")

## ( -53.15 , -6.85 )

# Dan Leone has 0% confidence level in this answer

Question 8:

Two men, A and B, who usually commute to work together decide to conduct an experiment to see whether one route is faster than the other. The men feel that their driving habits are approximately the same, so each morning for two weeks one driver is assigned to route I and the other to route II. The times, recorded to the nearest minute, are shown in the following table. Using this data, find the 98% confidence interval for the true mean difference between the average travel time for route I and the average travel time for route II.

# What I know
# Paired data
r1 <- c(32, 27, 34, 24, 31, 25, 30, 23, 27, 35)
r2 <- c(28, 28, 33, 25, 26, 29, 33, 27, 25, 33)

# Calculate differences
d <- r1 - r2

# More stuff I know
n <-  length(d)
mean_d <- mean(d)
sigma_d <- sd(d)
SE <- sigma_d / sqrt(n)    # Standard error of the mean difference

# Define confidence level
confidence_level <- 0.98
alpha <- 1 - confidence_level
df <- n - 1

# Find critical t
t_crit <-  qt(1 - alpha / 2, df)

# Calculate margin of error
margin_error = t_crit * SE

# Confidence interval
CI_lower <- mean_d - margin_error
CI_upper <- mean_d + margin_error

cat("98% Confidence Interval: (", round(CI_lower, 3), ",", round(CI_upper, 3), ")\n")

## 98% Confidence Interval: ( -2.767 , 2.967 )

Conclusion: We are 98% confident that the true mean difference between route I and route II travel times lies between -2.77 minutes and 2.97 minutes.

Also, just noticing that I did not include an official hypothesis test.

Question 9:

The U.S. Census Bureau conducts annual surveys to obtain information on the percentage of the voting-age population that is registered to vote. Suppose that 391 employed persons and 510 unemployed persons are independently and randomly selected, and that 195 of the employed persons and 193 of the unemployed persons have registered to vote. Can we conclude that the percentage of employed workers (p1) who have registered to vote, exceeds the percentage of unemployed workers (p2) who have registered to vote? Use a significance level of 0.05 for the test. Show all work and hypothesis testing steps.

Q: Can we conclude that the percentage of employed workers (p1) who have registered to vote, exceeds the percentage of unemployed workers (p2) who have registered to vote?

# State the hypothesis
# Null hypothesis (h0): p1 <= p2 (employed registration exceeds unemployed rate)
# Alternate Hypothesis (ha): p1 > p2
# This is right-tailed test

n1 <- 391     # employed sample size
x1 <- 195     # employed registered voters
n2 <- 510     # unemployed sample size
x2 <- 193     # unemployed registered voters

# Sample proportions
phat1 <- x1 / n1
phat2 <- x2 / n2

# Calculate pooled proportions
pooled <- (x1 + x2) / (n1 + n2)

# Calculate standard error
SE <-  sqrt(pooled * (1 - pooled) * (1 / n1 + 1 / n2))

# Calculate test statistic
z_stat <- (phat1 - phat2) / SE

# Calculate critical value for alpha = 0.05 (right-tailed)
z_crit <- qnorm(0.95)

# Wrap it up
cat("Critical value:", round(z_crit, 4), "\n")

## Critical value: 1.6449

cat("\nDecision rule: Reject h0 if z >", round(z_crit, 4), "\n")

## 
## Decision rule: Reject h0 if z > 1.6449

cat("Test statistic z =", round(z_stat, 4), "\n")

## Test statistic z = 3.614

cat("Since", round(z_stat, 4), ">", round(z_crit, 4), 
    ", we REJECT h0\n")

## Since 3.614 > 1.6449 , we REJECT h0

# Visualize