Exercise 1: A company with a large fleet of cars wants to study the gasoline usage. They check the gasoline usage for 50 company trips chosen at random, finding a mean of 25.16 mpg. Based on the past experience, they believe that the gasoline usage is normally distributed and the standard deviation of the general gasoline usage is 4.82 mpg.

(a) Which kind of confidence intervals is appropriate to use here, z-interval or t-interval?

Since the population standard deviation ( sigma=4.82 mpg) is known, a z-interval is appropriate to use here.

(b) Please use R to find the critical value they need when constructing a 90% CI.

alpha = 0.10
z_critical = qnorm(1 - alpha/2)
z_critical
## [1] 1.644854

(c) Please use R to construct a 90% CI for the mean of the general gasoline usage.

x_bar = 25.16
sigma = 4.82
n = 50
confidence_level = 0.90

# Calculate the standard error
se = sigma / sqrt(n)

# Calculate the margin of error
margin_of_error = z_critical * se

# Calculate the confidence interval
lower_bound = x_bar - margin_of_error
upper_bound = x_bar + margin_of_error

cat("90% Confidence Interval: (", lower_bound, ", ", upper_bound, ")\n")
## 90% Confidence Interval: ( 24.03878 ,  26.28122 )

(d) If they want to control the width of CI to be within 0.60 mpg, at least how many trips do they have to sample? Use R to calculate.

width_target = 0.60
# Width = 2 * z_critical * (sigma / sqrt(n))
# Rearranging for n: n = (2 * z_critical * sigma / width)^2

n_required = (2 * z_critical * sigma / width_target)^2
ceiling(n_required) # Round up to the next whole number of trips
## [1] 699

(e) Create a R function whose argument is the width of CI, and the output is the sample size necessary to achieve such accuracy. Confidence level is fixed at 90%.

calculate_sample_size = function(ci_width) {
  sigma = 4.82
  alpha = 0.10
  z_critical = qnorm(1 - alpha/2)
  n_required = (2 * z_critical * sigma / ci_width)^2
  return(ceiling(n_required))
}

(f) Apply the function you created in part (e) to demonstrate that larger sample size is required to achieve better accuracy (i.e, narrower CI width). Confidence level is fixed at 90%.

cat("Sample size for CI width of 1.0 mpg:", calculate_sample_size(1.0), "\n")
## Sample size for CI width of 1.0 mpg: 252
cat("Sample size for CI width of 0.5 mpg:", calculate_sample_size(0.5), "\n")
## Sample size for CI width of 0.5 mpg: 1006
cat("Sample size for CI width of 0.1 mpg:", calculate_sample_size(0.1), "\n")
## Sample size for CI width of 0.1 mpg: 25143

Exercise 2: In a class survey, students are asked how many hours they sleep per night. In the sample of 21 students, the mean was 5.87 hours with a standard deviation of 1.56 hours. The parameter of interest is the mean number of hours slept per night in the population from which this sample was drawn.

(a) Which kind of confidence intervals is appropriate to use here, z-interval or t-interval?

Since the population standard deviation is unknown and the sample size is relatively small (\(n=21 \< 30\)), a t-interval is appropriate to use here

(b) Please use R to find the critical value they need when constructing a 98% CI.

n = 21
alpha = 0.02
df = n - 1
t_critical = qt(1 - alpha/2, df)
t_critical
## [1] 2.527977

(c) Please use R to construct a 98% CI for the mean number of hours slept per night.

x_bar = 5.87
s = 1.56
n = 21
confidence_level = 0.98
df = n - 1

# Calculate the standard error
se = s / sqrt(n)

# Calculate the margin of error
margin_of_error = t_critical * se

# Calculate the confidence interval
lower_bound = x_bar - margin_of_error
upper_bound = x_bar + margin_of_error

cat("98% Confidence Interval: (", lower_bound, ", ", upper_bound, ")\n")
## 98% Confidence Interval: ( 5.009426 ,  6.730574 )

Exercise 3: In the year 2001 Youth Risk Behavior survey done by the U.S. Centers for Disease Control, 750 out of 1200 female 12th graders said they always use a seatbelt when driving.

(a) We want to construct confidence interval to estimate the proportion of 12th grade females in the U.S. population who always use a seatbelt when driving. Is it it appropriate to use traditional confidence interval? Why or why not?

First, let’s calculate the sample proportion hatp: hatp= frac7501200=0.625

For a traditional confidence interval for proportions to be appropriate, we need to check the conditions:

n hatp ge10: 1200 times0.625=750 ge10 n(1− hatp) ge10: 1200 times(1−0.625)=1200 times0.375=450 ge10 Both conditions are met, so Yes, it is appropriate to use a traditional confidence interval. The sample size is large enough to assume that the sampling distribution of the sample proportion is approximately normal.

(b) If answer ”Yes” in part (a), use R to find the 99% traditional CI for the proportion of 12th grade females in the population who always use a seatbelt when driving. Skip this part if answer ”No” in part (a).

x = 750
n = 1200
p_hat = x / n
confidence_level = 0.99
alpha = 1 - confidence_level

# Find the critical z-value
z_critical = qnorm(1 - alpha/2)

# Calculate the standard error of the proportion
se_p_hat = sqrt(p_hat * (1 - p_hat) / n)

# Calculate the margin of error
margin_of_error = z_critical * se_p_hat

# Calculate the confidence interval
lower_bound = p_hat - margin_of_error
upper_bound = p_hat + margin_of_error

cat("99% Confidence Interval: (", lower_bound, ", ", upper_bound, ")\n")
## 99% Confidence Interval: ( 0.5890017 ,  0.6609983 )

(c) Assuming there is no prior information or past experience available, what is the sample size necessary to control the traditional 99% CI width to be within 0.01.

confidence_level = 0.99
alpha = 1 - confidence_level
z_critical = qnorm(1 - alpha/2)
desired_width = 0.01
margin_of_error = desired_width / 2

# Use p-hat = 0.5 for conservative estimate
p_hat_conservative = 0.5

n_required = (z_critical / margin_of_error)^2 * p_hat_conservative * (1 - p_hat_conservative)
ceiling(n_required)
## [1] 66349

Exercise 4: Consider the problem in Exercise 1 again. The company wants to conduct a test, with alpha=0.05, to see whether the fleet average is less than 26 mpg.

(a) Which kind of tests is appropriate to use here, z-test or t-test?

Since the population standard deviation (sigma=4.82 mpg) is known, a z-test is appropriate to use here

(b) Write appropriate hypotheses. Is the alternative hypothesis one-sided or two-sided?

Let mu be the true mean gasoline usage of the fleet. Null Hypothesis (H_0): mu=26 mpg Alternative Hypothesis (H_a): \(\\mu \< 26\) mpg

The alternative hypothesis is one-sided (left-tailed) because the company wants to see if the average is less than 26 mpg.

(c) Use R to compute the test statistic and construct rejection region. Make conclusion using the test statistic and rejection region.

mu0 = 26 # Hypothesized mean
x_bar = 25.16 # Sample mean
sigma = 4.82 # Population standard deviation
n = 50 # Sample size
alpha = 0.05 # Significance level

# Calculate the test statistic (z-score)
z_test_statistic = (x_bar - mu0) / (sigma / sqrt(n))
z_test_statistic
## [1] -1.232302
# Construct the rejection region
# For a one-sided (left-tailed) test, the critical value is qnorm(alpha)
z_critical = qnorm(alpha)
z_critical
## [1] -1.644854
cat("Test Statistic (z):", z_test_statistic, "\n")
## Test Statistic (z): -1.232302
cat("Critical Value (z_alpha):", z_critical, "\n")
## Critical Value (z_alpha): -1.644854
if (z_test_statistic < z_critical) {
  cat("Conclusion: Since the test statistic (", z_test_statistic, ") is less than the critical value (", z_critical, "), we **reject the null hypothesis**.\n")
  cat("There is sufficient evidence to conclude that the fleet average gasoline usage is less than 26 mpg.\n")
} else {
  cat("Conclusion: Since the test statistic (", z_test_statistic, ") is not less than the critical value (", z_critical, "), we **fail to reject the null hypothesis**.\n")
  cat("There is not sufficient evidence to conclude that the fleet average gasoline usage is less than 26 mpg.\n")
}
## Conclusion: Since the test statistic ( -1.232302 ) is not less than the critical value ( -1.644854 ), we **fail to reject the null hypothesis**.
## There is not sufficient evidence to conclude that the fleet average gasoline usage is less than 26 mpg.

(d) Use R to compute p-value. Make conclusion using p-value. Is it consistent with the conclusion in part (c)?

# Calculate the p-value for a one-sided (left-tailed) test
p_value = pnorm(z_test_statistic, lower.tail = TRUE)
p_value
## [1] 0.1089181
cat("P-value:", p_value, "\n")
## P-value: 0.1089181
cat("Significance level (alpha):", alpha, "\n")
## Significance level (alpha): 0.05
if (p_value < alpha) {
  cat("Conclusion: Since the p-value (", p_value, ") is less than the significance level (", alpha, "), we **reject the null hypothesis**.\n")
  cat("There is sufficient evidence to conclude that the fleet average gasoline usage is less than 26 mpg.\n")
} else {
  cat("Conclusion: Since the p_value (", p_value, ") is not less than the significance level (", alpha, "), we **fail to reject the null hypothesis**.\n")
  cat("There is not sufficient evidence to conclude that the fleet average gasoline usage is less than 26 mpg.\n")
}
## Conclusion: Since the p_value ( 0.1089181 ) is not less than the significance level ( 0.05 ), we **fail to reject the null hypothesis**.
## There is not sufficient evidence to conclude that the fleet average gasoline usage is less than 26 mpg.
cat("\nConsistency: Yes, the conclusion from the p-value method is consistent with the conclusion from the test statistic and rejection region method.\n")
## 
## Consistency: Yes, the conclusion from the p-value method is consistent with the conclusion from the test statistic and rejection region method.