Homework 4

Complete All of the Problems

Complete all of the problems below and enter your code within the supplied code block.

Problem 1: Confidence Interval for Mean (Sigma Known)

Question: A study claims that the average height of adult men in a certain country is 175 cm with a population standard deviation of 10 cm. You collect a random sample of 40 men from the population, and the sample has a mean height of 173 cm. Construct a 95% confidence interval for the true mean height of adult men in the country.

# Given values
population_sd <- 10       # population standard deviation
sample_mean   <- 173      # sample mean
n             <- 40       # sample size
z_value       <- 1.96     # z-value for 95% confidence interval

# Standard error
SE <- population_sd / sqrt(n)

# Confidence interval
lower_bound <- sample_mean - z_value * SE
upper_bound <- sample_mean + z_value * SE

# Print results
lower_bound
## [1] 169.901
upper_bound
## [1] 176.099

Problem 2: Confidence Interval for Mean (Sigma Unknown)

Question: A group of researchers wants to estimate the average age at which people are diagnosed with diabetes. They sample 25 individuals diagnosed with diabetes and find an average age of 52 years with a sample standard deviation of 8 years. Construct a 99% confidence interval for the true average age of diagnosis.

# Given values
sample_mean <- 52      # sample mean age
sample_sd   <- 8       # sample standard deviation
n           <- 25      # sample size
alpha       <- 0.01    # for 99% CI

# t-value for 99% CI with df = n - 1
t_value <- qt(1 - alpha/2, df = n - 1)

# Standard error
SE <- sample_sd / sqrt(n)

# Confidence interval
lower_bound <- sample_mean - t_value * SE
upper_bound <- sample_mean + t_value * SE

# Print results
lower_bound
## [1] 47.5249
upper_bound
## [1] 56.4751

Problem 3: Confidence Interval for a Proportion

Question: In a survey, 200 out of 500 respondents stated that they prefer shopping online rather than in physical stores. Construct a 90% confidence interval for the true proportion of people who prefer online shopping.

# Given values
x <- 200          # number who prefer online shopping
n <- 500          # total respondents
p_hat <- x / n    # sample proportion

z_value <- 1.645  # z-value for 90% CI

# Standard error
SE <- sqrt((p_hat * (1 - p_hat)) / n)

# Confidence interval
lower_bound <- p_hat - z_value * SE
upper_bound <- p_hat + z_value * SE

# Print results
lower_bound
## [1] 0.3639599
upper_bound
## [1] 0.4360401

Problem 4: Confidence Interval for the Difference in Means

Question: A medical researcher is comparing the effectiveness of two treatments for a particular disease. For Treatment A, a sample of 20 patients had an average recovery time of 10 days with a sample standard deviation of 2 days. For Treatment B, a sample of 25 patients had an average recovery time of 12 days with a sample standard deviation of 3 days. Construct a 95% confidence interval for the difference in mean recovery times between the two treatments.

# Given values
mean_A <- 10
sd_A   <- 2
n_A    <- 20

mean_B <- 12
sd_B   <- 3
n_B    <- 25

# Difference in sample means (A - B)
diff_mean <- mean_A - mean_B

# Standard error for difference in means
SE <- sqrt((sd_A^2 / n_A) + (sd_B^2 / n_B))

# Welch degrees of freedom
df <- ( (sd_A^2/n_A + sd_B^2/n_B)^2 ) /
      ( ((sd_A^2/n_A)^2)/(n_A - 1) + ((sd_B^2/n_B)^2)/(n_B - 1) )

# t-value for 95% CI
t_value <- qt(0.975, df = df)

# Confidence interval
lower_bound <- diff_mean - t_value * SE
upper_bound <- diff_mean + t_value * SE

# Print results
df
## [1] 41.78401
lower_bound
## [1] -3.510425
upper_bound
## [1] -0.4895746

Problem 5: Confidence Interval for the Difference in Proportions

Question: In a clinical trial, 100 out of 200 patients who received Drug A recovered from an illness, while 150 out of 300 patients who received Drug B recovered. Construct a 95% confidence interval for the difference in recovery proportions between Drug A and Drug B.

# Given values
x1 <- 100   # recovered on Drug A
n1 <- 200   # total on Drug A
p1 <- x1/n1 # proportion for Drug A

x2 <- 150   # recovered on Drug B
n2 <- 300   # total on Drug B
p2 <- x2/n2 # proportion for Drug B

# Difference in sample proportions (A - B)
diff_p <- p1 - p2

# Standard error for difference in proportions
SE <- sqrt((p1*(1-p1))/n1 + (p2*(1-p2))/n2)

# z-value for 95% CI
z_value <- 1.96

# Confidence interval
lower_bound <- diff_p - z_value * SE
upper_bound <- diff_p + z_value * SE

# Print results
lower_bound
## [1] -0.08946135
upper_bound
## [1] 0.08946135

Problem 6: One-Way ANOVA

Scenario: A botanist is studying the effect of five different fertilizers (Fertilizer A, Fertilizer B, Fertilizer C, Fertilizer D, Fertilizer E) on plant growth. She randomly assigns plants to one of the five fertilizers and measures their growth in centimeters after 8 weeks.

The data (in centimeters grown) for each fertilizer group is as follows:

  • Fertilizer A: 12, 15, 14, 16, 13
  • Fertilizer B: 10, 9, 11, 10, 12
  • Fertilizer C: 17, 19, 18, 16, 17
  • Fertilizer D: 14, 13, 12, 15, 14
  • Fertilizer E: 11, 13, 12, 10, 11

Task:

Use a One-Way ANOVA to determine if there is a significant difference in mean growth across the five fertilizers.

# Fertilizer data
A <- c(12, 15, 14, 16, 13)
B <- c(10, 9, 11, 10, 12)
C <- c(17, 19, 18, 16, 17)
D <- c(14, 13, 12, 15, 14)
E <- c(11, 13, 12, 10, 11)

# Combine into one vector
growth <- c(A, B, C, D, E)

# Create group labels
group <- factor(rep(c("A", "B", "C", "D", "E"), each = 5))

# Run One-Way ANOVA
anova_result <- aov(growth ~ group)

# Show ANOVA summary
summary(anova_result)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## group        4  147.0   36.74   23.86 2.26e-07 ***
## Residuals   20   30.8    1.54                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Questions:

  1. Based on the ANOVA result, is there a significant difference in growth across the five fertilizers?

Yes. Because the p-value is extremely small (p < 0.000001), far below 0.05, we reject the null hypothesis. There is a statistically significant difference in mean plant growth among the fertilizers.

  1. What is the F-value and p-value?

F-value: ≈ 23.86 p-value: ≈ 0.000000226

Problem 7: Two-Way ANOVA

Scenario: An educational researcher is interested in studying the effects of different study methods (Self-Study, Group Study) and exam difficulty levels (Easy, Moderate, Difficult) on students’ test scores. The researcher collects scores from students across both study methods and difficulty levels.

The test scores are as follows:

  • Self-Study (Easy): 88, 90, 85, 87
  • Self-Study (Moderate): 78, 80, 77, 76
  • Self-Study (Difficult): 65, 68, 70, 67
  • Group Study (Easy): 92, 95, 93, 90
  • Group Study (Moderate): 82, 84, 83, 81
  • Group Study (Difficult): 70, 72, 69, 71

Task:

Use a Two-Way ANOVA to analyze the effect of study method and exam difficulty on test scores.

scores <- c(
  88, 90, 85, 87,   # Self-Study, Easy
  78, 80, 77, 76,   # Self-Study, Moderate
  65, 68, 70, 67,   # Self-Study, Difficult
  92, 95, 93, 90,   # Group Study, Easy
  82, 84, 83, 81,   # Group Study, Moderate
  70, 72, 69, 71    # Group Study, Difficult
)

StudyMethod <- factor(rep(c("SelfStudy", "GroupStudy"), each = 12))

ExamDifficulty <- factor(rep(c("Easy", "Moderate", "Difficult"), 
                             each = 4, times = 2))

# Create dataframe
data <- data.frame(scores, StudyMethod, ExamDifficulty)

# Run Two-Way ANOVA

anova_result <- aov(scores ~ StudyMethod * ExamDifficulty, data = data)

# Display ANOVA table
summary(anova_result)
##                            Df Sum Sq Mean Sq F value   Pr(>F)    
## StudyMethod                 1  108.4   108.4   33.78 1.66e-05 ***
## ExamDifficulty              2 1766.1   883.0  275.23 3.20e-14 ***
## StudyMethod:ExamDifficulty  2    4.7     2.4    0.74    0.491    
## Residuals                  18   57.8     3.2                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1