Complete all of the problems below and enter your code within the supplied code block.
Question: A study claims that the average height of adult men in a certain country is 175 cm with a population standard deviation of 10 cm. You collect a random sample of 40 men from the population, and the sample has a mean height of 173 cm. Construct a 95% confidence interval for the true mean height of adult men in the country.
# Given values
population_sd <- 10 # population standard deviation
sample_mean <- 173 # sample mean
n <- 40 # sample size
z_value <- 1.96 # z-value for 95% confidence interval
# Standard error
SE <- population_sd / sqrt(n)
# Confidence interval
lower_bound <- sample_mean - z_value * SE
upper_bound <- sample_mean + z_value * SE
# Print results
lower_bound
## [1] 169.901
upper_bound
## [1] 176.099
Question: A group of researchers wants to estimate the average age at which people are diagnosed with diabetes. They sample 25 individuals diagnosed with diabetes and find an average age of 52 years with a sample standard deviation of 8 years. Construct a 99% confidence interval for the true average age of diagnosis.
# Given values
sample_mean <- 52 # sample mean age
sample_sd <- 8 # sample standard deviation
n <- 25 # sample size
alpha <- 0.01 # for 99% CI
# t-value for 99% CI with df = n - 1
t_value <- qt(1 - alpha/2, df = n - 1)
# Standard error
SE <- sample_sd / sqrt(n)
# Confidence interval
lower_bound <- sample_mean - t_value * SE
upper_bound <- sample_mean + t_value * SE
# Print results
lower_bound
## [1] 47.5249
upper_bound
## [1] 56.4751
Question: In a survey, 200 out of 500 respondents stated that they prefer shopping online rather than in physical stores. Construct a 90% confidence interval for the true proportion of people who prefer online shopping.
# Given values
x <- 200 # number who prefer online shopping
n <- 500 # total respondents
p_hat <- x / n # sample proportion
z_value <- 1.645 # z-value for 90% CI
# Standard error
SE <- sqrt((p_hat * (1 - p_hat)) / n)
# Confidence interval
lower_bound <- p_hat - z_value * SE
upper_bound <- p_hat + z_value * SE
# Print results
lower_bound
## [1] 0.3639599
upper_bound
## [1] 0.4360401
Question: A medical researcher is comparing the effectiveness of two treatments for a particular disease. For Treatment A, a sample of 20 patients had an average recovery time of 10 days with a sample standard deviation of 2 days. For Treatment B, a sample of 25 patients had an average recovery time of 12 days with a sample standard deviation of 3 days. Construct a 95% confidence interval for the difference in mean recovery times between the two treatments.
# Given values
mean_A <- 10
sd_A <- 2
n_A <- 20
mean_B <- 12
sd_B <- 3
n_B <- 25
# Difference in sample means (A - B)
diff_mean <- mean_A - mean_B
# Standard error for difference in means
SE <- sqrt((sd_A^2 / n_A) + (sd_B^2 / n_B))
# Welch degrees of freedom
df <- ( (sd_A^2/n_A + sd_B^2/n_B)^2 ) /
( ((sd_A^2/n_A)^2)/(n_A - 1) + ((sd_B^2/n_B)^2)/(n_B - 1) )
# t-value for 95% CI
t_value <- qt(0.975, df = df)
# Confidence interval
lower_bound <- diff_mean - t_value * SE
upper_bound <- diff_mean + t_value * SE
# Print results
df
## [1] 41.78401
lower_bound
## [1] -3.510425
upper_bound
## [1] -0.4895746
Question: In a clinical trial, 100 out of 200 patients who received Drug A recovered from an illness, while 150 out of 300 patients who received Drug B recovered. Construct a 95% confidence interval for the difference in recovery proportions between Drug A and Drug B.
# Given values
x1 <- 100 # recovered on Drug A
n1 <- 200 # total on Drug A
p1 <- x1/n1 # proportion for Drug A
x2 <- 150 # recovered on Drug B
n2 <- 300 # total on Drug B
p2 <- x2/n2 # proportion for Drug B
# Difference in sample proportions (A - B)
diff_p <- p1 - p2
# Standard error for difference in proportions
SE <- sqrt((p1*(1-p1))/n1 + (p2*(1-p2))/n2)
# z-value for 95% CI
z_value <- 1.96
# Confidence interval
lower_bound <- diff_p - z_value * SE
upper_bound <- diff_p + z_value * SE
# Print results
lower_bound
## [1] -0.08946135
upper_bound
## [1] 0.08946135
Scenario: A botanist is studying the effect of five different fertilizers (Fertilizer A, Fertilizer B, Fertilizer C, Fertilizer D, Fertilizer E) on plant growth. She randomly assigns plants to one of the five fertilizers and measures their growth in centimeters after 8 weeks.
The data (in centimeters grown) for each fertilizer group is as follows:
Task:
Use a One-Way ANOVA to determine if there is a significant difference in mean growth across the five fertilizers.
# Fertilizer data
A <- c(12, 15, 14, 16, 13)
B <- c(10, 9, 11, 10, 12)
C <- c(17, 19, 18, 16, 17)
D <- c(14, 13, 12, 15, 14)
E <- c(11, 13, 12, 10, 11)
# Combine into one vector
growth <- c(A, B, C, D, E)
# Create group labels
group <- factor(rep(c("A", "B", "C", "D", "E"), each = 5))
# Run One-Way ANOVA
anova_result <- aov(growth ~ group)
# Show ANOVA summary
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## group 4 147.0 36.74 23.86 2.26e-07 ***
## Residuals 20 30.8 1.54
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Questions:
Scenario: An educational researcher is interested in studying the effects of different study methods (Self-Study, Group Study) and exam difficulty levels (Easy, Moderate, Difficult) on students’ test scores. The researcher collects scores from students across both study methods and difficulty levels.
The test scores are as follows:
Task:
Use a Two-Way ANOVA to analyze the effect of study method and exam difficulty on test scores.
scores <- c(
88, 90, 85, 87, # Self-Study, Easy
78, 80, 77, 76, # Self-Study, Moderate
65, 68, 70, 67, # Self-Study, Difficult
92, 95, 93, 90, # Group Study, Easy
82, 84, 83, 81, # Group Study, Moderate
70, 72, 69, 71 # Group Study, Difficult
)
StudyMethod <- factor(rep(c("SelfStudy", "GroupStudy"), each = 12))
ExamDifficulty <- factor(rep(c("Easy", "Moderate", "Difficult"),
each = 4, times = 2))
# Create dataframe
data <- data.frame(scores, StudyMethod, ExamDifficulty)
# Run Two-Way ANOVA
anova_result <- aov(scores ~ StudyMethod * ExamDifficulty, data = data)
# Display ANOVA table
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## StudyMethod 1 108.4 108.4 33.78 1.66e-05 ***
## ExamDifficulty 2 1766.1 883.0 275.23 3.20e-14 ***
## StudyMethod:ExamDifficulty 2 4.7 2.4 0.74 0.491
## Residuals 18 57.8 3.2
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1