#Question 1.

Imagine you are a public health researcher tasked with conducting a study to estimate the prevalence of a specific disease (e.g., diabetes, heart disease, cancer) in a given community. Your goal is to determine how large your sample should be to ensure that your estimates are reliable and accurate. Address the following tasks:

For this assignment, we will use diabetes with general prevalence rate of 10% as a example.

  1. Using the formula for determining sample size for estimating a proportion with a 95% confidence level and a 5% margin of error for your initial calculation.
# Set the estimated prevalence rate of the disease to 10% 
prev <- 0.1

# Set the significance level to 0.05 for a 95% confidence level
alpha <- 0.05

# Calculate the z-score for a 95% confidence level (1 point)
z_score <- qnorm(1 - alpha / 2)

# Display the z-score
print(z_score)
## [1] 1.959964
# Set the desired margin of error to 5%
d<- 0.05

# Calculate the required sample size using the formula for proportions (1 point)
sample_size = z_score^2 * prev * (1 - prev) / (d^2)
sample_size
## [1] 138.2925
# Round up the calculated sample size to the nearest whole number
ceiling(sample_size)
## [1] 139

1.1 Explain clearly what does 5% margin or error means? (1 point) It means that we are 95% confidence that the true mean falls within the range of our sample mean

1.2 If you estimated that 20% of adults in a community have hypertension with a 5% margin of error, does it mean that the actual percentage of the population with hypertension could range from 5% to 20% ? (1 point) No that’s not what it means.

  1. Also, perform calculations for different levels of estimated prevalence (e.g., if the prevalence of the disease is X%, then redo your calculations for 0.5X, 2X, and 3X prevalence rates to understand how the expected prevalence rate affects the required sample size).
# Sample size calculation for 0.5X prevalence (1 point)
prev.5<- prev*0.5
sample_size.5 <- z_score^2 * prev.5 * (1 - prev.5) / (d^2)
sample_size.5
## [1] 72.98772
# Sample size calculation for 2X prevalence (1 point)
prev2<- prev*2
sample_size2 <- z_score^2 * prev2 * (1 - prev2) / (d^2)
sample_size2
## [1] 245.8534
# Sample size calculation for 3X prevalence (1 point)
prev3<- prev*3
sample_size3 <- z_score^2 * prev3 * (1 - prev3) / (d^2)
sample_size3
## [1] 322.6825

2.1 What happen to the number of samples when the prevalence increases to 2x and 3x of the original 10% prevalence rate? (1 point) The sample size increased.

  1. Redo your calculations by changing the margin of error to 2.5% and 7.5% while keeping the confidence level at 95%.
# Sample size calculation for a 2.5% margin of error (1 point)
d = 0.025
sample_size2.5d = z_score^2 * prev * (1 - prev) / (d^2)
sample_size2.5d
## [1] 553.1701
# Sample size calculation for a 7.5% margin of error (1 point)
d = 0.075
sample_size7.5d = z_score^2 * prev * (1 - prev) / (d^2)
sample_size7.5d
## [1] 61.46334

3.1 What happen to required sample size when we decrease margin of error? Why? (2 point) When we decrease the margin of error, we have to be more acccurate, so we need to sample more people.

  1. Redo your calculations by changing the confidence levels to 90% and 99% while keeping margin of error at 5%.
# Sample size calculation for a 90% confidence level (1 point)
alpha <- 0.1
d<- 0.05
z_score <- qnorm(1 - alpha / 2)
sample_size90CI = z_score^2 * prev * (1 - prev) / (d^2)
sample_size90CI
## [1] 97.39956
# Sample size calculation for a 99% confidence level (1 point)
alpha <- 0.01
d<- 0.05
z_score <- qnorm(1 - alpha / 2)
sample_size99CI = z_score^2 * prev * (1 - prev) / (d^2)
sample_size99CI
## [1] 238.8563

Report how changing confidence interval affect the number of required sample? (1 point)