A nonprofit wants to understand the fraction of households that have elevated levels of lead in their drinking water. They expect at least 5% of homes will have elevated levels of lead, but not more than about 30%. They randomly sample 800 homes and work with the owners to retrieve water samples, and they compute the fraction of these homes with elevated lead levels. They repeat this 1,000 times and build a distribution of sample proportions.
(a) What is this distribution called?
This is called sampling distribution.
# sample size and number of samples
n <- 800
reps <- 10000
# perform random sampling, 800 x 10000 sample matrix
samples <- replicate(reps, rnorm(n))
# compute sample means
sample.dist <- colMeans(samples)
hist(sample.dist, ylim = c(0, 15), col = "blue" , freq = F, breaks = 20)
curve(dnorm(x, sd = 1/sqrt(n)), col = "red", lwd = "3", add = T)
(b) Would you expect the shape of this distribution to be symmetric, right skewed, or left skewed? Explain your reasoning.
In order for the distribution to be symmetric or normal, np and n(1 − p) must be at least 10.
n <- 800
p <- 0.05
n*p
## [1] 40
n *(1 - p)
## [1] 760
We have np and n*(1-p) greater than 10. Therefore, the success-failure condition would be satisfied and the sampling distribution would be symmetric.
(c) If the proportions are distributed around 8%, what is the variability of the distribution?
n1 <- 800
p1<- 0.08
sqrt((p1*(1-p1))/n)
## [1] 0.009591663
(d) What is the formal name of the value you computed in (c)?
Standard error.
(e) Suppose the researchers’ budget is reduced, and they are only able to collect 250 observations per sample, but they can still collect 1,000 samples. They build a new distribution of sample proportions. How will the variability of this new distribution compare to the variability of the distribution when each sample contained 800 observations?
The variability of the new distribution will be smaller due to the larger sample size.