Introduction: The US Geological Survey estimates the 29% of the Earth’s surface is covered with land. We can use this information and the binomial distribution to simulate our globe tossing excercise.
# Tossing the globe
# Tossing the globe is just another example of a binomial experiment. So
# we can set up a function (similar to what we did when studying the
# sampling distribution for a proportion) that does the following:
#
# 1. Tosses the globe "n" times and calculates the proportion
# of times we hit land.
# 2. Calculates the standard error
# 3. Calculates a confidence interval for a given value of "z"
# (i.e., for a given confidence level).
# Here's our function:
globe_tossing_ci <- function(n, z, pie = .29){
# Arguments:
# n = the sample size
# z = the standard normal variate for a given confidence level
# pie = the population proportion (or probability)
# Toss the globe "n" times and calculate proportion of land
p <- rbinom(1, n, prob = pie)/n
# Calculate the standard error
se <- sqrt((p*(1-p))/n)
# Calculate the confidence interval for a given "z"
low <- p - z*se
up <- p + z*se
# Return the "point" estimate and confidence interval
return(c(p, low, up))
}
Next, we can use the “replicate” function to repeatedly sample our globe_tossing_ci to get many different confidence intervals.
sims <- 100
cis <- as.data.frame(t(replicate(sims, globe_tossing_ci(n = 10000, z = 1.96))))
names(cis) <- c("p", "low", "up")
We have everything we need to calculate the “coverage” associated with our confidence interval:
cis$coverage <- ifelse(cis$low <= 0.29 & cis$up >= 0.29, 1, 0)
mean(cis$coverage)
## [1] 0.97
This last step is not necessary, but we can visualize coverage using ggplot2 with the following:
library(ggplot2)
cis$labels <- rownames(cis)
plt <- ggplot(cis, aes(x = labels, y = p)) +
geom_hline(yintercept = 0.29, color = "red") +
geom_pointrange(aes(ymax = up, ymin = low, color = coverage), size = 1, shape = 32) +
geom_point(aes(color = coverage)) +
scale_y_continuous(limits = c(.21,.38)) +
ylab("Proportion Land\n") +
xlab("") +
coord_flip() +
theme_bw() +
theme(
axis.text.y = element_blank(),
legend.position="none"
)
plt
Assume that we are interested in the proportion of lectures taking part in the upcomming strike. Assume further that we take a random sample of = 150 lectures and estimate the proportion participating to be p = 0.33 . Construct the 95% confience interval using the following formula: p +/- z * sqrt(p(1-p)/n)
Step 1: we start by calculating the standard error
#sqrt(p(1-p)/n)
se <- sqrt(0.33*(1 - 0.33)/150)
Step 2: plug in the standard error now the formula is: p +/- z * se, se is standard error The standard error of the sample mean depends on both the standard deviation and the sample size, 0.33 +/- z * se what is the value of z at 95% confidence interval? look in the table with z values, for 95% confidence interval, value of z = 1.96
0.33 - 1.96 * se
## [1] 0.2547503
0.33 + 1.96 * se
## [1] 0.4052497
[0.2547503, 0.4052497]
now, it’s your turn for “confidence levels” of 90 what is the value of z at 90% confidence interval? then replace in the formula
0.33 - 1.645 * se
## [1] 0.266844
0.33 + 1.645 * se
## [1] 0.393156
for “confidence levels” of 99 what is the value of z at 99% confidence interval? then replace in the formula
0.33 - 2.58 * se
## [1] 0.2309468
0.33 + 2.58 * se
## [1] 0.4290532
This was “by hand”. But we actually have functions in R that will calculate the confidence intervals: prop.test()
prop.test(x=50, n=150)
##
## 1-sample proportions test with continuity correction
##
## data: 50 out of 150, null probability 0.5
## X-squared = 16.007, df = 1, p-value = 6.312e-05
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.2598209 0.4155318
## sample estimates:
## p
## 0.3333333
Note: conf.level default is set at .95
Assume we take a randomm sample of 1,000 students and measure how many hours of sleep they get on average over the course of a week. We find a mean of 8 hours and a standard deviation 2 hours. What is the 90% confidence interval for the mean hours slept?
set.seed(5678)
sleep <- rnorm(1000, mean = 8, sd = 2)
Calculate the sample mean. Note that it is not exactly 8
x_bar = mean(sleep)
x_bar
## [1] 8.008523
formula x bar +/- t * (s/sqrt(n)) Step 1: calculate the standard error
# s/sqrt(n)
se <- sd(sleep)/(sqrt(1000))
x_bar
## [1] 8.008523
# formula is: x_bar +/- t * se
find the t value for a 90% confidence interval we do this using “qt” function
qt(c(.05, .95), df=999)
## [1] -1.64638 1.64638
we replace everything we know in the formula,
x_bar - 1.64638 * se
## [1] 7.903698
x_bar + 1.64638 * se
## [1] 8.113348
# [ 7.883581, 8.133466]
your turn, calculate the following: for “confidence levels” of 95
qt(c(.025, .975), df=999)
## [1] -1.962341 1.962341
x_bar - 1.96 * se
## [1] 7.88373
x_bar + 1.96 * se
## [1] 8.133317
for “confidence levels” of 99
qt(c(.005, .995), df=999)
## [1] -2.58076 2.58076
x_bar - 2.58 * se
## [1] 7.844255
x_bar + 2.58 * se
## [1] 8.172792
This was “by hand”. But we actually have functions in R that will calculate the confidence intervals for us: t.test
t.test(sleep, conf.level = 0.95)
##
## One Sample t-test
##
## data: sleep
## t = 125.78, df = 999, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 7.883581 8.133466
## sample estimates:
## mean of x
## 8.008523
[7.903698 8.113348]
Assume that we take a random sample of 30 students and ask them how many units of alcohol they drink in a typical week. We estimate a sample mean (x bar) of 15.27 and we know that the standard deviation (s) is 6 CALCULATE “BY HAND” in R first, then using prop.test() or t.test()
Step 1: calculate the standard error
Step 2: find the value associated with a 95% confidence interval and n = 30:
double check with the appropriate function
Theresa May. Assume that we take a random sample of 100 potential voters in the U.K. and ask them to rate their feelings toward Theresa May using a “feeling thermometer” that ranges from 0 (extreme negative feelings) to 100 (extreme positive feelings). Assume further that x bar = 45 and we know s = 15. Please construct the 95% confidence interval for feelings towards May. CALCULATE “BY HAND” in R first, then using prop.test() or t.test()
Step 1: calculate the standard error
Step 2: find the value associated with a 95% confidence interval and n = 100:
double check with the appropriate function
The end