download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")
population <- ames$Gr.Liv.Area
samp <- sample(population, 60)

Exercise1

Describe the distribution of your sample. What would you say is the “typical” size within your sample? Also state precisely what you interpreted “typical” to mean.

hist(samp)

The distribution of my sample looks right skewed. The typical size within the sample is around 1500. I will interpreted typical to be the average.

Exercise2

Would you expect another student’s distribution to be identical to yours? Would you expect it to be similar? Why or why not?

No, I would not expect another student’s distribution to be identical to mine because we are drawing a random sample of 60.

sample_mean <- mean(samp)

se <- sd(samp) / sqrt(60)
lower <- sample_mean - 1.96 * se
upper <- sample_mean + 1.96 * se
c(lower, upper)
## [1] 1490.663 1757.837

Exercise 3

For the confidence interval to be valid, the sample mean must be normally distributed and have standard error \(s / \sqrt{n}\). What conditions must be met for this to be true?

The condition must be met for this to be true is that sample must be normal distributed.

Exercise 4

What does “95% confidence” mean? If you’re not sure, see Section 4.2.2.

95% confidence means 95% of the sample size are included .

Exercise 5

Does your confidence interval capture the true average size of houses in Ames? If you are working on this lab in a classroom, does your neighbor’s interval capture this value?

popMean<-mean(population)
lower < popMean && popMean < upper
## [1] TRUE

Yea my confidence interval capture the true average size of houses in Ames. Yes, my neighbor’s interval will capture this value.

Exercise 6

Each student in your class should have gotten a slightly different confidence interval. What proportion of those intervals would you expect to capture the true population mean? Why? If you are working in this lab in a classroom, collect data on the intervals created by other students in the class and calculate the proportion of intervals that capture the true population mean.

95% of the proportion of those intervals will capture the true population mean.

samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60

for(i in 1:50){
  samp <- sample(population, n) # obtain a sample of size n = 60 from the population
  samp_mean[i] <- mean(samp)    # save sample mean in ith element of samp_mean
  samp_sd[i] <- sd(samp)        # save sample sd in ith element of samp_sd
}


lower_vector <- samp_mean - 1.96 * samp_sd / sqrt(n) 
upper_vector <- samp_mean + 1.96 * samp_sd / sqrt(n)

c(lower_vector[1], upper_vector[1])
## [1] 1439.014 1761.119

On Your Own

1)Using the following function (which was downloaded with the data set), plot all intervals. What proportion of your confidence intervals include the true population mean? Is this proportion exactly equal to the confidence level? If not, explain why.

A 95% confidence means that the population mean will fall within the interval 95% of the time.

plot_ci(lower_vector, upper_vector, mean(population))

2)Pick a confidence level of your choosing, provided it is not 95%. What is the appropriate critical value?

I picked a confidence level of 68%.

lower_vector1 <- samp_mean - 1 * samp_sd / sqrt(n) 
upper_vector1 <- samp_mean + 1* samp_sd / sqrt(n)

c(lower_vector1[1], upper_vector1[1])
## [1] 1517.897 1682.236

3)Calculate 50 confidence intervals at the confidence level you chose in the previous question. You do not need to obtain new samples, simply calculate new intervals based on the sample means and standard deviations you have already collected. Using the plot_ci function, plot all intervals and calculate the proportion of intervals that include the true population mean. How does this percentage compare to the confidence level selected for the intervals?

samp_mean <- rep(NA, 50)
samp_sd <- rep(NA, 50)
n <- 60

for(i in 1:50){
  samp <- sample(population, n) # obtain a sample of size n = 60 from the population
  samp_mean[i] <- mean(samp)    # save sample mean in ith element of samp_mean
  samp_sd[i] <- sd(samp)        # save sample sd in ith element of samp_sd
}


lower_vector1 <- samp_mean - 1 * samp_sd / sqrt(n) 
upper_vector1 <- samp_mean + 1 * samp_sd / sqrt(n)

c(lower_vector1, upper_vector1)
##   [1] 1430.694 1357.474 1446.951 1385.252 1400.624 1409.875 1412.478
##   [8] 1537.998 1468.532 1380.785 1281.550 1498.989 1519.539 1434.525
##  [15] 1477.451 1429.270 1397.458 1479.359 1368.436 1387.773 1304.207
##  [22] 1370.785 1427.109 1433.006 1315.956 1415.217 1455.777 1537.019
##  [29] 1510.118 1516.126 1547.685 1287.637 1503.214 1472.473 1550.229
##  [36] 1317.295 1420.327 1420.878 1478.773 1384.091 1390.566 1432.451
##  [43] 1452.838 1450.586 1445.804 1438.761 1415.537 1368.924 1381.923
##  [50] 1437.495 1538.939 1482.926 1571.349 1533.648 1500.243 1547.258
##  [57] 1569.122 1667.769 1598.802 1529.148 1397.783 1674.544 1640.895
##  [64] 1574.808 1620.483 1537.597 1523.109 1616.907 1529.330 1501.560
##  [71] 1408.626 1491.715 1564.991 1539.394 1433.978 1510.483 1582.689
##  [78] 1666.615 1629.115 1649.240 1664.281 1384.896 1613.186 1611.227
##  [85] 1697.671 1422.639 1546.873 1564.055 1617.493 1510.976 1517.034
##  [92] 1565.015 1643.662 1579.147 1565.430 1561.939 1556.296 1459.042
##  [99] 1490.411 1563.139
plot_ci(lower_vector1, upper_vector1, mean(population))

(50-15)/50
## [1] 0.7

70% is pretty close to the chosen 68% confidence interval .