download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")
population <- ames$Gr.Liv.Area
samp<-sample(population, 60)
summary(samp)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 754 1094 1360 1482 1733 2794
hist(samp)
Exercise 1. The distribution is right skewed. Most observations fall between 1000 and 1500. Typical I understand as the most frequient.
Exercise 2. Another sample is very unlikely to be identical to the first. We randomly pick 60 observations and for these observations to be identical to the origincal sample would be extremly unlikely. However, another sample very likely to be simular to the original one.We pulling from the same population 60 observations which should be enough to give us an idea about the actual population, so both sample should be simular to the original population.
sample_mean <- mean(samp)
se <- sd(samp)/sqrt(60)
lower <- sample_mean-1.96*se
upper <- sample_mean+1.96*se
c(lower,upper)
## [1] 1358.825 1605.008
Exercise 3. Sample size should be over 30 and the sample distribution should not be skewed.
Exercise 4. We are 95% confident that population mean falls into confidence interval.
mean(population)
## [1] 1499.69
Exercise 5. It does. It captures population mean.
Exercise 6. 95% intervals should capture population mean if our assumptions are correct.
samp_mean<-rep(NA,50)
samp_sd <- rep(NA, 50)
n <- 60
for (i in 1:5000){
samp<-sample(population,n)
samp_mean[i]<-mean(samp)
samp_sd[i]<-sd(samp)
}
lower_vector<-samp_mean-1.96*samp_sd/sqrt(n)
upper_vector<-samp_mean+1.96*samp_sd/sqrt(n)
c(lower_vector[1],upper_vector[1])
## [1] 1242.624 1464.143
On my own.
plot_ci(lower_vector, upper_vector, mean(population))
count<-0
for (i in 1:5000){
if (lower_vector[i]>mean(population)||upper_vector[i]<mean(population)){count<-count+1}
}
count/5000
## [1] 0.058
lower_vector1<-samp_mean-1.645*samp_sd/sqrt(n)
upper_vector1<-samp_mean+1.645*samp_sd/sqrt(n)
c(lower_vector1[1],upper_vector1[1])
## [1] 1260.424 1446.342
plot_ci(lower_vector1, upper_vector1, mean(population))
count<-0
for (i in 1:5000){
if (lower_vector1[i]>mean(population)||upper_vector1[i]<mean(population)){count<-count+1}
}
count/5000
## [1] 0.1094