#### Name:Sonora Williams
#### Section: 01l
#### Date: September 24, 2013
### Exercises
#### Load data:
download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")
area = ames$Gr.Liv.Area
price = ames$SalePrice
hist(area)
set.seed(554985)
samp1 = sample(area, 50)
hist(samp1)
samp2 = sample(area, 50)
hist(samp2)
sample_means50 = rep(0, 5000)
for (i in 1:5000) {
samp = sample(area, 50)
sample_means50[i] = mean(samp)
}
length(sample_means50)
## [1] 5000
summary(sample_means50)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1260 1450 1500 1500 1540 1760
sample_means_small = rep(0, 100)
for (i in 1:100) {
samp = sample(area, 50)
sample_means_small[i] = mean(samp)
}
# enter your UID
population = ames$Gr.Liv.Area
set.seed(554985)
samp = sample(population, 60)
population = ames$Gr.Liv.Area
samp = sample(population, 60)
summary(samp)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 747 1120 1360 1430 1640 2940
hist(samp)
Well, my neighbors left, so my imaginary friends recreated using an adjacent computer will have to do. Doing this again, I got a very different mean, 1421, and histogram. the histogram is still right skewed, but the bin widths are smaller.
samp5 = sample(population, 60)
population = ames$Gr.Liv.Area
samp2 = sample(population, 60)
summary(samp5)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 816 1180 1390 1540 1730 2640
hist(samp5)
The lower limit is 1415.763 and the upper limit is 1669.370, meaing that 95% of the time the mean will fall between this interval.
sample_mean = mean(samp)
se = sd(samp)/sqrt(60)
lower = sample_mean - qnorm(0.975) * se
upper = sample_mean + qnorm(0.975) * se
c(lower, upper)
## [1] 1326 1541
Etiher the population has to already be normal or the sample size has to be very large. The population is already normal, and a sample size of 50 out 0f a population of 50,000 is large enough.
This means that 95% of the time, the mean value will fall within this interval, or more precisely that the interval will contain the actual mean 95% of the time.
My confidence interval does capture the ture mean. and my imaginary friend's interval, 1294-1547, does also capture the true mean.
mean(population)
## [1] 1500
sample_mean = mean(samp5)
se = sd(samp)/sqrt(60)
lower = sample_mean - qnorm(0.975) * se
upper = sample_mean + qnorm(0.975) * se
c(lower, upper)
## [1] 1435 1650
I would expect 95% of the intervals to contain the true mean, because that is the whole purpose of the confidence interval. Well, only 94% of the intervals contained the true mean.
The new interval 1291-1459 does not include the true mean of 1499.
samp_mean = rep(NA, 50)
samp_sd = rep(NA, 50)
n = 60
for (i in 1:50) {
samp = sample(population, n) # obtain a sample of size n = 60 from the population
samp_mean[i] = mean(samp) # save sample mean in ith element of samp_mean
samp_sd[i] = sd(samp)
} # save sample sd in ith element of samp_sd}}
lower = samp_mean - qnorm(0.975) * samp_sd/sqrt(n)
upper = samp_mean + qnorm(0.975) * samp_sd/sqrt(n)
c(lower[1], upper[1])
## [1] 1375 1640
There are a total of fifty intervals and three intervals do not include the true mean. Therefore, 47 out of the total 50 include the ture mean for a percentage of 94%. This is not equal to the confidence level, and that is most likly because theere were not enough samples taken. If this were done for 100 intervals the percentage would probably aproach the confidence level. However, it is pretty close.
plot_ci(lower, upper, mean(population))
samp_mean = rep(NA, 50)
samp_sd = rep(NA, 50)
n = 60
for (i in 1:50) {
samp = sample(population, n) # obtain a sample of size n = 60 from the population
samp_mean[i] = mean(samp) # save sample mean in ith element of samp_mean
samp_sd[i] = sd(samp)
} # save sample sd in ith element of samp_sd}}
lower = samp_mean - qnorm(0.96) * samp_sd/sqrt(n)
upper = samp_mean + qnorm(0.96) * samp_sd/sqrt(n)
c(lower[1], upper[1])
## [1] 1366 1562
plot_ci(lower, upper, mean(population))