library(knitr)
load(url("http://www.openintro.org/stat/data/ames.RData"))
area <- ames$Gr.Liv.Area
price <- ames$SalePrice
summary(area)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 334 1126 1442 1500 1743 5642
hist(area)
The population distribution is skewed to the right and the shape of the distribution is in between the 334 and 5642.
samp1 <- sample(area, 50)
This sample also is right skewed.
mean(samp1)
## [1] 1504.94
samp2
. How does the mean of samp2
compare with the mean of samp1
? Suppose we took two more samples, one of size 100 and one of size 1000. Which would you think would provide a more accurate estimate of the population mean?mu1<-mean(samp1)
samp2<-sample(area,50)
mu2<-mean(samp2)
if(mu1<mu2){
printo<-paste("The mean of samp2 is greater than samp1")
} else if(mu1>mu2){
printo<-paste("The mean of samp2 is less than samp1")
} else {
printo<-paste("The mean of samp2 is equal to samp1")
}
The mean of samp2 is 1358.04 vs. the mean of samp1 is 1504.94. The mean of samp2 is less than samp1.
sample_means50 <- rep(NA, 5000)
for(i in 1:5000){
samp <- sample(area, 50)
sample_means50[i] <- mean(samp)
}
hist(sample_means50)
hist(sample_means50, breaks = 25)
sample_means50
? Describe the sampling distribution, and be sure to specifically note its center. Would you expect the distribution to change if we instead collected 50,000 sample means?
# of elements of sample_means50
: 5000 with a mean of: 1500.17. The distribution looks normal.
sample_means50 <- rep(NA, 5000)
samp <- sample(area, 50)
sample_means50[1] <- mean(samp)
samp <- sample(area, 50)
sample_means50[2] <- mean(samp)
samp <- sample(area, 50)
sample_means50[3] <- mean(samp)
samp <- sample(area, 50)
sample_means50[4] <- mean(samp)
sample_means50 <- rep(NA, 5000)
for(i in 1:5000){
samp <- sample(area, 50)
sample_means50[i] <- mean(samp)
if(i==1){
print("from ")
print(i)
print(" to ")
}
if(i==5000)
print(i)
}
## [1] "from "
## [1] 1
## [1] " to "
## [1] 5000
sample_means_small
. Run a loop that takes a sample of size 50 from area
and stores the sample mean in sample_means_small
, but only iterate from 1 to 100. Print the output to your screen (type sample_means_small
into the console and press enter). How many elements are there in this object called sample_means_small
? What does each element represent?sample_means_small <- rep(NA, 100)
for(i in 1:100){
samp <- sample(area, 50)
sample_means_small[i] <- mean(samp)
if(i==1){
print("from ")
print(i)
print(" to ")
}
if(i==100)
print(i)
}
## [1] "from "
## [1] 1
## [1] " to "
## [1] 100
# of elements of sample_means_small
: 100 with a mean of: 1492.16. Each element represents the mean of the sample of 50, rolled 100 times.
hist(sample_means50)
sample_means10 <- rep(NA, 5000)
sample_means100 <- rep(NA, 5000)
for(i in 1:5000){
samp <- sample(area, 10)
sample_means10[i] <- mean(samp)
samp <- sample(area, 100)
sample_means100[i] <- mean(samp)
}
par(mfrow = c(3, 1))
xlimits <- range(sample_means10)
hist(sample_means10, breaks = 20, xlim = xlimits)
hist(sample_means50, breaks = 20, xlim = xlimits)
hist(sample_means100, breaks = 20, xlim = xlimits)
So far, we have only focused on estimating the mean living area in homes in Ames. Now you’ll try to estimate the mean home price.
price
. Using this sample, what is your best point estimate of the population mean?sample_price50<-sample(price,50)
mean_sample_price50<-mean(sample_price50)
point_estimate_sample_price50<-format(sample_price50,scientific=FALSE)
sample_means50
. Plot the data, then describe the shape of this sampling distribution. Based on this sampling distribution, what would you guess the mean home price of the population to be? Finally, calculate and report the population mean.sample_means50<-rep(NA, 5000)
for(i in 1:5000){
samp<-sample(price, 50)
sample_means50[i]<-mean(samp)
}
hist(sample_means50, breaks=25)
sample_means50
is between 180000 and 181000. mean_sample_means50 = 1.808939610^{5}sample_means50
distribution is normal and at the mean_sample_means50
sample_means150
. Describe the shape of this sampling distribution, and compare it to the sampling distribution for a sample size of 50. Based on this sampling distribution, what would you guess to be the mean sale price of homes in Ames?sample_means150<-rep(NA, 5000)
for(i in 1:5000){
samp<-sample(price, 150)
sample_means150[i]<-mean(samp)
}
sample_means150
is between 180000 and 181000. mean_sample_means150 = 1.807086110^{5}sample_means150
distribution is normal and at the mean_sample_means150
par(mfrow = c(1, 2))
xlimits <- range(sample_means50)
hist(sample_means50, breaks = 20, xlim = xlimits)
hist(sample_means150, breaks = 20, xlim = xlimits)
xlimits50 <- range(sample_means50)
range50diff <- xlimits50[2] - xlimits50[1]
xlimits150 <- range(sample_means150)
range150diff <- xlimits150[2] - xlimits150[1]
sample_means50
is 1.4648710^{5}, 2.332053610^{5}sample_means150
is 1.60778410^{5}, 2.031201310^{5}range150diff
< range50diff