load("more/ames.RData")area <- ames$Gr.Liv.Area
price <- ames$SalePrice
hist(area)
The population distribution appears to be rigt-skewed and unimodal
samp1 <- sample(area, 50)
hist(samp1)summary(samp1)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 540 1302 1500 1498 1679 4476
Since we are considering the sample data, the distribution will vary, however the given sample size 50% compared to previous one and it will reach close previous mean if we increase the sampling size
samp2. How does the mean of samp2 compare with the mean of samp1? Suppose we took two more samples, one of size 100 and one of size 1000. Which would you think would provide a more accurate estimate of the population mean?sample_means50 <- rep(NA, 5000)
for(i in 1:5000){
samp <- sample(area, 50)
sample_means50[i] <- mean(samp)
}
hist(sample_means50)hist(sample_means50, breaks = 25)summary(sample_means50)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1220 1450 1498 1498 1544 1862
Increasing the sample size will approach closer the population mean, so 1000 is correct candidate
sample_means50? Describe the sampling distribution, and be sure to specifically note its center. Would you expect the distribution to change if we instead collected 50,000 sample means?sample_means50 <- rep(NA, 5000)
samp <- sample(area, 50)
sample_means50[1] <- mean(samp)
samp <- sample(area, 50)
sample_means50[2] <- mean(samp)
samp <- sample(area, 50)
sample_means50[3] <- mean(samp)
samp <- sample(area, 50)
sample_means50[4] <- mean(samp)
sample_means50 <- rep(NA, 5000)
for(i in 1:5000){
samp <- sample(area, 50)
sample_means50[i] <- mean(samp)
# print(i)
}Increasing the sample size to 5000 is close to normal poluation and center will be almost simliar to center of population
sample_means_small. Run a loop that takes a sample of size 50 from area and stores the sample mean in sample_means_small, but only iterate from 1 to 100. Print the output to your screen (type sample_means_small into the console and press enter). How many elements are there in this object called sample_means_small? What does each element represent?hist(sample_means50)sample_means10 <- rep(NA, 5000)
sample_means100 <- rep(NA, 5000)
for(i in 1:5000){
samp <- sample(area, 10)
sample_means10[i] <- mean(samp)
samp <- sample(area, 100)
sample_means100[i] <- mean(samp)
}
par(mfrow = c(3, 1))
xlimits <- range(sample_means10)
hist(sample_means10, breaks = 20, xlim = xlimits)
hist(sample_means50, breaks = 20, xlim = xlimits)
hist(sample_means100, breaks = 20, xlim = xlimits) 100 elements in sample_means_small and each element represents the mean of random n=50 sample area.
If the sample size is larger, the center will be closer, the spread will be shorten
price. Using this sample, what is your best point estimate of the population mean?price1 <- sample(price,50)
summary(price1)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 84500 130125 163900 187648 221500 377426
The best point estimate of the population mean is around 190000
sample_means50. Plot the data, then describe the shape of this sampling distribution. Based on this sampling distribution, what would you guess the mean home price of the population to be? Finally, calculate and report the population mean.samp1e <- sample(price, 50)
sample_means50 <- rep(NA, 5000)
for(i in 1:5000){
samp <- sample(price, 50)
sample_means50[i] <- mean(samp)
}
hist(sample_means50)summary(sample_means50)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 148828 173015 180147 180801 187958 227868
The home price of the population could be ~ $180,544
sample_means150. Describe the shape of this sampling distribution, and compare it to the sampling distribution for a sample size of 50. Based on this sampling distribution, what would you guess to be the mean sale price of homes in Ames?samp1e <- sample(price, 150)
sample_means150 <- rep(NA, 5000)
for(i in 1:5000){
samp <- sample(price, 150)
sample_means150[i] <- mean(samp)
}
hist(sample_means150)summary(sample_means150)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 160502 176257 180628 180659 184893 206507
The home price of the population could be ~ $180,890
Sampling distributions from 3 is smllar spread,would prefer a smaller spread if we need more closer value to the true value.