Question 1
So far, we have only focused on estimating the mean living area in homes in Ames. Now you'll try to estimate the mean home price. Take a random sample of size 50 from price. Using this sample, what is your best point estimate of the population mean?
mean(sample(price, 50))
## [1] 189814.4
Question 2
Since you have access to the population, simulate the sampling distribution for \(\bar{x}_{price}\) price (sampling mean of price ) by taking 5000 samples from the population of size 50 and computing 5000 sample means. Store these means in a vector called sample_means50 . Plot the data,then describe the shape of this sampling distribution. Based on this sampling distribution, what would you guess the mean home price of the population to be? Finally, calculate and report the population mean.
par(mfrow = c(1, 1))
sample_means50 <- rep(NA, 5000)
for(i in 1:5000){
samp <- sample(price, 50)
sample_means50[i] <- mean(samp)
}
hist(sample_means50)
Question 3
Change your sample size from 50 to 150, then compute the sampling distribution using the same method as above, and store these means in a new vector called sample_means150. Describe the shape of this sampling distribution, and compare it to the sampling distribution for a sample size of 50. Based on this sampling distribution, what would you guess to be the mean sale price of homes in Ames?
sample_means150 <- rep(NA, 5000)
for(i in 1:5000){
samp <- sample(price, 150)
sample_means150[i] <- mean(samp)}
hist(sample_means150)
Compared to the sampling distribution from a sample size of 50, the sampling distribution from a sample size of 150 is more concentrated around the mean. Base on the samples I would guess the population mean would be:
mean(sample_means150)
## [1] 180701.3
Question 4
Of the sampling distributions from 2 and 3, which has a smaller spread? If we're concerned with making estimates that are more often close to the true value, would we prefer a distribution with a large or small spread?
sd(sample_means150) < sd(sample_means50)
## [1] TRUE
Sampling distribution with 150 sample has a smaller spread. If we're concerned with making estimates that are more often close to the true value, we would prefer a distirbution with a smaller spread, as there is less uncertainty.