download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")
area <- ames$Gr.Liv.Area
price <- ames$SalePrice
summary(area)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 334 1126 1442 1500 1743 5642
hist(area)

Exercise 1. Describe this population distribution.
The near normal distribution is slightly right skewed. The mean is 1500, median is 1442 and the range is between 334 and 5642.
The Unknown Sampling Distribution
samp1 <- sample(area, 50)
summary(samp1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 858 1178 1374 1429 1634 2256
hist(samp1, xlim = c(500, 3000))

Exercise 2. Describe the distribution of this sample. How does it compare to the distribution of the population?
Exercise 3. Take a second sample, also of size 50, and call it samp2. How does the mean of samp2 compare with the mean of samp1? Suppose we took two more samples, one of size 100 and one of size 1000. Which would you think would provide a more accurate estimate of the population mean?
samp2 <- sample(area, 50)
mean(samp2)
## [1] 1603.44
samp3 <- sample(area, 100)
mean(samp3)
## [1] 1535.28
samp4 <- sample(area, 1000)
mean(samp4)
## [1] 1492.363
The larger sample would provide a more accurate estimate of population mean.Therefore sample size of 1000 will be closest to population mean.