Intro to Reference

download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")

area <- ames$Gr.Liv.Area
price <- ames$SalePrice

summary(area)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     334    1126    1442    1500    1743    5642

hist(area)

Exercise 1. Describe this population distribution.

The near normal distribution is slightly right skewed. The mean is 1500, median is 1442 and the range is between 334 and 5642.

The Unknown Sampling Distribution

samp1 <- sample(area, 50)

summary(samp1)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     858    1178    1374    1429    1634    2256

hist(samp1, xlim = c(500, 3000))

Exercise 2. Describe the distribution of this sample. How does it compare to the distribution of the population?

The data distribution of 50 samples follows the population distribution. The mean is 1652, the median is 1537, and the range is between 768 and 3672. The range is not as wide as the population distribution.

mean(samp1)

## [1] 1429.28

Exercise 3. Take a second sample, also of size 50, and call it samp2. How does the mean of samp2 compare with the mean of samp1? Suppose we took two more samples, one of size 100 and one of size 1000. Which would you think would provide a more accurate estimate of the population mean?

samp2 <- sample(area, 50)
mean(samp2)

## [1] 1603.44

samp3 <- sample(area, 100)
mean(samp3)

## [1] 1535.28

samp4 <- sample(area, 1000)
mean(samp4)

## [1] 1492.363

The larger sample would provide a more accurate estimate of population mean.Therefore sample size of 1000 will be closest to population mean.