load("more/ames.RData")
area <- ames$Gr.Liv.Area
price <- ames$SalePrice
summary(area)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 334 1126 1442 1500 1743 5642
hist(area)
samp1 <- sample(area, 50)
summary(samp1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 720 1099 1440 1459 1698 2656
hist(samp1)
mean(samp1)
## [1] 1458.8
samp2 <- sample(area, 50)
mean(samp2)
## [1] 1554.3
The mean of samp2 is larger than the mean of samp1 and larger than the mean of entire data set.
I would expect that the larger the sample size, the closer the mean will approximate the mean of the actual population.
sample_means50 <- rep(NA, 5000)
for(i in 1:5000){
samp <- sample(area, 50)
sample_means50[i] <- mean(samp)
}
hist(sample_means50)
hist(sample_means50, breaks = 25)
summary(sample_means50)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1249 1451 1498 1500 1548 1790
sample_means_small <- rep(0, 100)
for (i in 1:100){
samp <- sample(area, 50)
sample_means_small[i] <- mean(samp)
}
sample_means_small
## [1] 1412.76 1598.40 1454.84 1435.26 1579.24 1469.72 1426.70 1445.74
## [9] 1539.02 1349.00 1431.48 1450.40 1582.42 1484.56 1543.88 1548.80
## [17] 1536.40 1510.60 1461.84 1481.30 1532.80 1520.28 1459.18 1596.72
## [25] 1364.86 1489.16 1428.54 1542.40 1522.30 1665.02 1356.34 1394.78
## [33] 1397.44 1511.06 1511.60 1540.66 1593.48 1540.40 1608.34 1413.00
## [41] 1612.56 1395.72 1504.42 1439.64 1507.40 1564.46 1477.38 1525.56
## [49] 1656.90 1488.52 1473.72 1428.52 1535.62 1454.92 1422.34 1570.18
## [57] 1593.02 1561.82 1608.60 1558.30 1642.56 1517.42 1417.44 1422.88
## [65] 1608.62 1597.18 1457.20 1460.48 1471.26 1507.34 1439.46 1403.52
## [73] 1515.78 1521.98 1490.90 1525.92 1584.00 1528.50 1501.14 1479.28
## [81] 1520.84 1472.42 1521.30 1510.28 1481.04 1593.38 1551.02 1403.10
## [89] 1448.04 1376.44 1448.44 1615.32 1455.92 1449.78 1521.78 1563.28
## [97] 1486.62 1548.92 1556.52 1437.34
There are 100 elements in sample_means_small. Each element represents the mean area of a random sample of 50 houses that were sold.
6. When the sample size is larger, the spread becomes narrower and the center becomes more defined.
####Price of Homes
samp_price <- sample(price, 50)
mean(samp_price)
## [1] 155866.5
The mean house price in a sample of 50 homes is $155866.50.
sample_means50 <- rep(NA, 5000)
for (i in 1:5000){
samp <- sample(price, 50)
sample_means50[i] <- mean(samp)
}
hist(sample_means50, breaks=25)
summary(sample_means50)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 144571 172684 180290 180719 188040 226455
The data is unimodal and symmetric. The mean of the sampling distribution is $180719.
mean(price)
## [1] 180796.1
The mean house price in the population is $180796.1.
sample_means150 <- rep(NA, 5000)
for (i in 1:5000){
samp <- sample(price, 150)
sample_means150[i] <- mean(samp)
}
hist(sample_means150, breaks=25)
summary(sample_means150)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 157326 176488 180790 180887 185184 201573
The shape of the sampling distribution is unimodal and symmetric. The sampling distribtion with samples of size 150 has a narrower spread than the sampling distribution with samples of size 50. I would guess the mean sale price of a house in Ames is close to $180887. In order to make estimates that are close to the true value, I would prefer a distribution with a small spread.