The population distribution is unimodal with a right skew, mean 1500 and range 334-5642 and an IQR of 617.
The distribution of this sample has a similar right skew when compared to the the population distribution
samp2
. How does the mean of samp2
compare with the mean of samp1
? Suppose we took two more samples, one of size 100 and one of size 1000. Which would you think would provide a more accurate estimate of the population mean?The mean of samp2 differs slightly from the mean of samp1. in this example differing by ~31. If we were to take two more samples of size 100 and 1000, the sample of size 1000 would provide the more accurate estimate of population mean
## [1] 1551
sample_means50
? Describe the sampling distribution, and be sure to specifically note its center. Would you expect the distribution to change if we instead collected 50,000 sample means?There are 5000 elements in the vector sample_means50. The sampling distribution is symmetric, with a center at approximately 1500. If we collected 50,000 sample means, I would expect the curve to become more normal and with its center even closer to 1500
sample_means_small
. Run a loop that takes a sample of size 50 from area
and stores the sample mean in sample_means_small
, but only iterate from 1 to 100. Print the output to your screen (type sample_means_small
into the console and press enter). How many elements are there in this object called sample_means_small
? What does each element represent?There are 100 elements, each representing the mean of a sample of size 50 taken from the area vector.
sample_means_small <- rep(NA,100)
for(i in 1:100) {
samp <- sample(area, 50)
sample_means_small[i] <- mean(samp)
}
sample_means_small
## [1] 1494.20 1457.80 1450.32 1595.92 1445.08 1507.30 1576.04 1511.20
## [9] 1600.64 1480.92 1615.58 1448.68 1634.40 1510.34 1543.14 1553.00
## [17] 1469.40 1588.38 1527.68 1396.00 1496.40 1456.40 1529.42 1430.12
## [25] 1551.28 1457.06 1511.80 1455.44 1421.66 1431.70 1524.02 1452.08
## [33] 1499.16 1477.80 1555.28 1457.50 1599.56 1580.64 1399.98 1531.68
## [41] 1451.94 1400.24 1486.66 1482.30 1541.24 1589.60 1277.54 1482.16
## [49] 1434.88 1427.08 1428.72 1447.70 1408.84 1467.24 1577.72 1463.14
## [57] 1578.74 1549.44 1419.80 1515.28 1589.06 1661.50 1505.08 1543.56
## [65] 1484.86 1458.74 1446.86 1501.44 1518.72 1480.20 1367.72 1581.74
## [73] 1442.40 1555.32 1396.80 1487.60 1579.76 1564.84 1386.80 1502.04
## [81] 1494.26 1602.02 1486.10 1455.00 1550.58 1458.50 1314.08 1439.14
## [89] 1409.88 1496.30 1542.20 1497.86 1564.88 1492.32 1551.72 1522.56
## [97] 1704.32 1528.88 1548.28 1546.46
When the sample size is larger, the center gets closer to the population mean and the spread gets reduced.
price
. Using this sample, what is your best point estimate of the population mean?## [1] 198427.9
sample_means50
. Plot the data, then describe the shape of this sampling distribution. Based on this sampling distribution, what would you guess the mean home price of the population to be? Finally, calculate and report the population mean.Based on the sampling distribution I would guess the mean home price of the population to be approximately 180000
set.seed(1)
sample_means50 <- rep(NA, 5000)
for(i in 1:5000){
samp <- sample(price, 50)
sample_means50[i] <- mean(samp)
}
hist(sample_means50, breaks=50)
## [1] 180796.1
sample_means150
. Describe the shape of this sampling distribution, and compare it to the sampling distribution for a sample size of 50. Based on this sampling distribution, what would you guess to be the mean sale price of homes in Ames?Based on this sampling distribution I would guess the mean sale price to be slightly above 180000
set.seed(2534)
sample_means150 <- rep(NA, 5000)
for(i in 1:5000){
samp <- sample(price, 150)
sample_means150[i] <- mean(samp)
}
hist(sample_means150, breaks=50)
## [1] 180796.1
The second distribution has a smaller spread. We would prefer a distribution with a smaller spread