Ans: The distribution is a bit left skew, the mode is between 1000 and 1500, with mean equal to 1500.
Ans: Random sample size is 50 which is greater than 30, that means it is good for approch the population distribution.
samp2
. How does the mean of samp2
compare with the mean of samp1
? Suppose we took two more samples, one of size 100 and one of size 1000. Which would you think would provide a more accurate estimate of the population mean?Ans: If both samp1 and samp2 are good for approach the popluation distribution, their mean would be very close or equal. As the random size increase, sample mean will approch better to the population mean, which will be more accurate. Therefore, size 1000 will be more accurate.
sample_means50
? Describe the sampling distribution, and be sure to specifically note its center. Would you expect the distribution to change if we instead collected 50,000 sample means?Ans: 5000 elements in ‘sample_means50’; the belt curve will more symetric, and not as flat as lesser sampling size.
sample_means_small
. Run a loop that takes a sample of size 50 from area
and stores the sample mean in sample_means_small
, but only iterate from 1 to 100. Print the output to your screen (type sample_means_small
into the console and press enter). How many elements are there in this object called sample_means_small
? What does each element represent?load("more/ames.RData")
area <- ames$Gr.Liv.Area
sample_means_samll <- rep(NA, 100)
for(i in 1:100){
samp <- sample(area, 50)
sample_means_samll[i] <- mean(samp)
}
sample_means_samll
## [1] 1576.54 1530.34 1639.10 1586.96 1452.44 1372.86 1461.60 1501.08
## [9] 1367.06 1405.66 1436.46 1523.86 1536.76 1518.26 1406.92 1395.58
## [17] 1552.64 1502.64 1553.06 1586.54 1363.42 1582.14 1553.90 1561.92
## [25] 1374.02 1453.62 1498.94 1533.52 1492.40 1467.18 1463.08 1422.74
## [33] 1497.06 1562.58 1575.06 1418.30 1438.26 1371.80 1442.28 1620.48
## [41] 1460.62 1506.00 1373.44 1527.00 1459.24 1415.92 1424.34 1468.02
## [49] 1547.00 1439.64 1594.54 1380.02 1415.62 1520.78 1453.10 1548.46
## [57] 1606.18 1421.88 1470.42 1486.72 1415.16 1535.48 1501.98 1447.98
## [65] 1518.68 1444.62 1543.98 1369.02 1521.78 1523.74 1381.36 1545.72
## [73] 1543.84 1571.76 1432.08 1583.34 1468.16 1479.46 1392.12 1416.24
## [81] 1449.58 1446.92 1432.22 1498.42 1560.46 1550.70 1512.68 1561.76
## [89] 1417.50 1444.62 1440.84 1489.82 1439.56 1510.54 1505.96 1516.96
## [97] 1588.94 1445.34 1442.06 1560.48
Ans: There are 100 objects in sample_means_samll.
Ans: The center is close to the middle, the spread between median and mean is smaller.
So far, we have only focused on estimating the mean living area in homes in Ames. Now you’ll try to estimate the mean home price.
price
. Using this sample, what is your best point estimate of the population mean?sample50<-rep(NA,50)
price <- ames$SalePrice
for( i in 1:50){
samp <- sample(price, 50)
sample50[i] <- mean(samp)
}
summary(sample50)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 153756 173638 180819 180004 186624 200692
Ans: The best point estmimation is sample mean 187754.
sample_means50
. Plot the data, then describe the shape of this sampling distribution. Based on this sampling distribution, what would you guess the mean home price of the population to be? Finally, calculate and report the population mean.Ans:
samp1e <- sample(price, 50)
sample_means50 <- rep(NA, 5000)
for(i in 1:5000){
samp <- sample(price, 50)
sample_means50[i] <- mean(samp)
}
hist(sample_means50)
summary(sample_means50)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 138061 172936 180046 180759 188259 227624
Ans: The mean and meddian are very close. The belt curve is symetric. The population mean of the 5000 sample means is 180974.
sample_means150
. Describe the shape of this sampling distribution, and compare it to the sampling distribution for a sample size of 50. Based on this sampling distribution, what would you guess to be the mean sale price of homes in Ames?samp1e <- sample(price, 150)
sample_means150 <- rep(NA, 5000)
for(i in 1:5000){
samp <- sample(price, 150)
sample_means150[i] <- mean(samp)
}
hist(sample_means150)
summary(sample_means150)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 156679 176430 180641 180765 184859 205121
Ans: The distribution of sample_mean150 is almost same as sample_mean50. In sample_mean150, the different of median and mean is 467; in sample_mean50,it is 649.The price sale in Ames is close to 180856.
par(mfrow = c(2, 1))
xlimits <- range(sample_means50)
hist(sample_means50, breaks = 20, xlim = xlimits)
hist(sample_means150, breaks = 20, xlim = xlimits)
Ans: sampling distributions from 3 has samller spread in median and mean, which is close to the true value.
This is a product of OpenIntro that is released under a Creative Commons Attribution-ShareAlike 3.0 Unported. This lab was written for OpenIntro by Andrew Bray and Mine Çetinkaya-Rundel.