Sample size and the Central Limit Theorem

March 5th, 2023


Professor: Andrés Castaño Zuluaga

Course: Business Statistics

University of Northern Iowa


Lab Goal

We have collected information on the population of houses sold in Cedar Falls between September 2022 and March 2023 using Zillow marketplace data available here. The information doesn’t include townhomes, multi-family, condos, lots, or apartments. We have information on the sale price and features of the house (bedrooms, bathrooms, etc). This dataset is organized in the file CedarFalls_HousesSoldLast6moths.csv

Initially, we are going to load the data into RStudio using the function read.csv as follows:

SoldHouses <- read.csv("CedarFalls_HousesSoldLast6moths.csv")
dim(SoldHouses)
## [1] 185  13
head(SoldHouses, 5) # to show the first 5 rows of the dataset
##   id price_sold_usd Bedrooms Bathrooms sold_month sold_month_text  sold_date
## 1  1         425000        4         4          3           March 03/03/2023
## 2  2         170000        3         1          3           March 03/03/2023
## 3  3         295000        3         3          3           March 03/03/2023
## 4  4         443900        5         3          3           March 03/03/2023
## 5  5         865176        5         4          3           March 03/02/2023
##       street_address        city state zip_code Country
## 1   3623 Pheasant Dr Cedar Falls    IA    50613     USA
## 2     1900 Walnut St Cedar Falls    IA    50613     USA
## 3 5404 Meadowlark Ln Cedar Falls    IA    50613     USA
## 4     2715 Falcon Ln Cedar Falls    IA    50613     USA
## 5   5307 Fernwood Dr Cedar Falls    IA    50613     USA
##                                                                                Property.URL
## 1   https://www.zillow.com/homedetails/3623-Pheasant-Dr-Cedar-Falls-IA-50613/76666933_zpid/
## 2     https://www.zillow.com/homedetails/1900-Walnut-St-Cedar-Falls-IA-50613/76668231_zpid/
## 3 https://www.zillow.com/homedetails/5404-Meadowlark-Ln-Cedar-Falls-IA-50613/76671882_zpid/
## 4     https://www.zillow.com/homedetails/2715-Falcon-Ln-Cedar-Falls-IA-50613/93829357_zpid/
## 5 https://www.zillow.com/homedetails/5307-Fernwood-Dr-Cedar-Falls-IA-50613/2071239917_zpid/
tail(SoldHouses, 5) # to show the last 5 rows of the dataset
##      id price_sold_usd Bedrooms Bathrooms sold_month sold_month_text  sold_date
## 181 181         299000        3         3          9       September 09/08/2022
## 182 182         515000        6         3          9       September 09/08/2022
## 183 183         185000        4         1          9       September 09/08/2022
## 184 184         200000        4         1          9       September 09/07/2022
## 185 185         243700        3         1          9       September 09/06/2022
##         street_address        city state zip_code Country
## 181    1815 Grand Blvd Cedar Falls    IA    50613     USA
## 182 2908 Wellington Dr Cedar Falls    IA    50613     USA
## 183    532 Bonita Blvd Cedar Falls    IA    50613     USA
## 184    1522 Theimer St Cedar Falls    IA    50613     USA
## 185   4114 Heritage Rd Cedar Falls    IA    50613     USA
##                                                                                  Property.URL
## 181    https://www.zillow.com/homedetails/1815-Grand-Blvd-Cedar-Falls-IA-50613/76649890_zpid/
## 182 https://www.zillow.com/homedetails/2908-Wellington-Dr-Cedar-Falls-IA-50613/76669939_zpid/
## 183    https://www.zillow.com/homedetails/532-Bonita-Blvd-Cedar-Falls-IA-50613/76670979_zpid/
## 184    https://www.zillow.com/homedetails/1522-Theimer-St-Cedar-Falls-IA-50613/76666062_zpid/
## 185   https://www.zillow.com/homedetails/4114-Heritage-Rd-Cedar-Falls-IA-50613/76671154_zpid/

Now use the Shiny app below to create sampling distributions of means of price_sold_usds from Samples of sizes 10, 50, and 100. Use 5,000 simulations. Shiny is an open source R package that provides an elegant and powerful web framework for building web applications using R.

  1. What does each observation in the sampling distribution represent?

  2. Effect of increasing the sample size: How does the mean, standard error, and shape of the sampling distribution change as the sample size increases?

  3. Effect of increasing the number of simulations: How (if at all) do these values change if you increase the number of simulations?

Let’s do it together!

Shiny applications not supported in static R Markdown documents