The Data

download.file("http://www.openintro.org/stat/data/ames.RData", destfile = "ames.RData")
load("ames.RData")

Exercise 1: Describe this population distribution.

head(ames)
##   Order       PID MS.SubClass MS.Zoning Lot.Frontage Lot.Area Street Alley
## 1     1 526301100          20        RL          141    31770   Pave  <NA>
## 2     2 526350040          20        RH           80    11622   Pave  <NA>
## 3     3 526351010          20        RL           81    14267   Pave  <NA>
## 4     4 526353030          20        RL           93    11160   Pave  <NA>
## 5     5 527105010          60        RL           74    13830   Pave  <NA>
## 6     6 527105030          60        RL           78     9978   Pave  <NA>
##   Lot.Shape Land.Contour Utilities Lot.Config Land.Slope Neighborhood
## 1       IR1          Lvl    AllPub     Corner        Gtl        NAmes
## 2       Reg          Lvl    AllPub     Inside        Gtl        NAmes
## 3       IR1          Lvl    AllPub     Corner        Gtl        NAmes
## 4       Reg          Lvl    AllPub     Corner        Gtl        NAmes
## 5       IR1          Lvl    AllPub     Inside        Gtl      Gilbert
## 6       IR1          Lvl    AllPub     Inside        Gtl      Gilbert
##   Condition.1 Condition.2 Bldg.Type House.Style Overall.Qual Overall.Cond
## 1        Norm        Norm      1Fam      1Story            6            5
## 2       Feedr        Norm      1Fam      1Story            5            6
## 3        Norm        Norm      1Fam      1Story            6            6
## 4        Norm        Norm      1Fam      1Story            7            5
## 5        Norm        Norm      1Fam      2Story            5            5
## 6        Norm        Norm      1Fam      2Story            6            6
##   Year.Built Year.Remod.Add Roof.Style Roof.Matl Exterior.1st Exterior.2nd
## 1       1960           1960        Hip   CompShg      BrkFace      Plywood
## 2       1961           1961      Gable   CompShg      VinylSd      VinylSd
## 3       1958           1958        Hip   CompShg      Wd Sdng      Wd Sdng
## 4       1968           1968        Hip   CompShg      BrkFace      BrkFace
## 5       1997           1998      Gable   CompShg      VinylSd      VinylSd
## 6       1998           1998      Gable   CompShg      VinylSd      VinylSd
##   Mas.Vnr.Type Mas.Vnr.Area Exter.Qual Exter.Cond Foundation Bsmt.Qual
## 1        Stone          112         TA         TA     CBlock        TA
## 2         None            0         TA         TA     CBlock        TA
## 3      BrkFace          108         TA         TA     CBlock        TA
## 4         None            0         Gd         TA     CBlock        TA
## 5         None            0         TA         TA      PConc        Gd
## 6      BrkFace           20         TA         TA      PConc        TA
##   Bsmt.Cond Bsmt.Exposure BsmtFin.Type.1 BsmtFin.SF.1 BsmtFin.Type.2
## 1        Gd            Gd            BLQ          639            Unf
## 2        TA            No            Rec          468            LwQ
## 3        TA            No            ALQ          923            Unf
## 4        TA            No            ALQ         1065            Unf
## 5        TA            No            GLQ          791            Unf
## 6        TA            No            GLQ          602            Unf
##   BsmtFin.SF.2 Bsmt.Unf.SF Total.Bsmt.SF Heating Heating.QC Central.Air
## 1            0         441          1080    GasA         Fa           Y
## 2          144         270           882    GasA         TA           Y
## 3            0         406          1329    GasA         TA           Y
## 4            0        1045          2110    GasA         Ex           Y
## 5            0         137           928    GasA         Gd           Y
## 6            0         324           926    GasA         Ex           Y
##   Electrical X1st.Flr.SF X2nd.Flr.SF Low.Qual.Fin.SF Gr.Liv.Area
## 1      SBrkr        1656           0               0        1656
## 2      SBrkr         896           0               0         896
## 3      SBrkr        1329           0               0        1329
## 4      SBrkr        2110           0               0        2110
## 5      SBrkr         928         701               0        1629
## 6      SBrkr         926         678               0        1604
##   Bsmt.Full.Bath Bsmt.Half.Bath Full.Bath Half.Bath Bedroom.AbvGr
## 1              1              0         1         0             3
## 2              0              0         1         0             2
## 3              0              0         1         1             3
## 4              1              0         2         1             3
## 5              0              0         2         1             3
## 6              0              0         2         1             3
##   Kitchen.AbvGr Kitchen.Qual TotRms.AbvGrd Functional Fireplaces
## 1             1           TA             7        Typ          2
## 2             1           TA             5        Typ          0
## 3             1           Gd             6        Typ          0
## 4             1           Ex             8        Typ          2
## 5             1           TA             6        Typ          1
## 6             1           Gd             7        Typ          1
##   Fireplace.Qu Garage.Type Garage.Yr.Blt Garage.Finish Garage.Cars
## 1           Gd      Attchd          1960           Fin           2
## 2         <NA>      Attchd          1961           Unf           1
## 3         <NA>      Attchd          1958           Unf           1
## 4           TA      Attchd          1968           Fin           2
## 5           TA      Attchd          1997           Fin           2
## 6           Gd      Attchd          1998           Fin           2
##   Garage.Area Garage.Qual Garage.Cond Paved.Drive Wood.Deck.SF
## 1         528          TA          TA           P          210
## 2         730          TA          TA           Y          140
## 3         312          TA          TA           Y          393
## 4         522          TA          TA           Y            0
## 5         482          TA          TA           Y          212
## 6         470          TA          TA           Y          360
##   Open.Porch.SF Enclosed.Porch X3Ssn.Porch Screen.Porch Pool.Area Pool.QC
## 1            62              0           0            0         0    <NA>
## 2             0              0           0          120         0    <NA>
## 3            36              0           0            0         0    <NA>
## 4             0              0           0            0         0    <NA>
## 5            34              0           0            0         0    <NA>
## 6            36              0           0            0         0    <NA>
##   Fence Misc.Feature Misc.Val Mo.Sold Yr.Sold Sale.Type Sale.Condition
## 1  <NA>         <NA>        0       5    2010       WD          Normal
## 2 MnPrv         <NA>        0       6    2010       WD          Normal
## 3  <NA>         Gar2    12500       6    2010       WD          Normal
## 4  <NA>         <NA>        0       4    2010       WD          Normal
## 5 MnPrv         <NA>        0       3    2010       WD          Normal
## 6  <NA>         <NA>        0       6    2010       WD          Normal
##   SalePrice
## 1    215000
## 2    105000
## 3    172000
## 4    244000
## 5    189900
## 6    195500
area <- ames$Gr.Liv.Area
price <- ames$SalePrice

summary(area)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     334    1126    1442    1500    1743    5642
hist(area)

The distribution for this population is right skewed, with most of the data being between 1000-2000.

Exercise 2: Describe the distribution of this sample. How does it compare to the distribution of the population?

samp1 <- sample(area,50)

par(mfrow = c(1,2))
hist(area)
hist(samp1, xlim = c(0,6000))

The distribution the population is right skewed whereas the distribution of the sample data resembles a normal distribution.

mean(samp1)
## [1] 1559.4

Exercise 2: Take a second sample, also of size 50, and call it samp2. How does the mean of samp2 compare with the mean of samp1? Suppose we took two more samples, one of size 100 and one of size 1000. Which would you think would provide a more accurate estimate of the population mean?

samp2 <- sample(area,50)
mean (samp2)
## [1] 1534.1

The mean of samp1 is only off by 18 whereas the mean of samp2 is off by 95. The mean of a sample size of 1000 would probably provide the most accurate estimate of the population mean.

samp3 <- sample(area,100)
mean(samp3)
## [1] 1551.48
samp4 <- sample(area,1000)
mean(samp4)
## [1] 1479.221
sample_means50 <- rep(NA, 5000)

for(i in 1:5000){
   samp <- sample(area, 50)
   sample_means50[i] <- mean(samp)
   }

hist(sample_means50, breaks = 25)

Exercise 4: How many elements are there in sample_means50? Describe the sampling distribution, and be sure to specifically note its center. Would you expect the distribution to change if we instead collected 50,000 sample means?

There are 5000 elements in sample_means50, with each element representing a sample of 50 means. The sampling distribution resembles a normal distribution. The center is around 1500 which is fairly close to the population mean of 1499. If 50000 sample means were collected instead of 5000, I would expect the distribution to fit a normal distribution even better and for the center to be closer to the population mean.

sample_means50 <- rep(NA, 50000)

for(i in 1:50000){
   samp <- sample(area, 50)
   sample_means50[i] <- mean(samp)
   }

hist(sample_means50, breaks = 25)