R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

This exercise is under construction. Please report any errors at https://forms.gle/2W4tffs4YJA1jeBv9

Goal: Understand and experience outlier detection techniques Law in action.

Background: The data for this question has been adapted from https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data. Please review information at https://www.kaggle.com/c/house-prices-advanced-regression-techniques/overview before you get started.

Before starting: 1. You are not allowed to search for solutions to this assignment. 2. You are allowed to search information about packages and functions that can help you.

Individual assignment only: 70 total points (Rmd and html solution) Team assignment: 20 points (written analysis)

[1 point] Q1.

Start by entering your name and today’s date in Lines 3 and 4, respectively, to indicate your compliance with the Fuqua Honor Code. Then, run the chunk of code below by clicking on the green arrow (that points to the right) on the top right of the chunk. Tip: I numbered code chunks corresponding to their numbers. Chunk 1 specified the knitting parameters.

[4 points] Q2.

Read and store the data from the file PricesBefore2009.csv into a variable called before2009. Tip: Then, inspect the data. Rubric: 1 each point for reading and storing; 1 points each for using 2 R commands for inspecting. Tip: I recommend using the read_csv() function from the tidyverse package to do this for this and all subsequent assignments.

rm(list =ls())
ls()
## character(0)
before2009 <- read.csv("PricesBefore2009.csv")
head(before2009,20)
##     X Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape
## 1   1  1         60       RL          65    8450   Pave  <NA>      Reg
## 2   2  2         20       RL          80    9600   Pave  <NA>      Reg
## 3   3  3         60       RL          68   11250   Pave  <NA>      IR1
## 4   4  4         70       RL          60    9550   Pave  <NA>      IR1
## 5   5  5         60       RL          84   14260   Pave  <NA>      IR1
## 6   6  7         20       RL          75   10084   Pave  <NA>      Reg
## 7   7  9         50       RM          51    6120   Pave  <NA>      Reg
## 8   8 10        190       RL          50    7420   Pave  <NA>      Reg
## 9   9 11         20       RL          70   11200   Pave  <NA>      Reg
## 10 10 12         60       RL          85   11924   Pave  <NA>      IR1
## 11 11 13         20       RL          NA   12968   Pave  <NA>      IR2
## 12 12 14         20       RL          91   10652   Pave  <NA>      IR1
## 13 13 15         20       RL          NA   10920   Pave  <NA>      IR1
## 14 14 16         45       RM          51    6120   Pave  <NA>      Reg
## 15 15 18         90       RL          72   10791   Pave  <NA>      Reg
## 16 16 19         20       RL          66   13695   Pave  <NA>      Reg
## 17 17 21         60       RL         101   14215   Pave  <NA>      IR1
## 18 18 22         45       RM          57    7449   Pave  Grvl      Reg
## 19 19 23         20       RL          75    9742   Pave  <NA>      Reg
## 20 20 24        120       RM          44    4224   Pave  <NA>      Reg
##    LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2
## 1          Lvl    AllPub    Inside       Gtl      CollgCr       Norm       Norm
## 2          Lvl    AllPub       FR2       Gtl      Veenker      Feedr       Norm
## 3          Lvl    AllPub    Inside       Gtl      CollgCr       Norm       Norm
## 4          Lvl    AllPub    Corner       Gtl      Crawfor       Norm       Norm
## 5          Lvl    AllPub       FR2       Gtl      NoRidge       Norm       Norm
## 6          Lvl    AllPub    Inside       Gtl      Somerst       Norm       Norm
## 7          Lvl    AllPub    Inside       Gtl      OldTown     Artery       Norm
## 8          Lvl    AllPub    Corner       Gtl      BrkSide     Artery     Artery
## 9          Lvl    AllPub    Inside       Gtl       Sawyer       Norm       Norm
## 10         Lvl    AllPub    Inside       Gtl      NridgHt       Norm       Norm
## 11         Lvl    AllPub    Inside       Gtl       Sawyer       Norm       Norm
## 12         Lvl    AllPub    Inside       Gtl      CollgCr       Norm       Norm
## 13         Lvl    AllPub    Corner       Gtl        NAmes       Norm       Norm
## 14         Lvl    AllPub    Corner       Gtl      BrkSide       Norm       Norm
## 15         Lvl    AllPub    Inside       Gtl       Sawyer       Norm       Norm
## 16         Lvl    AllPub    Inside       Gtl      SawyerW       RRAe       Norm
## 17         Lvl    AllPub    Corner       Gtl      NridgHt       Norm       Norm
## 18         Bnk    AllPub    Inside       Gtl       IDOTRR       Norm       Norm
## 19         Lvl    AllPub    Inside       Gtl      CollgCr       Norm       Norm
## 20         Lvl    AllPub    Inside       Gtl      MeadowV       Norm       Norm
##    BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle
## 1      1Fam     2Story           7           5      2003         2003     Gable
## 2      1Fam     1Story           6           8      1976         1976     Gable
## 3      1Fam     2Story           7           5      2001         2002     Gable
## 4      1Fam     2Story           7           5      1915         1970     Gable
## 5      1Fam     2Story           8           5      2000         2000     Gable
## 6      1Fam     1Story           8           5      2004         2005     Gable
## 7      1Fam     1.5Fin           7           5      1931         1950     Gable
## 8    2fmCon     1.5Unf           5           6      1939         1950     Gable
## 9      1Fam     1Story           5           5      1965         1965       Hip
## 10     1Fam     2Story           9           5      2005         2006       Hip
## 11     1Fam     1Story           5           6      1962         1962       Hip
## 12     1Fam     1Story           7           5      2006         2007     Gable
## 13     1Fam     1Story           6           5      1960         1960       Hip
## 14     1Fam     1.5Unf           7           8      1929         2001     Gable
## 15   Duplex     1Story           4           5      1967         1967     Gable
## 16     1Fam     1Story           5           5      2004         2004     Gable
## 17     1Fam     2Story           8           5      2005         2006     Gable
## 18     1Fam     1.5Unf           7           7      1930         1950     Gable
## 19     1Fam     1Story           8           5      2002         2002       Hip
## 20   TwnhsE     1Story           5           7      1976         1976     Gable
##    RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond
## 1   CompShg     VinylSd     VinylSd    BrkFace        196        Gd        TA
## 2   CompShg     MetalSd     MetalSd       None          0        TA        TA
## 3   CompShg     VinylSd     VinylSd    BrkFace        162        Gd        TA
## 4   CompShg     Wd Sdng     Wd Shng       None          0        TA        TA
## 5   CompShg     VinylSd     VinylSd    BrkFace        350        Gd        TA
## 6   CompShg     VinylSd     VinylSd      Stone        186        Gd        TA
## 7   CompShg     BrkFace     Wd Shng       None          0        TA        TA
## 8   CompShg     MetalSd     MetalSd       None          0        TA        TA
## 9   CompShg     HdBoard     HdBoard       None          0        TA        TA
## 10  CompShg     WdShing     Wd Shng      Stone        286        Ex        TA
## 11  CompShg     HdBoard     Plywood       None          0        TA        TA
## 12  CompShg     VinylSd     VinylSd      Stone        306        Gd        TA
## 13  CompShg     MetalSd     MetalSd    BrkFace        212        TA        TA
## 14  CompShg     Wd Sdng     Wd Sdng       None          0        TA        TA
## 15  CompShg     MetalSd     MetalSd       None          0        TA        TA
## 16  CompShg     VinylSd     VinylSd       None          0        TA        TA
## 17  CompShg     VinylSd     VinylSd    BrkFace        380        Gd        TA
## 18  CompShg     Wd Sdng     Wd Sdng       None          0        TA        TA
## 19  CompShg     VinylSd     VinylSd    BrkFace        281        Gd        TA
## 20  CompShg     CemntBd     CmentBd       None          0        TA        TA
##    Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1
## 1       PConc       Gd       TA           No          GLQ        706
## 2      CBlock       Gd       TA           Gd          ALQ        978
## 3       PConc       Gd       TA           Mn          GLQ        486
## 4      BrkTil       TA       Gd           No          ALQ        216
## 5       PConc       Gd       TA           Av          GLQ        655
## 6       PConc       Ex       TA           Av          GLQ       1369
## 7      BrkTil       TA       TA           No          Unf          0
## 8      BrkTil       TA       TA           No          GLQ        851
## 9      CBlock       TA       TA           No          Rec        906
## 10      PConc       Ex       TA           No          GLQ        998
## 11     CBlock       TA       TA           No          ALQ        737
## 12      PConc       Gd       TA           Av          Unf          0
## 13     CBlock       TA       TA           No          BLQ        733
## 14     BrkTil       TA       TA           No          Unf          0
## 15       Slab     <NA>     <NA>         <NA>         <NA>          0
## 16      PConc       TA       TA           No          GLQ        646
## 17      PConc       Ex       TA           Av          Unf          0
## 18      PConc       TA       TA           No          Unf          0
## 19      PConc       Gd       TA           No          Unf          0
## 20      PConc       Gd       TA           No          GLQ        840
##    BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir
## 1           Unf          0       150         856    GasA        Ex          Y
## 2           Unf          0       284        1262    GasA        Ex          Y
## 3           Unf          0       434         920    GasA        Ex          Y
## 4           Unf          0       540         756    GasA        Gd          Y
## 5           Unf          0       490        1145    GasA        Ex          Y
## 6           Unf          0       317        1686    GasA        Ex          Y
## 7           Unf          0       952         952    GasA        Gd          Y
## 8           Unf          0       140         991    GasA        Ex          Y
## 9           Unf          0       134        1040    GasA        Ex          Y
## 10          Unf          0       177        1175    GasA        Ex          Y
## 11          Unf          0       175         912    GasA        TA          Y
## 12          Unf          0      1494        1494    GasA        Ex          Y
## 13          Unf          0       520        1253    GasA        TA          Y
## 14          Unf          0       832         832    GasA        Ex          Y
## 15         <NA>          0         0           0    GasA        TA          Y
## 16          Unf          0       468        1114    GasA        Ex          Y
## 17          Unf          0      1158        1158    GasA        Ex          Y
## 18          Unf          0       637         637    GasA        Ex          Y
## 19          Unf          0      1777        1777    GasA        Ex          Y
## 20          Unf          0       200        1040    GasA        TA          Y
##    Electrical X1stFlrSF X2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath
## 1       SBrkr       856       854            0      1710            1
## 2       SBrkr      1262         0            0      1262            0
## 3       SBrkr       920       866            0      1786            1
## 4       SBrkr       961       756            0      1717            1
## 5       SBrkr      1145      1053            0      2198            1
## 6       SBrkr      1694         0            0      1694            1
## 7       FuseF      1022       752            0      1774            0
## 8       SBrkr      1077         0            0      1077            1
## 9       SBrkr      1040         0            0      1040            1
## 10      SBrkr      1182      1142            0      2324            1
## 11      SBrkr       912         0            0       912            1
## 12      SBrkr      1494         0            0      1494            0
## 13      SBrkr      1253         0            0      1253            1
## 14      FuseA       854         0            0       854            0
## 15      SBrkr      1296         0            0      1296            0
## 16      SBrkr      1114         0            0      1114            1
## 17      SBrkr      1158      1218            0      2376            0
## 18      FuseF      1108         0            0      1108            0
## 19      SBrkr      1795         0            0      1795            0
## 20      SBrkr      1060         0            0      1060            1
##    BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual
## 1             0        2        1            3            1          Gd
## 2             1        2        0            3            1          TA
## 3             0        2        1            3            1          Gd
## 4             0        1        0            3            1          Gd
## 5             0        2        1            4            1          Gd
## 6             0        2        0            3            1          Gd
## 7             0        2        0            2            2          TA
## 8             0        1        0            2            2          TA
## 9             0        1        0            3            1          TA
## 10            0        3        0            4            1          Ex
## 11            0        1        0            2            1          TA
## 12            0        2        0            3            1          Gd
## 13            0        1        1            2            1          TA
## 14            0        1        0            2            1          TA
## 15            0        2        0            2            2          TA
## 16            0        1        1            3            1          Gd
## 17            0        3        1            4            1          Gd
## 18            0        1        0            3            1          Gd
## 19            0        2        0            3            1          Gd
## 20            0        1        0            3            1          TA
##    TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt
## 1             8        Typ          0        <NA>     Attchd        2003
## 2             6        Typ          1          TA     Attchd        1976
## 3             6        Typ          1          TA     Attchd        2001
## 4             7        Typ          1          Gd     Detchd        1998
## 5             9        Typ          1          TA     Attchd        2000
## 6             7        Typ          1          Gd     Attchd        2004
## 7             8       Min1          2          TA     Detchd        1931
## 8             5        Typ          2          TA     Attchd        1939
## 9             5        Typ          0        <NA>     Detchd        1965
## 10           11        Typ          2          Gd    BuiltIn        2005
## 11            4        Typ          0        <NA>     Detchd        1962
## 12            7        Typ          1          Gd     Attchd        2006
## 13            5        Typ          1          Fa     Attchd        1960
## 14            5        Typ          0        <NA>     Detchd        1991
## 15            6        Typ          0        <NA>    CarPort        1967
## 16            6        Typ          0        <NA>     Detchd        2004
## 17            9        Typ          1          Gd    BuiltIn        2005
## 18            6        Typ          1          Gd     Attchd        1930
## 19            7        Typ          1          Gd     Attchd        2002
## 20            6        Typ          1          TA     Attchd        1976
##    GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive
## 1           RFn          2        548         TA         TA          Y
## 2           RFn          2        460         TA         TA          Y
## 3           RFn          2        608         TA         TA          Y
## 4           Unf          3        642         TA         TA          Y
## 5           RFn          3        836         TA         TA          Y
## 6           RFn          2        636         TA         TA          Y
## 7           Unf          2        468         Fa         TA          Y
## 8           RFn          1        205         Gd         TA          Y
## 9           Unf          1        384         TA         TA          Y
## 10          Fin          3        736         TA         TA          Y
## 11          Unf          1        352         TA         TA          Y
## 12          RFn          3        840         TA         TA          Y
## 13          RFn          1        352         TA         TA          Y
## 14          Unf          2        576         TA         TA          Y
## 15          Unf          2        516         TA         TA          Y
## 16          Unf          2        576         TA         TA          Y
## 17          RFn          3        853         TA         TA          Y
## 18          Unf          1        280         TA         TA          N
## 19          RFn          2        534         TA         TA          Y
## 20          Unf          2        572         TA         TA          Y
##    WoodDeckSF OpenPorchSF EnclosedPorch X3SsnPorch ScreenPorch PoolArea PoolQC
## 1           0          61             0          0           0        0   <NA>
## 2         298           0             0          0           0        0   <NA>
## 3           0          42             0          0           0        0   <NA>
## 4           0          35           272          0           0        0   <NA>
## 5         192          84             0          0           0        0   <NA>
## 6         255          57             0          0           0        0   <NA>
## 7          90           0           205          0           0        0   <NA>
## 8           0           4             0          0           0        0   <NA>
## 9           0           0             0          0           0        0   <NA>
## 10        147          21             0          0           0        0   <NA>
## 11        140           0             0          0         176        0   <NA>
## 12        160          33             0          0           0        0   <NA>
## 13          0         213           176          0           0        0   <NA>
## 14         48         112             0          0           0        0   <NA>
## 15          0           0             0          0           0        0   <NA>
## 16          0         102             0          0           0        0   <NA>
## 17        240         154             0          0           0        0   <NA>
## 18          0           0           205          0           0        0   <NA>
## 19        171         159             0          0           0        0   <NA>
## 20        100         110             0          0           0        0   <NA>
##    Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition SalePrice
## 1   <NA>        <NA>       0      2   2008       WD        Normal    208500
## 2   <NA>        <NA>       0      5   2007       WD        Normal    181500
## 3   <NA>        <NA>       0      9   2008       WD        Normal    223500
## 4   <NA>        <NA>       0      2   2006       WD       Abnorml    140000
## 5   <NA>        <NA>       0     12   2008       WD        Normal    250000
## 6   <NA>        <NA>       0      8   2007       WD        Normal    307000
## 7   <NA>        <NA>       0      4   2008       WD       Abnorml    129900
## 8   <NA>        <NA>       0      1   2008       WD        Normal    118000
## 9   <NA>        <NA>       0      2   2008       WD        Normal    129500
## 10  <NA>        <NA>       0      7   2006      New       Partial    345000
## 11  <NA>        <NA>       0      9   2008       WD        Normal    144000
## 12  <NA>        <NA>       0      8   2007      New       Partial    279500
## 13  GdWo        <NA>       0      5   2008       WD        Normal    157000
## 14 GdPrv        <NA>       0      7   2007       WD        Normal    132000
## 15  <NA>        Shed     500     10   2006       WD        Normal     90000
## 16  <NA>        <NA>       0      6   2008       WD        Normal    159000
## 17  <NA>        <NA>       0     11   2006      New       Partial    325300
## 18 GdPrv        <NA>       0      6   2007       WD        Normal    139400
## 19  <NA>        <NA>       0      9   2008       WD        Normal    230000
## 20  <NA>        <NA>       0      6   2007       WD        Normal    129900
str(before2009)
## 'data.frame':    1933 obs. of  82 variables:
##  $ X            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Id           : int  1 2 3 4 5 7 9 10 11 12 ...
##  $ MSSubClass   : int  60 20 60 70 60 20 50 190 20 60 ...
##  $ MSZoning     : chr  "RL" "RL" "RL" "RL" ...
##  $ LotFrontage  : int  65 80 68 60 84 75 51 50 70 85 ...
##  $ LotArea      : int  8450 9600 11250 9550 14260 10084 6120 7420 11200 11924 ...
##  $ Street       : chr  "Pave" "Pave" "Pave" "Pave" ...
##  $ Alley        : chr  NA NA NA NA ...
##  $ LotShape     : chr  "Reg" "Reg" "IR1" "IR1" ...
##  $ LandContour  : chr  "Lvl" "Lvl" "Lvl" "Lvl" ...
##  $ Utilities    : chr  "AllPub" "AllPub" "AllPub" "AllPub" ...
##  $ LotConfig    : chr  "Inside" "FR2" "Inside" "Corner" ...
##  $ LandSlope    : chr  "Gtl" "Gtl" "Gtl" "Gtl" ...
##  $ Neighborhood : chr  "CollgCr" "Veenker" "CollgCr" "Crawfor" ...
##  $ Condition1   : chr  "Norm" "Feedr" "Norm" "Norm" ...
##  $ Condition2   : chr  "Norm" "Norm" "Norm" "Norm" ...
##  $ BldgType     : chr  "1Fam" "1Fam" "1Fam" "1Fam" ...
##  $ HouseStyle   : chr  "2Story" "1Story" "2Story" "2Story" ...
##  $ OverallQual  : int  7 6 7 7 8 8 7 5 5 9 ...
##  $ OverallCond  : int  5 8 5 5 5 5 5 6 5 5 ...
##  $ YearBuilt    : int  2003 1976 2001 1915 2000 2004 1931 1939 1965 2005 ...
##  $ YearRemodAdd : int  2003 1976 2002 1970 2000 2005 1950 1950 1965 2006 ...
##  $ RoofStyle    : chr  "Gable" "Gable" "Gable" "Gable" ...
##  $ RoofMatl     : chr  "CompShg" "CompShg" "CompShg" "CompShg" ...
##  $ Exterior1st  : chr  "VinylSd" "MetalSd" "VinylSd" "Wd Sdng" ...
##  $ Exterior2nd  : chr  "VinylSd" "MetalSd" "VinylSd" "Wd Shng" ...
##  $ MasVnrType   : chr  "BrkFace" "None" "BrkFace" "None" ...
##  $ MasVnrArea   : int  196 0 162 0 350 186 0 0 0 286 ...
##  $ ExterQual    : chr  "Gd" "TA" "Gd" "TA" ...
##  $ ExterCond    : chr  "TA" "TA" "TA" "TA" ...
##  $ Foundation   : chr  "PConc" "CBlock" "PConc" "BrkTil" ...
##  $ BsmtQual     : chr  "Gd" "Gd" "Gd" "TA" ...
##  $ BsmtCond     : chr  "TA" "TA" "TA" "Gd" ...
##  $ BsmtExposure : chr  "No" "Gd" "Mn" "No" ...
##  $ BsmtFinType1 : chr  "GLQ" "ALQ" "GLQ" "ALQ" ...
##  $ BsmtFinSF1   : int  706 978 486 216 655 1369 0 851 906 998 ...
##  $ BsmtFinType2 : chr  "Unf" "Unf" "Unf" "Unf" ...
##  $ BsmtFinSF2   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ BsmtUnfSF    : int  150 284 434 540 490 317 952 140 134 177 ...
##  $ TotalBsmtSF  : int  856 1262 920 756 1145 1686 952 991 1040 1175 ...
##  $ Heating      : chr  "GasA" "GasA" "GasA" "GasA" ...
##  $ HeatingQC    : chr  "Ex" "Ex" "Ex" "Gd" ...
##  $ CentralAir   : chr  "Y" "Y" "Y" "Y" ...
##  $ Electrical   : chr  "SBrkr" "SBrkr" "SBrkr" "SBrkr" ...
##  $ X1stFlrSF    : int  856 1262 920 961 1145 1694 1022 1077 1040 1182 ...
##  $ X2ndFlrSF    : int  854 0 866 756 1053 0 752 0 0 1142 ...
##  $ LowQualFinSF : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ GrLivArea    : int  1710 1262 1786 1717 2198 1694 1774 1077 1040 2324 ...
##  $ BsmtFullBath : int  1 0 1 1 1 1 0 1 1 1 ...
##  $ BsmtHalfBath : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ FullBath     : int  2 2 2 1 2 2 2 1 1 3 ...
##  $ HalfBath     : int  1 0 1 0 1 0 0 0 0 0 ...
##  $ BedroomAbvGr : int  3 3 3 3 4 3 2 2 3 4 ...
##  $ KitchenAbvGr : int  1 1 1 1 1 1 2 2 1 1 ...
##  $ KitchenQual  : chr  "Gd" "TA" "Gd" "Gd" ...
##  $ TotRmsAbvGrd : int  8 6 6 7 9 7 8 5 5 11 ...
##  $ Functional   : chr  "Typ" "Typ" "Typ" "Typ" ...
##  $ Fireplaces   : int  0 1 1 1 1 1 2 2 0 2 ...
##  $ FireplaceQu  : chr  NA "TA" "TA" "Gd" ...
##  $ GarageType   : chr  "Attchd" "Attchd" "Attchd" "Detchd" ...
##  $ GarageYrBlt  : int  2003 1976 2001 1998 2000 2004 1931 1939 1965 2005 ...
##  $ GarageFinish : chr  "RFn" "RFn" "RFn" "Unf" ...
##  $ GarageCars   : int  2 2 2 3 3 2 2 1 1 3 ...
##  $ GarageArea   : int  548 460 608 642 836 636 468 205 384 736 ...
##  $ GarageQual   : chr  "TA" "TA" "TA" "TA" ...
##  $ GarageCond   : chr  "TA" "TA" "TA" "TA" ...
##  $ PavedDrive   : chr  "Y" "Y" "Y" "Y" ...
##  $ WoodDeckSF   : int  0 298 0 0 192 255 90 0 0 147 ...
##  $ OpenPorchSF  : int  61 0 42 35 84 57 0 4 0 21 ...
##  $ EnclosedPorch: int  0 0 0 272 0 0 205 0 0 0 ...
##  $ X3SsnPorch   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ ScreenPorch  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ PoolArea     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ PoolQC       : chr  NA NA NA NA ...
##  $ Fence        : chr  NA NA NA NA ...
##  $ MiscFeature  : chr  NA NA NA NA ...
##  $ MiscVal      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ MoSold       : int  2 5 9 2 12 8 4 1 2 7 ...
##  $ YrSold       : int  2008 2007 2008 2006 2008 2007 2008 2008 2008 2006 ...
##  $ SaleType     : chr  "WD" "WD" "WD" "WD" ...
##  $ SaleCondition: chr  "Normal" "Normal" "Normal" "Abnorml" ...
##  $ SalePrice    : num  208500 181500 223500 140000 250000 ...

[4 points] Q3.

Convert the following columns to character or factor type: MSSubClass, OverallQual, OverallCond. Then inspect the result to verify that your code works. Rubric: 3 points (1 point each) for conversion and 1 point for verification.

# Convert the columns to factor type
before2009$MSSubClass <- as.factor(before2009$MSSubClass)
before2009$OverallQual <- as.factor(before2009$OverallQual)
before2009$OverallCond <- as.factor(before2009$OverallCond)

# Inspect the result to verify the conversion
str(before2009[, c("MSSubClass", "OverallQual","OverallCond")])
## 'data.frame':    1933 obs. of  3 variables:
##  $ MSSubClass : Factor w/ 16 levels "20","30","40",..: 6 1 6 7 6 1 5 16 1 6 ...
##  $ OverallQual: Factor w/ 10 levels "1","2","3","4",..: 7 6 7 7 8 8 7 5 5 9 ...
##  $ OverallCond: Factor w/ 9 levels "1","2","3","4",..: 5 8 5 5 5 5 5 6 5 5 ...

[7 points] Q4.

How many NAs does each column have? Display your answer as a dataframe (or tibble) called beforeNAs. The dataset beforeNAs should contain two columns, one containing the names of the columns of before2009, and the other containing the number of NAs in each column. Then, print only the first 10 (head) rows of this dataframe to verify that your code works. Rubric: 6 points for constructing beforeNAs and 1 point for verification.

#install.packages("dplyr")
library(dplyr)
## 
## 载入程辑包:'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#install.packages("tidyverse")
library(tidyverse)
## Warning: 程辑包'tidyverse'是用R版本4.3.2 来建造的
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.2     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
temp = map(before2009, ~sum(is.na(.))) %>% as_tibble() %>% t()
beforeNAs = tibble('Columns' = rownames(temp), "NAs" = temp[,1])
beforeNAs %>% head(10)
## # A tibble: 10 × 2
##    Columns       NAs
##    <chr>       <int>
##  1 X               0
##  2 Id              0
##  3 MSSubClass      0
##  4 MSZoning        3
##  5 LotFrontage   317
##  6 LotArea         0
##  7 Street          0
##  8 Alley        1797
##  9 LotShape        0
## 10 LandContour     0

[9 points] Q5.

Drop (remove) all the columns (except SalePrice) that have 20 or more many missing values. Also, drop (remove) the columns called X1, Id, and Utilities (all its values are the same). While some of the columns we drop here may contribute to the predictive accuracy of our model, the majority of the information will be contained in the remaining variables. Then, print only the first 10 (head) rows of this dataframe to verify that your code works. Rubric: 8 points for constructing beforeNAs and 1 point for verification.

count_NA <- sapply(before2009, function(x) sum(is.na(x)))
cols_NA <- names(count_NA[count_NA >= 20])

count_NA["SalePrice"] <- 0
cols_NA <- names(count_NA[count_NA >= 20])

cols_drop <- c("X","Id","Utilities")
dropCols <- union(cols_NA, cols_drop)
before2009 <- select(before2009, -dropCols)
## Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
##   # Was:
##   data %>% select(dropCols)
## 
##   # Now:
##   data %>% select(all_of(dropCols))
## 
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
head(before2009, 20)
##    MSSubClass MSZoning LotArea Street LotShape LandContour LotConfig LandSlope
## 1          60       RL    8450   Pave      Reg         Lvl    Inside       Gtl
## 2          20       RL    9600   Pave      Reg         Lvl       FR2       Gtl
## 3          60       RL   11250   Pave      IR1         Lvl    Inside       Gtl
## 4          70       RL    9550   Pave      IR1         Lvl    Corner       Gtl
## 5          60       RL   14260   Pave      IR1         Lvl       FR2       Gtl
## 6          20       RL   10084   Pave      Reg         Lvl    Inside       Gtl
## 7          50       RM    6120   Pave      Reg         Lvl    Inside       Gtl
## 8         190       RL    7420   Pave      Reg         Lvl    Corner       Gtl
## 9          20       RL   11200   Pave      Reg         Lvl    Inside       Gtl
## 10         60       RL   11924   Pave      IR1         Lvl    Inside       Gtl
## 11         20       RL   12968   Pave      IR2         Lvl    Inside       Gtl
## 12         20       RL   10652   Pave      IR1         Lvl    Inside       Gtl
## 13         20       RL   10920   Pave      IR1         Lvl    Corner       Gtl
## 14         45       RM    6120   Pave      Reg         Lvl    Corner       Gtl
## 15         90       RL   10791   Pave      Reg         Lvl    Inside       Gtl
## 16         20       RL   13695   Pave      Reg         Lvl    Inside       Gtl
## 17         60       RL   14215   Pave      IR1         Lvl    Corner       Gtl
## 18         45       RM    7449   Pave      Reg         Bnk    Inside       Gtl
## 19         20       RL    9742   Pave      Reg         Lvl    Inside       Gtl
## 20        120       RM    4224   Pave      Reg         Lvl    Inside       Gtl
##    Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual
## 1       CollgCr       Norm       Norm     1Fam     2Story           7
## 2       Veenker      Feedr       Norm     1Fam     1Story           6
## 3       CollgCr       Norm       Norm     1Fam     2Story           7
## 4       Crawfor       Norm       Norm     1Fam     2Story           7
## 5       NoRidge       Norm       Norm     1Fam     2Story           8
## 6       Somerst       Norm       Norm     1Fam     1Story           8
## 7       OldTown     Artery       Norm     1Fam     1.5Fin           7
## 8       BrkSide     Artery     Artery   2fmCon     1.5Unf           5
## 9        Sawyer       Norm       Norm     1Fam     1Story           5
## 10      NridgHt       Norm       Norm     1Fam     2Story           9
## 11       Sawyer       Norm       Norm     1Fam     1Story           5
## 12      CollgCr       Norm       Norm     1Fam     1Story           7
## 13        NAmes       Norm       Norm     1Fam     1Story           6
## 14      BrkSide       Norm       Norm     1Fam     1.5Unf           7
## 15       Sawyer       Norm       Norm   Duplex     1Story           4
## 16      SawyerW       RRAe       Norm     1Fam     1Story           5
## 17      NridgHt       Norm       Norm     1Fam     2Story           8
## 18       IDOTRR       Norm       Norm     1Fam     1.5Unf           7
## 19      CollgCr       Norm       Norm     1Fam     1Story           8
## 20      MeadowV       Norm       Norm   TwnhsE     1Story           5
##    OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st
## 1            5      2003         2003     Gable  CompShg     VinylSd
## 2            8      1976         1976     Gable  CompShg     MetalSd
## 3            5      2001         2002     Gable  CompShg     VinylSd
## 4            5      1915         1970     Gable  CompShg     Wd Sdng
## 5            5      2000         2000     Gable  CompShg     VinylSd
## 6            5      2004         2005     Gable  CompShg     VinylSd
## 7            5      1931         1950     Gable  CompShg     BrkFace
## 8            6      1939         1950     Gable  CompShg     MetalSd
## 9            5      1965         1965       Hip  CompShg     HdBoard
## 10           5      2005         2006       Hip  CompShg     WdShing
## 11           6      1962         1962       Hip  CompShg     HdBoard
## 12           5      2006         2007     Gable  CompShg     VinylSd
## 13           5      1960         1960       Hip  CompShg     MetalSd
## 14           8      1929         2001     Gable  CompShg     Wd Sdng
## 15           5      1967         1967     Gable  CompShg     MetalSd
## 16           5      2004         2004     Gable  CompShg     VinylSd
## 17           5      2005         2006     Gable  CompShg     VinylSd
## 18           7      1930         1950     Gable  CompShg     Wd Sdng
## 19           5      2002         2002       Hip  CompShg     VinylSd
## 20           7      1976         1976     Gable  CompShg     CemntBd
##    Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtFinSF1
## 1      VinylSd    BrkFace        196        Gd        TA      PConc        706
## 2      MetalSd       None          0        TA        TA     CBlock        978
## 3      VinylSd    BrkFace        162        Gd        TA      PConc        486
## 4      Wd Shng       None          0        TA        TA     BrkTil        216
## 5      VinylSd    BrkFace        350        Gd        TA      PConc        655
## 6      VinylSd      Stone        186        Gd        TA      PConc       1369
## 7      Wd Shng       None          0        TA        TA     BrkTil          0
## 8      MetalSd       None          0        TA        TA     BrkTil        851
## 9      HdBoard       None          0        TA        TA     CBlock        906
## 10     Wd Shng      Stone        286        Ex        TA      PConc        998
## 11     Plywood       None          0        TA        TA     CBlock        737
## 12     VinylSd      Stone        306        Gd        TA      PConc          0
## 13     MetalSd    BrkFace        212        TA        TA     CBlock        733
## 14     Wd Sdng       None          0        TA        TA     BrkTil          0
## 15     MetalSd       None          0        TA        TA       Slab          0
## 16     VinylSd       None          0        TA        TA      PConc        646
## 17     VinylSd    BrkFace        380        Gd        TA      PConc          0
## 18     Wd Sdng       None          0        TA        TA      PConc          0
## 19     VinylSd    BrkFace        281        Gd        TA      PConc          0
## 20     CmentBd       None          0        TA        TA      PConc        840
##    BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical
## 1           0       150         856    GasA        Ex          Y      SBrkr
## 2           0       284        1262    GasA        Ex          Y      SBrkr
## 3           0       434         920    GasA        Ex          Y      SBrkr
## 4           0       540         756    GasA        Gd          Y      SBrkr
## 5           0       490        1145    GasA        Ex          Y      SBrkr
## 6           0       317        1686    GasA        Ex          Y      SBrkr
## 7           0       952         952    GasA        Gd          Y      FuseF
## 8           0       140         991    GasA        Ex          Y      SBrkr
## 9           0       134        1040    GasA        Ex          Y      SBrkr
## 10          0       177        1175    GasA        Ex          Y      SBrkr
## 11          0       175         912    GasA        TA          Y      SBrkr
## 12          0      1494        1494    GasA        Ex          Y      SBrkr
## 13          0       520        1253    GasA        TA          Y      SBrkr
## 14          0       832         832    GasA        Ex          Y      FuseA
## 15          0         0           0    GasA        TA          Y      SBrkr
## 16          0       468        1114    GasA        Ex          Y      SBrkr
## 17          0      1158        1158    GasA        Ex          Y      SBrkr
## 18          0       637         637    GasA        Ex          Y      FuseF
## 19          0      1777        1777    GasA        Ex          Y      SBrkr
## 20          0       200        1040    GasA        TA          Y      SBrkr
##    X1stFlrSF X2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath
## 1        856       854            0      1710            1            0
## 2       1262         0            0      1262            0            1
## 3        920       866            0      1786            1            0
## 4        961       756            0      1717            1            0
## 5       1145      1053            0      2198            1            0
## 6       1694         0            0      1694            1            0
## 7       1022       752            0      1774            0            0
## 8       1077         0            0      1077            1            0
## 9       1040         0            0      1040            1            0
## 10      1182      1142            0      2324            1            0
## 11       912         0            0       912            1            0
## 12      1494         0            0      1494            0            0
## 13      1253         0            0      1253            1            0
## 14       854         0            0       854            0            0
## 15      1296         0            0      1296            0            0
## 16      1114         0            0      1114            1            0
## 17      1158      1218            0      2376            0            0
## 18      1108         0            0      1108            0            0
## 19      1795         0            0      1795            0            0
## 20      1060         0            0      1060            1            0
##    FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd
## 1         2        1            3            1          Gd            8
## 2         2        0            3            1          TA            6
## 3         2        1            3            1          Gd            6
## 4         1        0            3            1          Gd            7
## 5         2        1            4            1          Gd            9
## 6         2        0            3            1          Gd            7
## 7         2        0            2            2          TA            8
## 8         1        0            2            2          TA            5
## 9         1        0            3            1          TA            5
## 10        3        0            4            1          Ex           11
## 11        1        0            2            1          TA            4
## 12        2        0            3            1          Gd            7
## 13        1        1            2            1          TA            5
## 14        1        0            2            1          TA            5
## 15        2        0            2            2          TA            6
## 16        1        1            3            1          Gd            6
## 17        3        1            4            1          Gd            9
## 18        1        0            3            1          Gd            6
## 19        2        0            3            1          Gd            7
## 20        1        0            3            1          TA            6
##    Functional Fireplaces GarageCars GarageArea PavedDrive WoodDeckSF
## 1         Typ          0          2        548          Y          0
## 2         Typ          1          2        460          Y        298
## 3         Typ          1          2        608          Y          0
## 4         Typ          1          3        642          Y          0
## 5         Typ          1          3        836          Y        192
## 6         Typ          1          2        636          Y        255
## 7        Min1          2          2        468          Y         90
## 8         Typ          2          1        205          Y          0
## 9         Typ          0          1        384          Y          0
## 10        Typ          2          3        736          Y        147
## 11        Typ          0          1        352          Y        140
## 12        Typ          1          3        840          Y        160
## 13        Typ          1          1        352          Y          0
## 14        Typ          0          2        576          Y         48
## 15        Typ          0          2        516          Y          0
## 16        Typ          0          2        576          Y          0
## 17        Typ          1          3        853          Y        240
## 18        Typ          1          1        280          N          0
## 19        Typ          1          2        534          Y        171
## 20        Typ          1          2        572          Y        100
##    OpenPorchSF EnclosedPorch X3SsnPorch ScreenPorch PoolArea MiscVal MoSold
## 1           61             0          0           0        0       0      2
## 2            0             0          0           0        0       0      5
## 3           42             0          0           0        0       0      9
## 4           35           272          0           0        0       0      2
## 5           84             0          0           0        0       0     12
## 6           57             0          0           0        0       0      8
## 7            0           205          0           0        0       0      4
## 8            4             0          0           0        0       0      1
## 9            0             0          0           0        0       0      2
## 10          21             0          0           0        0       0      7
## 11           0             0          0         176        0       0      9
## 12          33             0          0           0        0       0      8
## 13         213           176          0           0        0       0      5
## 14         112             0          0           0        0       0      7
## 15           0             0          0           0        0     500     10
## 16         102             0          0           0        0       0      6
## 17         154             0          0           0        0       0     11
## 18           0           205          0           0        0       0      6
## 19         159             0          0           0        0       0      9
## 20         110             0          0           0        0       0      6
##    YrSold SaleType SaleCondition SalePrice
## 1    2008       WD        Normal    208500
## 2    2007       WD        Normal    181500
## 3    2008       WD        Normal    223500
## 4    2006       WD       Abnorml    140000
## 5    2008       WD        Normal    250000
## 6    2007       WD        Normal    307000
## 7    2008       WD       Abnorml    129900
## 8    2008       WD        Normal    118000
## 9    2008       WD        Normal    129500
## 10   2006      New       Partial    345000
## 11   2008       WD        Normal    144000
## 12   2007      New       Partial    279500
## 13   2008       WD        Normal    157000
## 14   2007       WD        Normal    132000
## 15   2006       WD        Normal     90000
## 16   2008       WD        Normal    159000
## 17   2006      New       Partial    325300
## 18   2007       WD        Normal    139400
## 19   2008       WD        Normal    230000
## 20   2007       WD        Normal    129900

[5 points] Q6.

Conduct a multiple linear regression on all variables. Set SalePrice as the response and store the results in regBefore2009. Then, print the summary of regBefore2009 to verify that your code works. Tip: The formula for regression is lm(SalePrice ~ ., data = before2009) Rubric: 4 points for setting regBefore2009 and 1 point for verification.

regBefore2009 <- lm(SalePrice ~., data = before2009)

summary(regBefore2009)
## 
## Call:
## lm(formula = SalePrice ~ ., data = before2009)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -178924   -4505     -76    4002  157196 
## 
## Coefficients: (3 not defined because of singularities)
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -2.564e+06  9.825e+05  -2.610 0.009139 ** 
## MSSubClass30         -2.601e+02  2.741e+03  -0.095 0.924405    
## MSSubClass40         -2.076e+02  8.694e+03  -0.024 0.980951    
## MSSubClass45          6.521e+03  1.191e+04   0.547 0.584141    
## MSSubClass50          3.890e+03  4.941e+03   0.787 0.431210    
## MSSubClass60          1.242e+03  4.708e+03   0.264 0.791926    
## MSSubClass70          3.569e+03  4.989e+03   0.715 0.474411    
## MSSubClass75          6.586e+02  7.683e+03   0.086 0.931698    
## MSSubClass80         -1.153e+04  7.335e+03  -1.572 0.116120    
## MSSubClass85         -7.530e+03  5.808e+03  -1.296 0.195004    
## MSSubClass90         -1.215e+04  4.669e+03  -2.602 0.009344 ** 
## MSSubClass120        -8.063e+03  7.107e+03  -1.135 0.256704    
## MSSubClass150         2.546e+03  1.911e+04   0.133 0.894003    
## MSSubClass160        -5.545e+03  8.866e+03  -0.625 0.531741    
## MSSubClass180        -5.791e+03  9.898e+03  -0.585 0.558595    
## MSSubClass190        -8.616e+03  1.274e+04  -0.676 0.499018    
## MSZoningFV            4.095e+04  6.833e+03   5.993 2.51e-09 ***
## MSZoningRH            2.490e+04  6.983e+03   3.567 0.000372 ***
## MSZoningRL            3.009e+04  5.710e+03   5.269 1.55e-07 ***
## MSZoningRM            3.019e+04  5.331e+03   5.663 1.74e-08 ***
## LotArea               5.643e-01  7.679e-02   7.348 3.13e-13 ***
## StreetPave            2.965e+04  6.601e+03   4.492 7.54e-06 ***
## LotShapeIR2           4.777e+03  2.431e+03   1.965 0.049604 *  
## LotShapeIR3           7.886e+03  5.038e+03   1.565 0.117678    
## LotShapeReg           5.291e+02  9.504e+02   0.557 0.577754    
## LandContourHLS        1.283e+04  2.912e+03   4.406 1.12e-05 ***
## LandContourLow        8.402e+02  4.041e+03   0.208 0.835341    
## LandContourLvl        1.041e+04  2.160e+03   4.818 1.58e-06 ***
## LotConfigCulDSac      6.820e+03  1.939e+03   3.517 0.000449 ***
## LotConfigFR2         -7.573e+03  2.562e+03  -2.956 0.003161 ** 
## LotConfigFR3         -1.235e+04  5.077e+03  -2.432 0.015109 *  
## LotConfigInside      -3.084e+03  1.062e+03  -2.904 0.003728 ** 
## LandSlopeMod          1.175e+04  2.376e+03   4.943 8.46e-07 ***
## LandSlopeSev         -2.130e+04  7.277e+03  -2.926 0.003476 ** 
## NeighborhoodBlueste  -1.081e+04  9.673e+03  -1.118 0.263830    
## NeighborhoodBrDale   -2.852e+03  6.729e+03  -0.424 0.671753    
## NeighborhoodBrkSide  -1.128e+04  5.441e+03  -2.073 0.038294 *  
## NeighborhoodClearCr  -2.081e+04  5.730e+03  -3.631 0.000290 ***
## NeighborhoodCollgCr  -1.806e+04  4.275e+03  -4.224 2.53e-05 ***
## NeighborhoodCrawfor   3.828e+03  4.936e+03   0.775 0.438221    
## NeighborhoodEdwards  -2.614e+04  4.686e+03  -5.578 2.83e-08 ***
## NeighborhoodGilbert  -1.959e+04  4.545e+03  -4.312 1.71e-05 ***
## NeighborhoodIDOTRR   -1.865e+04  5.904e+03  -3.159 0.001613 ** 
## NeighborhoodMeadowV  -2.288e+04  6.805e+03  -3.362 0.000792 ***
## NeighborhoodMitchel  -2.956e+04  4.753e+03  -6.220 6.26e-10 ***
## NeighborhoodNAmes    -2.254e+04  4.574e+03  -4.929 9.09e-07 ***
## NeighborhoodNoRidge   1.408e+04  5.029e+03   2.799 0.005181 ** 
## NeighborhoodNPkVill   5.527e+03  1.022e+04   0.541 0.588839    
## NeighborhoodNridgHt   1.329e+04  4.439e+03   2.994 0.002796 ** 
## NeighborhoodNWAmes   -2.666e+04  4.712e+03  -5.659 1.79e-08 ***
## NeighborhoodOldTown  -2.330e+04  5.415e+03  -4.303 1.78e-05 ***
## NeighborhoodSawyer   -1.786e+04  4.745e+03  -3.763 0.000174 ***
## NeighborhoodSawyerW  -1.319e+04  4.649e+03  -2.838 0.004596 ** 
## NeighborhoodSomerst  -1.647e+04  5.199e+03  -3.169 0.001559 ** 
## NeighborhoodStoneBr   2.869e+04  5.073e+03   5.656 1.81e-08 ***
## NeighborhoodSWISU    -1.355e+04  5.847e+03  -2.318 0.020557 *  
## NeighborhoodTimber   -1.265e+04  4.795e+03  -2.638 0.008427 ** 
## NeighborhoodVeenker  -4.591e+03  5.740e+03  -0.800 0.423928    
## Condition1Feedr       2.806e+03  2.954e+03   0.950 0.342241    
## Condition1Norm        1.340e+04  2.470e+03   5.425 6.62e-08 ***
## Condition1PosA        1.274e+04  5.569e+03   2.288 0.022259 *  
## Condition1PosN        7.585e+03  4.569e+03   1.660 0.097089 .  
## Condition1RRAe       -1.425e+04  4.652e+03  -3.063 0.002226 ** 
## Condition1RRAn        1.101e+04  3.895e+03   2.826 0.004765 ** 
## Condition1RRNe       -1.495e+03  8.783e+03  -0.170 0.864886    
## Condition1RRNn        6.815e+03  9.224e+03   0.739 0.460134    
## Condition2Feedr      -8.786e+03  1.040e+04  -0.845 0.398354    
## Condition2Norm       -3.235e+03  9.104e+03  -0.355 0.722366    
## Condition2PosA       -7.289e+03  1.437e+04  -0.507 0.612077    
## Condition2PosN       -2.439e+05  1.368e+04 -17.834  < 2e-16 ***
## Condition2RRAe       -1.074e+05  2.368e+04  -4.537 6.11e-06 ***
## Condition2RRAn       -7.330e+03  1.870e+04  -0.392 0.695047    
## Condition2RRNn       -3.959e+02  1.467e+04  -0.027 0.978476    
## BldgType2fmCon       -7.442e+02  1.277e+04  -0.058 0.953546    
## BldgTypeDuplex               NA         NA      NA       NA    
## BldgTypeTwnhs        -1.811e+04  7.659e+03  -2.365 0.018163 *  
## BldgTypeTwnhsE       -1.752e+04  7.082e+03  -2.474 0.013460 *  
## HouseStyle1.5Unf      7.953e+03  1.114e+04   0.714 0.475592    
## HouseStyle1Story      1.086e+04  4.960e+03   2.190 0.028633 *  
## HouseStyle2.5Fin     -1.193e+04  1.025e+04  -1.164 0.244636    
## HouseStyle2.5Unf     -8.651e+03  7.330e+03  -1.180 0.238061    
## HouseStyle2Story     -6.055e+03  4.809e+03  -1.259 0.208150    
## HouseStyleSFoyer      1.538e+04  6.497e+03   2.367 0.018057 *  
## HouseStyleSLvl        1.827e+04  7.821e+03   2.336 0.019626 *  
## OverallQual2          2.961e+04  1.988e+04   1.490 0.136484    
## OverallQual3          3.496e+04  1.859e+04   1.880 0.060248 .  
## OverallQual4          3.600e+04  1.847e+04   1.949 0.051465 .  
## OverallQual5          4.016e+04  1.853e+04   2.167 0.030388 *  
## OverallQual6          4.530e+04  1.858e+04   2.438 0.014872 *  
## OverallQual7          5.296e+04  1.860e+04   2.847 0.004468 ** 
## OverallQual8          6.663e+04  1.867e+04   3.569 0.000368 ***
## OverallQual9          8.783e+04  1.890e+04   4.647 3.63e-06 ***
## OverallQual10         1.369e+05  1.948e+04   7.028 3.03e-12 ***
## OverallCond2          1.315e+04  2.332e+04   0.564 0.572872    
## OverallCond3          2.004e+04  1.342e+04   1.494 0.135405    
## OverallCond4          2.603e+04  1.336e+04   1.948 0.051607 .  
## OverallCond5          3.390e+04  1.336e+04   2.536 0.011290 *  
## OverallCond6          4.023e+04  1.342e+04   2.998 0.002759 ** 
## OverallCond7          4.572e+04  1.345e+04   3.400 0.000690 ***
## OverallCond8          5.177e+04  1.350e+04   3.836 0.000130 ***
## OverallCond9          6.002e+04  1.400e+04   4.288 1.90e-05 ***
## YearBuilt             3.595e+02  4.731e+01   7.599 4.92e-14 ***
## YearRemodAdd          1.055e+02  3.212e+01   3.286 0.001038 ** 
## RoofStyleGable       -4.815e+03  8.656e+03  -0.556 0.578143    
## RoofStyleGambrel     -2.186e+03  9.745e+03  -0.224 0.822557    
## RoofStyleHip         -3.696e+03  8.704e+03  -0.425 0.671165    
## RoofStyleMansard      5.986e+03  1.100e+04   0.544 0.586339    
## RoofStyleShed         7.902e+04  1.515e+04   5.215 2.06e-07 ***
## RoofMatlCompShg       6.617e+05  2.022e+04  32.718  < 2e-16 ***
## RoofMatlMembran       7.396e+05  2.887e+04  25.620  < 2e-16 ***
## RoofMatlMetal         6.992e+05  2.872e+04  24.346  < 2e-16 ***
## RoofMatlRoll          6.538e+05  2.634e+04  24.818  < 2e-16 ***
## RoofMatlTar&Grv       6.672e+05  2.175e+04  30.676  < 2e-16 ***
## RoofMatlWdShake       6.437e+05  2.152e+04  29.911  < 2e-16 ***
## RoofMatlWdShngl       7.432e+05  2.132e+04  34.854  < 2e-16 ***
## Exterior1stAsphShn   -1.850e+04  2.276e+04  -0.813 0.416323    
## Exterior1stBrkComm   -7.338e+03  1.387e+04  -0.529 0.596760    
## Exterior1stBrkFace    7.740e+03  7.342e+03   1.054 0.291930    
## Exterior1stCemntBd   -1.151e+04  1.224e+04  -0.940 0.347353    
## Exterior1stHdBoard   -1.014e+04  7.086e+03  -1.431 0.152611    
## Exterior1stImStucc   -6.986e+04  1.799e+04  -3.884 0.000107 ***
## Exterior1stMetalSd    1.209e+03  7.963e+03   0.152 0.879352    
## Exterior1stPlywood   -1.547e+04  6.952e+03  -2.226 0.026170 *  
## Exterior1stStone     -2.641e+04  1.549e+04  -1.705 0.088463 .  
## Exterior1stStucco    -4.405e+03  8.089e+03  -0.544 0.586173    
## Exterior1stVinylSd   -1.611e+04  8.066e+03  -1.997 0.045966 *  
## Exterior1stWd Sdng   -8.627e+03  6.955e+03  -1.240 0.215021    
## Exterior1stWdShing   -3.086e+03  7.381e+03  -0.418 0.675971    
## Exterior2ndAsphShn    2.238e+03  1.431e+04   0.156 0.875675    
## Exterior2ndBrk Cmn    4.508e+03  1.355e+04   0.333 0.739463    
## Exterior2ndBrkFace   -3.269e+02  8.265e+03  -0.040 0.968454    
## Exterior2ndCmentBd    1.024e+04  1.261e+04   0.812 0.417099    
## Exterior2ndHdBoard    1.608e+03  7.571e+03   0.212 0.831880    
## Exterior2ndImStucc    3.464e+04  8.906e+03   3.890 0.000104 ***
## Exterior2ndMetalSd   -3.830e+03  8.357e+03  -0.458 0.646778    
## Exterior2ndOther     -1.006e+04  1.825e+04  -0.551 0.581676    
## Exterior2ndPlywood    3.119e+03  7.283e+03   0.428 0.668547    
## Exterior2ndStone      1.465e+04  1.425e+04   1.029 0.303846    
## Exterior2ndStucco    -2.973e+03  8.510e+03  -0.349 0.726834    
## Exterior2ndVinylSd    1.104e+04  8.440e+03   1.308 0.191016    
## Exterior2ndWd Sdng    4.350e+03  7.496e+03   0.580 0.561750    
## Exterior2ndWd Shng   -4.197e+03  7.836e+03  -0.536 0.592332    
## MasVnrTypeBrkFace     8.602e+03  3.955e+03   2.175 0.029777 *  
## MasVnrTypeNone        1.164e+04  3.963e+03   2.938 0.003349 ** 
## MasVnrTypeStone       1.311e+04  4.195e+03   3.125 0.001807 ** 
## MasVnrArea            1.884e+01  3.395e+00   5.550 3.32e-08 ***
## ExterQualFa           1.336e+04  6.288e+03   2.125 0.033726 *  
## ExterQualGd          -8.024e+03  3.085e+03  -2.601 0.009375 ** 
## ExterQualTA          -1.003e+04  3.394e+03  -2.956 0.003160 ** 
## ExterCondFa          -3.499e+03  7.066e+03  -0.495 0.620488    
## ExterCondGd          -1.037e+04  6.377e+03  -1.625 0.104247    
## ExterCondTA          -6.483e+03  6.373e+03  -1.017 0.309186    
## FoundationCBlock      1.819e+03  1.801e+03   1.010 0.312470    
## FoundationPConc       6.034e+03  1.964e+03   3.072 0.002159 ** 
## FoundationSlab        6.341e+03  4.468e+03   1.419 0.156047    
## FoundationStone       4.014e+03  7.200e+03   0.557 0.577301    
## FoundationWood       -2.481e+04  1.172e+04  -2.116 0.034456 *  
## BsmtFinSF1            3.360e+01  2.392e+00  14.048  < 2e-16 ***
## BsmtFinSF2            2.188e+01  3.201e+00   6.835 1.14e-11 ***
## BsmtUnfSF             1.270e+01  2.228e+00   5.698 1.43e-08 ***
## TotalBsmtSF                  NA         NA      NA       NA    
## HeatingGasA           2.343e+03  1.677e+04   0.140 0.888850    
## HeatingGasW          -6.304e+03  1.730e+04  -0.364 0.715552    
## HeatingGrav          -3.139e+03  1.894e+04  -0.166 0.868428    
## HeatingOthW          -2.839e+04  2.062e+04  -1.377 0.168675    
## HeatingWall           4.965e+03  2.129e+04   0.233 0.815610    
## HeatingQCFa          -1.928e+03  2.706e+03  -0.712 0.476280    
## HeatingQCGd          -3.737e+03  1.183e+03  -3.160 0.001606 ** 
## HeatingQCPo           1.191e+04  1.245e+04   0.956 0.339135    
## HeatingQCTA          -3.674e+03  1.197e+03  -3.069 0.002180 ** 
## CentralAirY          -3.873e+02  2.091e+03  -0.185 0.853039    
## ElectricalFuseF      -2.989e+03  3.552e+03  -0.842 0.400166    
## ElectricalFuseP      -9.314e+03  7.068e+03  -1.318 0.187772    
## ElectricalMix         9.603e+03  2.643e+04   0.363 0.716405    
## ElectricalSBrkr      -1.970e+03  1.715e+03  -1.148 0.251012    
## X1stFlrSF             5.350e+01  2.858e+00  18.718  < 2e-16 ***
## X2ndFlrSF             6.487e+01  3.035e+00  21.375  < 2e-16 ***
## LowQualFinSF          1.211e+01  1.099e+01   1.102 0.270735    
## GrLivArea                    NA         NA      NA       NA    
## BsmtFullBath          2.144e+03  1.114e+03   1.925 0.054395 .  
## BsmtHalfBath          8.477e+02  1.648e+03   0.514 0.607154    
## FullBath              3.703e+03  1.295e+03   2.860 0.004292 ** 
## HalfBath             -7.771e+01  1.241e+03  -0.063 0.950057    
## BedroomAbvGr         -3.233e+03  7.977e+02  -4.053 5.28e-05 ***
## KitchenAbvGr         -6.874e+03  4.161e+03  -1.652 0.098739 .  
## KitchenQualFa        -1.566e+04  3.644e+03  -4.296 1.84e-05 ***
## KitchenQualGd        -2.059e+04  2.131e+03  -9.663  < 2e-16 ***
## KitchenQualTA        -1.784e+04  2.372e+03  -7.523 8.70e-14 ***
## TotRmsAbvGrd          7.895e+02  5.523e+02   1.430 0.153012    
## FunctionalMaj2       -5.674e+03  9.442e+03  -0.601 0.547938    
## FunctionalMin1        5.206e+03  5.666e+03   0.919 0.358376    
## FunctionalMin2        6.318e+03  5.832e+03   1.083 0.278778    
## FunctionalMod        -5.581e+03  6.326e+03  -0.882 0.377781    
## FunctionalSev        -5.304e+04  1.823e+04  -2.910 0.003666 ** 
## FunctionalTyp         1.758e+04  5.075e+03   3.465 0.000544 ***
## Fireplaces            4.200e+03  7.970e+02   5.270 1.54e-07 ***
## GarageCars            2.574e+03  1.278e+03   2.015 0.044109 *  
## GarageArea            1.710e+01  4.373e+00   3.910 9.61e-05 ***
## PavedDriveP          -3.467e+03  2.994e+03  -1.158 0.247036    
## PavedDriveY          -2.394e+03  1.905e+03  -1.257 0.208887    
## WoodDeckSF            1.471e+01  3.400e+00   4.325 1.61e-05 ***
## OpenPorchSF           1.655e+01  6.399e+00   2.586 0.009807 ** 
## EnclosedPorch         6.099e+00  6.780e+00   0.900 0.368493    
## X3SsnPorch            5.566e+01  1.690e+01   3.293 0.001012 ** 
## ScreenPorch           2.423e+01  7.027e+00   3.449 0.000577 ***
## PoolArea              6.602e+01  9.534e+00   6.924 6.20e-12 ***
## MiscVal              -5.596e-01  6.572e-01  -0.852 0.394563    
## MoSold               -5.360e+02  1.417e+02  -3.784 0.000160 ***
## YrSold                4.488e+02  4.868e+02   0.922 0.356638    
## SaleTypeCon           3.738e+04  9.712e+03   3.849 0.000123 ***
## SaleTypeConLD         1.265e+04  5.268e+03   2.402 0.016432 *  
## SaleTypeConLI        -5.307e+03  9.946e+03  -0.534 0.593718    
## SaleTypeConLw        -5.056e+02  8.633e+03  -0.059 0.953308    
## SaleTypeCWD           2.103e+04  5.336e+03   3.942 8.40e-05 ***
## SaleTypeNew           1.496e+04  8.815e+03   1.697 0.089953 .  
## SaleTypeOth           1.122e+04  9.702e+03   1.156 0.247642    
## SaleTypeWD           -1.063e+03  2.538e+03  -0.419 0.675428    
## SaleConditionAdjLand  9.288e+03  5.828e+03   1.594 0.111226    
## SaleConditionAlloca   6.959e+03  6.002e+03   1.160 0.246397    
## SaleConditionFamily  -2.638e+02  3.233e+03  -0.082 0.934961    
## SaleConditionNormal   4.383e+03  1.704e+03   2.573 0.010165 *  
## SaleConditionPartial  5.334e+03  8.442e+03   0.632 0.527558    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 15630 on 1684 degrees of freedom
##   (因为不存在,30个观察量被删除了)
## Multiple R-squared:  0.966,  Adjusted R-squared:  0.9616 
## F-statistic: 219.2 on 218 and 1684 DF,  p-value: < 2.2e-16

[9 points] Q7.

Using the result of this and your general understanding of what variables should be important in determining SalePrice, choose a maximum of 15 variables and create another, smaller regression, and call it regBefore2009optimal. Then, print the summary of regBefore2009optimal to verify that your code works. Tip: Normally you would do a more detailed variable selection using a backward or step-wise selection approach but this is NOT required for this question. Tip: This is the formula for regression: lm(SalePrice ~ var1 + var2 + … + varN, data = before2009), where var1, etc. are the variables of your choice. Tip: Pick the variables with the lowest Pr(>|t|) Rubric: 8 points for setting regBefore2009optimal and 1 point for verification.

summary_reg <- summary(regBefore2009)
coef_df <- as.data.frame(summary_reg$coefficients)

coef_df_pval <- coef_df[order(coef_df[,"Pr(>|t|)"]), ]

top_var <- rownames(coef_df_pval)[1:40]
top_var
##  [1] "RoofMatlWdShngl"     "RoofMatlCompShg"     "RoofMatlTar&Grv"    
##  [4] "RoofMatlWdShake"     "RoofMatlMembran"     "RoofMatlRoll"       
##  [7] "RoofMatlMetal"       "X2ndFlrSF"           "X1stFlrSF"          
## [10] "Condition2PosN"      "BsmtFinSF1"          "KitchenQualGd"      
## [13] "YearBuilt"           "KitchenQualTA"       "LotArea"            
## [16] "OverallQual10"       "PoolArea"            "BsmtFinSF2"         
## [19] "NeighborhoodMitchel" "MSZoningFV"          "BsmtUnfSF"          
## [22] "MSZoningRM"          "NeighborhoodNWAmes"  "NeighborhoodStoneBr"
## [25] "NeighborhoodEdwards" "MasVnrArea"          "Condition1Norm"     
## [28] "Fireplaces"          "MSZoningRL"          "RoofStyleShed"      
## [31] "LandSlopeMod"        "NeighborhoodNAmes"   "LandContourLvl"     
## [34] "OverallQual9"        "Condition2RRAe"      "StreetPave"         
## [37] "LandContourHLS"      "WoodDeckSF"          "NeighborhoodGilbert"
## [40] "NeighborhoodOldTown"
regBefore2009optimal <- lm(SalePrice ~ RoofMatl + LandSlope + BsmtUnfSF + OverallQual+ 
                             Condition2 + MSZoning + Neighborhood + LotArea + OverallCond +
                             Foundation + BedroomAbvGr + EnclosedPorch + BsmtFinSF1 +BsmtFinSF2 + 
                             MasVnrType, data = before2009)

summary(regBefore2009optimal)
## 
## Call:
## lm(formula = SalePrice ~ RoofMatl + LandSlope + BsmtUnfSF + OverallQual + 
##     Condition2 + MSZoning + Neighborhood + LotArea + OverallCond + 
##     Foundation + BedroomAbvGr + EnclosedPorch + BsmtFinSF1 + 
##     BsmtFinSF2 + MasVnrType, data = before2009)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -126988  -15520   -1473   13956  187597 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -6.723e+05  5.381e+04 -12.493  < 2e-16 ***
## RoofMatlCompShg      5.839e+05  3.168e+04  18.433  < 2e-16 ***
## RoofMatlMembran      6.160e+05  4.481e+04  13.747  < 2e-16 ***
## RoofMatlMetal        6.445e+05  4.510e+04  14.292  < 2e-16 ***
## RoofMatlRoll         5.786e+05  4.287e+04  13.495  < 2e-16 ***
## RoofMatlTar&Grv      5.862e+05  3.250e+04  18.040  < 2e-16 ***
## RoofMatlWdShake      5.870e+05  3.362e+04  17.460  < 2e-16 ***
## RoofMatlWdShngl      6.633e+05  3.346e+04  19.825  < 2e-16 ***
## LandSlopeMod         1.080e+04  3.589e+03   3.009 0.002654 ** 
## LandSlopeSev        -4.146e+04  1.213e+04  -3.418 0.000645 ***
## BsmtUnfSF            2.807e+01  2.443e+00  11.488  < 2e-16 ***
## OverallQual2         4.370e+04  3.308e+04   1.321 0.186691    
## OverallQual3         3.858e+04  3.060e+04   1.261 0.207594    
## OverallQual4         4.067e+04  3.048e+04   1.334 0.182334    
## OverallQual5         4.746e+04  3.054e+04   1.554 0.120390    
## OverallQual6         6.640e+04  3.062e+04   2.169 0.030228 *  
## OverallQual7         9.085e+04  3.065e+04   2.964 0.003074 ** 
## OverallQual8         1.245e+05  3.070e+04   4.054 5.24e-05 ***
## OverallQual9         1.729e+05  3.092e+04   5.594 2.56e-08 ***
## OverallQual10        2.870e+05  3.162e+04   9.079  < 2e-16 ***
## Condition2Feedr     -7.713e+03  1.734e+04  -0.445 0.656452    
## Condition2Norm       1.214e+03  1.464e+04   0.083 0.933906    
## Condition2PosA      -4.793e+04  2.348e+04  -2.041 0.041388 *  
## Condition2PosN      -2.343e+05  2.270e+04 -10.320  < 2e-16 ***
## Condition2RRAe       2.725e+04  3.238e+04   0.842 0.400075    
## Condition2RRAn      -2.672e+04  3.244e+04  -0.824 0.410288    
## Condition2RRNn       1.542e+03  2.498e+04   0.062 0.950802    
## MSZoningFV           4.900e+04  1.121e+04   4.373 1.30e-05 ***
## MSZoningRH           2.496e+04  1.181e+04   2.113 0.034729 *  
## MSZoningRL           4.049e+04  9.214e+03   4.394 1.18e-05 ***
## MSZoningRM           3.501e+04  8.642e+03   4.052 5.30e-05 ***
## NeighborhoodBlueste -2.377e+04  1.643e+04  -1.447 0.148186    
## NeighborhoodBrDale  -3.148e+04  9.979e+03  -3.155 0.001634 ** 
## NeighborhoodBrkSide -1.670e+04  8.244e+03  -2.025 0.042981 *  
## NeighborhoodClearCr -1.231e+04  9.202e+03  -1.338 0.181206    
## NeighborhoodCollgCr -9.268e+03  6.955e+03  -1.333 0.182795    
## NeighborhoodCrawfor  8.885e+03  7.769e+03   1.144 0.252898    
## NeighborhoodEdwards -3.248e+04  7.502e+03  -4.329 1.58e-05 ***
## NeighborhoodGilbert -3.688e+03  7.360e+03  -0.501 0.616348    
## NeighborhoodIDOTRR  -2.445e+04  8.843e+03  -2.765 0.005757 ** 
## NeighborhoodMeadowV -3.192e+04  9.947e+03  -3.209 0.001355 ** 
## NeighborhoodMitchel -3.232e+04  7.699e+03  -4.197 2.83e-05 ***
## NeighborhoodNAmes   -2.768e+04  7.232e+03  -3.827 0.000134 ***
## NeighborhoodNoRidge  4.475e+04  8.059e+03   5.553 3.22e-08 ***
## NeighborhoodNPkVill -2.614e+04  1.190e+04  -2.196 0.028248 *  
## NeighborhoodNridgHt  2.934e+04  7.500e+03   3.912 9.49e-05 ***
## NeighborhoodNWAmes  -2.052e+04  7.602e+03  -2.699 0.007009 ** 
## NeighborhoodOldTown -2.335e+04  8.078e+03  -2.891 0.003889 ** 
## NeighborhoodSawyer  -3.012e+04  7.633e+03  -3.946 8.23e-05 ***
## NeighborhoodSawyerW -8.314e+03  7.599e+03  -1.094 0.274112    
## NeighborhoodSomerst -8.833e+03  8.598e+03  -1.027 0.304385    
## NeighborhoodStoneBr  3.185e+04  8.446e+03   3.770 0.000168 ***
## NeighborhoodSWISU   -2.465e+04  9.145e+03  -2.696 0.007091 ** 
## NeighborhoodTimber  -6.031e+03  7.992e+03  -0.755 0.450570    
## NeighborhoodVeenker  1.021e+04  9.536e+03   1.071 0.284485    
## LotArea              1.321e+00  1.176e-01  11.231  < 2e-16 ***
## OverallCond2         1.281e+04  2.991e+04   0.428 0.668375    
## OverallCond3         2.431e+04  2.223e+04   1.094 0.274248    
## OverallCond4         3.282e+04  2.204e+04   1.489 0.136691    
## OverallCond5         3.912e+04  2.199e+04   1.779 0.075474 .  
## OverallCond6         4.538e+04  2.203e+04   2.060 0.039556 *  
## OverallCond7         5.190e+04  2.205e+04   2.354 0.018690 *  
## OverallCond8         5.651e+04  2.213e+04   2.553 0.010746 *  
## OverallCond9         7.254e+04  2.274e+04   3.190 0.001445 ** 
## FoundationCBlock     4.564e+03  2.831e+03   1.612 0.107069    
## FoundationPConc      1.883e+04  3.082e+03   6.109 1.22e-09 ***
## FoundationSlab       3.196e+04  6.716e+03   4.759 2.10e-06 ***
## FoundationStone     -8.538e+03  1.225e+04  -0.697 0.485947    
## FoundationWood       7.437e+03  2.099e+04   0.354 0.723098    
## BedroomAbvGr         1.307e+04  9.057e+02  14.432  < 2e-16 ***
## EnclosedPorch       -1.836e+00  1.095e+01  -0.168 0.866779    
## BsmtFinSF1           5.586e+01  2.483e+00  22.499  < 2e-16 ***
## BsmtFinSF2           5.029e+01  4.559e+00  11.030  < 2e-16 ***
## MasVnrTypeBrkFace    2.000e+04  6.728e+03   2.973 0.002989 ** 
## MasVnrTypeNone       1.593e+04  6.630e+03   2.402 0.016394 *  
## MasVnrTypeStone      2.614e+04  7.143e+03   3.660 0.000260 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 28640 on 1828 degrees of freedom
##   (因为不存在,29个观察量被删除了)
## Multiple R-squared:  0.8759, Adjusted R-squared:  0.8708 
## F-statistic:   172 on 75 and 1828 DF,  p-value: < 2.2e-16

[5 points] Q8.

Display diagnostic plots of your regression. Tip: The diagnostic plots include QQ-Plot, Residual versus Fitted Values plot, a \(\sqrt{Standardized \; Residuals}\) vs Fitted Values plot, and a Standardized Residuals vs Leverage plot. Do not worry if your residuals have a slight curve to them. Tip: Google “Plotting Diagnostics for Linear Models - CRAN” and don’t use any arguments for the function autoplot at this time.

#install.packages("ggfortify")
library(ggfortify)
## Warning: 程辑包'ggfortify'是用R版本4.3.2 来建造的
regBefore2009optimal %>%
  autoplot()
## Warning: Removed 1904 rows containing missing values (`geom_line()`).
## Warning: Removed 7 rows containing missing values (`geom_point()`).
## Warning: Removed 14 rows containing missing values (`geom_line()`).

[5 points] Q9.

Now read in the PricesAfter2009.csv data and assign it to a variable called after2009. The dataset contains data for house prices after 2009. Then, repeat your data manipulation operations from Q2 and Q3 on this new dataset. Drop (remove) unnecessary columns that you dropped in Q5.. Rubric: 1 point for reading and 4 points for data manipulation.

after2009 <- read.csv("PricesAfter2009.csv")
head(after2009,20)
##     X Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape
## 1   1  6         50       RL          85   14115   Pave  <NA>      IR1
## 2   2  8         60       RL          NA   10382   Pave  <NA>      IR1
## 3   3 17         20       RL          NA   11241   Pave  <NA>      IR1
## 4   4 20         20       RL          70    7560   Pave  <NA>      Reg
## 5   5 25         20       RL          NA    8246   Pave  <NA>      IR1
## 6   6 26         20       RL         110   14230   Pave  <NA>      Reg
## 7   7 27         20       RL          60    7200   Pave  <NA>      Reg
## 8   8 28         20       RL          98   11478   Pave  <NA>      Reg
## 9   9 34         20       RL          70   10552   Pave  <NA>      IR1
## 10 10 37         20       RL         112   10859   Pave  <NA>      Reg
## 11 11 38         20       RL          74    8532   Pave  <NA>      Reg
## 12 12 39         20       RL          68    7922   Pave  <NA>      Reg
## 13 13 46        120       RL          61    7658   Pave  <NA>      Reg
## 14 14 47         50       RL          48   12822   Pave  <NA>      IR1
## 15 15 49        190       RM          33    4456   Pave  <NA>      Reg
## 16 16 53         90       RM         110    8472   Grvl  <NA>      IR2
## 17 17 57        160       FV          24    2645   Pave  Pave      Reg
## 18 18 64         70       RM          50   10300   Pave  <NA>      IR1
## 19 19 65         60       RL          NA    9375   Pave  <NA>      Reg
## 20 20 67         20       RL          NA   19900   Pave  <NA>      Reg
##    LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2
## 1          Lvl    AllPub    Inside       Gtl      Mitchel       Norm       Norm
## 2          Lvl    AllPub    Corner       Gtl       NWAmes       PosN       Norm
## 3          Lvl    AllPub   CulDSac       Gtl        NAmes       Norm       Norm
## 4          Lvl    AllPub    Inside       Gtl        NAmes       Norm       Norm
## 5          Lvl    AllPub    Inside       Gtl       Sawyer       Norm       Norm
## 6          Lvl    AllPub    Corner       Gtl      NridgHt       Norm       Norm
## 7          Lvl    AllPub    Corner       Gtl        NAmes       Norm       Norm
## 8          Lvl    AllPub    Inside       Gtl      NridgHt       Norm       Norm
## 9          Lvl    AllPub    Inside       Gtl        NAmes       Norm       Norm
## 10         Lvl    AllPub    Corner       Gtl      CollgCr       Norm       Norm
## 11         Lvl    AllPub    Inside       Gtl        NAmes       Norm       Norm
## 12         Lvl    AllPub    Inside       Gtl        NAmes       Norm       Norm
## 13         Lvl    AllPub    Inside       Gtl      NridgHt       Norm       Norm
## 14         Lvl    AllPub   CulDSac       Gtl      Mitchel       Norm       Norm
## 15         Lvl    AllPub    Inside       Gtl      OldTown       Norm       Norm
## 16         Bnk    AllPub    Corner       Mod       IDOTRR       RRNn       Norm
## 17         Lvl    AllPub    Inside       Gtl      Somerst       Norm       Norm
## 18         Bnk    AllPub    Inside       Gtl      OldTown       RRAn      Feedr
## 19         Lvl    AllPub    Inside       Gtl      CollgCr       Norm       Norm
## 20         Lvl    AllPub    Inside       Gtl        NAmes       PosA       Norm
##    BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle
## 1      1Fam     1.5Fin           5           5      1993         1995     Gable
## 2      1Fam     2Story           7           6      1973         1973     Gable
## 3      1Fam     1Story           6           7      1970         1970     Gable
## 4      1Fam     1Story           5           6      1958         1965       Hip
## 5      1Fam     1Story           5           8      1968         2001     Gable
## 6      1Fam     1Story           8           5      2007         2007     Gable
## 7      1Fam     1Story           5           7      1951         2000     Gable
## 8      1Fam     1Story           8           5      2007         2008     Gable
## 9      1Fam     1Story           5           5      1959         1959       Hip
## 10     1Fam     1Story           5           5      1994         1995     Gable
## 11     1Fam     1Story           5           6      1954         1990       Hip
## 12     1Fam     1Story           5           7      1953         2007     Gable
## 13   TwnhsE     1Story           9           5      2005         2005       Hip
## 14     1Fam     1.5Fin           7           5      2003         2003     Gable
## 15   2fmCon     2Story           4           5      1920         2008     Gable
## 16   Duplex     1Story           5           5      1963         1963     Gable
## 17    Twnhs     2Story           8           5      1999         2000     Gable
## 18     1Fam     2Story           7           6      1921         1950     Gable
## 19     1Fam     2Story           7           5      1997         1998     Gable
## 20     1Fam     1Story           7           5      1970         1989     Gable
##    RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond
## 1   CompShg     VinylSd     VinylSd       None          0        TA        TA
## 2   CompShg     HdBoard     HdBoard      Stone        240        TA        TA
## 3   CompShg     Wd Sdng     Wd Sdng    BrkFace        180        TA        TA
## 4   CompShg     BrkFace     Plywood       None          0        TA        TA
## 5   CompShg     Plywood     Plywood       None          0        TA        Gd
## 6   CompShg     VinylSd     VinylSd      Stone        640        Gd        TA
## 7   CompShg     Wd Sdng     Wd Sdng       None          0        TA        TA
## 8   CompShg     VinylSd     VinylSd      Stone        200        Gd        TA
## 9   CompShg     BrkFace     BrkFace       None          0        TA        TA
## 10  CompShg     VinylSd     VinylSd       None          0        TA        TA
## 11  CompShg     Wd Sdng     Wd Sdng    BrkFace        650        TA        TA
## 12  CompShg     VinylSd     VinylSd       None          0        TA        Gd
## 13  CompShg     MetalSd     MetalSd    BrkFace        412        Ex        TA
## 14  CompShg     VinylSd     VinylSd       None          0        Gd        TA
## 15  CompShg     MetalSd     MetalSd       None          0        TA        TA
## 16  CompShg     Wd Sdng     Wd Sdng       None          0        Fa        TA
## 17  CompShg     MetalSd     MetalSd    BrkFace        456        Gd        TA
## 18  CompShg      Stucco      Stucco       None          0        TA        TA
## 19  CompShg     VinylSd     VinylSd    BrkFace        573        TA        TA
## 20  CompShg     Plywood     Plywood    BrkFace        287        TA        TA
##    Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1
## 1        Wood       Gd       TA           No          GLQ        732
## 2      CBlock       Gd       TA           Mn          ALQ        859
## 3      CBlock       TA       TA           No          ALQ        578
## 4      CBlock       TA       TA           No          LwQ        504
## 5      CBlock       TA       TA           Mn          Rec        188
## 6       PConc       Gd       TA           No          Unf          0
## 7      CBlock       TA       TA           Mn          BLQ        234
## 8       PConc       Ex       TA           No          GLQ       1218
## 9      CBlock       TA       TA           No          Rec       1018
## 10      PConc       Gd       TA           No          Unf          0
## 11     CBlock       TA       TA           No          Rec       1213
## 12     CBlock       TA       TA           No          GLQ        731
## 13      PConc       Ex       TA           No          GLQ        456
## 14      PConc       Ex       TA           No          GLQ       1351
## 15     BrkTil       TA       TA           No          Unf          0
## 16     CBlock       Gd       TA           Gd          LwQ        104
## 17      PConc       Gd       TA           No          GLQ        649
## 18     BrkTil       TA       TA           No          Unf          0
## 19      PConc       Gd       TA           No          GLQ        739
## 20     CBlock       Gd       TA           Gd          GLQ        912
##    BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir
## 1           Unf          0        64         796    GasA        Ex          Y
## 2           BLQ         32       216        1107    GasA        Ex          Y
## 3           Unf          0       426        1004    GasA        Ex          Y
## 4           Unf          0       525        1029    GasA        TA          Y
## 5           ALQ        668       204        1060    GasA        Ex          Y
## 6           Unf          0      1566        1566    GasA        Ex          Y
## 7           Rec        486       180         900    GasA        TA          Y
## 8           Unf          0       486        1704    GasA        Ex          Y
## 9           Unf          0       380        1398    GasA        Gd          Y
## 10          Unf          0      1097        1097    GasA        Ex          Y
## 11          Unf          0        84        1297    GasA        Gd          Y
## 12          Unf          0       326        1057    GasA        TA          Y
## 13          Unf          0      1296        1752    GasA        Ex          Y
## 14          Unf          0        83        1434    GasA        Ex          Y
## 15          Unf          0       736         736    GasA        Gd          Y
## 16          GLQ        712         0         816    GasA        TA          N
## 17          Unf          0       321         970    GasA        Ex          Y
## 18          Unf          0       576         576    GasA        Gd          Y
## 19          Unf          0       318        1057    GasA        Ex          Y
## 20          Unf          0      1035        1947    GasA        TA          Y
##    Electrical X1stFlrSF X2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath
## 1       SBrkr       796       566            0      1362            1
## 2       SBrkr      1107       983            0      2090            1
## 3       SBrkr      1004         0            0      1004            1
## 4       SBrkr      1339         0            0      1339            0
## 5       SBrkr      1060         0            0      1060            1
## 6       SBrkr      1600         0            0      1600            0
## 7       SBrkr       900         0            0       900            0
## 8       SBrkr      1704         0            0      1704            1
## 9       SBrkr      1700         0            0      1700            0
## 10      SBrkr      1097         0            0      1097            0
## 11      SBrkr      1297         0            0      1297            0
## 12      SBrkr      1057         0            0      1057            1
## 13      SBrkr      1752         0            0      1752            1
## 14      SBrkr      1518       631            0      2149            1
## 15      SBrkr       736       716            0      1452            0
## 16      SBrkr       816         0            0       816            1
## 17      SBrkr       983       756            0      1739            1
## 18      SBrkr       902       808            0      1710            0
## 19      SBrkr      1057       977            0      2034            1
## 20      SBrkr      2207         0            0      2207            1
##    BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual
## 1             0        1        1            1            1          TA
## 2             0        2        1            3            1          TA
## 3             0        1        0            2            1          TA
## 4             0        1        0            3            1          TA
## 5             0        1        0            3            1          Gd
## 6             0        2        0            3            1          Gd
## 7             1        1        0            3            1          Gd
## 8             0        2        0            3            1          Gd
## 9             1        1        1            4            1          Gd
## 10            0        1        1            3            1          TA
## 11            1        1        0            3            1          TA
## 12            0        1        0            3            1          Gd
## 13            0        2        0            2            1          Ex
## 14            0        1        1            1            1          Gd
## 15            0        2        0            2            3          TA
## 16            0        1        0            2            1          TA
## 17            0        2        1            3            1          Gd
## 18            0        2        0            3            1          TA
## 19            0        2        1            3            1          Gd
## 20            0        2        0            3            1          TA
##    TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt
## 1             5        Typ          0        <NA>     Attchd        1993
## 2             7        Typ          2          TA     Attchd        1973
## 3             5        Typ          1          TA     Attchd        1970
## 4             6       Min1          0        <NA>     Attchd        1958
## 5             6        Typ          1          TA     Attchd        1968
## 6             7        Typ          1          Gd     Attchd        2007
## 7             5        Typ          0        <NA>     Detchd        2005
## 8             7        Typ          1          Gd     Attchd        2008
## 9             6        Typ          1          Gd     Attchd        1959
## 10            6        Typ          0        <NA>     Attchd        1995
## 11            5        Typ          1          TA     Attchd        1954
## 12            5        Typ          0        <NA>     Detchd        1953
## 13            6        Typ          1          Gd     Attchd        2005
## 14            6        Typ          1          Ex     Attchd        2003
## 15            8        Typ          0        <NA>       <NA>          NA
## 16            5        Typ          0        <NA>    CarPort        1963
## 17            7        Typ          0        <NA>     Attchd        1999
## 18            9        Typ          0        <NA>     Detchd        1990
## 19            8        Typ          0        <NA>     Attchd        1998
## 20            7       Min1          1          Gd     Attchd        1970
##    GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive
## 1           Unf          2        480         TA         TA          Y
## 2           RFn          2        484         TA         TA          Y
## 3           Fin          2        480         TA         TA          Y
## 4           Unf          1        294         TA         TA          Y
## 5           Unf          1        270         TA         TA          Y
## 6           RFn          3        890         TA         TA          Y
## 7           Unf          2        576         TA         TA          Y
## 8           RFn          3        772         TA         TA          Y
## 9           RFn          2        447         TA         TA          Y
## 10          Unf          2        672         TA         TA          Y
## 11          Fin          2        498         TA         TA          Y
## 12          Unf          1        246         TA         TA          Y
## 13          RFn          2        576         TA         TA          Y
## 14          RFn          2        670         TA         TA          Y
## 15         <NA>          0          0       <NA>       <NA>          N
## 16          Unf          2        516         TA         TA          Y
## 17          Fin          2        480         TA         TA          Y
## 18          Unf          2        480         TA         TA          Y
## 19          RFn          2        645         TA         TA          Y
## 20          RFn          2        576         TA         TA          Y
##    WoodDeckSF OpenPorchSF EnclosedPorch X3SsnPorch ScreenPorch PoolArea PoolQC
## 1          40          30             0        320           0        0     NA
## 2         235         204           228          0           0        0     NA
## 3           0           0             0          0           0        0     NA
## 4           0           0             0          0           0        0     NA
## 5         406          90             0          0           0        0     NA
## 6           0          56             0          0           0        0     NA
## 7         222          32             0          0           0        0     NA
## 8           0          50             0          0           0        0     NA
## 9           0          38             0          0           0        0     NA
## 10        392          64             0          0           0        0     NA
## 11          0           0             0          0           0        0     NA
## 12          0          52             0          0           0        0     NA
## 13        196          82             0          0           0        0     NA
## 14        168          43             0          0         198        0     NA
## 15          0           0           102          0           0        0     NA
## 16        106           0             0          0           0        0     NA
## 17        115           0             0          0           0        0     NA
## 18         12          11            64          0           0        0     NA
## 19        576          36             0          0           0        0     NA
## 20        301           0             0          0           0        0     NA
##    Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition SalePrice
## 1  MnPrv        Shed     700     10   2009       WD        Normal    143000
## 2   <NA>        Shed     350     11   2009       WD        Normal    200000
## 3   <NA>        Shed     700      3   2010       WD        Normal    149000
## 4  MnPrv        <NA>       0      5   2009      COD       Abnorml    139000
## 5  MnPrv        <NA>       0      5   2010       WD        Normal    154000
## 6   <NA>        <NA>       0      7   2009       WD        Normal    256300
## 7   <NA>        <NA>       0      5   2010       WD        Normal    134800
## 8   <NA>        <NA>       0      5   2010       WD        Normal    306000
## 9   <NA>        <NA>       0      4   2010       WD        Normal    165500
## 10  <NA>        <NA>       0      6   2009       WD        Normal    145000
## 11  <NA>        <NA>       0     10   2009       WD        Normal    153000
## 12  <NA>        <NA>       0      1   2010       WD       Abnorml    109000
## 13  <NA>        <NA>       0      2   2010       WD        Normal    319900
## 14  <NA>        <NA>       0      8   2009       WD       Abnorml    239686
## 15  <NA>        <NA>       0      6   2009      New       Partial    113000
## 16  <NA>        <NA>       0      5   2010       WD        Normal    110000
## 17  <NA>        <NA>       0      8   2009       WD       Abnorml    172500
## 18 GdPrv        <NA>       0      4   2010       WD        Normal    140000
## 19 GdPrv        <NA>       0      2   2009       WD        Normal    219500
## 20  <NA>        <NA>       0      7   2010       WD        Normal    180000
str(after2009)
## 'data.frame':    986 obs. of  82 variables:
##  $ X            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Id           : int  6 8 17 20 25 26 27 28 34 37 ...
##  $ MSSubClass   : int  50 60 20 20 20 20 20 20 20 20 ...
##  $ MSZoning     : chr  "RL" "RL" "RL" "RL" ...
##  $ LotFrontage  : int  85 NA NA 70 NA 110 60 98 70 112 ...
##  $ LotArea      : int  14115 10382 11241 7560 8246 14230 7200 11478 10552 10859 ...
##  $ Street       : chr  "Pave" "Pave" "Pave" "Pave" ...
##  $ Alley        : chr  NA NA NA NA ...
##  $ LotShape     : chr  "IR1" "IR1" "IR1" "Reg" ...
##  $ LandContour  : chr  "Lvl" "Lvl" "Lvl" "Lvl" ...
##  $ Utilities    : chr  "AllPub" "AllPub" "AllPub" "AllPub" ...
##  $ LotConfig    : chr  "Inside" "Corner" "CulDSac" "Inside" ...
##  $ LandSlope    : chr  "Gtl" "Gtl" "Gtl" "Gtl" ...
##  $ Neighborhood : chr  "Mitchel" "NWAmes" "NAmes" "NAmes" ...
##  $ Condition1   : chr  "Norm" "PosN" "Norm" "Norm" ...
##  $ Condition2   : chr  "Norm" "Norm" "Norm" "Norm" ...
##  $ BldgType     : chr  "1Fam" "1Fam" "1Fam" "1Fam" ...
##  $ HouseStyle   : chr  "1.5Fin" "2Story" "1Story" "1Story" ...
##  $ OverallQual  : int  5 7 6 5 5 8 5 8 5 5 ...
##  $ OverallCond  : int  5 6 7 6 8 5 7 5 5 5 ...
##  $ YearBuilt    : int  1993 1973 1970 1958 1968 2007 1951 2007 1959 1994 ...
##  $ YearRemodAdd : int  1995 1973 1970 1965 2001 2007 2000 2008 1959 1995 ...
##  $ RoofStyle    : chr  "Gable" "Gable" "Gable" "Hip" ...
##  $ RoofMatl     : chr  "CompShg" "CompShg" "CompShg" "CompShg" ...
##  $ Exterior1st  : chr  "VinylSd" "HdBoard" "Wd Sdng" "BrkFace" ...
##  $ Exterior2nd  : chr  "VinylSd" "HdBoard" "Wd Sdng" "Plywood" ...
##  $ MasVnrType   : chr  "None" "Stone" "BrkFace" "None" ...
##  $ MasVnrArea   : int  0 240 180 0 0 640 0 200 0 0 ...
##  $ ExterQual    : chr  "TA" "TA" "TA" "TA" ...
##  $ ExterCond    : chr  "TA" "TA" "TA" "TA" ...
##  $ Foundation   : chr  "Wood" "CBlock" "CBlock" "CBlock" ...
##  $ BsmtQual     : chr  "Gd" "Gd" "TA" "TA" ...
##  $ BsmtCond     : chr  "TA" "TA" "TA" "TA" ...
##  $ BsmtExposure : chr  "No" "Mn" "No" "No" ...
##  $ BsmtFinType1 : chr  "GLQ" "ALQ" "ALQ" "LwQ" ...
##  $ BsmtFinSF1   : int  732 859 578 504 188 0 234 1218 1018 0 ...
##  $ BsmtFinType2 : chr  "Unf" "BLQ" "Unf" "Unf" ...
##  $ BsmtFinSF2   : int  0 32 0 0 668 0 486 0 0 0 ...
##  $ BsmtUnfSF    : int  64 216 426 525 204 1566 180 486 380 1097 ...
##  $ TotalBsmtSF  : int  796 1107 1004 1029 1060 1566 900 1704 1398 1097 ...
##  $ Heating      : chr  "GasA" "GasA" "GasA" "GasA" ...
##  $ HeatingQC    : chr  "Ex" "Ex" "Ex" "TA" ...
##  $ CentralAir   : chr  "Y" "Y" "Y" "Y" ...
##  $ Electrical   : chr  "SBrkr" "SBrkr" "SBrkr" "SBrkr" ...
##  $ X1stFlrSF    : int  796 1107 1004 1339 1060 1600 900 1704 1700 1097 ...
##  $ X2ndFlrSF    : int  566 983 0 0 0 0 0 0 0 0 ...
##  $ LowQualFinSF : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ GrLivArea    : int  1362 2090 1004 1339 1060 1600 900 1704 1700 1097 ...
##  $ BsmtFullBath : int  1 1 1 0 1 0 0 1 0 0 ...
##  $ BsmtHalfBath : int  0 0 0 0 0 0 1 0 1 0 ...
##  $ FullBath     : int  1 2 1 1 1 2 1 2 1 1 ...
##  $ HalfBath     : int  1 1 0 0 0 0 0 0 1 1 ...
##  $ BedroomAbvGr : int  1 3 2 3 3 3 3 3 4 3 ...
##  $ KitchenAbvGr : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ KitchenQual  : chr  "TA" "TA" "TA" "TA" ...
##  $ TotRmsAbvGrd : int  5 7 5 6 6 7 5 7 6 6 ...
##  $ Functional   : chr  "Typ" "Typ" "Typ" "Min1" ...
##  $ Fireplaces   : int  0 2 1 0 1 1 0 1 1 0 ...
##  $ FireplaceQu  : chr  NA "TA" "TA" NA ...
##  $ GarageType   : chr  "Attchd" "Attchd" "Attchd" "Attchd" ...
##  $ GarageYrBlt  : int  1993 1973 1970 1958 1968 2007 2005 2008 1959 1995 ...
##  $ GarageFinish : chr  "Unf" "RFn" "Fin" "Unf" ...
##  $ GarageCars   : int  2 2 2 1 1 3 2 3 2 2 ...
##  $ GarageArea   : int  480 484 480 294 270 890 576 772 447 672 ...
##  $ GarageQual   : chr  "TA" "TA" "TA" "TA" ...
##  $ GarageCond   : chr  "TA" "TA" "TA" "TA" ...
##  $ PavedDrive   : chr  "Y" "Y" "Y" "Y" ...
##  $ WoodDeckSF   : int  40 235 0 0 406 0 222 0 0 392 ...
##  $ OpenPorchSF  : int  30 204 0 0 90 56 32 50 38 64 ...
##  $ EnclosedPorch: int  0 228 0 0 0 0 0 0 0 0 ...
##  $ X3SsnPorch   : int  320 0 0 0 0 0 0 0 0 0 ...
##  $ ScreenPorch  : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ PoolArea     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ PoolQC       : logi  NA NA NA NA NA NA ...
##  $ Fence        : chr  "MnPrv" NA NA "MnPrv" ...
##  $ MiscFeature  : chr  "Shed" "Shed" "Shed" NA ...
##  $ MiscVal      : int  700 350 700 0 0 0 0 0 0 0 ...
##  $ MoSold       : int  10 11 3 5 5 7 5 5 4 6 ...
##  $ YrSold       : int  2009 2009 2010 2009 2010 2009 2010 2010 2010 2009 ...
##  $ SaleType     : chr  "WD" "WD" "WD" "COD" ...
##  $ SaleCondition: chr  "Normal" "Normal" "Normal" "Abnorml" ...
##  $ SalePrice    : num  143000 200000 149000 139000 154000 ...
after2009$MSSubClass <- as.factor(after2009$MSSubClass)
after2009$OverallQual <- as.factor(after2009$OverallQual)
after2009$OverallCond <- as.factor(after2009$OverallCond)

str(after2009[, c("MSSubClass", "OverallQual","OverallCond")])
## 'data.frame':    986 obs. of  3 variables:
##  $ MSSubClass : Factor w/ 15 levels "20","30","40",..: 5 6 1 1 1 1 1 1 1 1 ...
##  $ OverallQual: Factor w/ 10 levels "1","2","3","4",..: 5 7 6 5 5 8 5 8 5 5 ...
##  $ OverallCond: Factor w/ 9 levels "1","2","3","4",..: 5 6 7 6 8 5 7 5 5 5 ...
count_NA_after <- sapply(after2009, function(x) sum(is.na(x)))
cols_NA_after <- names(count_NA[count_NA >= 20])

count_NA_after["SalePrice"] <- 0
cols_NA_after <- names(count_NA[count_NA >= 20])

cols_drop_after <- c("X","Id","Utilities")

dropCols_after <- union(cols_NA_after, cols_drop_after)

after2009 <- select(after2009, -dropCols_after)
## Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
##   # Was:
##   data %>% select(dropCols_after)
## 
##   # Now:
##   data %>% select(all_of(dropCols_after))
## 
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
head(after2009, 20)
##    MSSubClass MSZoning LotArea Street LotShape LandContour LotConfig LandSlope
## 1          50       RL   14115   Pave      IR1         Lvl    Inside       Gtl
## 2          60       RL   10382   Pave      IR1         Lvl    Corner       Gtl
## 3          20       RL   11241   Pave      IR1         Lvl   CulDSac       Gtl
## 4          20       RL    7560   Pave      Reg         Lvl    Inside       Gtl
## 5          20       RL    8246   Pave      IR1         Lvl    Inside       Gtl
## 6          20       RL   14230   Pave      Reg         Lvl    Corner       Gtl
## 7          20       RL    7200   Pave      Reg         Lvl    Corner       Gtl
## 8          20       RL   11478   Pave      Reg         Lvl    Inside       Gtl
## 9          20       RL   10552   Pave      IR1         Lvl    Inside       Gtl
## 10         20       RL   10859   Pave      Reg         Lvl    Corner       Gtl
## 11         20       RL    8532   Pave      Reg         Lvl    Inside       Gtl
## 12         20       RL    7922   Pave      Reg         Lvl    Inside       Gtl
## 13        120       RL    7658   Pave      Reg         Lvl    Inside       Gtl
## 14         50       RL   12822   Pave      IR1         Lvl   CulDSac       Gtl
## 15        190       RM    4456   Pave      Reg         Lvl    Inside       Gtl
## 16         90       RM    8472   Grvl      IR2         Bnk    Corner       Mod
## 17        160       FV    2645   Pave      Reg         Lvl    Inside       Gtl
## 18         70       RM   10300   Pave      IR1         Bnk    Inside       Gtl
## 19         60       RL    9375   Pave      Reg         Lvl    Inside       Gtl
## 20         20       RL   19900   Pave      Reg         Lvl    Inside       Gtl
##    Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual
## 1       Mitchel       Norm       Norm     1Fam     1.5Fin           5
## 2        NWAmes       PosN       Norm     1Fam     2Story           7
## 3         NAmes       Norm       Norm     1Fam     1Story           6
## 4         NAmes       Norm       Norm     1Fam     1Story           5
## 5        Sawyer       Norm       Norm     1Fam     1Story           5
## 6       NridgHt       Norm       Norm     1Fam     1Story           8
## 7         NAmes       Norm       Norm     1Fam     1Story           5
## 8       NridgHt       Norm       Norm     1Fam     1Story           8
## 9         NAmes       Norm       Norm     1Fam     1Story           5
## 10      CollgCr       Norm       Norm     1Fam     1Story           5
## 11        NAmes       Norm       Norm     1Fam     1Story           5
## 12        NAmes       Norm       Norm     1Fam     1Story           5
## 13      NridgHt       Norm       Norm   TwnhsE     1Story           9
## 14      Mitchel       Norm       Norm     1Fam     1.5Fin           7
## 15      OldTown       Norm       Norm   2fmCon     2Story           4
## 16       IDOTRR       RRNn       Norm   Duplex     1Story           5
## 17      Somerst       Norm       Norm    Twnhs     2Story           8
## 18      OldTown       RRAn      Feedr     1Fam     2Story           7
## 19      CollgCr       Norm       Norm     1Fam     2Story           7
## 20        NAmes       PosA       Norm     1Fam     1Story           7
##    OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st
## 1            5      1993         1995     Gable  CompShg     VinylSd
## 2            6      1973         1973     Gable  CompShg     HdBoard
## 3            7      1970         1970     Gable  CompShg     Wd Sdng
## 4            6      1958         1965       Hip  CompShg     BrkFace
## 5            8      1968         2001     Gable  CompShg     Plywood
## 6            5      2007         2007     Gable  CompShg     VinylSd
## 7            7      1951         2000     Gable  CompShg     Wd Sdng
## 8            5      2007         2008     Gable  CompShg     VinylSd
## 9            5      1959         1959       Hip  CompShg     BrkFace
## 10           5      1994         1995     Gable  CompShg     VinylSd
## 11           6      1954         1990       Hip  CompShg     Wd Sdng
## 12           7      1953         2007     Gable  CompShg     VinylSd
## 13           5      2005         2005       Hip  CompShg     MetalSd
## 14           5      2003         2003     Gable  CompShg     VinylSd
## 15           5      1920         2008     Gable  CompShg     MetalSd
## 16           5      1963         1963     Gable  CompShg     Wd Sdng
## 17           5      1999         2000     Gable  CompShg     MetalSd
## 18           6      1921         1950     Gable  CompShg      Stucco
## 19           5      1997         1998     Gable  CompShg     VinylSd
## 20           5      1970         1989     Gable  CompShg     Plywood
##    Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtFinSF1
## 1      VinylSd       None          0        TA        TA       Wood        732
## 2      HdBoard      Stone        240        TA        TA     CBlock        859
## 3      Wd Sdng    BrkFace        180        TA        TA     CBlock        578
## 4      Plywood       None          0        TA        TA     CBlock        504
## 5      Plywood       None          0        TA        Gd     CBlock        188
## 6      VinylSd      Stone        640        Gd        TA      PConc          0
## 7      Wd Sdng       None          0        TA        TA     CBlock        234
## 8      VinylSd      Stone        200        Gd        TA      PConc       1218
## 9      BrkFace       None          0        TA        TA     CBlock       1018
## 10     VinylSd       None          0        TA        TA      PConc          0
## 11     Wd Sdng    BrkFace        650        TA        TA     CBlock       1213
## 12     VinylSd       None          0        TA        Gd     CBlock        731
## 13     MetalSd    BrkFace        412        Ex        TA      PConc        456
## 14     VinylSd       None          0        Gd        TA      PConc       1351
## 15     MetalSd       None          0        TA        TA     BrkTil          0
## 16     Wd Sdng       None          0        Fa        TA     CBlock        104
## 17     MetalSd    BrkFace        456        Gd        TA      PConc        649
## 18      Stucco       None          0        TA        TA     BrkTil          0
## 19     VinylSd    BrkFace        573        TA        TA      PConc        739
## 20     Plywood    BrkFace        287        TA        TA     CBlock        912
##    BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical
## 1           0        64         796    GasA        Ex          Y      SBrkr
## 2          32       216        1107    GasA        Ex          Y      SBrkr
## 3           0       426        1004    GasA        Ex          Y      SBrkr
## 4           0       525        1029    GasA        TA          Y      SBrkr
## 5         668       204        1060    GasA        Ex          Y      SBrkr
## 6           0      1566        1566    GasA        Ex          Y      SBrkr
## 7         486       180         900    GasA        TA          Y      SBrkr
## 8           0       486        1704    GasA        Ex          Y      SBrkr
## 9           0       380        1398    GasA        Gd          Y      SBrkr
## 10          0      1097        1097    GasA        Ex          Y      SBrkr
## 11          0        84        1297    GasA        Gd          Y      SBrkr
## 12          0       326        1057    GasA        TA          Y      SBrkr
## 13          0      1296        1752    GasA        Ex          Y      SBrkr
## 14          0        83        1434    GasA        Ex          Y      SBrkr
## 15          0       736         736    GasA        Gd          Y      SBrkr
## 16        712         0         816    GasA        TA          N      SBrkr
## 17          0       321         970    GasA        Ex          Y      SBrkr
## 18          0       576         576    GasA        Gd          Y      SBrkr
## 19          0       318        1057    GasA        Ex          Y      SBrkr
## 20          0      1035        1947    GasA        TA          Y      SBrkr
##    X1stFlrSF X2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath
## 1        796       566            0      1362            1            0
## 2       1107       983            0      2090            1            0
## 3       1004         0            0      1004            1            0
## 4       1339         0            0      1339            0            0
## 5       1060         0            0      1060            1            0
## 6       1600         0            0      1600            0            0
## 7        900         0            0       900            0            1
## 8       1704         0            0      1704            1            0
## 9       1700         0            0      1700            0            1
## 10      1097         0            0      1097            0            0
## 11      1297         0            0      1297            0            1
## 12      1057         0            0      1057            1            0
## 13      1752         0            0      1752            1            0
## 14      1518       631            0      2149            1            0
## 15       736       716            0      1452            0            0
## 16       816         0            0       816            1            0
## 17       983       756            0      1739            1            0
## 18       902       808            0      1710            0            0
## 19      1057       977            0      2034            1            0
## 20      2207         0            0      2207            1            0
##    FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd
## 1         1        1            1            1          TA            5
## 2         2        1            3            1          TA            7
## 3         1        0            2            1          TA            5
## 4         1        0            3            1          TA            6
## 5         1        0            3            1          Gd            6
## 6         2        0            3            1          Gd            7
## 7         1        0            3            1          Gd            5
## 8         2        0            3            1          Gd            7
## 9         1        1            4            1          Gd            6
## 10        1        1            3            1          TA            6
## 11        1        0            3            1          TA            5
## 12        1        0            3            1          Gd            5
## 13        2        0            2            1          Ex            6
## 14        1        1            1            1          Gd            6
## 15        2        0            2            3          TA            8
## 16        1        0            2            1          TA            5
## 17        2        1            3            1          Gd            7
## 18        2        0            3            1          TA            9
## 19        2        1            3            1          Gd            8
## 20        2        0            3            1          TA            7
##    Functional Fireplaces GarageCars GarageArea PavedDrive WoodDeckSF
## 1         Typ          0          2        480          Y         40
## 2         Typ          2          2        484          Y        235
## 3         Typ          1          2        480          Y          0
## 4        Min1          0          1        294          Y          0
## 5         Typ          1          1        270          Y        406
## 6         Typ          1          3        890          Y          0
## 7         Typ          0          2        576          Y        222
## 8         Typ          1          3        772          Y          0
## 9         Typ          1          2        447          Y          0
## 10        Typ          0          2        672          Y        392
## 11        Typ          1          2        498          Y          0
## 12        Typ          0          1        246          Y          0
## 13        Typ          1          2        576          Y        196
## 14        Typ          1          2        670          Y        168
## 15        Typ          0          0          0          N          0
## 16        Typ          0          2        516          Y        106
## 17        Typ          0          2        480          Y        115
## 18        Typ          0          2        480          Y         12
## 19        Typ          0          2        645          Y        576
## 20       Min1          1          2        576          Y        301
##    OpenPorchSF EnclosedPorch X3SsnPorch ScreenPorch PoolArea MiscVal MoSold
## 1           30             0        320           0        0     700     10
## 2          204           228          0           0        0     350     11
## 3            0             0          0           0        0     700      3
## 4            0             0          0           0        0       0      5
## 5           90             0          0           0        0       0      5
## 6           56             0          0           0        0       0      7
## 7           32             0          0           0        0       0      5
## 8           50             0          0           0        0       0      5
## 9           38             0          0           0        0       0      4
## 10          64             0          0           0        0       0      6
## 11           0             0          0           0        0       0     10
## 12          52             0          0           0        0       0      1
## 13          82             0          0           0        0       0      2
## 14          43             0          0         198        0       0      8
## 15           0           102          0           0        0       0      6
## 16           0             0          0           0        0       0      5
## 17           0             0          0           0        0       0      8
## 18          11            64          0           0        0       0      4
## 19          36             0          0           0        0       0      2
## 20           0             0          0           0        0       0      7
##    YrSold SaleType SaleCondition SalePrice
## 1    2009       WD        Normal    143000
## 2    2009       WD        Normal    200000
## 3    2010       WD        Normal    149000
## 4    2009      COD       Abnorml    139000
## 5    2010       WD        Normal    154000
## 6    2009       WD        Normal    256300
## 7    2010       WD        Normal    134800
## 8    2010       WD        Normal    306000
## 9    2010       WD        Normal    165500
## 10   2009       WD        Normal    145000
## 11   2009       WD        Normal    153000
## 12   2010       WD       Abnorml    109000
## 13   2010       WD        Normal    319900
## 14   2009       WD       Abnorml    239686
## 15   2009      New       Partial    113000
## 16   2010       WD        Normal    110000
## 17   2009       WD       Abnorml    172500
## 18   2010       WD        Normal    140000
## 19   2009       WD        Normal    219500
## 20   2010       WD        Normal    180000

[8 points] Q10.

Local authorities found in 2011 that there was housing fraud taking place in several neighborhoods, including NAmes, Gilbert and NridgHt, in 2009 and 2010. Make a density plot (which data scientists often use to catch outliers or anomalous activity) of SalePrice (after 2009) for all the neighborhoods (with or without fraud) and arrange them all in a grid. Tip: I recommend using ggplot2 for these plots with facet_wrap(~ Neighborhood). Your call will look something like this: ggplot(data = …, aes(…)) + geom_density() + facet_wrap(~ …) + ggtitle(“…”) + xlab(‘…’)

#install.packages("ggplot2")
library(ggplot2)

ggplot(data = after2009, aes(x = SalePrice)) + 
  geom_density() +
  facet_wrap(~ Neighborhood) +
  ggtitle("Density Plot of SalePrice by Neighborhood") +
  xlab('SalePrice')
## Warning: Removed 5 rows containing non-finite values (`stat_density()`).

[8 points] Q11.

As you can see, the density plot for NAmes between 2009 and 2010 does not look any different from other density plots. If there are fraudsters, they are making an effort to mask their activities. Now, make 2 density plots, one for SalePrice in NAmes before 2009 and the other for after 2009. Compare the two to see if there is visual evidence of anomalous activity. Then, do the same for Gilbert and see if anything anomalous is detectable between these plots. Tip: I recommend using the gridExtra library’s grid.arrange function for all four plots so you can see the plots for each neighborhood side by side.

#install.packages("gridExtra")
library(gridExtra)
## Warning: 程辑包'gridExtra'是用R版本4.3.2 来建造的
## 
## 载入程辑包:'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
NAmes_before2009 <- before2009[before2009$Neighborhood == "NAmes", ]
NAmes_after2009 <- after2009[after2009$Neighborhood == "NAmes", ]

plot_NAmes_before2009 <- ggplot(NAmes_before2009, aes(x = SalePrice)) + 
  geom_density() + ggtitle("NAmes Before 2009") + xlim(0,400000) +xlab("SalePrice")
plot_NAmes_after2009 <- ggplot(NAmes_after2009, aes(x = SalePrice)) +
  geom_density() + ggtitle("NAmes After 2009") + xlim(0,400000) + xlab("SalePrice")

Gilbert_before2009 <- before2009[before2009$Neighborhood == "Gilbert", ]
Gilbert_after2009 <- after2009[after2009$Neighborhood == "Gilbert", ]

plot_Gilbert_before2009 <- ggplot(Gilbert_before2009, aes(x = SalePrice)) + 
  geom_density() + ggtitle("Gilbert Before 2009") + xlim(0,400000) +xlab("SalePrice")
plot_Gilbert_after2009 <- ggplot(Gilbert_after2009, aes(x = SalePrice)) + 
  geom_density() + ggtitle("Gilbert After 2009") + xlim(0,400000) +xlab("SalePrice")

grid.arrange(plot_NAmes_before2009, plot_NAmes_after2009, plot_Gilbert_before2009, 
             plot_Gilbert_after2009, ncol = 2)
## Warning: Removed 4 rows containing non-finite values (`stat_density()`).
## Warning: Removed 1 rows containing non-finite values (`stat_density()`).

* * * * * * * * * * * * * * * * * * * * * * * * * * *

The questions above were in the Previous Assignment.

Team Assignemnt worth a total of 60 Points.

* * * * * * * * * * * * * * * * * * * * * * * * * * *

We pick up this story from new Question 12 below and continue the investigation after you have learned regression in more detail. Tip: I bookended this assignment with the regression module so you can reinforce your understanding and apply it. (I also wanted to have empathy for your learning-life blend.) This will also, hopefully, cement your understanding and build your confidence.

[5 points] Q12

Analyze the visualizations above for Gilbert and NAmes to detect possible fraud. Tip: Look for a fraud pattern.

### This section doesn't require code. Just answer the question as a comment.

# We found that there was a different distribution for the home prices for those before 2009 and after 2009 for both NAmes and Gilbert. However, after further analysis, we found that it made sense for the home prices to have a slightly lower price after 2009 due to the financial crisis, however, when looking at the distribution for the Gilbert neighborhood after 2009, we found that the sale price actually had a bimodal distribution, with a local peak at around 145,000. We found this interesting and may be a case of fraud as this bimodal distribution was not present in the distribution of home prices in Gilbert prior to 2009, and may be a case of fraud for a group of homes that have a sale price around 145,000. Furthermore, we notice a slightly different kurtosis for the home prices prior to 2009 and after 2009 in the NAmes neighborhood (with after 2009 having a lower kurtosis), which may be a good coverup for fraud activity that skewed some home prices less than the mean while still keeping the distribution somewhat intact. Overall, we believe that the ggplots show us evidence of fraud with different distribution of home prices after 2009 and can be further investigated. We suggest subsetting the home prices for the Gilbert neighborhood for prices around the first local peak (140,000 to 150,000), and see the trend of other variables that may indicate abnormalities when compared to similar home prices before 2009.

# 

[5 points] Q13.

You may feel that the fraudsters were not very careful in masking their activity after identifying the fraud pattern. However, we don’t have sufficient evidence to claim that this is fraudulent activity (just based on the density plots). We will now use multiple linear regression to attempt to get more evidence. Run a regression on the data in after2009 using variables you already know to be good at predicting the SalePrice. Store the result in variable called regAfter2009optimal. Then print summary of regAfter2009optimal to verify that your code works. Tip: You can reuse your previous work on before2009. Rubric: 4 points for regression, 1 point for printing summary.

regAfter2009optimal <- lm(SalePrice ~ RoofMatl + LandSlope + BsmtUnfSF + 
                            OverallQual+ Condition2 + MSZoning + Neighborhood + 
                            LotArea + OverallCond +Foundation + BedroomAbvGr + 
                            EnclosedPorch + BsmtFinSF1 +BsmtFinSF2 + MasVnrType, 
                          data = after2009)

summary(regAfter2009optimal)
## 
## Call:
## lm(formula = SalePrice ~ RoofMatl + LandSlope + BsmtUnfSF + OverallQual + 
##     Condition2 + MSZoning + Neighborhood + LotArea + OverallCond + 
##     Foundation + BedroomAbvGr + EnclosedPorch + BsmtFinSF1 + 
##     BsmtFinSF2 + MasVnrType, data = after2009)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -243969  -13856     127   14684  238698 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          4.886e+04  5.753e+04   0.849 0.395872    
## RoofMatlTar&Grv      1.856e+04  1.391e+04   1.334 0.182580    
## RoofMatlWdShake      7.726e+04  2.476e+04   3.121 0.001860 ** 
## RoofMatlWdShngl     -1.076e+04  3.251e+04  -0.331 0.740735    
## LandSlopeMod        -2.704e+02  5.305e+03  -0.051 0.959363    
## LandSlopeSev        -7.526e+04  2.174e+04  -3.462 0.000561 ***
## BsmtUnfSF            2.568e+01  3.843e+00   6.681 4.13e-11 ***
## OverallQual2        -1.350e+05  4.692e+04  -2.877 0.004105 ** 
## OverallQual3        -1.199e+05  4.632e+04  -2.588 0.009806 ** 
## OverallQual4        -1.141e+05  4.534e+04  -2.517 0.012001 *  
## OverallQual5        -1.122e+05  4.519e+04  -2.482 0.013230 *  
## OverallQual6        -9.499e+04  4.541e+04  -2.092 0.036739 *  
## OverallQual7        -7.374e+04  4.555e+04  -1.619 0.105802    
## OverallQual8        -5.187e+04  4.579e+04  -1.133 0.257623    
## OverallQual9         2.456e+04  4.616e+04   0.532 0.594773    
## OverallQual10        7.218e+04  4.760e+04   1.517 0.129732    
## Condition2Feedr      3.476e+04  3.739e+04   0.930 0.352778    
## Condition2Norm       2.134e+04  3.378e+04   0.632 0.527682    
## Condition2PosA       6.305e+04  4.788e+04   1.317 0.188188    
## Condition2PosN      -1.972e+05  4.806e+04  -4.104 4.43e-05 ***
## MSZoningFV           4.551e+04  1.895e+04   2.401 0.016556 *  
## MSZoningRH           2.095e+04  1.773e+04   1.182 0.237706    
## MSZoningRL           3.404e+04  1.429e+04   2.382 0.017427 *  
## MSZoningRM           3.030e+04  1.327e+04   2.283 0.022669 *  
## NeighborhoodBlueste  5.123e+03  1.906e+04   0.269 0.788103    
## NeighborhoodBrDale   2.061e+03  1.719e+04   0.120 0.904626    
## NeighborhoodBrkSide  1.432e+03  1.439e+04   0.099 0.920763    
## NeighborhoodClearCr  2.748e+04  1.533e+04   1.792 0.073452 .  
## NeighborhoodCollgCr -1.987e+02  1.211e+04  -0.016 0.986908    
## NeighborhoodCrawfor  3.172e+04  1.376e+04   2.306 0.021322 *  
## NeighborhoodEdwards -1.744e+04  1.297e+04  -1.345 0.179001    
## NeighborhoodGilbert  2.554e+03  1.259e+04   0.203 0.839267    
## NeighborhoodIDOTRR  -1.767e+04  1.633e+04  -1.082 0.279391    
## NeighborhoodMeadowV -1.513e+04  1.705e+04  -0.887 0.375234    
## NeighborhoodMitchel -2.466e+03  1.301e+04  -0.190 0.849686    
## NeighborhoodNAmes   -1.207e+04  1.253e+04  -0.964 0.335502    
## NeighborhoodNoRidge  5.246e+04  1.360e+04   3.857 0.000123 ***
## NeighborhoodNPkVill -1.985e+03  1.503e+04  -0.132 0.894976    
## NeighborhoodNridgHt  8.247e+03  1.270e+04   0.649 0.516341    
## NeighborhoodNWAmes  -2.732e+03  1.294e+04  -0.211 0.832786    
## NeighborhoodOldTown -1.649e+04  1.413e+04  -1.167 0.243435    
## NeighborhoodSawyer  -1.734e+04  1.326e+04  -1.308 0.191330    
## NeighborhoodSawyerW  4.546e+03  1.253e+04   0.363 0.716771    
## NeighborhoodSomerst  2.615e+03  1.639e+04   0.160 0.873298    
## NeighborhoodStoneBr  4.473e+04  1.481e+04   3.021 0.002592 ** 
## NeighborhoodSWISU   -8.673e+03  1.444e+04  -0.600 0.548330    
## NeighborhoodTimber   7.695e+03  1.377e+04   0.559 0.576318    
## NeighborhoodVeenker  4.835e+04  2.014e+04   2.401 0.016573 *  
## LotArea              1.290e+00  1.765e-01   7.307 5.96e-13 ***
## OverallCond2         5.772e+04  2.699e+04   2.139 0.032692 *  
## OverallCond3         4.801e+04  2.466e+04   1.947 0.051844 .  
## OverallCond4         5.431e+04  2.440e+04   2.226 0.026285 *  
## OverallCond5         6.200e+04  2.398e+04   2.585 0.009880 ** 
## OverallCond6         6.349e+04  2.399e+04   2.647 0.008266 ** 
## OverallCond7         7.203e+04  2.403e+04   2.998 0.002794 ** 
## OverallCond8         7.083e+04  2.438e+04   2.905 0.003757 ** 
## OverallCond9         7.590e+04  2.535e+04   2.994 0.002826 ** 
## FoundationCBlock     5.948e+02  4.528e+03   0.131 0.895529    
## FoundationPConc      1.804e+04  5.022e+03   3.592 0.000346 ***
## FoundationSlab       1.666e+04  9.084e+03   1.834 0.066999 .  
## FoundationStone      4.058e+04  1.517e+04   2.675 0.007612 ** 
## FoundationWood      -1.902e+04  1.917e+04  -0.992 0.321310    
## BedroomAbvGr         1.269e+04  1.434e+03   8.845  < 2e-16 ***
## EnclosedPorch        1.616e+01  1.853e+01   0.872 0.383414    
## BsmtFinSF1           5.861e+01  3.953e+00  14.828  < 2e-16 ***
## BsmtFinSF2           3.723e+01  6.907e+00   5.390 8.97e-08 ***
## MasVnrTypeBrkFace   -4.891e+03  1.469e+04  -0.333 0.739188    
## MasVnrTypeNone      -5.224e+03  1.458e+04  -0.358 0.720281    
## MasVnrTypeStone      1.395e+04  1.507e+04   0.926 0.354951    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 31500 on 910 degrees of freedom
##   (因为不存在,7个观察量被删除了)
## Multiple R-squared:  0.8405, Adjusted R-squared:  0.8286 
## F-statistic: 70.51 on 68 and 910 DF,  p-value: < 2.2e-16

[2 points] Q14.

Now, display diagnostic plots of your regression (regAfter2009optimal). Tip: You have already know how to autoplot.

regAfter2009optimal %>%
  autoplot()
## Warning: Removed 979 rows containing missing values (`geom_line()`).
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Warning: Removed 10 rows containing missing values (`geom_line()`).

[6 points] Q15.

Now, let’s focus on the Residual vs. Fitted graph by plotting it by itself using ggplot. Tip: Call ggplot with the data parameter in regAfter2009optimal. The aes parameters are (.fitted, .resid), respectively. You can use stat_smooth() for the trendline and appropriately title the plot and label both axes. Tip: Check out cheatsheets such as https://rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf.

library(broom)
library(ggplot2)

regAfter2009optimal_aug <- augment(regAfter2009optimal)

ggplot(regAfter2009optimal_aug, aes(x = .fitted, y = .resid)) +
  geom_point() +
  stat_smooth(method = "loess", colour = "blue") +
  ggtitle("Residuals vs Fitted") +
  xlab("Fitted values") +
  ylab("Residuals")
## `geom_smooth()` using formula = 'y ~ x'

[5 points] Q16.

Identify any outliers in the visualization from the last two chunks.

### This section doesn't require code. Just answer the question as a comment.

# In the "Residuals vs Fitted" plot, we can observe that the points labeled with the numbers 280, 348, and 533 are located at a significant distance from the regression line. 
#QQ-Plot (Quantile-Quantile Plot): Points deviating from the slop line may indicate outliers like 280
#Scale-Location:points far from the horizontal center like 280 would be outliers
#Residuals vs. Leverage Plot:points outside the dashed horizontal lines like 529 is outlier
# These points can be identified as outliers because their residuals significantly deviate from the rest.
# The second ggplot also shows a similar pattern to the first graph. Furthermore we also see heteroscedascitiy that increases over the large values in the graph. #Residuals vs. Fitted Values Plot: outliers points far from the read line like the point which residuals over 2e+05

## 

[20 points] Q17.

Now, let’s think like a fraudster and do something smarter fraudsters may do. Instead of misrepresenting values by just reporting the mean value of the houses sold in NAmes before 2009, what is something more clever and nuanced that the fraudsters could report these values? Specifically, consider a method smarter fraudsters may use to set the rows in which the prices are misrepresented? Then, using this method generate and set values for the SalePrice in those rows. Then, try your fraud inspection techniques of comparing old and new density plots as well as using the diagnostic plots to show that now the fraud is much harder to catch. Tip: You must use exact commands/functions to set the values and tell us why you chose to generate values this way. You must share the resulting diagnostic plots with us. Tip: Consider using more information (instead of the mean values) to generate the fraudulent values using what you learned from your work above. You can do this in two steps: Step 1: Find the rows set by the stupid fraudsters (by searching for the SalePrice of 142769.7). Step 2: Use a smarter way to generate and replace these values. Tip: For plotting, you may use ggplot to plot NAmes and NAmes. My ggplot call looked like this: before2009 %>% filter(Neighborhood == “???”) %>% ggplot(aes(x = SalePrice)) + geom_density(fill = “???”, alpha = 0.5) + ggtitle(“???”) + xlab(“???”) Tip: Always refine your model as fraudsters adapt their methods after they find out that you can catch them. Rubric: 10 points each for the fraud method and the plots.

### This section requires you to first explain your idea. Just answer this as a comment.

# Instead of simply misrepresenting values by reporting the average sale price of houses sold in NAmes before 2009, a more sophisticated method of adjusting prices would involve using a regression model to predict the SalePrice. After predicting the SalePrice that we wishes to manipulate, add random noise to ensure that the SalePrice does not match the model's predictions exactly.

fraud_price <- subset(after2009, SalePrice == 142769.7)

fraud_price_new <- predict(regAfter2009optimal, newdata = fraud_price)

set.seed(123)
random_noise <- rnorm(length(fraud_price_new))
fraud_price$SalePrice <- fraud_price_new + random_noise

after2009$SalePrice[after2009$Saleprice == 142769.7] <- fraud_price$SalePrice

p_before <- ggplot(before2009%>% filter(Neighborhood == "Gilbert"), aes(x = SalePrice)) + geom_density(fill = "red", alpha = 0.5) + ggtitle("Density Plot for Gilbert Before 2009")  +xlab("SalePrice")
p_after <- ggplot(after2009%>% filter(Neighborhood == "Gilbert"), aes(x = SalePrice)) + geom_density(fill = "blue", alpha = 0.5) + ggtitle("Density Plot for Gilbert After 2009 with fraud") +xlab("SalePrice")
p_before1 <- ggplot(before2009%>% filter(Neighborhood == "NAmes"), aes(x = SalePrice)) + geom_density(fill = "yellow", alpha = 0.5) + ggtitle("Density Plot for NAmes Before 2009") +xlab("SalePrice")
p_after1 <- ggplot(after2009%>% filter(Neighborhood == "NAmes"), aes(x = SalePrice)) + geom_density(fill = "green", alpha = 0.5) + ggtitle("Density Plot for NAmes After 2009 with fraud") +xlab("SalePrice")

grid.arrange(p_before, p_after,p_before1,p_after1, ncol = 2)
## Warning: Removed 4 rows containing non-finite values (`stat_density()`).
## Warning: Removed 1 rows containing non-finite values (`stat_density()`).

[5 points] Q18.

Now, run a regression on the new data in after2009 using variables you know are good at predicting SalePrice. Store the result in variable called regAfter2009optimalFraud. Then print summary of regAfter2009optimalFraud to verify that your code works. Tip: You can reuse previous work you before2009. Rubric: 4 points for regression, 1 point for printing summary.

regAfter2009optimalFraud <- lm(SalePrice ~ RoofMatl + LandSlope + BsmtUnfSF + 
                                 OverallQual+ Condition2 + MSZoning + Neighborhood + 
                                 LotArea + OverallCond +Foundation + BedroomAbvGr + 
                                 EnclosedPorch + BsmtFinSF1 +BsmtFinSF2 + 
                                 MasVnrType, data = after2009)

summary(regAfter2009optimalFraud)
## 
## Call:
## lm(formula = SalePrice ~ RoofMatl + LandSlope + BsmtUnfSF + OverallQual + 
##     Condition2 + MSZoning + Neighborhood + LotArea + OverallCond + 
##     Foundation + BedroomAbvGr + EnclosedPorch + BsmtFinSF1 + 
##     BsmtFinSF2 + MasVnrType, data = after2009)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -243969  -13856     127   14684  238698 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          4.886e+04  5.753e+04   0.849 0.395872    
## RoofMatlTar&Grv      1.856e+04  1.391e+04   1.334 0.182580    
## RoofMatlWdShake      7.726e+04  2.476e+04   3.121 0.001860 ** 
## RoofMatlWdShngl     -1.076e+04  3.251e+04  -0.331 0.740735    
## LandSlopeMod        -2.704e+02  5.305e+03  -0.051 0.959363    
## LandSlopeSev        -7.526e+04  2.174e+04  -3.462 0.000561 ***
## BsmtUnfSF            2.568e+01  3.843e+00   6.681 4.13e-11 ***
## OverallQual2        -1.350e+05  4.692e+04  -2.877 0.004105 ** 
## OverallQual3        -1.199e+05  4.632e+04  -2.588 0.009806 ** 
## OverallQual4        -1.141e+05  4.534e+04  -2.517 0.012001 *  
## OverallQual5        -1.122e+05  4.519e+04  -2.482 0.013230 *  
## OverallQual6        -9.499e+04  4.541e+04  -2.092 0.036739 *  
## OverallQual7        -7.374e+04  4.555e+04  -1.619 0.105802    
## OverallQual8        -5.187e+04  4.579e+04  -1.133 0.257623    
## OverallQual9         2.456e+04  4.616e+04   0.532 0.594773    
## OverallQual10        7.218e+04  4.760e+04   1.517 0.129732    
## Condition2Feedr      3.476e+04  3.739e+04   0.930 0.352778    
## Condition2Norm       2.134e+04  3.378e+04   0.632 0.527682    
## Condition2PosA       6.305e+04  4.788e+04   1.317 0.188188    
## Condition2PosN      -1.972e+05  4.806e+04  -4.104 4.43e-05 ***
## MSZoningFV           4.551e+04  1.895e+04   2.401 0.016556 *  
## MSZoningRH           2.095e+04  1.773e+04   1.182 0.237706    
## MSZoningRL           3.404e+04  1.429e+04   2.382 0.017427 *  
## MSZoningRM           3.030e+04  1.327e+04   2.283 0.022669 *  
## NeighborhoodBlueste  5.123e+03  1.906e+04   0.269 0.788103    
## NeighborhoodBrDale   2.061e+03  1.719e+04   0.120 0.904626    
## NeighborhoodBrkSide  1.432e+03  1.439e+04   0.099 0.920763    
## NeighborhoodClearCr  2.748e+04  1.533e+04   1.792 0.073452 .  
## NeighborhoodCollgCr -1.987e+02  1.211e+04  -0.016 0.986908    
## NeighborhoodCrawfor  3.172e+04  1.376e+04   2.306 0.021322 *  
## NeighborhoodEdwards -1.744e+04  1.297e+04  -1.345 0.179001    
## NeighborhoodGilbert  2.554e+03  1.259e+04   0.203 0.839267    
## NeighborhoodIDOTRR  -1.767e+04  1.633e+04  -1.082 0.279391    
## NeighborhoodMeadowV -1.513e+04  1.705e+04  -0.887 0.375234    
## NeighborhoodMitchel -2.466e+03  1.301e+04  -0.190 0.849686    
## NeighborhoodNAmes   -1.207e+04  1.253e+04  -0.964 0.335502    
## NeighborhoodNoRidge  5.246e+04  1.360e+04   3.857 0.000123 ***
## NeighborhoodNPkVill -1.985e+03  1.503e+04  -0.132 0.894976    
## NeighborhoodNridgHt  8.247e+03  1.270e+04   0.649 0.516341    
## NeighborhoodNWAmes  -2.732e+03  1.294e+04  -0.211 0.832786    
## NeighborhoodOldTown -1.649e+04  1.413e+04  -1.167 0.243435    
## NeighborhoodSawyer  -1.734e+04  1.326e+04  -1.308 0.191330    
## NeighborhoodSawyerW  4.546e+03  1.253e+04   0.363 0.716771    
## NeighborhoodSomerst  2.615e+03  1.639e+04   0.160 0.873298    
## NeighborhoodStoneBr  4.473e+04  1.481e+04   3.021 0.002592 ** 
## NeighborhoodSWISU   -8.673e+03  1.444e+04  -0.600 0.548330    
## NeighborhoodTimber   7.695e+03  1.377e+04   0.559 0.576318    
## NeighborhoodVeenker  4.835e+04  2.014e+04   2.401 0.016573 *  
## LotArea              1.290e+00  1.765e-01   7.307 5.96e-13 ***
## OverallCond2         5.772e+04  2.699e+04   2.139 0.032692 *  
## OverallCond3         4.801e+04  2.466e+04   1.947 0.051844 .  
## OverallCond4         5.431e+04  2.440e+04   2.226 0.026285 *  
## OverallCond5         6.200e+04  2.398e+04   2.585 0.009880 ** 
## OverallCond6         6.349e+04  2.399e+04   2.647 0.008266 ** 
## OverallCond7         7.203e+04  2.403e+04   2.998 0.002794 ** 
## OverallCond8         7.083e+04  2.438e+04   2.905 0.003757 ** 
## OverallCond9         7.590e+04  2.535e+04   2.994 0.002826 ** 
## FoundationCBlock     5.948e+02  4.528e+03   0.131 0.895529    
## FoundationPConc      1.804e+04  5.022e+03   3.592 0.000346 ***
## FoundationSlab       1.666e+04  9.084e+03   1.834 0.066999 .  
## FoundationStone      4.058e+04  1.517e+04   2.675 0.007612 ** 
## FoundationWood      -1.902e+04  1.917e+04  -0.992 0.321310    
## BedroomAbvGr         1.269e+04  1.434e+03   8.845  < 2e-16 ***
## EnclosedPorch        1.616e+01  1.853e+01   0.872 0.383414    
## BsmtFinSF1           5.861e+01  3.953e+00  14.828  < 2e-16 ***
## BsmtFinSF2           3.723e+01  6.907e+00   5.390 8.97e-08 ***
## MasVnrTypeBrkFace   -4.891e+03  1.469e+04  -0.333 0.739188    
## MasVnrTypeNone      -5.224e+03  1.458e+04  -0.358 0.720281    
## MasVnrTypeStone      1.395e+04  1.507e+04   0.926 0.354951    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 31500 on 910 degrees of freedom
##   (因为不存在,7个观察量被删除了)
## Multiple R-squared:  0.8405, Adjusted R-squared:  0.8286 
## F-statistic: 70.51 on 68 and 910 DF,  p-value: < 2.2e-16

[2 points] Q19.

Now, display diagnostic plots of your regression (regAfter2009optimalFraud). Tip: You have already know how to autoplot.

regAfter2009optimalFraud %>%
  autoplot()
## Warning: Removed 979 rows containing missing values (`geom_line()`).
## Warning: Removed 3 rows containing missing values (`geom_point()`).
## Warning: Removed 10 rows containing missing values (`geom_line()`).

[5 points] Q20.

Now, look for outliers in diagnostic plots of your regression (regAfter2009optimal). Tip: You have already know how to autoplot.

### This section doesn't require code. Just answer the question as a comment.
##  Residuals vs. Fitted Values Plot: outliers points far from the horizontal center line like 280
# QQ-Plot (Quantile-Quantile Plot): Points deviating from the slop line may indicate outliers like 533
# Scale-Location:points far from the horizontal center like 280 would be outliers
# Residuals vs. Leverage Plot:points outside the dashed horizontal lines like 524 is an outlier

[5 points] Q21.

Knit to html after eliminating all the errors. Submit both the Rmd and html files. Tip: Do not worry about minor formatting issues.

### This section doesn't require code. Just knit and submit the Rmd and html files.###