Complete all Exercises, and submit answers to Questions on the Coursera platform.
This initial quiz will concern exploratory data analysis (EDA) of the Ames Housing dataset. EDA is essential when working with any source of data and helps inform modeling.
First, let us load the data:
# Install the tool to download packages from Github
library(devtools)## Warning: package 'devtools' was built under R version 3.4.1
install_github("StatsWithR/statsr")## Skipping install of 'statsr' from a github remote, the SHA1 (6f64cf27) has not changed since last install.
## Use `force = TRUE` to force installation
library(dplyr)## Warning: package 'dplyr' was built under R version 3.4.1
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(statsr)load("ames_train.Rdata")
#missing data handling
library(Rcpp)
require(Amelia)## Loading required package: Amelia
## ##
## ## Amelia II: Multiple Imputation
## ## (Version 1.7.4, built: 2015-12-05)
## ## Copyright (C) 2005-2017 James Honaker, Gary King and Matthew Blackwell
## ## Refer to http://gking.harvard.edu/amelia/ for more information
## ##
missmap(ames_train, main="Missing Map")## Warning in if (class(obj) == "amelia") {: the condition has length > 1 and
## only the first element will be used
## Warning: Unknown or uninitialised column: 'arguments'.
## Warning: Unknown or uninitialised column: 'arguments'.
## Warning: Unknown or uninitialised column: 'imputations'.
sapply(ames_train, function(df) {
+ sum(is.na(df)==TRUE)/ length(df)
})## PID area price MS.SubClass
## 0.000 0.000 0.000 0.000
## MS.Zoning Lot.Frontage Lot.Area Street
## 0.000 0.167 0.000 0.000
## Alley Lot.Shape Land.Contour Utilities
## 0.933 0.000 0.000 0.000
## Lot.Config Land.Slope Neighborhood Condition.1
## 0.000 0.000 0.000 0.000
## Condition.2 Bldg.Type House.Style Overall.Qual
## 0.000 0.000 0.000 0.000
## Overall.Cond Year.Built Year.Remod.Add Roof.Style
## 0.000 0.000 0.000 0.000
## Roof.Matl Exterior.1st Exterior.2nd Mas.Vnr.Type
## 0.000 0.000 0.000 0.000
## Mas.Vnr.Area Exter.Qual Exter.Cond Foundation
## 0.007 0.000 0.000 0.000
## Bsmt.Qual Bsmt.Cond Bsmt.Exposure BsmtFin.Type.1
## 0.021 0.021 0.021 0.021
## BsmtFin.SF.1 BsmtFin.Type.2 BsmtFin.SF.2 Bsmt.Unf.SF
## 0.001 0.021 0.001 0.001
## Total.Bsmt.SF Heating Heating.QC Central.Air
## 0.001 0.000 0.000 0.000
## Electrical X1st.Flr.SF X2nd.Flr.SF Low.Qual.Fin.SF
## 0.000 0.000 0.000 0.000
## Bsmt.Full.Bath Bsmt.Half.Bath Full.Bath Half.Bath
## 0.001 0.001 0.000 0.000
## Bedroom.AbvGr Kitchen.AbvGr Kitchen.Qual TotRms.AbvGrd
## 0.000 0.000 0.000 0.000
## Functional Fireplaces Fireplace.Qu Garage.Type
## 0.000 0.000 0.491 0.046
## Garage.Yr.Blt Garage.Finish Garage.Cars Garage.Area
## 0.048 0.046 0.001 0.001
## Garage.Qual Garage.Cond Paved.Drive Wood.Deck.SF
## 0.047 0.047 0.000 0.000
## Open.Porch.SF Enclosed.Porch X3Ssn.Porch Screen.Porch
## 0.000 0.000 0.000 0.000
## Pool.Area Pool.QC Fence Misc.Feature
## 0.000 0.997 0.798 0.971
## Misc.Val Mo.Sold Yr.Sold Sale.Type
## 0.000 0.000 0.000 0.000
## Sale.Condition
## 0.000
Misc.Feature, Fence, Pool.QC
Misc.Feature, Alley, Pool.QC
Pool.QC, Alley, Fence
Fireplace.Qu, Pool.QC, Lot.Frontage
# type your code for Question 1 here, and Knitint? Change them to factors when conducting your analysis.
# type your code for Question 2 here, and Knit
str(ames_train)## Classes 'tbl_df', 'tbl' and 'data.frame': 1000 obs. of 81 variables:
## $ PID : int 909176150 905476230 911128020 535377150 534177230 908128060 902135020 528228540 923426010 908186050 ...
## $ area : int 856 1049 1001 1039 1665 1922 936 1246 889 1072 ...
## $ price : int 126000 139500 124900 114000 227000 198500 93000 187687 137500 140000 ...
## $ MS.SubClass : int 30 120 30 70 60 85 20 20 20 180 ...
## $ MS.Zoning : Factor w/ 7 levels "A (agr)","C (all)",..: 6 6 2 6 6 6 7 6 6 7 ...
## $ Lot.Frontage : int NA 42 60 80 70 64 60 53 74 35 ...
## $ Lot.Area : int 7890 4235 6060 8146 8400 7301 6000 3710 12395 3675 ...
## $ Street : Factor w/ 2 levels "Grvl","Pave": 2 2 2 2 2 2 2 2 2 2 ...
## $ Alley : Factor w/ 2 levels "Grvl","Pave": NA NA NA NA NA NA 2 NA NA NA ...
## $ Lot.Shape : Factor w/ 4 levels "IR1","IR2","IR3",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ Land.Contour : Factor w/ 4 levels "Bnk","HLS","Low",..: 4 4 4 4 4 4 1 4 4 4 ...
## $ Utilities : Factor w/ 3 levels "AllPub","NoSeWa",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ Lot.Config : Factor w/ 5 levels "Corner","CulDSac",..: 1 5 5 1 5 1 5 5 1 5 ...
## $ Land.Slope : Factor w/ 3 levels "Gtl","Mod","Sev": 1 1 1 1 1 1 2 1 1 1 ...
## $ Neighborhood : Factor w/ 28 levels "Blmngtn","Blueste",..: 26 8 12 21 20 8 21 1 15 8 ...
## $ Condition.1 : Factor w/ 9 levels "Artery","Feedr",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ Condition.2 : Factor w/ 8 levels "Artery","Feedr",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ Bldg.Type : Factor w/ 5 levels "1Fam","2fmCon",..: 1 5 1 1 1 1 2 1 1 5 ...
## $ House.Style : Factor w/ 8 levels "1.5Fin","1.5Unf",..: 3 3 3 6 6 7 3 3 3 7 ...
## $ Overall.Qual : int 6 5 5 4 8 7 4 7 5 6 ...
## $ Overall.Cond : int 6 5 9 8 6 5 4 5 6 5 ...
## $ Year.Built : int 1939 1984 1930 1900 2001 2003 1953 2007 1984 2005 ...
## $ Year.Remod.Add : int 1950 1984 2007 2003 2001 2003 1953 2008 1984 2005 ...
## $ Roof.Style : Factor w/ 6 levels "Flat","Gable",..: 2 2 4 2 2 2 2 2 2 2 ...
## $ Roof.Matl : Factor w/ 8 levels "ClyTile","CompShg",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ Exterior.1st : Factor w/ 16 levels "AsbShng","AsphShn",..: 15 7 9 9 14 7 9 16 7 14 ...
## $ Exterior.2nd : Factor w/ 17 levels "AsbShng","AsphShn",..: 16 7 9 9 15 7 9 17 11 15 ...
## $ Mas.Vnr.Type : Factor w/ 6 levels "","BrkCmn","BrkFace",..: 5 3 5 5 5 3 5 3 5 6 ...
## $ Mas.Vnr.Area : int 0 149 0 0 0 500 0 20 0 76 ...
## $ Exter.Qual : Factor w/ 4 levels "Ex","Fa","Gd",..: 4 3 3 3 3 3 2 3 4 4 ...
## $ Exter.Cond : Factor w/ 5 levels "Ex","Fa","Gd",..: 5 5 5 3 5 5 5 5 5 5 ...
## $ Foundation : Factor w/ 6 levels "BrkTil","CBlock",..: 2 2 1 1 3 4 2 3 2 3 ...
## $ Bsmt.Qual : Factor w/ 6 levels "","Ex","Fa","Gd",..: 6 4 6 3 4 NA 3 4 6 4 ...
## $ Bsmt.Cond : Factor w/ 6 levels "","Ex","Fa","Gd",..: 6 6 6 6 6 NA 6 6 6 6 ...
## $ Bsmt.Exposure : Factor w/ 5 levels "","Av","Gd","Mn",..: 5 4 5 5 5 NA 5 3 5 3 ...
## $ BsmtFin.Type.1 : Factor w/ 7 levels "","ALQ","BLQ",..: 6 4 2 7 4 NA 7 7 2 4 ...
## $ BsmtFin.SF.1 : int 238 552 737 0 643 0 0 0 647 467 ...
## $ BsmtFin.Type.2 : Factor w/ 7 levels "","ALQ","BLQ",..: 7 2 7 7 7 NA 7 7 7 7 ...
## $ BsmtFin.SF.2 : int 0 393 0 0 0 0 0 0 0 0 ...
## $ Bsmt.Unf.SF : int 618 104 100 405 167 0 936 1146 217 80 ...
## $ Total.Bsmt.SF : int 856 1049 837 405 810 0 936 1146 864 547 ...
## $ Heating : Factor w/ 6 levels "Floor","GasA",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ Heating.QC : Factor w/ 5 levels "Ex","Fa","Gd",..: 5 5 1 3 1 1 5 1 5 1 ...
## $ Central.Air : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 1 2 2 2 ...
## $ Electrical : Factor w/ 6 levels "","FuseA","FuseF",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ X1st.Flr.SF : int 856 1049 1001 717 810 495 936 1246 889 1072 ...
## $ X2nd.Flr.SF : int 0 0 0 322 855 1427 0 0 0 0 ...
## $ Low.Qual.Fin.SF: int 0 0 0 0 0 0 0 0 0 0 ...
## $ Bsmt.Full.Bath : int 1 1 0 0 1 0 0 0 0 1 ...
## $ Bsmt.Half.Bath : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Full.Bath : int 1 2 1 1 2 3 1 2 1 1 ...
## $ Half.Bath : int 0 0 0 0 1 0 0 0 0 0 ...
## $ Bedroom.AbvGr : int 2 2 2 2 3 4 2 2 3 2 ...
## $ Kitchen.AbvGr : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Kitchen.Qual : Factor w/ 5 levels "Ex","Fa","Gd",..: 5 3 3 5 3 3 5 3 5 3 ...
## $ TotRms.AbvGrd : int 4 5 5 6 6 7 4 5 6 5 ...
## $ Functional : Factor w/ 8 levels "Maj1","Maj2",..: 8 8 8 8 8 8 4 8 8 8 ...
## $ Fireplaces : int 1 0 0 0 0 1 0 1 0 0 ...
## $ Fireplace.Qu : Factor w/ 5 levels "Ex","Fa","Gd",..: 3 NA NA NA NA 1 NA 3 NA NA ...
## $ Garage.Type : Factor w/ 6 levels "2Types","Attchd",..: 6 2 6 6 2 4 6 2 2 3 ...
## $ Garage.Yr.Blt : int 1939 1984 1930 1940 2001 2003 1974 2007 1984 2005 ...
## $ Garage.Finish : Factor w/ 4 levels "","Fin","RFn",..: 4 2 4 4 2 3 4 2 4 2 ...
## $ Garage.Cars : int 2 1 1 1 2 2 2 2 2 2 ...
## $ Garage.Area : int 399 266 216 281 528 672 576 428 484 525 ...
## $ Garage.Qual : Factor w/ 6 levels "","Ex","Fa","Gd",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ Garage.Cond : Factor w/ 6 levels "","Ex","Fa","Gd",..: 6 6 5 6 6 6 6 6 6 6 ...
## $ Paved.Drive : Factor w/ 3 levels "N","P","Y": 3 3 1 1 3 3 3 3 3 3 ...
## $ Wood.Deck.SF : int 0 0 154 0 0 0 0 100 0 0 ...
## $ Open.Porch.SF : int 0 105 0 0 45 0 32 24 0 44 ...
## $ Enclosed.Porch : int 0 0 42 168 0 177 112 0 0 0 ...
## $ X3Ssn.Porch : int 0 0 86 0 0 0 0 0 0 0 ...
## $ Screen.Porch : int 166 0 0 111 0 0 0 0 0 0 ...
## $ Pool.Area : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Pool.QC : Factor w/ 4 levels "Ex","Fa","Gd",..: NA NA NA NA NA NA NA NA NA NA ...
## $ Fence : Factor w/ 4 levels "GdPrv","GdWo",..: NA NA NA NA NA NA NA NA NA NA ...
## $ Misc.Feature : Factor w/ 5 levels "Elev","Gar2",..: NA NA NA NA NA NA NA NA NA NA ...
## $ Misc.Val : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Mo.Sold : int 3 2 11 5 11 7 2 3 4 5 ...
## $ Yr.Sold : int 2010 2009 2007 2009 2009 2009 2009 2008 2008 2007 ...
## $ Sale.Type : Factor w/ 10 levels "COD","Con","ConLD",..: 10 10 10 10 10 3 10 7 10 10 ...
## $ Sale.Condition : Factor w/ 6 levels "Abnorml","AdjLand",..: 5 5 5 5 5 5 5 6 5 5 ...
summary(ames_train)## PID area price MS.SubClass
## Min. :5.263e+08 Min. : 334 Min. : 12789 Min. : 20.00
## 1st Qu.:5.285e+08 1st Qu.:1092 1st Qu.:129763 1st Qu.: 20.00
## Median :5.354e+08 Median :1411 Median :159467 Median : 50.00
## Mean :7.059e+08 Mean :1477 Mean :181190 Mean : 57.15
## 3rd Qu.:9.071e+08 3rd Qu.:1743 3rd Qu.:213000 3rd Qu.: 70.00
## Max. :1.007e+09 Max. :4676 Max. :615000 Max. :190.00
##
## MS.Zoning Lot.Frontage Lot.Area Street Alley
## A (agr): 0 Min. : 21.00 Min. : 1470 Grvl: 3 Grvl: 33
## C (all): 9 1st Qu.: 57.00 1st Qu.: 7314 Pave:997 Pave: 34
## FV : 56 Median : 69.00 Median : 9317 NA's:933
## I (all): 1 Mean : 69.21 Mean : 10352
## RH : 7 3rd Qu.: 80.00 3rd Qu.: 11650
## RL :772 Max. :313.00 Max. :215245
## RM :155 NA's :167
## Lot.Shape Land.Contour Utilities Lot.Config Land.Slope
## IR1:338 Bnk: 33 AllPub:1000 Corner :173 Gtl:962
## IR2: 30 HLS: 38 NoSeWa: 0 CulDSac: 76 Mod: 33
## IR3: 3 Low: 20 NoSewr: 0 FR2 : 36 Sev: 5
## Reg:629 Lvl:909 FR3 : 5
## Inside :710
##
##
## Neighborhood Condition.1 Condition.2 Bldg.Type House.Style
## NAmes :155 Norm :875 Norm :988 1Fam :823 1Story :521
## CollgCr: 85 Feedr : 53 Feedr : 6 2fmCon: 20 2Story :286
## Somerst: 74 Artery : 23 Artery : 2 Duplex: 35 1.5Fin : 98
## OldTown: 71 RRAn : 14 PosN : 2 Twnhs : 38 SLvl : 41
## Sawyer : 61 PosN : 11 PosA : 1 TwnhsE: 84 SFoyer : 36
## Edwards: 60 RRAe : 11 RRNn : 1 2.5Unf : 10
## (Other):494 (Other): 13 (Other): 0 (Other): 8
## Overall.Qual Overall.Cond Year.Built Year.Remod.Add
## Min. : 1.000 Min. :1.000 Min. :1872 Min. :1950
## 1st Qu.: 5.000 1st Qu.:5.000 1st Qu.:1955 1st Qu.:1966
## Median : 6.000 Median :5.000 Median :1975 Median :1992
## Mean : 6.095 Mean :5.559 Mean :1972 Mean :1984
## 3rd Qu.: 7.000 3rd Qu.:6.000 3rd Qu.:2001 3rd Qu.:2004
## Max. :10.000 Max. :9.000 Max. :2010 Max. :2010
##
## Roof.Style Roof.Matl Exterior.1st Exterior.2nd Mas.Vnr.Type
## Flat : 9 CompShg:984 VinylSd:349 VinylSd:345 : 7
## Gable :775 Tar&Grv: 11 HdBoard:164 HdBoard:150 BrkCmn : 8
## Gambrel: 8 WdShake: 2 MetalSd:147 MetalSd:148 BrkFace:317
## Hip :204 WdShngl: 2 Wd Sdng:138 Wd Sdng:130 CBlock : 0
## Mansard: 4 Metal : 1 Plywood: 74 Plywood: 96 None :593
## Shed : 0 ClyTile: 0 CemntBd: 40 CmentBd: 40 Stone : 75
## (Other): 0 (Other): 88 (Other): 91
## Mas.Vnr.Area Exter.Qual Exter.Cond Foundation Bsmt.Qual Bsmt.Cond
## Min. : 0.0 Ex: 39 Ex: 4 BrkTil:102 : 1 : 1
## 1st Qu.: 0.0 Fa: 11 Fa: 19 CBlock:430 Ex : 87 Ex : 2
## Median : 0.0 Gd:337 Gd:116 PConc :453 Fa : 28 Fa : 23
## Mean : 104.1 TA:613 Po: 0 Slab : 12 Gd :424 Gd : 44
## 3rd Qu.: 160.0 TA:861 Stone : 3 Po : 1 Po : 1
## Max. :1290.0 Wood : 0 TA :438 TA :908
## NA's :7 NA's: 21 NA's: 21
## Bsmt.Exposure BsmtFin.Type.1 BsmtFin.SF.1 BsmtFin.Type.2
## : 2 GLQ :294 Min. : 0.0 Unf :863
## Av :157 Unf :279 1st Qu.: 0.0 LwQ : 31
## Gd : 98 ALQ :163 Median : 400.0 Rec : 29
## Mn : 87 Rec :107 Mean : 464.1 BLQ : 24
## No :635 BLQ : 87 3rd Qu.: 773.0 ALQ : 20
## NA's: 21 (Other): 49 Max. :2260.0 (Other): 12
## NA's : 21 NA's :1 NA's : 21
## BsmtFin.SF.2 Bsmt.Unf.SF Total.Bsmt.SF Heating
## Min. : 0.00 Min. : 0.0 Min. : 0.0 Floor: 0
## 1st Qu.: 0.00 1st Qu.: 223.5 1st Qu.: 797.5 GasA :988
## Median : 0.00 Median : 461.0 Median : 998.0 GasW : 8
## Mean : 48.07 Mean : 547.0 Mean :1059.2 Grav : 2
## 3rd Qu.: 0.00 3rd Qu.: 783.0 3rd Qu.:1301.0 OthW : 1
## Max. :1526.00 Max. :2336.0 Max. :3138.0 Wall : 1
## NA's :1 NA's :1 NA's :1
## Heating.QC Central.Air Electrical X1st.Flr.SF X2nd.Flr.SF
## Ex:516 N: 55 : 0 Min. : 334.0 Min. : 0.0
## Fa: 22 Y:945 FuseA: 54 1st Qu.: 876.2 1st Qu.: 0.0
## Gd:157 FuseF: 12 Median :1080.5 Median : 0.0
## Po: 1 FuseP: 2 Mean :1157.1 Mean : 315.2
## TA:304 Mix : 0 3rd Qu.:1376.2 3rd Qu.: 688.2
## SBrkr:932 Max. :3138.0 Max. :1836.0
##
## Low.Qual.Fin.SF Bsmt.Full.Bath Bsmt.Half.Bath Full.Bath
## Min. : 0.00 Min. :0.0000 Min. :0.00000 Min. :0.000
## 1st Qu.: 0.00 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:1.000
## Median : 0.00 Median :0.0000 Median :0.00000 Median :2.000
## Mean : 4.32 Mean :0.4474 Mean :0.06106 Mean :1.541
## 3rd Qu.: 0.00 3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:2.000
## Max. :1064.00 Max. :3.0000 Max. :2.00000 Max. :4.000
## NA's :1 NA's :1
## Half.Bath Bedroom.AbvGr Kitchen.AbvGr Kitchen.Qual
## Min. :0.000 Min. :0.000 Min. :0.000 Ex: 67
## 1st Qu.:0.000 1st Qu.:2.000 1st Qu.:1.000 Fa: 20
## Median :0.000 Median :3.000 Median :1.000 Gd:403
## Mean :0.378 Mean :2.806 Mean :1.039 Po: 1
## 3rd Qu.:1.000 3rd Qu.:3.000 3rd Qu.:1.000 TA:509
## Max. :2.000 Max. :6.000 Max. :2.000
##
## TotRms.AbvGrd Functional Fireplaces Fireplace.Qu Garage.Type
## Min. : 2.00 Typ :935 Min. :0.000 Ex : 16 2Types : 10
## 1st Qu.: 5.00 Min2 : 24 1st Qu.:0.000 Fa : 24 Attchd :610
## Median : 6.00 Min1 : 18 Median :1.000 Gd :232 Basment: 11
## Mean : 6.34 Mod : 16 Mean :0.597 Po : 18 BuiltIn: 56
## 3rd Qu.: 7.00 Maj1 : 4 3rd Qu.:1.000 TA :219 CarPort: 1
## Max. :13.00 Maj2 : 2 Max. :4.000 NA's:491 Detchd :266
## (Other): 1 NA's : 46
## Garage.Yr.Blt Garage.Finish Garage.Cars Garage.Area Garage.Qual
## Min. :1900 : 2 Min. :0.000 Min. : 0.0 : 1
## 1st Qu.:1961 Fin :247 1st Qu.:1.000 1st Qu.: 312.0 Ex : 1
## Median :1979 RFn :278 Median :2.000 Median : 480.0 Fa : 37
## Mean :1978 Unf :427 Mean :1.767 Mean : 475.4 Gd : 7
## 3rd Qu.:2002 NA's: 46 3rd Qu.:2.000 3rd Qu.: 576.0 Po : 3
## Max. :2010 Max. :5.000 Max. :1390.0 TA :904
## NA's :48 NA's :1 NA's :1 NA's: 47
## Garage.Cond Paved.Drive Wood.Deck.SF Open.Porch.SF
## : 1 N: 67 Min. : 0.00 Min. : 0.00
## Ex : 1 P: 29 1st Qu.: 0.00 1st Qu.: 0.00
## Fa : 21 Y:904 Median : 0.00 Median : 28.00
## Gd : 6 Mean : 93.84 Mean : 48.93
## Po : 6 3rd Qu.:168.00 3rd Qu.: 74.00
## TA :918 Max. :857.00 Max. :742.00
## NA's: 47
## Enclosed.Porch X3Ssn.Porch Screen.Porch Pool.Area
## Min. : 0.00 Min. : 0.000 Min. : 0.00 Min. : 0.000
## 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.000
## Median : 0.00 Median : 0.000 Median : 0.00 Median : 0.000
## Mean : 23.48 Mean : 3.118 Mean : 14.77 Mean : 1.463
## 3rd Qu.: 0.00 3rd Qu.: 0.000 3rd Qu.: 0.00 3rd Qu.: 0.000
## Max. :432.00 Max. :508.000 Max. :440.00 Max. :800.000
##
## Pool.QC Fence Misc.Feature Misc.Val Mo.Sold
## Ex : 1 GdPrv: 43 Elev: 0 Min. : 0.00 Min. : 1.000
## Fa : 1 GdWo : 37 Gar2: 2 1st Qu.: 0.00 1st Qu.: 4.000
## Gd : 1 MnPrv:120 Othr: 1 Median : 0.00 Median : 6.000
## TA : 0 MnWw : 2 Shed: 25 Mean : 45.81 Mean : 6.243
## NA's:997 NA's :798 TenC: 1 3rd Qu.: 0.00 3rd Qu.: 8.000
## NA's:971 Max. :15500.00 Max. :12.000
##
## Yr.Sold Sale.Type Sale.Condition
## Min. :2006 WD :863 Abnorml: 61
## 1st Qu.:2007 New : 79 AdjLand: 2
## Median :2008 COD : 27 Alloca : 4
## Mean :2008 ConLD : 7 Family : 17
## 3rd Qu.:2009 ConLw : 6 Normal :834
## Max. :2010 Con : 5 Partial: 82
## (Other): 13
StoneBr
Timber
Veenker
NridgHt
# type your code for Question 3 here, and Knit
aggregate(ames_train$price, by=list(ames_train$Neighborhood), FUN=sd)## Group.1 x
## 1 Blmngtn 26454.86
## 2 Blueste 10381.23
## 3 BrDale 13337.59
## 4 BrkSide 37309.91
## 5 ClearCr 48068.69
## 6 CollgCr 52786.08
## 7 Crawfor 71267.56
## 8 Edwards 54851.63
## 9 Gilbert 41190.38
## 10 Greens 29063.42
## 11 GrnHill 70710.68
## 12 IDOTRR 31530.44
## 13 MeadowV 18939.78
## 14 Mitchel 39682.94
## 15 NAmes 27267.97
## 16 NoRidge 35888.97
## 17 NPkVill 11958.37
## 18 NridgHt 105088.90
## 19 NWAmes 41340.50
## 20 OldTown 36429.69
## 21 Sawyer 21216.22
## 22 SawyerW 48354.36
## 23 Somerst 65199.49
## 24 StoneBr 123459.10
## 25 SWISU 27375.76
## 26 Timber 84029.57
## 27 Veenker 72545.41
price?
Lot.Area
Bedroom.AbvGr
Overall.Qual
Year.Built
# type your code for Question 4 here, and Knit
ames_train$Overall.Qual <- as.numeric(ames_train$Overall.Qual)
library(ggplot2)
ggplot(data = ames_train, aes(x = Lot.Area, y = price)) +
geom_point()ggplot(data = ames_train, aes(x = Bedroom.AbvGr, y = price)) +
geom_point()ggplot(data = ames_train, aes(x = Overall.Qual, y = price)) +
geom_point()ggplot(data = ames_train, aes(x = Year.Built, y = price)) +
geom_point()price and area. Which of the following variable transformations makes the relationship appear to be the most linear?
price or area
price but not area
area but not price
price and area
# type your code for Question 5 here, and Knit
ggplot(data = ames_train, aes(x = area, y = price)) +
geom_point()ggplot(data = ames_train, aes(x = area, y = log(price))) +
geom_point()ggplot(data = ames_train, aes(x = log(area), y = price)) +
geom_point()ggplot(data = ames_train, aes(x = log(area), y = log(price))) +
geom_point()# type your code for Question 6 here, and Knit
table(ames_train$Garage.Cars >= 1)##
## FALSE TRUE
## 46 953
# type your code for Question 7 here, and Knit
table(ames_train$Year.Built > 1999)##
## FALSE TRUE
## 728 272
median(ames_train$price) - mean(ames_train$price)## [1] -21723.08
table(ames_train$Total.Bsmt.SF > 0)##
## FALSE TRUE
## 21 978
table(ames_train$Street)##
## Grvl Pave
## 3 997
# type your code for Question 8 here, and Knit
with(ames_train, t.test(area[Garage.Cars>0], area[Garage.Cars==0]))##
## Welch Two Sample t-test
##
## data: area[Garage.Cars > 0] and area[Garage.Cars == 0]
## t = 5.134, df = 50.702, p-value = 4.535e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 211.4183 482.9963
## sample estimates:
## mean of x mean of y
## 1492.251 1145.043
# type your code for Question 9 here, and Knit
# k * theta = mean
# sqrt(k) * theta = sd
# k = (mean / sd) ^ 2 = 9
# theta = mean / k = 1/3
k = 9
theta = 1/3
ames_q9 = ames_train[ames_train$area > 2000, ]
k_star = k + sum(ames_q9$Bedroom.AbvGr)
theta_star <- theta / (length(ames_q9$Bedroom.AbvGr) * theta + 1)
post_mean <- k_star * theta_star
post_sd <- theta_star * sqrt(k_star)
post_mean## [1] 3.617021
post_sd## [1] 0.1601644
price) on \(\log\)(area), there are some outliers. Which of the following do the three most outlying points have in common?
# type your code for Question 10 here, and Knit
lm_q10 = lm(log(price) ~ log(area), ames_train)
res_q10 = resid(lm_q10) ** 2
threshold = sort(res_q10, partial=997)[997]
which(res_q10 > threshold)## 206 428 741
## 206 428 741
ames_train$Sale.Condition[c(206, 428, 741)]## [1] Abnorml Abnorml Normal
## Levels: Abnorml AdjLand Alloca Family Normal Partial
ames_train$Bedroom.AbvGr[c(206, 428, 741)]## [1] 3 2 3
ames_train$Overall.Qual[c(206, 428, 741)]## [1] 4 2 4
ames_train$Year.Built[c(206, 428, 741)]## [1] 1910 1923 1920
price if used as a dependent variable in a linear regression?
price is right-skewed.
price cannot take on negative values.
price can only take on integer values.# type your code for Question 11 here, and Knit
hist(ames_train$price)hist(log(ames_train$price))Bldg.Type = 1Fam)
ames_train %>%
group_by(Neighborhood) %>%
summarise(flag=mean(Bldg.Type == '1Fam')) %>%
arrange(-flag)## # A tibble: 27 x 2
## Neighborhood flag
## <fctr> <dbl>
## 1 ClearCr 1.0000000
## 2 NoRidge 1.0000000
## 3 Timber 1.0000000
## 4 Gilbert 0.9795918
## 5 BrkSide 0.9756098
## 6 NWAmes 0.9756098
## 7 CollgCr 0.9647059
## 8 IDOTRR 0.9428571
## 9 NAmes 0.9225806
## 10 Sawyer 0.9180328
## # ... with 17 more rows
area) and the number of bedrooms above ground (Bedroom.AbvGr)?
# type your code for Question 13 here, and Knit
cor(ames_train$Bedroom.AbvGr, log(ames_train$area))## [1] 0.5457625
ggplot(ames_train, aes(Bedroom.AbvGr, log(area))) + geom_point()# type your code for Question 14 here, and Knit
ames_train %>%
filter(Bsmt.Unf.SF > 0) %>%
summarise(mean(Bsmt.Unf.SF))## # A tibble: 1 x 1
## `mean(Bsmt.Unf.SF)`
## <dbl>
## 1 595.2527