library(tidyverse)
library(gridExtra)
library(kableExtra)
library(psych)
library(corrplot)
library(matrixcalc)
library(MASS)
library(Rmisc)
library(mice)
library(VIM)
library(broom)

Problem 1

Using R, generate a random variable X that has 10,000 random uniform numbers from 1 to N, where N can be any number of your choosing greater than or equal to 6. Then generate a random variable Y that has 10,000 random normal numbers with a mean of \(\mu = \sigma = (N+1)/2\).

set.seed(123)
N <- 7
mn <- sd <- (N + 1)/2
X <- runif(10000, 1, N)
Y <- rnorm(10000, mean = mn, sd=sd)

Probability

Calculate as a minimum the below probabilities a through c. Assume the small letter “x” is estimated as the median of the X variable, and the small letter “y” is estimated as the 1st quartile of the Y variable. Interpret the meaning of all probabilities.

x <- median(X)
find_quant <- data.frame(quantile(Y)) #get the quantiles for Y
y <- find_quant$quantile.Y.[2] #get the first quantile
  • \(P(X>x | X>y)\)
sum((((X > x ) & (X > y )))/10000) / (sum(X > y)/10000)
## [1] 0.5309547

There is a 53.1% chance that \(X\) is greater than its median given \(X\) is greater than the first quantile of \(Y\).

  • \(P(X>x, Y>y)\)
sum((X > x ) * (Y > y ))/10000
## [1] 0.3756

Results show that there is 37.56% probability that \(X\) is greater than its median and \(Y\) is greater than its first quantile

  • \(P(X<x | X>y)\)
sum((((X < x ) & (X > y )))/10000) / (sum(X > y)/10000)
## [1] 0.4690453

Here is the probability that \(X\) is less than its median and \(X\) is greater than the first quantile of \(Y\) is 46.9%.

Investigate whether \(P(X>x \ and \ Y>y)=P(X>x)P(Y>y)\) by building a table and evaluating the marginal and joint probabilities.

Contingency Table 1
#Marginal
Xgx <- sum(X > x)/10000
Xlx <- sum(X < x)/10000
Ygy <- sum(Y > y)/10000
Yly <- sum(Y < y)/10000

#joint
Xgy <- sum(X>x & Y>y)/10000
Xgly <- sum(X>x & Y<y)/10000
Xly <- sum(X<x & Y<y)/10000
Xlgy <-sum(X<x & Y>y)/10000

row1 <- c(Xgx, Xgy, 0, Xgly)
row2 <- c(Xgy, Ygy, Xlgy, 0)
row3 <- c(0, Xlgy, Xgx, Xly)
row4 <- c(Xgly, 0, Xly, Yly)
contingency_table1 <- data.frame(rbind(row1, row2, row3, row4))
colnames(contingency_table1) <- c("X>x", "Y>y", "X<x", "Y<y")
row.names(contingency_table1) <- c("X>x", "Y>y", "X<x", "Y<y")

kable(contingency_table1) %>% kable_styling(bootstrap_options = c("striped", "responsive"), full_width = F, position = "left")
X>x Y>y X<x Y<y
X>x 0.5000 0.3756 0.0000 0.1244
Y>y 0.3756 0.7500 0.3744 0.0000
X<x 0.0000 0.3744 0.5000 0.1256
Y<y 0.1244 0.0000 0.1256 0.2500

Based on the table above:

\(P(X>x \ and \ Y>y) = 0.3756\)

\(P(X>x) = 0.5\)

\(P(Y>y) = 0.75\)

0.75 * 0.5
## [1] 0.375

The two probabilities are the same.

Check to see if independence holds by using Fisher’s Exact Test and the Chi Square Test. What is the difference between the two? Which is most appropriate?

Contingency Table 2
jm_df <- cbind(X, Y)
contingency_table2 <- prop.table(table(jm_df[,1]>x, jm_df[,2]>y))

row.names(contingency_table2) <- c("X<x", "X>x")

colnames(contingency_table2) <- c("Y<y", "Y>y")

kable(contingency_table2) %>% kable_styling("striped", full_width = F, position = "left")
Y<y Y>y
X<x 0.1256 0.3744
X>x 0.1244 0.3756

Fisher’s Exact Test

fisher.test(contingency_table2)
## Warning in fisher.test(contingency_table2): 'x' has been rounded to
## integer: Mean relative difference: 1
## 
##  Fisher's Exact Test for Count Data
## 
## data:  contingency_table2
## p-value = 1
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##    0 Inf
## sample estimates:
## odds ratio 
##          0

Chi Square Test

chisq.test(contingency_table2)
## Warning in chisq.test(contingency_table2): Chi-squared approximation may be
## incorrect
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  contingency_table2
## X-squared = 1.5407e-33, df = 1, p-value = 1

For both tests the P-value is greater than 0.05, so we believe the variables are independent.

Fisher’s exact test is best used for testing independence that is typically used only for 2×2 contingency table while the Chi-squared is best for larger samples. In our case, Fisher’s Exact test is more appropriate.


Problem 2

You are to register for Kaggle.com (free) and compete in the House Prices: Advanced Regression Techniques competition. https://www.kaggle.com/c/house-prices-advanced-regression-techniques .

Descriptive and Inferential Statistics

Provide univariate descriptive statistics and appropriate plots for the training data set. Provide a scatterplot matrix for at least two of the independent variables and the dependent variable. Derive a correlation matrix for any three quantitative variables in the dataset. Test the hypotheses that the correlations between each pairwise set of variables is 0 and provide an 80% confidence interval. Discuss the meaning of your analysis. Would you be worried about familywise error? Why or why not?

Import Data

train_data <- read.csv('train.csv', sep = ',', header = T, stringsAsFactors = F)
test_data <- read.csv("test.csv", sep = ',', header = T, stringsAsFactors = F)

Preview

kable(head(train_data, 100)) %>% kable_styling(bootstrap_options = "striped" ,font_size = 11) %>% scroll_box(height = "500px")
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical X1stFlrSF X2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch X3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition SalePrice
1 60 RL 65 8450 Pave NA Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2003 2003 Gable CompShg VinylSd VinylSd BrkFace 196 Gd TA PConc Gd TA No GLQ 706 Unf 0 150 856 GasA Ex Y SBrkr 856 854 0 1710 1 0 2 1 3 1 Gd 8 Typ 0 NA Attchd 2003 RFn 2 548 TA TA Y 0 61 0 0 0 0 NA NA NA 0 2 2008 WD Normal 208500
2 20 RL 80 9600 Pave NA Reg Lvl AllPub FR2 Gtl Veenker Feedr Norm 1Fam 1Story 6 8 1976 1976 Gable CompShg MetalSd MetalSd None 0 TA TA CBlock Gd TA Gd ALQ 978 Unf 0 284 1262 GasA Ex Y SBrkr 1262 0 0 1262 0 1 2 0 3 1 TA 6 Typ 1 TA Attchd 1976 RFn 2 460 TA TA Y 298 0 0 0 0 0 NA NA NA 0 5 2007 WD Normal 181500
3 60 RL 68 11250 Pave NA IR1 Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2001 2002 Gable CompShg VinylSd VinylSd BrkFace 162 Gd TA PConc Gd TA Mn GLQ 486 Unf 0 434 920 GasA Ex Y SBrkr 920 866 0 1786 1 0 2 1 3 1 Gd 6 Typ 1 TA Attchd 2001 RFn 2 608 TA TA Y 0 42 0 0 0 0 NA NA NA 0 9 2008 WD Normal 223500
4 70 RL 60 9550 Pave NA IR1 Lvl AllPub Corner Gtl Crawfor Norm Norm 1Fam 2Story 7 5 1915 1970 Gable CompShg Wd Sdng Wd Shng None 0 TA TA BrkTil TA Gd No ALQ 216 Unf 0 540 756 GasA Gd Y SBrkr 961 756 0 1717 1 0 1 0 3 1 Gd 7 Typ 1 Gd Detchd 1998 Unf 3 642 TA TA Y 0 35 272 0 0 0 NA NA NA 0 2 2006 WD Abnorml 140000
5 60 RL 84 14260 Pave NA IR1 Lvl AllPub FR2 Gtl NoRidge Norm Norm 1Fam 2Story 8 5 2000 2000 Gable CompShg VinylSd VinylSd BrkFace 350 Gd TA PConc Gd TA Av GLQ 655 Unf 0 490 1145 GasA Ex Y SBrkr 1145 1053 0 2198 1 0 2 1 4 1 Gd 9 Typ 1 TA Attchd 2000 RFn 3 836 TA TA Y 192 84 0 0 0 0 NA NA NA 0 12 2008 WD Normal 250000
6 50 RL 85 14115 Pave NA IR1 Lvl AllPub Inside Gtl Mitchel Norm Norm 1Fam 1.5Fin 5 5 1993 1995 Gable CompShg VinylSd VinylSd None 0 TA TA Wood Gd TA No GLQ 732 Unf 0 64 796 GasA Ex Y SBrkr 796 566 0 1362 1 0 1 1 1 1 TA 5 Typ 0 NA Attchd 1993 Unf 2 480 TA TA Y 40 30 0 320 0 0 NA MnPrv Shed 700 10 2009 WD Normal 143000
7 20 RL 75 10084 Pave NA Reg Lvl AllPub Inside Gtl Somerst Norm Norm 1Fam 1Story 8 5 2004 2005 Gable CompShg VinylSd VinylSd Stone 186 Gd TA PConc Ex TA Av GLQ 1369 Unf 0 317 1686 GasA Ex Y SBrkr 1694 0 0 1694 1 0 2 0 3 1 Gd 7 Typ 1 Gd Attchd 2004 RFn 2 636 TA TA Y 255 57 0 0 0 0 NA NA NA 0 8 2007 WD Normal 307000
8 60 RL NA 10382 Pave NA IR1 Lvl AllPub Corner Gtl NWAmes PosN Norm 1Fam 2Story 7 6 1973 1973 Gable CompShg HdBoard HdBoard Stone 240 TA TA CBlock Gd TA Mn ALQ 859 BLQ 32 216 1107 GasA Ex Y SBrkr 1107 983 0 2090 1 0 2 1 3 1 TA 7 Typ 2 TA Attchd 1973 RFn 2 484 TA TA Y 235 204 228 0 0 0 NA NA Shed 350 11 2009 WD Normal 200000
9 50 RM 51 6120 Pave NA Reg Lvl AllPub Inside Gtl OldTown Artery Norm 1Fam 1.5Fin 7 5 1931 1950 Gable CompShg BrkFace Wd Shng None 0 TA TA BrkTil TA TA No Unf 0 Unf 0 952 952 GasA Gd Y FuseF 1022 752 0 1774 0 0 2 0 2 2 TA 8 Min1 2 TA Detchd 1931 Unf 2 468 Fa TA Y 90 0 205 0 0 0 NA NA NA 0 4 2008 WD Abnorml 129900
10 190 RL 50 7420 Pave NA Reg Lvl AllPub Corner Gtl BrkSide Artery Artery 2fmCon 1.5Unf 5 6 1939 1950 Gable CompShg MetalSd MetalSd None 0 TA TA BrkTil TA TA No GLQ 851 Unf 0 140 991 GasA Ex Y SBrkr 1077 0 0 1077 1 0 1 0 2 2 TA 5 Typ 2 TA Attchd 1939 RFn 1 205 Gd TA Y 0 4 0 0 0 0 NA NA NA 0 1 2008 WD Normal 118000
11 20 RL 70 11200 Pave NA Reg Lvl AllPub Inside Gtl Sawyer Norm Norm 1Fam 1Story 5 5 1965 1965 Hip CompShg HdBoard HdBoard None 0 TA TA CBlock TA TA No Rec 906 Unf 0 134 1040 GasA Ex Y SBrkr 1040 0 0 1040 1 0 1 0 3 1 TA 5 Typ 0 NA Detchd 1965 Unf 1 384 TA TA Y 0 0 0 0 0 0 NA NA NA 0 2 2008 WD Normal 129500
12 60 RL 85 11924 Pave NA IR1 Lvl AllPub Inside Gtl NridgHt Norm Norm 1Fam 2Story 9 5 2005 2006 Hip CompShg WdShing Wd Shng Stone 286 Ex TA PConc Ex TA No GLQ 998 Unf 0 177 1175 GasA Ex Y SBrkr 1182 1142 0 2324 1 0 3 0 4 1 Ex 11 Typ 2 Gd BuiltIn 2005 Fin 3 736 TA TA Y 147 21 0 0 0 0 NA NA NA 0 7 2006 New Partial 345000
13 20 RL NA 12968 Pave NA IR2 Lvl AllPub Inside Gtl Sawyer Norm Norm 1Fam 1Story 5 6 1962 1962 Hip CompShg HdBoard Plywood None 0 TA TA CBlock TA TA No ALQ 737 Unf 0 175 912 GasA TA Y SBrkr 912 0 0 912 1 0 1 0 2 1 TA 4 Typ 0 NA Detchd 1962 Unf 1 352 TA TA Y 140 0 0 0 176 0 NA NA NA 0 9 2008 WD Normal 144000
14 20 RL 91 10652 Pave NA IR1 Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 1Story 7 5 2006 2007 Gable CompShg VinylSd VinylSd Stone 306 Gd TA PConc Gd TA Av Unf 0 Unf 0 1494 1494 GasA Ex Y SBrkr 1494 0 0 1494 0 0 2 0 3 1 Gd 7 Typ 1 Gd Attchd 2006 RFn 3 840 TA TA Y 160 33 0 0 0 0 NA NA NA 0 8 2007 New Partial 279500
15 20 RL NA 10920 Pave NA IR1 Lvl AllPub Corner Gtl NAmes Norm Norm 1Fam 1Story 6 5 1960 1960 Hip CompShg MetalSd MetalSd BrkFace 212 TA TA CBlock TA TA No BLQ 733 Unf 0 520 1253 GasA TA Y SBrkr 1253 0 0 1253 1 0 1 1 2 1 TA 5 Typ 1 Fa Attchd 1960 RFn 1 352 TA TA Y 0 213 176 0 0 0 NA GdWo NA 0 5 2008 WD Normal 157000
16 45 RM 51 6120 Pave NA Reg Lvl AllPub Corner Gtl BrkSide Norm Norm 1Fam 1.5Unf 7 8 1929 2001 Gable CompShg Wd Sdng Wd Sdng None 0 TA TA BrkTil TA TA No Unf 0 Unf 0 832 832 GasA Ex Y FuseA 854 0 0 854 0 0 1 0 2 1 TA 5 Typ 0 NA Detchd 1991 Unf 2 576 TA TA Y 48 112 0 0 0 0 NA GdPrv NA 0 7 2007 WD Normal 132000
17 20 RL NA 11241 Pave NA IR1 Lvl AllPub CulDSac Gtl NAmes Norm Norm 1Fam 1Story 6 7 1970 1970 Gable CompShg Wd Sdng Wd Sdng BrkFace 180 TA TA CBlock TA TA No ALQ 578 Unf 0 426 1004 GasA Ex Y SBrkr 1004 0 0 1004 1 0 1 0 2 1 TA 5 Typ 1 TA Attchd 1970 Fin 2 480 TA TA Y 0 0 0 0 0 0 NA NA Shed 700 3 2010 WD Normal 149000
18 90 RL 72 10791 Pave NA Reg Lvl AllPub Inside Gtl Sawyer Norm Norm Duplex 1Story 4 5 1967 1967 Gable CompShg MetalSd MetalSd None 0 TA TA Slab NA NA NA NA 0 NA 0 0 0 GasA TA Y SBrkr 1296 0 0 1296 0 0 2 0 2 2 TA 6 Typ 0 NA CarPort 1967 Unf 2 516 TA TA Y 0 0 0 0 0 0 NA NA Shed 500 10 2006 WD Normal 90000
19 20 RL 66 13695 Pave NA Reg Lvl AllPub Inside Gtl SawyerW RRAe Norm 1Fam 1Story 5 5 2004 2004 Gable CompShg VinylSd VinylSd None 0 TA TA PConc TA TA No GLQ 646 Unf 0 468 1114 GasA Ex Y SBrkr 1114 0 0 1114 1 0 1 1 3 1 Gd 6 Typ 0 NA Detchd 2004 Unf 2 576 TA TA Y 0 102 0 0 0 0 NA NA NA 0 6 2008 WD Normal 159000
20 20 RL 70 7560 Pave NA Reg Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 5 6 1958 1965 Hip CompShg BrkFace Plywood None 0 TA TA CBlock TA TA No LwQ 504 Unf 0 525 1029 GasA TA Y SBrkr 1339 0 0 1339 0 0 1 0 3 1 TA 6 Min1 0 NA Attchd 1958 Unf 1 294 TA TA Y 0 0 0 0 0 0 NA MnPrv NA 0 5 2009 COD Abnorml 139000
21 60 RL 101 14215 Pave NA IR1 Lvl AllPub Corner Gtl NridgHt Norm Norm 1Fam 2Story 8 5 2005 2006 Gable CompShg VinylSd VinylSd BrkFace 380 Gd TA PConc Ex TA Av Unf 0 Unf 0 1158 1158 GasA Ex Y SBrkr 1158 1218 0 2376 0 0 3 1 4 1 Gd 9 Typ 1 Gd BuiltIn 2005 RFn 3 853 TA TA Y 240 154 0 0 0 0 NA NA NA 0 11 2006 New Partial 325300
22 45 RM 57 7449 Pave Grvl Reg Bnk AllPub Inside Gtl IDOTRR Norm Norm 1Fam 1.5Unf 7 7 1930 1950 Gable CompShg Wd Sdng Wd Sdng None 0 TA TA PConc TA TA No Unf 0 Unf 0 637 637 GasA Ex Y FuseF 1108 0 0 1108 0 0 1 0 3 1 Gd 6 Typ 1 Gd Attchd 1930 Unf 1 280 TA TA N 0 0 205 0 0 0 NA GdPrv NA 0 6 2007 WD Normal 139400
23 20 RL 75 9742 Pave NA Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 1Story 8 5 2002 2002 Hip CompShg VinylSd VinylSd BrkFace 281 Gd TA PConc Gd TA No Unf 0 Unf 0 1777 1777 GasA Ex Y SBrkr 1795 0 0 1795 0 0 2 0 3 1 Gd 7 Typ 1 Gd Attchd 2002 RFn 2 534 TA TA Y 171 159 0 0 0 0 NA NA NA 0 9 2008 WD Normal 230000
24 120 RM 44 4224 Pave NA Reg Lvl AllPub Inside Gtl MeadowV Norm Norm TwnhsE 1Story 5 7 1976 1976 Gable CompShg CemntBd CmentBd None 0 TA TA PConc Gd TA No GLQ 840 Unf 0 200 1040 GasA TA Y SBrkr 1060 0 0 1060 1 0 1 0 3 1 TA 6 Typ 1 TA Attchd 1976 Unf 2 572 TA TA Y 100 110 0 0 0 0 NA NA NA 0 6 2007 WD Normal 129900
25 20 RL NA 8246 Pave NA IR1 Lvl AllPub Inside Gtl Sawyer Norm Norm 1Fam 1Story 5 8 1968 2001 Gable CompShg Plywood Plywood None 0 TA Gd CBlock TA TA Mn Rec 188 ALQ 668 204 1060 GasA Ex Y SBrkr 1060 0 0 1060 1 0 1 0 3 1 Gd 6 Typ 1 TA Attchd 1968 Unf 1 270 TA TA Y 406 90 0 0 0 0 NA MnPrv NA 0 5 2010 WD Normal 154000
26 20 RL 110 14230 Pave NA Reg Lvl AllPub Corner Gtl NridgHt Norm Norm 1Fam 1Story 8 5 2007 2007 Gable CompShg VinylSd VinylSd Stone 640 Gd TA PConc Gd TA No Unf 0 Unf 0 1566 1566 GasA Ex Y SBrkr 1600 0 0 1600 0 0 2 0 3 1 Gd 7 Typ 1 Gd Attchd 2007 RFn 3 890 TA TA Y 0 56 0 0 0 0 NA NA NA 0 7 2009 WD Normal 256300
27 20 RL 60 7200 Pave NA Reg Lvl AllPub Corner Gtl NAmes Norm Norm 1Fam 1Story 5 7 1951 2000 Gable CompShg Wd Sdng Wd Sdng None 0 TA TA CBlock TA TA Mn BLQ 234 Rec 486 180 900 GasA TA Y SBrkr 900 0 0 900 0 1 1 0 3 1 Gd 5 Typ 0 NA Detchd 2005 Unf 2 576 TA TA Y 222 32 0 0 0 0 NA NA NA 0 5 2010 WD Normal 134800
28 20 RL 98 11478 Pave NA Reg Lvl AllPub Inside Gtl NridgHt Norm Norm 1Fam 1Story 8 5 2007 2008 Gable CompShg VinylSd VinylSd Stone 200 Gd TA PConc Ex TA No GLQ 1218 Unf 0 486 1704 GasA Ex Y SBrkr 1704 0 0 1704 1 0 2 0 3 1 Gd 7 Typ 1 Gd Attchd 2008 RFn 3 772 TA TA Y 0 50 0 0 0 0 NA NA NA 0 5 2010 WD Normal 306000
29 20 RL 47 16321 Pave NA IR1 Lvl AllPub CulDSac Gtl NAmes Norm Norm 1Fam 1Story 5 6 1957 1997 Gable CompShg MetalSd MetalSd None 0 TA TA CBlock TA TA Gd BLQ 1277 Unf 0 207 1484 GasA TA Y SBrkr 1600 0 0 1600 1 0 1 0 2 1 TA 6 Typ 2 Gd Attchd 1957 RFn 1 319 TA TA Y 288 258 0 0 0 0 NA NA NA 0 12 2006 WD Normal 207500
30 30 RM 60 6324 Pave NA IR1 Lvl AllPub Inside Gtl BrkSide Feedr RRNn 1Fam 1Story 4 6 1927 1950 Gable CompShg MetalSd MetalSd None 0 TA TA BrkTil TA TA No Unf 0 Unf 0 520 520 GasA Fa N SBrkr 520 0 0 520 0 0 1 0 1 1 Fa 4 Typ 0 NA Detchd 1920 Unf 1 240 Fa TA Y 49 0 87 0 0 0 NA NA NA 0 5 2008 WD Normal 68500
31 70 C (all) 50 8500 Pave Pave Reg Lvl AllPub Inside Gtl IDOTRR Feedr Norm 1Fam 2Story 4 4 1920 1950 Gambrel CompShg BrkFace BrkFace None 0 TA Fa BrkTil TA TA No Unf 0 Unf 0 649 649 GasA TA N SBrkr 649 668 0 1317 0 0 1 0 3 1 TA 6 Typ 0 NA Detchd 1920 Unf 1 250 TA Fa N 0 54 172 0 0 0 NA MnPrv NA 0 7 2008 WD Normal 40000
32 20 RL NA 8544 Pave NA IR1 Lvl AllPub CulDSac Gtl Sawyer Norm Norm 1Fam 1Story 5 6 1966 2006 Gable CompShg HdBoard HdBoard None 0 TA TA CBlock TA TA No Unf 0 Unf 0 1228 1228 GasA Gd Y SBrkr 1228 0 0 1228 0 0 1 1 3 1 Gd 6 Typ 0 NA Attchd 1966 Unf 1 271 TA TA Y 0 65 0 0 0 0 NA MnPrv NA 0 6 2008 WD Normal 149350
33 20 RL 85 11049 Pave NA Reg Lvl AllPub Corner Gtl CollgCr Norm Norm 1Fam 1Story 8 5 2007 2007 Gable CompShg VinylSd VinylSd None 0 Gd TA PConc Ex TA Av Unf 0 Unf 0 1234 1234 GasA Ex Y SBrkr 1234 0 0 1234 0 0 2 0 3 1 Gd 7 Typ 0 NA Attchd 2007 RFn 2 484 TA TA Y 0 30 0 0 0 0 NA NA NA 0 1 2008 WD Normal 179900
34 20 RL 70 10552 Pave NA IR1 Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 5 5 1959 1959 Hip CompShg BrkFace BrkFace None 0 TA TA CBlock TA TA No Rec 1018 Unf 0 380 1398 GasA Gd Y SBrkr 1700 0 0 1700 0 1 1 1 4 1 Gd 6 Typ 1 Gd Attchd 1959 RFn 2 447 TA TA Y 0 38 0 0 0 0 NA NA NA 0 4 2010 WD Normal 165500
35 120 RL 60 7313 Pave NA Reg Lvl AllPub Inside Gtl NridgHt Norm Norm TwnhsE 1Story 9 5 2005 2005 Hip CompShg MetalSd MetalSd BrkFace 246 Ex TA PConc Ex TA No GLQ 1153 Unf 0 408 1561 GasA Ex Y SBrkr 1561 0 0 1561 1 0 2 0 2 1 Ex 6 Typ 1 Gd Attchd 2005 Fin 2 556 TA TA Y 203 47 0 0 0 0 NA NA NA 0 8 2007 WD Normal 277500
36 60 RL 108 13418 Pave NA Reg Lvl AllPub Inside Gtl NridgHt Norm Norm 1Fam 2Story 8 5 2004 2005 Gable CompShg VinylSd VinylSd Stone 132 Gd TA PConc Ex TA Av Unf 0 Unf 0 1117 1117 GasA Ex Y SBrkr 1132 1320 0 2452 0 0 3 1 4 1 Gd 9 Typ 1 Gd BuiltIn 2004 Fin 3 691 TA TA Y 113 32 0 0 0 0 NA NA NA 0 9 2006 WD Normal 309000
37 20 RL 112 10859 Pave NA Reg Lvl AllPub Corner Gtl CollgCr Norm Norm 1Fam 1Story 5 5 1994 1995 Gable CompShg VinylSd VinylSd None 0 TA TA PConc Gd TA No Unf 0 Unf 0 1097 1097 GasA Ex Y SBrkr 1097 0 0 1097 0 0 1 1 3 1 TA 6 Typ 0 NA Attchd 1995 Unf 2 672 TA TA Y 392 64 0 0 0 0 NA NA NA 0 6 2009 WD Normal 145000
38 20 RL 74 8532 Pave NA Reg Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 5 6 1954 1990 Hip CompShg Wd Sdng Wd Sdng BrkFace 650 TA TA CBlock TA TA No Rec 1213 Unf 0 84 1297 GasA Gd Y SBrkr 1297 0 0 1297 0 1 1 0 3 1 TA 5 Typ 1 TA Attchd 1954 Fin 2 498 TA TA Y 0 0 0 0 0 0 NA NA NA 0 10 2009 WD Normal 153000
39 20 RL 68 7922 Pave NA Reg Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 5 7 1953 2007 Gable CompShg VinylSd VinylSd None 0 TA Gd CBlock TA TA No GLQ 731 Unf 0 326 1057 GasA TA Y SBrkr 1057 0 0 1057 1 0 1 0 3 1 Gd 5 Typ 0 NA Detchd 1953 Unf 1 246 TA TA Y 0 52 0 0 0 0 NA NA NA 0 1 2010 WD Abnorml 109000
40 90 RL 65 6040 Pave NA Reg Lvl AllPub Inside Gtl Edwards Norm Norm Duplex 1Story 4 5 1955 1955 Gable CompShg AsbShng Plywood None 0 TA TA PConc NA NA NA NA 0 NA 0 0 0 GasA TA N FuseP 1152 0 0 1152 0 0 2 0 2 2 Fa 6 Typ 0 NA NA NA NA 0 0 NA NA N 0 0 0 0 0 0 NA NA NA 0 6 2008 WD AdjLand 82000
41 20 RL 84 8658 Pave NA Reg Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 6 5 1965 1965 Gable CompShg Wd Sdng Wd Sdng BrkFace 101 TA TA CBlock TA TA No Rec 643 Unf 0 445 1088 GasA Ex Y SBrkr 1324 0 0 1324 0 0 2 0 3 1 TA 6 Typ 1 TA Attchd 1965 RFn 2 440 TA TA Y 0 138 0 0 0 0 NA GdWo NA 0 12 2006 WD Abnorml 160000
42 20 RL 115 16905 Pave NA Reg Lvl AllPub Inside Gtl Timber Norm Norm 1Fam 1Story 5 6 1959 1959 Gable CompShg VinylSd VinylSd None 0 TA Gd CBlock TA TA Gd BLQ 967 Unf 0 383 1350 GasA Gd Y SBrkr 1328 0 0 1328 0 1 1 1 2 1 TA 5 Typ 2 Gd Attchd 1959 RFn 1 308 TA TA P 0 104 0 0 0 0 NA NA NA 0 7 2007 WD Normal 170000
43 85 RL NA 9180 Pave NA IR1 Lvl AllPub CulDSac Gtl SawyerW Norm Norm 1Fam SFoyer 5 7 1983 1983 Gable CompShg HdBoard HdBoard None 0 TA TA CBlock Gd TA Av ALQ 747 LwQ 93 0 840 GasA Gd Y SBrkr 884 0 0 884 1 0 1 0 2 1 Gd 5 Typ 0 NA Attchd 1983 RFn 2 504 TA Gd Y 240 0 0 0 0 0 NA MnPrv NA 0 12 2007 WD Normal 144000
44 20 RL NA 9200 Pave NA IR1 Lvl AllPub CulDSac Gtl CollgCr Norm Norm 1Fam 1Story 5 6 1975 1980 Hip CompShg VinylSd VinylSd None 0 TA TA CBlock Gd TA Av LwQ 280 BLQ 491 167 938 GasA TA Y SBrkr 938 0 0 938 1 0 1 0 3 1 TA 5 Typ 0 NA Detchd 1977 Unf 1 308 TA TA Y 145 0 0 0 0 0 NA MnPrv NA 0 7 2008 WD Normal 130250
45 20 RL 70 7945 Pave NA Reg Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 5 6 1959 1959 Gable CompShg BrkFace Wd Sdng None 0 TA TA CBlock TA TA No ALQ 179 BLQ 506 465 1150 GasA Ex Y FuseA 1150 0 0 1150 1 0 1 0 3 1 TA 6 Typ 0 NA Attchd 1959 RFn 1 300 TA TA Y 0 0 0 0 0 0 NA NA NA 0 5 2006 WD Normal 141000
46 120 RL 61 7658 Pave NA Reg Lvl AllPub Inside Gtl NridgHt Norm Norm TwnhsE 1Story 9 5 2005 2005 Hip CompShg MetalSd MetalSd BrkFace 412 Ex TA PConc Ex TA No GLQ 456 Unf 0 1296 1752 GasA Ex Y SBrkr 1752 0 0 1752 1 0 2 0 2 1 Ex 6 Typ 1 Gd Attchd 2005 RFn 2 576 TA TA Y 196 82 0 0 0 0 NA NA NA 0 2 2010 WD Normal 319900
47 50 RL 48 12822 Pave NA IR1 Lvl AllPub CulDSac Gtl Mitchel Norm Norm 1Fam 1.5Fin 7 5 2003 2003 Gable CompShg VinylSd VinylSd None 0 Gd TA PConc Ex TA No GLQ 1351 Unf 0 83 1434 GasA Ex Y SBrkr 1518 631 0 2149 1 0 1 1 1 1 Gd 6 Typ 1 Ex Attchd 2003 RFn 2 670 TA TA Y 168 43 0 0 198 0 NA NA NA 0 8 2009 WD Abnorml 239686
48 20 FV 84 11096 Pave NA Reg Lvl AllPub Inside Gtl Somerst Norm Norm 1Fam 1Story 8 5 2006 2006 Gable CompShg VinylSd VinylSd None 0 Gd TA PConc Gd TA Av GLQ 24 Unf 0 1632 1656 GasA Ex Y SBrkr 1656 0 0 1656 0 0 2 0 3 1 Gd 7 Typ 0 NA Attchd 2006 RFn 3 826 TA TA Y 0 146 0 0 0 0 NA NA NA 0 7 2007 WD Normal 249700
49 190 RM 33 4456 Pave NA Reg Lvl AllPub Inside Gtl OldTown Norm Norm 2fmCon 2Story 4 5 1920 2008 Gable CompShg MetalSd MetalSd None 0 TA TA BrkTil TA TA No Unf 0 Unf 0 736 736 GasA Gd Y SBrkr 736 716 0 1452 0 0 2 0 2 3 TA 8 Typ 0 NA NA NA NA 0 0 NA NA N 0 0 102 0 0 0 NA NA NA 0 6 2009 New Partial 113000
50 20 RL 66 7742 Pave NA Reg Lvl AllPub Inside Gtl Sawyer Norm Norm 1Fam 1Story 5 7 1966 1966 Gable CompShg HdBoard HdBoard None 0 TA TA CBlock TA TA No BLQ 763 Unf 0 192 955 GasA Ex Y SBrkr 955 0 0 955 1 0 1 0 3 1 TA 6 Typ 0 NA Attchd 1966 Unf 1 386 TA TA Y 0 0 0 0 0 0 NA MnPrv NA 0 1 2007 WD Normal 127000
51 60 RL NA 13869 Pave NA IR2 Lvl AllPub Corner Gtl Gilbert Norm Norm 1Fam 2Story 6 6 1997 1997 Gable CompShg VinylSd VinylSd None 0 TA TA PConc Gd TA Av GLQ 182 Unf 0 612 794 GasA Gd Y SBrkr 794 676 0 1470 0 1 2 0 3 1 TA 6 Typ 0 NA Attchd 1997 Fin 2 388 TA TA Y 0 75 0 0 0 0 NA NA NA 0 7 2007 WD Normal 177000
52 50 RM 52 6240 Pave NA Reg Lvl AllPub Inside Gtl BrkSide Norm Norm 1Fam 1.5Fin 6 6 1934 1950 Gable CompShg Wd Sdng Wd Sdng None 0 TA TA PConc TA TA No Unf 0 Unf 0 816 816 GasA TA Y SBrkr 816 0 360 1176 0 0 1 0 3 1 TA 6 Typ 1 Gd Detchd 1985 Unf 2 528 TA TA Y 112 0 0 0 0 0 NA MnPrv Shed 400 9 2006 WD Normal 114500
53 90 RM 110 8472 Grvl NA IR2 Bnk AllPub Corner Mod IDOTRR RRNn Norm Duplex 1Story 5 5 1963 1963 Gable CompShg Wd Sdng Wd Sdng None 0 Fa TA CBlock Gd TA Gd LwQ 104 GLQ 712 0 816 GasA TA N SBrkr 816 0 0 816 1 0 1 0 2 1 TA 5 Typ 0 NA CarPort 1963 Unf 2 516 TA TA Y 106 0 0 0 0 0 NA NA NA 0 5 2010 WD Normal 110000
54 20 RL 68 50271 Pave NA IR1 Low AllPub Inside Gtl Veenker Norm Norm 1Fam 1Story 9 5 1981 1987 Gable WdShngl WdShing Wd Shng None 0 Gd TA CBlock Ex TA Gd GLQ 1810 Unf 0 32 1842 GasA Gd Y SBrkr 1842 0 0 1842 2 0 0 1 0 1 Gd 5 Typ 1 Gd Attchd 1981 Fin 3 894 TA TA Y 857 72 0 0 0 0 NA NA NA 0 11 2006 WD Normal 385000
55 80 RL 60 7134 Pave NA Reg Bnk AllPub Inside Mod NAmes Norm Norm 1Fam SLvl 5 5 1955 1955 Gable CompShg MetalSd MetalSd None 0 TA TA CBlock TA TA No ALQ 384 Unf 0 0 384 GasA TA Y SBrkr 1360 0 0 1360 0 0 1 0 3 1 TA 6 Min1 1 TA Detchd 1962 Unf 2 572 TA TA Y 0 50 0 0 0 0 NA MnPrv NA 0 2 2007 WD Normal 130000
56 20 RL 100 10175 Pave NA IR1 Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 6 5 1964 1964 Gable CompShg HdBoard Plywood BrkFace 272 TA TA CBlock TA TA No BLQ 490 Unf 0 935 1425 GasA Gd Y SBrkr 1425 0 0 1425 0 0 2 0 3 1 TA 7 Typ 1 Gd Attchd 1964 RFn 2 576 TA TA Y 0 0 0 407 0 0 NA NA NA 0 7 2008 WD Normal 180500
57 160 FV 24 2645 Pave Pave Reg Lvl AllPub Inside Gtl Somerst Norm Norm Twnhs 2Story 8 5 1999 2000 Gable CompShg MetalSd MetalSd BrkFace 456 Gd TA PConc Gd TA No GLQ 649 Unf 0 321 970 GasA Ex Y SBrkr 983 756 0 1739 1 0 2 1 3 1 Gd 7 Typ 0 NA Attchd 1999 Fin 2 480 TA TA Y 115 0 0 0 0 0 NA NA NA 0 8 2009 WD Abnorml 172500
58 60 RL 89 11645 Pave NA IR1 Lvl AllPub Corner Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2004 2004 Gable CompShg VinylSd VinylSd None 0 Gd TA PConc Gd TA No Unf 0 Unf 0 860 860 GasA Ex Y SBrkr 860 860 0 1720 0 0 2 1 3 1 Gd 7 Typ 0 NA Attchd 2004 RFn 2 565 TA TA Y 0 70 0 0 0 0 NA NA NA 0 8 2006 WD Normal 196500
59 60 RL 66 13682 Pave NA IR2 HLS AllPub CulDSac Gtl StoneBr Norm Norm 1Fam 2Story 10 5 2006 2006 Hip CompShg VinylSd VinylSd BrkFace 1031 Ex TA PConc Ex TA Gd Unf 0 Unf 0 1410 1410 GasA Ex Y SBrkr 1426 1519 0 2945 0 0 3 1 3 1 Gd 10 Typ 1 Gd BuiltIn 2006 Fin 3 641 TA TA Y 192 0 37 0 0 0 NA NA NA 0 10 2006 New Partial 438780
60 20 RL 60 7200 Pave NA Reg Bnk AllPub Inside Gtl CollgCr Norm Norm 1Fam 1Story 5 7 1972 1972 Gable CompShg HdBoard HdBoard None 0 TA TA CBlock TA TA Av ALQ 632 Unf 0 148 780 GasA Ex Y SBrkr 780 0 0 780 0 0 1 0 2 1 TA 4 Typ 0 NA Detchd 1973 Unf 1 352 TA TA Y 196 0 0 0 0 0 NA MnPrv NA 0 1 2008 WD Normal 124900
61 20 RL 63 13072 Pave NA Reg Lvl AllPub Inside Gtl SawyerW RRAe Norm 1Fam 1Story 6 5 2004 2004 Gable CompShg VinylSd VinylSd None 0 TA TA PConc Gd TA No ALQ 941 Unf 0 217 1158 GasA Ex Y SBrkr 1158 0 0 1158 1 0 1 1 3 1 Gd 5 Typ 0 NA Detchd 2006 Unf 2 576 TA TA Y 0 50 0 0 0 0 NA NA NA 0 5 2006 New Partial 158000
62 75 RM 60 7200 Pave NA Reg Lvl AllPub Inside Gtl IDOTRR Norm Norm 1Fam 2.5Unf 5 7 1920 1996 Gable CompShg MetalSd MetalSd None 0 TA TA BrkTil TA Fa No Unf 0 Unf 0 530 530 GasA TA N SBrkr 581 530 0 1111 0 0 1 0 3 1 Fa 6 Typ 0 NA Detchd 1935 Unf 1 288 TA TA N 0 0 144 0 0 0 NA NA NA 0 3 2007 WD Normal 101000
63 120 RL 44 6442 Pave NA IR1 Lvl AllPub Inside Gtl NridgHt Norm Norm TwnhsE 1Story 8 5 2006 2006 Gable CompShg VinylSd VinylSd Stone 178 Gd TA PConc Gd Gd Mn GLQ 24 Unf 0 1346 1370 GasA Ex Y SBrkr 1370 0 0 1370 0 0 2 0 2 1 Gd 6 Typ 1 Gd Attchd 2006 RFn 2 484 TA TA Y 120 49 0 0 0 0 NA NA NA 0 10 2007 WD Normal 202500
64 70 RM 50 10300 Pave NA IR1 Bnk AllPub Inside Gtl OldTown RRAn Feedr 1Fam 2Story 7 6 1921 1950 Gable CompShg Stucco Stucco None 0 TA TA BrkTil TA TA No Unf 0 Unf 0 576 576 GasA Gd Y SBrkr 902 808 0 1710 0 0 2 0 3 1 TA 9 Typ 0 NA Detchd 1990 Unf 2 480 TA TA Y 12 11 64 0 0 0 NA GdPrv NA 0 4 2010 WD Normal 140000
65 60 RL NA 9375 Pave NA Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 1997 1998 Gable CompShg VinylSd VinylSd BrkFace 573 TA TA PConc Gd TA No GLQ 739 Unf 0 318 1057 GasA Ex Y SBrkr 1057 977 0 2034 1 0 2 1 3 1 Gd 8 Typ 0 NA Attchd 1998 RFn 2 645 TA TA Y 576 36 0 0 0 0 NA GdPrv NA 0 2 2009 WD Normal 219500
66 60 RL 76 9591 Pave NA Reg Lvl AllPub Inside Gtl NridgHt Norm Norm 1Fam 2Story 8 5 2004 2005 Gable CompShg VinylSd VinylSd BrkFace 344 Gd TA PConc Ex TA Av Unf 0 Unf 0 1143 1143 GasA Ex Y SBrkr 1143 1330 0 2473 0 0 2 1 4 1 Gd 9 Typ 1 Gd BuiltIn 2004 RFn 3 852 TA TA Y 192 151 0 0 0 0 NA NA NA 0 10 2007 WD Normal 317000
67 20 RL NA 19900 Pave NA Reg Lvl AllPub Inside Gtl NAmes PosA Norm 1Fam 1Story 7 5 1970 1989 Gable CompShg Plywood Plywood BrkFace 287 TA TA CBlock Gd TA Gd GLQ 912 Unf 0 1035 1947 GasA TA Y SBrkr 2207 0 0 2207 1 0 2 0 3 1 TA 7 Min1 1 Gd Attchd 1970 RFn 2 576 TA TA Y 301 0 0 0 0 0 NA NA NA 0 7 2010 WD Normal 180000
68 20 RL 72 10665 Pave NA IR1 Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 1Story 7 5 2003 2003 Gable CompShg VinylSd VinylSd BrkFace 167 Gd TA PConc Gd TA Av GLQ 1013 Unf 0 440 1453 GasA Ex Y SBrkr 1479 0 0 1479 1 0 2 0 3 1 Gd 7 Typ 0 NA Attchd 2003 RFn 2 558 TA TA Y 144 29 0 0 0 0 NA NA NA 0 6 2007 WD Normal 226000
69 30 RM 47 4608 Pave NA Reg Lvl AllPub Corner Gtl OldTown Artery Norm 1Fam 1Story 4 6 1945 1950 Gable CompShg MetalSd MetalSd None 0 TA Gd CBlock TA TA No Unf 0 Unf 0 747 747 GasA TA Y SBrkr 747 0 0 747 0 0 1 0 2 1 TA 4 Typ 0 NA Attchd 1945 Unf 1 220 TA TA Y 0 0 0 0 0 0 NA NA NA 0 6 2010 WD Normal 80000
70 50 RL 81 15593 Pave NA Reg Lvl AllPub Corner Gtl ClearCr Norm Norm 1Fam 1.5Fin 7 4 1953 1953 Gable CompShg BrkFace AsbShng None 0 Gd TA CBlock TA TA No BLQ 603 Unf 0 701 1304 GasW TA Y SBrkr 1304 983 0 2287 0 0 2 0 3 1 TA 7 Typ 1 TA Attchd 1953 Fin 2 667 TA TA Y 0 21 114 0 0 0 NA NA NA 0 7 2006 WD Normal 225000
71 20 RL 95 13651 Pave NA IR1 Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 7 6 1973 1973 Gable CompShg Plywood Plywood BrkFace 1115 TA Gd CBlock Gd TA Gd ALQ 1880 Unf 0 343 2223 GasA Ex Y SBrkr 2223 0 0 2223 1 0 2 0 3 1 TA 8 Typ 2 Gd Attchd 1973 Fin 2 516 TA TA Y 300 0 0 0 0 0 NA NA NA 0 2 2007 WD Normal 244000
72 20 RL 69 7599 Pave NA Reg Lvl AllPub Corner Gtl Mitchel Norm Norm 1Fam 1Story 4 6 1982 2006 Gable CompShg HdBoard Plywood None 0 TA TA CBlock TA TA No ALQ 565 Unf 0 280 845 GasA TA Y SBrkr 845 0 0 845 1 0 1 0 2 1 TA 4 Typ 0 NA Detchd 1987 Unf 2 360 TA TA Y 0 0 0 0 0 0 NA NA NA 0 6 2007 WD Normal 129500
73 60 RL 74 10141 Pave NA IR1 Lvl AllPub Corner Gtl Gilbert Norm Norm 1Fam 2Story 7 5 1998 1998 Gable CompShg VinylSd VinylSd BrkFace 40 TA TA PConc Gd TA No Unf 0 Unf 0 832 832 GasA Gd Y SBrkr 885 833 0 1718 0 0 2 1 3 1 TA 7 Typ 1 TA Attchd 1998 Fin 2 427 TA TA Y 0 94 0 0 291 0 NA NA NA 0 12 2009 WD Normal 185000
74 20 RL 85 10200 Pave NA Reg Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 5 7 1954 2003 Gable CompShg Wd Sdng Wd Sdng BrkFace 104 TA TA CBlock TA TA No ALQ 320 BLQ 362 404 1086 GasA Gd Y SBrkr 1086 0 0 1086 1 0 1 0 3 1 TA 6 Typ 0 NA Attchd 1989 Unf 2 490 TA TA Y 0 0 0 0 0 0 NA GdWo NA 0 5 2010 WD Normal 144900
75 50 RM 60 5790 Pave NA Reg Lvl AllPub Corner Gtl OldTown Norm Norm 1Fam 2Story 3 6 1915 1950 Gambrel CompShg VinylSd VinylSd None 0 Gd Gd CBlock Fa TA No Unf 0 Unf 0 840 840 GasA Gd N SBrkr 840 765 0 1605 0 0 2 0 3 2 TA 8 Typ 0 NA Detchd 1915 Unf 1 379 TA TA Y 0 0 202 0 0 0 NA NA NA 0 5 2010 WD Normal 107400
76 180 RM 21 1596 Pave NA Reg Lvl AllPub Inside Gtl MeadowV Norm Norm Twnhs SLvl 4 5 1973 1973 Gable CompShg CemntBd CmentBd None 0 TA TA CBlock Gd TA Gd GLQ 462 Unf 0 0 462 GasA TA Y SBrkr 526 462 0 988 1 0 1 0 2 1 TA 5 Typ 0 NA BuiltIn 1973 Unf 1 297 TA TA Y 120 101 0 0 0 0 NA GdWo NA 0 11 2009 WD Normal 91000
77 20 RL NA 8475 Pave NA IR1 Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 4 7 1956 1956 Gable CompShg VinylSd VinylSd None 0 TA TA CBlock TA TA No ALQ 228 Unf 0 724 952 GasA Ex Y FuseA 952 0 0 952 0 0 1 0 2 1 TA 4 Typ 0 NA Detchd 1956 Unf 1 283 TA TA Y 0 0 0 0 0 0 NA NA NA 0 4 2008 WD Normal 135750
78 50 RM 50 8635 Pave NA Reg Lvl AllPub Inside Gtl BrkSide Norm Norm 1Fam 1.5Fin 5 5 1948 2001 Gable CompShg Wd Sdng Wd Sdng None 0 TA TA CBlock TA TA No BLQ 336 GLQ 41 295 672 GasA TA Y SBrkr 1072 213 0 1285 1 0 1 0 2 1 TA 6 Min1 0 NA Detchd 1948 Unf 1 240 TA TA Y 0 0 0 0 0 0 NA MnPrv NA 0 1 2008 WD Normal 127000
79 90 RL 72 10778 Pave NA Reg Lvl AllPub Inside Gtl Sawyer Norm Norm Duplex 1Story 4 5 1968 1968 Hip CompShg HdBoard HdBoard None 0 TA TA CBlock TA TA No Unf 0 Unf 0 1768 1768 GasA TA N SBrkr 1768 0 0 1768 0 0 2 0 4 2 TA 8 Typ 0 NA NA NA NA 0 0 NA NA Y 0 0 0 0 0 0 NA NA NA 0 4 2010 WD Normal 136500
80 50 RM 60 10440 Pave Grvl Reg Lvl AllPub Corner Gtl OldTown Norm Norm 1Fam 2Story 5 6 1910 1981 Gable CompShg Wd Sdng Wd Sdng None 0 TA TA PConc TA TA No Unf 0 Unf 0 440 440 GasA Gd Y SBrkr 682 548 0 1230 0 0 1 1 2 1 TA 5 Typ 0 NA Detchd 1966 Unf 2 440 TA TA Y 74 0 128 0 0 0 NA MnPrv NA 0 5 2009 WD Normal 110000
81 60 RL 100 13000 Pave NA Reg Lvl AllPub Corner Gtl NAmes Norm Norm 1Fam 2Story 6 6 1968 1968 Gable CompShg VinylSd VinylSd BrkFace 576 TA Gd CBlock Gd TA No Rec 448 Unf 0 448 896 GasA TA Y SBrkr 1182 960 0 2142 0 0 2 1 4 1 Gd 8 Typ 1 Gd Attchd 1968 Fin 1 509 TA TA Y 0 72 0 0 252 0 NA NA NA 0 6 2009 WD Normal 193500
82 120 RM 32 4500 Pave NA Reg Lvl AllPub FR2 Gtl Mitchel Norm Norm TwnhsE 1Story 6 5 1998 1998 Hip CompShg VinylSd VinylSd BrkFace 443 TA Gd PConc Ex Gd No GLQ 1201 Unf 0 36 1237 GasA Ex Y SBrkr 1337 0 0 1337 1 0 2 0 2 1 TA 5 Typ 0 NA Attchd 1998 Fin 2 405 TA TA Y 0 199 0 0 0 0 NA NA NA 0 3 2006 WD Normal 153500
83 20 RL 78 10206 Pave NA Reg Lvl AllPub Inside Gtl Somerst Norm Norm 1Fam 1Story 8 5 2007 2007 Gable CompShg VinylSd VinylSd Stone 468 TA TA PConc Gd TA No GLQ 33 Unf 0 1530 1563 GasA Ex Y SBrkr 1563 0 0 1563 0 0 2 0 3 1 Gd 6 Typ 1 Gd Attchd 2007 RFn 3 758 TA TA Y 144 99 0 0 0 0 NA NA NA 0 10 2008 WD Normal 245000
84 20 RL 80 8892 Pave NA IR1 Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 5 5 1960 1960 Gable CompShg MetalSd MetalSd BrkCmn 66 TA TA CBlock TA TA No Unf 0 Unf 0 1065 1065 GasA Gd Y SBrkr 1065 0 0 1065 0 0 1 1 3 1 TA 6 Typ 0 NA Detchd 1974 Unf 2 461 TA TA Y 74 0 0 0 0 0 NA NA NA 0 7 2007 COD Normal 126500
85 80 RL NA 8530 Pave NA IR1 Lvl AllPub Inside Gtl Gilbert Norm Norm 1Fam SLvl 7 5 1995 1996 Gable CompShg HdBoard HdBoard BrkFace 22 TA TA PConc Gd TA No Unf 0 Unf 0 384 384 GasA Gd Y SBrkr 804 670 0 1474 0 0 2 1 3 1 TA 7 Typ 1 TA BuiltIn 1995 Fin 2 400 TA TA Y 120 72 0 0 0 0 NA NA Shed 700 5 2009 WD Normal 168500
86 60 RL 121 16059 Pave NA Reg Lvl AllPub Corner Gtl NoRidge Norm Norm 1Fam 2Story 8 5 1991 1992 Hip CompShg HdBoard HdBoard BrkFace 284 Gd TA CBlock Gd TA No Unf 0 Unf 0 1288 1288 GasA Ex Y SBrkr 1301 1116 0 2417 0 0 2 1 4 1 Gd 9 Typ 1 TA Attchd 1991 Unf 2 462 TA TA Y 127 82 0 0 0 0 NA NA NA 0 4 2006 WD Normal 260000
87 60 RL 122 11911 Pave NA IR2 Lvl AllPub Inside Gtl Gilbert Norm Norm 1Fam 2Story 6 5 2005 2005 Gable CompShg VinylSd VinylSd None 0 Gd TA PConc Gd TA Av Unf 0 Unf 0 684 684 GasA Ex Y SBrkr 684 876 0 1560 0 0 2 1 3 1 Gd 6 Typ 1 Gd BuiltIn 2005 Fin 2 400 TA TA Y 100 38 0 0 0 0 NA NA NA 0 3 2009 WD Normal 174000
88 160 FV 40 3951 Pave Pave Reg Lvl AllPub Corner Gtl Somerst Norm Norm TwnhsE 2Story 6 5 2009 2009 Gable CompShg VinylSd VinylSd Stone 76 Gd TA PConc Gd TA Av Unf 0 Unf 0 612 612 GasA Ex Y SBrkr 612 612 0 1224 0 0 2 1 2 1 Gd 4 Typ 0 NA Detchd 2009 RFn 2 528 TA TA Y 0 234 0 0 0 0 NA NA NA 0 6 2009 New Partial 164500
89 50 C (all) 105 8470 Pave NA IR1 Lvl AllPub Corner Gtl IDOTRR Feedr Feedr 1Fam 1.5Fin 3 2 1915 1982 Hip CompShg Plywood Plywood None 0 Fa Fa CBlock TA Fa No Unf 0 Unf 0 1013 1013 GasA TA N SBrkr 1013 0 513 1526 0 0 1 0 2 1 Fa 6 Typ 0 NA NA NA NA 0 0 NA NA N 0 0 156 0 0 0 NA MnPrv NA 0 10 2009 ConLD Abnorml 85000
90 20 RL 60 8070 Pave NA Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 1Story 4 5 1994 1995 Gable CompShg VinylSd VinylSd None 0 TA TA PConc Gd TA No GLQ 588 Unf 0 402 990 GasA Ex Y SBrkr 990 0 0 990 1 0 1 0 3 1 TA 5 Typ 0 NA NA NA NA 0 0 NA NA Y 0 0 0 0 0 0 NA NA NA 0 8 2007 WD Normal 123600
91 20 RL 60 7200 Pave NA Reg Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 4 5 1950 1950 Gable CompShg BrkFace Wd Sdng None 0 TA TA Slab NA NA NA NA 0 NA 0 0 0 GasA TA Y FuseA 1040 0 0 1040 0 0 1 0 2 1 TA 4 Typ 0 NA Detchd 1950 Unf 2 420 TA TA Y 0 29 0 0 0 0 NA NA NA 0 7 2006 WD Normal 109900
92 20 RL 85 8500 Pave NA Reg Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 5 3 1961 1961 Hip CompShg HdBoard HdBoard BrkCmn 203 TA TA CBlock TA TA No Rec 600 Unf 0 635 1235 GasA TA Y SBrkr 1235 0 0 1235 0 0 1 0 2 1 TA 6 Typ 0 NA Attchd 1961 Unf 2 480 TA TA Y 0 0 0 0 0 0 NA GdWo NA 0 12 2006 WD Abnorml 98600
93 30 RL 80 13360 Pave Grvl IR1 HLS AllPub Inside Gtl Crawfor Norm Norm 1Fam 1Story 5 7 1921 2006 Gable CompShg Wd Sdng Wd Sdng None 0 TA Gd BrkTil Gd TA No ALQ 713 Unf 0 163 876 GasA Ex Y SBrkr 964 0 0 964 1 0 1 0 2 1 TA 5 Typ 0 NA Detchd 1921 Unf 2 432 TA TA Y 0 0 44 0 0 0 NA NA NA 0 8 2009 WD Normal 163500
94 190 C (all) 60 7200 Pave NA Reg Lvl AllPub Corner Gtl OldTown Norm Norm 2fmCon 2.5Unf 6 6 1910 1998 Hip CompShg MetalSd MetalSd None 0 TA TA BrkTil TA Fa Mn Rec 1046 Unf 0 168 1214 GasW Ex N SBrkr 1260 1031 0 2291 0 1 2 0 4 2 TA 9 Typ 1 Gd Detchd 1900 Unf 2 506 TA TA Y 0 0 0 0 99 0 NA NA NA 0 11 2007 WD Normal 133900
95 60 RL 69 9337 Pave NA IR1 Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 6 5 1997 1997 Gable CompShg VinylSd VinylSd None 0 TA Gd PConc Gd TA No GLQ 648 Unf 0 176 824 GasA Ex Y SBrkr 905 881 0 1786 1 0 2 1 3 1 Gd 7 Typ 0 NA Attchd 1997 RFn 2 684 TA TA Y 0 162 0 0 0 0 NA NA NA 0 5 2007 WD Normal 204750
96 60 RL NA 9765 Pave NA IR2 Lvl AllPub Corner Gtl Gilbert Norm Norm 1Fam 2Story 6 8 1993 1993 Gable CompShg VinylSd VinylSd BrkFace 68 Ex Gd PConc Gd Gd No ALQ 310 Unf 0 370 680 GasA Gd Y SBrkr 680 790 0 1470 0 0 2 1 3 1 TA 6 Typ 1 TA BuiltIn 1993 Fin 2 420 TA TA Y 232 63 0 0 0 0 NA NA Shed 480 4 2009 WD Normal 185000
97 20 RL 78 10264 Pave NA IR1 Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 1Story 7 5 1999 1999 Gable CompShg VinylSd VinylSd BrkFace 183 Gd TA PConc Gd TA Av ALQ 1162 Unf 0 426 1588 GasA Ex Y SBrkr 1588 0 0 1588 0 0 2 0 3 1 Gd 6 Typ 0 NA Attchd 1999 RFn 2 472 TA TA Y 158 29 0 0 0 0 NA NA NA 0 8 2006 WD Normal 214000
98 20 RL 73 10921 Pave NA Reg HLS AllPub Inside Gtl Edwards Norm Norm 1Fam 1Story 4 5 1965 1965 Hip CompShg HdBoard HdBoard BrkFace 48 TA TA CBlock TA TA No Rec 520 Unf 0 440 960 GasA TA Y FuseF 960 0 0 960 1 0 1 0 3 1 TA 6 Typ 0 NA Attchd 1965 Fin 1 432 TA TA P 120 0 0 0 0 0 NA NA NA 0 5 2007 WD Normal 94750
99 30 RL 85 10625 Pave NA Reg Lvl AllPub Corner Gtl Edwards Norm Norm 1Fam 1Story 5 5 1920 1950 Gable CompShg Wd Sdng Wd Sdng None 0 TA TA BrkTil TA TA No ALQ 108 Unf 0 350 458 GasA Fa N SBrkr 835 0 0 835 0 0 1 0 2 1 TA 5 Typ 0 NA Basment 1920 Unf 1 366 Fa TA Y 0 0 77 0 0 0 NA NA Shed 400 5 2010 COD Abnorml 83000
100 20 RL 77 9320 Pave NA IR1 Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 4 5 1959 1959 Gable CompShg Plywood Plywood None 0 TA TA CBlock TA TA No ALQ 569 Unf 0 381 950 GasA Fa Y SBrkr 1225 0 0 1225 1 0 1 1 3 1 TA 6 Typ 0 NA NA NA NA 0 0 NA NA Y 352 0 0 0 0 0 NA NA Shed 400 1 2010 WD Normal 128950
dim(train_data)
## [1] 1460   81

There are 81 variables and 1460 obvervations in the training dataset.

dim(test_data)
## [1] 1459   80

There are 80 variables and 1459 obvervations in the test dataset.

Summary

summary(train_data)
##        Id           MSSubClass      MSZoning          LotFrontage    
##  Min.   :   1.0   Min.   : 20.0   Length:1460        Min.   : 21.00  
##  1st Qu.: 365.8   1st Qu.: 20.0   Class :character   1st Qu.: 59.00  
##  Median : 730.5   Median : 50.0   Mode  :character   Median : 69.00  
##  Mean   : 730.5   Mean   : 56.9                      Mean   : 70.05  
##  3rd Qu.:1095.2   3rd Qu.: 70.0                      3rd Qu.: 80.00  
##  Max.   :1460.0   Max.   :190.0                      Max.   :313.00  
##                                                      NA's   :259     
##     LotArea          Street             Alley             LotShape        
##  Min.   :  1300   Length:1460        Length:1460        Length:1460       
##  1st Qu.:  7554   Class :character   Class :character   Class :character  
##  Median :  9478   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 10517                                                           
##  3rd Qu.: 11602                                                           
##  Max.   :215245                                                           
##                                                                           
##  LandContour         Utilities          LotConfig        
##  Length:1460        Length:1460        Length:1460       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##   LandSlope         Neighborhood        Condition1       
##  Length:1460        Length:1460        Length:1460       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##   Condition2          BldgType          HouseStyle         OverallQual    
##  Length:1460        Length:1460        Length:1460        Min.   : 1.000  
##  Class :character   Class :character   Class :character   1st Qu.: 5.000  
##  Mode  :character   Mode  :character   Mode  :character   Median : 6.000  
##                                                           Mean   : 6.099  
##                                                           3rd Qu.: 7.000  
##                                                           Max.   :10.000  
##                                                                           
##   OverallCond      YearBuilt     YearRemodAdd   RoofStyle        
##  Min.   :1.000   Min.   :1872   Min.   :1950   Length:1460       
##  1st Qu.:5.000   1st Qu.:1954   1st Qu.:1967   Class :character  
##  Median :5.000   Median :1973   Median :1994   Mode  :character  
##  Mean   :5.575   Mean   :1971   Mean   :1985                     
##  3rd Qu.:6.000   3rd Qu.:2000   3rd Qu.:2004                     
##  Max.   :9.000   Max.   :2010   Max.   :2010                     
##                                                                  
##    RoofMatl         Exterior1st        Exterior2nd       
##  Length:1460        Length:1460        Length:1460       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##   MasVnrType          MasVnrArea      ExterQual          ExterCond        
##  Length:1460        Min.   :   0.0   Length:1460        Length:1460       
##  Class :character   1st Qu.:   0.0   Class :character   Class :character  
##  Mode  :character   Median :   0.0   Mode  :character   Mode  :character  
##                     Mean   : 103.7                                        
##                     3rd Qu.: 166.0                                        
##                     Max.   :1600.0                                        
##                     NA's   :8                                             
##   Foundation          BsmtQual           BsmtCond        
##  Length:1460        Length:1460        Length:1460       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##  BsmtExposure       BsmtFinType1         BsmtFinSF1     BsmtFinType2      
##  Length:1460        Length:1460        Min.   :   0.0   Length:1460       
##  Class :character   Class :character   1st Qu.:   0.0   Class :character  
##  Mode  :character   Mode  :character   Median : 383.5   Mode  :character  
##                                        Mean   : 443.6                     
##                                        3rd Qu.: 712.2                     
##                                        Max.   :5644.0                     
##                                                                           
##    BsmtFinSF2        BsmtUnfSF       TotalBsmtSF       Heating         
##  Min.   :   0.00   Min.   :   0.0   Min.   :   0.0   Length:1460       
##  1st Qu.:   0.00   1st Qu.: 223.0   1st Qu.: 795.8   Class :character  
##  Median :   0.00   Median : 477.5   Median : 991.5   Mode  :character  
##  Mean   :  46.55   Mean   : 567.2   Mean   :1057.4                     
##  3rd Qu.:   0.00   3rd Qu.: 808.0   3rd Qu.:1298.2                     
##  Max.   :1474.00   Max.   :2336.0   Max.   :6110.0                     
##                                                                        
##   HeatingQC          CentralAir         Electrical          X1stFlrSF   
##  Length:1460        Length:1460        Length:1460        Min.   : 334  
##  Class :character   Class :character   Class :character   1st Qu.: 882  
##  Mode  :character   Mode  :character   Mode  :character   Median :1087  
##                                                           Mean   :1163  
##                                                           3rd Qu.:1391  
##                                                           Max.   :4692  
##                                                                         
##    X2ndFlrSF     LowQualFinSF       GrLivArea     BsmtFullBath   
##  Min.   :   0   Min.   :  0.000   Min.   : 334   Min.   :0.0000  
##  1st Qu.:   0   1st Qu.:  0.000   1st Qu.:1130   1st Qu.:0.0000  
##  Median :   0   Median :  0.000   Median :1464   Median :0.0000  
##  Mean   : 347   Mean   :  5.845   Mean   :1515   Mean   :0.4253  
##  3rd Qu.: 728   3rd Qu.:  0.000   3rd Qu.:1777   3rd Qu.:1.0000  
##  Max.   :2065   Max.   :572.000   Max.   :5642   Max.   :3.0000  
##                                                                  
##   BsmtHalfBath        FullBath        HalfBath       BedroomAbvGr  
##  Min.   :0.00000   Min.   :0.000   Min.   :0.0000   Min.   :0.000  
##  1st Qu.:0.00000   1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:2.000  
##  Median :0.00000   Median :2.000   Median :0.0000   Median :3.000  
##  Mean   :0.05753   Mean   :1.565   Mean   :0.3829   Mean   :2.866  
##  3rd Qu.:0.00000   3rd Qu.:2.000   3rd Qu.:1.0000   3rd Qu.:3.000  
##  Max.   :2.00000   Max.   :3.000   Max.   :2.0000   Max.   :8.000  
##                                                                    
##   KitchenAbvGr   KitchenQual         TotRmsAbvGrd     Functional       
##  Min.   :0.000   Length:1460        Min.   : 2.000   Length:1460       
##  1st Qu.:1.000   Class :character   1st Qu.: 5.000   Class :character  
##  Median :1.000   Mode  :character   Median : 6.000   Mode  :character  
##  Mean   :1.047                      Mean   : 6.518                     
##  3rd Qu.:1.000                      3rd Qu.: 7.000                     
##  Max.   :3.000                      Max.   :14.000                     
##                                                                        
##    Fireplaces    FireplaceQu         GarageType         GarageYrBlt  
##  Min.   :0.000   Length:1460        Length:1460        Min.   :1900  
##  1st Qu.:0.000   Class :character   Class :character   1st Qu.:1961  
##  Median :1.000   Mode  :character   Mode  :character   Median :1980  
##  Mean   :0.613                                         Mean   :1979  
##  3rd Qu.:1.000                                         3rd Qu.:2002  
##  Max.   :3.000                                         Max.   :2010  
##                                                        NA's   :81    
##  GarageFinish         GarageCars      GarageArea      GarageQual       
##  Length:1460        Min.   :0.000   Min.   :   0.0   Length:1460       
##  Class :character   1st Qu.:1.000   1st Qu.: 334.5   Class :character  
##  Mode  :character   Median :2.000   Median : 480.0   Mode  :character  
##                     Mean   :1.767   Mean   : 473.0                     
##                     3rd Qu.:2.000   3rd Qu.: 576.0                     
##                     Max.   :4.000   Max.   :1418.0                     
##                                                                        
##   GarageCond         PavedDrive          WoodDeckSF      OpenPorchSF    
##  Length:1460        Length:1460        Min.   :  0.00   Min.   :  0.00  
##  Class :character   Class :character   1st Qu.:  0.00   1st Qu.:  0.00  
##  Mode  :character   Mode  :character   Median :  0.00   Median : 25.00  
##                                        Mean   : 94.24   Mean   : 46.66  
##                                        3rd Qu.:168.00   3rd Qu.: 68.00  
##                                        Max.   :857.00   Max.   :547.00  
##                                                                         
##  EnclosedPorch      X3SsnPorch      ScreenPorch        PoolArea      
##  Min.   :  0.00   Min.   :  0.00   Min.   :  0.00   Min.   :  0.000  
##  1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.00   1st Qu.:  0.000  
##  Median :  0.00   Median :  0.00   Median :  0.00   Median :  0.000  
##  Mean   : 21.95   Mean   :  3.41   Mean   : 15.06   Mean   :  2.759  
##  3rd Qu.:  0.00   3rd Qu.:  0.00   3rd Qu.:  0.00   3rd Qu.:  0.000  
##  Max.   :552.00   Max.   :508.00   Max.   :480.00   Max.   :738.000  
##                                                                      
##     PoolQC             Fence           MiscFeature       
##  Length:1460        Length:1460        Length:1460       
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##     MiscVal             MoSold           YrSold       SaleType        
##  Min.   :    0.00   Min.   : 1.000   Min.   :2006   Length:1460       
##  1st Qu.:    0.00   1st Qu.: 5.000   1st Qu.:2007   Class :character  
##  Median :    0.00   Median : 6.000   Median :2008   Mode  :character  
##  Mean   :   43.49   Mean   : 6.322   Mean   :2008                     
##  3rd Qu.:    0.00   3rd Qu.: 8.000   3rd Qu.:2009                     
##  Max.   :15500.00   Max.   :12.000   Max.   :2010                     
##                                                                       
##  SaleCondition        SalePrice     
##  Length:1460        Min.   : 34900  
##  Class :character   1st Qu.:129975  
##  Mode  :character   Median :163000  
##                     Mean   :180921  
##                     3rd Qu.:214000  
##                     Max.   :755000  
## 

Just to name a few, on average the houses has 2 full bath and the max is 3. Or the average prices for the houses is $180000 while the most expensive houses cost about $755000.

Histogram of all numeric values

train_data %>% 
  keep(is.numeric) %>% 
  gather() %>% 
  ggplot(aes(value)) +
    facet_wrap(~ key, scales = "free") +
    geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 348 rows containing non-finite values (stat_bin).

Histogram of Sale Prices

SP_hist <- ggplot(train_data, aes(x = SalePrice)) + geom_histogram(fill = "lightblue") + geom_vline(aes(xintercept=mean(SalePrice)), color="darkgreen", linetype="dashed", size=1)

SP_boxplot <- ggplot(train_data, aes(x = "", y = SalePrice)) + geom_boxplot(fill = "lightblue") + xlab("SalePrice")

grid.arrange(SP_hist, SP_boxplot, ncol=2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Sale Price against Sale Condition.

ggplot(train_data, aes(x=SaleCondition, y=SalePrice, fill=SaleCondition)) + geom_bar(stat="identity") + theme_minimal() + scale_fill_brewer(palette="Set3")

For better readability:

  • Normal Normal Sale
    • Abnorml Abnormal Sale - trade, foreclosure, short sale
    • AdjLand Adjoining Land Purchase
    • Alloca Allocation - two linked properties with separate deeds, typically condo with a garage unit
    • Family Sale between family members
    • Partial Home was not completed when last assessed (associated with New Homes)

Overall Quality of the houses

hquality <- data.frame(table(train_data$OverallQual))
ggplot(hquality, aes(x=Var1, y=Freq)) + geom_bar(stat="identity", color="brown", fill="orange") + xlab("Rate")

Houses are sold at average overall quality has a rating of 5. This makes sense because it is very expensive fixing up a house that is very poor and the average buys cannot afford houses rated at 10 because they would be expensive also. The distribution is slightly skewed.

How old are the houses?

year_built <- data.frame(table(train_data$YearBuilt))
ggplot(year_built, aes(x=Var1, y=Freq)) + geom_bar(stat="identity", color="blue", fill="white") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab( "Year")

Oldest houses are from the 19 century. Ranges from 1872 - 2010. Most of the houses were built in 2005.

House Foundation

Price for houses based on type of foundation.

ggplot(train_data, aes(x=Foundation, y=SalePrice, fill = Foundation)) + geom_boxplot()

For reference:

  • BrkTil Brick & Tile
    • CBlock Cinder Block
    • PConc Poured Contrete
    • Slab Slab
    • Stone Stone
    • Wood Wood

Building Type Sales

describeBy(train_data$SalePrice, group = train_data$BldgType)
## 
##  Descriptive statistics by group 
## group: 1Fam
##    vars    n     mean      sd median  trimmed      mad   min    max  range
## X1    1 1220 185763.8 82648.5 167900 175198.4 60638.34 34900 755000 720100
##    skew kurtosis      se
## X1 1.83     6.09 2366.22
## -------------------------------------------------------- 
## group: 2fmCon
##    vars  n     mean       sd median trimmed   mad   min    max  range skew
## X1    1 31 128432.3 35458.55 127500  126598 29652 55000 228950 173950 0.56
##    kurtosis      se
## X1     0.68 6368.54
## -------------------------------------------------------- 
## group: Duplex
##    vars  n     mean       sd median  trimmed      mad   min    max  range
## X1    1 52 133541.1 27833.25 135980 131867.5 21453.22 82000 206300 124300
##    skew kurtosis      se
## X1 0.48     0.24 3859.78
## -------------------------------------------------------- 
## group: Twnhs
##    vars  n     mean       sd median  trimmed     mad   min    max  range
## X1    1 43 135911.6 41013.22 137500 133648.6 54114.9 75000 230000 155000
##    skew kurtosis      se
## X1 0.27    -1.01 6254.46
## -------------------------------------------------------- 
## group: TwnhsE
##    vars   n     mean       sd median  trimmed      mad   min    max  range
## X1    1 114 181959.3 60626.11 172200 176496.9 45712.26 75500 392500 317000
##    skew kurtosis      se
## X1 0.98     1.07 5678.16

For Reference:

  • 1Fam Single-family Detached
    • 2FmCon Two-family Conversion; originally built as one-family dwelling
    • Duplx Duplex
    • TwnhsE Townhouse End Unit
    • TwnhsI Townhouse Inside Unit

Scatterplot Matrix

pairs(train_data[,c(5, 19, 39, 63, 81)], pch = 19)

We can spot some activity between SalePrice and LotArea or SalePrice and TotalBsmtSF.

Correlation Matrix

corr_mat <- cor(train_data[, c(5, 63, 81)]); corr_mat
##              LotArea GarageArea SalePrice
## LotArea    1.0000000  0.1804028 0.2638434
## GarageArea 0.1804028  1.0000000 0.6234314
## SalePrice  0.2638434  0.6234314 1.0000000
corrplot(corr_mat, type="upper", order="hclust")

Test Hypothesis of Correlations

\(H_0 = 0 \rightarrow\ There \ is \ no \ correlation\\H_A \neq 0 \rightarrow \ There \ is \ correlation\)

Lot Area and Sale Price

cor.test(train_data$LotArea, train_data$SalePrice, conf.level = 0.8)
## 
##  Pearson's product-moment correlation
## 
## data:  train_data$LotArea and train_data$SalePrice
## t = 10.445, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
##  0.2323391 0.2947946
## sample estimates:
##       cor 
## 0.2638434

Garage Area and Sale Price

cor.test(train_data$GarageArea, train_data$SalePrice, conf.level = 0.8)
## 
##  Pearson's product-moment correlation
## 
## data:  train_data$GarageArea and train_data$SalePrice
## t = 30.446, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
##  0.6024756 0.6435283
## sample estimates:
##       cor 
## 0.6234314

Lot Area and Garage Area

cor.test(train_data$LotArea, train_data$GarageArea, conf.level = 0.8)
## 
##  Pearson's product-moment correlation
## 
## data:  train_data$LotArea and train_data$GarageArea
## t = 7.0034, df = 1458, p-value = 3.803e-12
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
##  0.1477356 0.2126767
## sample estimates:
##       cor 
## 0.1804028

Based on the results above we will reject the null hypothesis that there is no correlations between the variables above. Also the p-values are very small which makes it safe to say that there is a correlation among each variable (Lot Area, Garage Area and Sale Price).

Family-wise error

The familywise error rate (FWE or FWER) is the probability of ending up with at least one false conclusion in a series of hypothesis tests . In other words, it’s the probability of making at least one Type I Error.

Let us check the family-wise error rate \(\rightarrow FWE ≤ 1 – (1 – \alpha_{IT})^c\)

Where:

\(\alpha_{IT}\) = alpha level for an individual test (e.g. .05), c = Number of comparisons.

Alpha Level 5% or 95% confidence interval

1 - (1 - 0.05)^3
## [1] 0.142625

In this case we would not worry much about familywise error as the probability of a type I error is just over 14%, which is low as we only performed three tests.

Alpha Level 20% or 80% confidence interval

1 - (1 - 0.2)^3
## [1] 0.488

On the other hand, we would worry about familywise error as the probability of a type I error is about 49%, which is high considering that we only performed three tests.


Linear Algebra and Correlation

Invert your correlation matrix from above. (This is known as the precision matrix and contains variance inflation factors on the diagonal.) Multiply the correlation matrix by the precision matrix, and then multiply the precision matrix by the correlation matrix. Conduct LU decomposition on the matrix.

Invert Correlation Matrix (Precision Matrix)

precision_mat <- solve(corr_mat)
precision_mat
##                LotArea  GarageArea  SalePrice
## LotArea     1.07530074 -0.02799273 -0.2662594
## GarageArea -0.02799273  1.63649778 -1.0128585
## SalePrice  -0.26625940 -1.01285847  1.7016986

Correlation Matrix \(\times\) Precision Matrix

corr_mat %*% precision_mat
##            LotArea GarageArea SalePrice
## LotArea          1          0         0
## GarageArea       0          1         0
## SalePrice        0          0         1

Precision Matrix \(\times\) Correlation Matrix

mult_mat <- precision_mat %*% corr_mat
mult_mat
##            LotArea   GarageArea     SalePrice
## LotArea          1 0.000000e+00  0.000000e+00
## GarageArea       0 1.000000e+00 -2.220446e-16
## SalePrice        0 2.220446e-16  1.000000e+00

LU Decomposition

lu_mult_mat <-lu.decomposition(mult_mat)

lu_mult_mat
## $L
##      [,1]         [,2] [,3]
## [1,]    1 0.000000e+00    0
## [2,]    0 1.000000e+00    0
## [3,]    0 2.220446e-16    1
## 
## $U
##      [,1] [,2]          [,3]
## [1,]    1    0  0.000000e+00
## [2,]    0    1 -2.220446e-16
## [3,]    0    0  1.000000e+00
#proof
lu_mult_mat$L %*% lu_mult_mat$U
##      [,1]         [,2]          [,3]
## [1,]    1 0.000000e+00  0.000000e+00
## [2,]    0 1.000000e+00 -2.220446e-16
## [3,]    0 2.220446e-16  1.000000e+00

Calculus-Based Probability & Statistics

Many times, it makes sense to fit a closed form distribution to data. Select a variable in the Kaggle.com training dataset that is skewed to the right, shift it so that the minimum value is absolutely above zero if necessary. Then load the MASS package and run fitdistr to fit an exponential probability density function. (See https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/fitdistr.html ). Find the optimal value of \(\lambda\) for this distribution, and then take 1000 samples from this exponential distribution using this value (e.g., rexp(1000, \(\lambda\))). Plot a histogram and compare it with a histogram of your original variable. Using the exponential pdf, find the \(5^{th}\) and \(95^{th}\) percentiles using the cumulative distribution function (CDF). Also generate a 95% confidence interval from the empirical data, assuming normality. Finally, provide the empirical 5th percentile and 95th percentile of the data. Discuss.

The variable I will use is GrLivArea. `

#no need to scale as this is the min
min(train_data$GrLivArea)
## [1] 334
# fit exponential pdf
fit <- fitdistr(train_data$GrLivArea, densfun = "exponential")

#optimal value for lambda
lambda <- fit$estimate

#1000 samples
exp_hist <- rexp(1000, lambda)

# compare histograms
G1 <- ggplot(train_data, aes(x = GrLivArea)) + geom_histogram(fill = "white", col = "black")
G2 <- ggplot()+ aes(exp_hist) + geom_histogram(fill = "white", col = "black") + xlab("Exp_GrLivArea")

grid.arrange(G1, G2, ncol= 2)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

The exponential shifts the estimates further to the left with an even longer tail to the right (more spread) compared to the original data.

qexp(c(0.05, 0.95),rate=lambda) ## 5th and 95th percentile
## [1]   77.73313 4539.92351
#0.95 confidence interval
CI(train_data$GrLivArea, ci=0.95)
##    upper     mean    lower 
## 1542.440 1515.464 1488.487
# 5th and 95th percentile of empircal data
quantile(train_data$GrLivArea, c(0.05, 0.95))
##     5%    95% 
##  848.0 2466.1

Based on the results above, at 95% confidence the exponential model would not be a good fit for this data. When looking at the confidence interval for the original data, it does not contain 95% of the data in the exponential model.


Modeling

Build some type of multiple regression model and submit your model to the competition board. Provide your complete model summary and results with analysis. Report your Kaggle.com user name and score.

Missing Data

aggr(train_data)

#Impute data using sample mean (Will only work for quantitative variables)
train_data <- complete(mice(data = train_data, m = 1, method = "mean"))
## 
##  iter imp variable
##   1   1  LotFrontage  MasVnrArea  GarageYrBlt
##   2   1  LotFrontage  MasVnrArea  GarageYrBlt
##   3   1  LotFrontage  MasVnrArea  GarageYrBlt
##   4   1  LotFrontage  MasVnrArea  GarageYrBlt
##   5   1  LotFrontage  MasVnrArea  GarageYrBlt
## Warning: Number of logged events: 58

Original Model

predSales.lm <- lm(SalePrice ~ MSSubClass +  LotArea + YearBuilt + YearRemodAdd + TotalBsmtSF + GrLivArea + FullBath + BedroomAbvGr + KitchenAbvGr, data = train_data)

summary(predSales.lm)
## 
## Call:
## lm(formula = SalePrice ~ MSSubClass + LotArea + YearBuilt + YearRemodAdd + 
##     TotalBsmtSF + GrLivArea + FullBath + BedroomAbvGr + KitchenAbvGr, 
##     data = train_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -613667  -19392   -2680   15284  260699 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.024e+06  1.293e+05 -15.648  < 2e-16 ***
## MSSubClass   -1.404e+02  2.862e+01  -4.906 1.04e-06 ***
## LotArea       4.441e-01  1.138e-01   3.903 9.92e-05 ***
## YearBuilt     5.965e+02  5.010e+01  11.907  < 2e-16 ***
## YearRemodAdd  4.647e+02  6.760e+01   6.874 9.23e-12 ***
## TotalBsmtSF   3.191e+01  3.212e+00   9.936  < 2e-16 ***
## GrLivArea     9.067e+01  3.307e+00  27.420  < 2e-16 ***
## FullBath      4.227e+03  2.938e+03   1.439    0.151    
## BedroomAbvGr -1.318e+04  1.673e+03  -7.879 6.45e-15 ***
## KitchenAbvGr -2.921e+04  5.364e+03  -5.447 6.01e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 40740 on 1450 degrees of freedom
## Multiple R-squared:  0.7387, Adjusted R-squared:  0.737 
## F-statistic: 455.3 on 9 and 1450 DF,  p-value: < 2.2e-16

Final Model

Predictors with low significance to the model are removed manually as shown below step-by-step with the update function.

Removing FullBath

predSales.lm <- update(predSales.lm, .~. - FullBath, data = train_data)
summary(predSales.lm)
## 
## Call:
## lm(formula = SalePrice ~ MSSubClass + LotArea + YearBuilt + YearRemodAdd + 
##     TotalBsmtSF + GrLivArea + BedroomAbvGr + KitchenAbvGr, data = train_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -620246  -18610   -2167   15265  260299 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.106e+06  1.159e+05 -18.171  < 2e-16 ***
## MSSubClass   -1.384e+02  2.860e+01  -4.838 1.45e-06 ***
## LotArea       4.449e-01  1.138e-01   3.909 9.70e-05 ***
## YearBuilt     6.230e+02  4.661e+01  13.367  < 2e-16 ***
## YearRemodAdd  4.806e+02  6.672e+01   7.204 9.36e-13 ***
## TotalBsmtSF   3.164e+01  3.207e+00   9.865  < 2e-16 ***
## GrLivArea     9.270e+01  2.990e+00  31.000  < 2e-16 ***
## BedroomAbvGr -1.281e+04  1.653e+03  -7.746 1.77e-14 ***
## KitchenAbvGr -2.786e+04  5.282e+03  -5.274 1.53e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 40750 on 1451 degrees of freedom
## Multiple R-squared:  0.7383, Adjusted R-squared:  0.7368 
## F-statistic: 511.6 on 8 and 1451 DF,  p-value: < 2.2e-16

This model is okay. The p-value is quite small which indicates significance. According to the \(Adjusted \ R^2\), the model explains 73.68% variation of observed values around the mean. This is good. The p-value of the F-statistic 511.6 for DF 8 and 1451 is extremely small which implies that removing variables like FullBath gave us a chance to improve the fit of the model significantly.

Residual Analysis

predSales_df <- augment(predSales.lm)

ggplot(predSales_df, aes(x = .fitted, y = .resid)) + geom_point() + geom_hline(yintercept=0, color = 'brown', size = 1) + ggtitle('Residual vs Fitted')

plot(predSales.lm, 1)

The residuals have a curved pattern which shows a small degree of non-linearity. This could be because we did not normalize our data. If you look at the second plot you’ll see that there are about three to four extreme values or outliers far above and below the line or far from the model which means that the model did not capture those data points. If we delete them, the model could improve.

ggplot(predSales_df, aes(x=.std.resid)) + geom_histogram(aes(y=..density..), bins = 50, colour="black")+
 geom_density(alpha=.2, fill="yellow") + ggtitle('Histogram of Residuals')

Based on the histogram above, this is a heavily tailed distribution. We can agree that the histogram seemingly indicates that the distribution is normal. However, we notice that each end of the histogram seems to extend much further than what we usually see in a normal distribution.

qplot(sample =.std.resid, data = predSales_df) + geom_abline()+ ggtitle('Normal Q-Q Plot')

plot(predSales.lm, 2)

Here we look to see if the residuals follow normal distribution or not. In our case the residual points follow the dotted line closely except at the tails where the points diverge away from the line curving in opposite directions.

plot(predSales.lm, 3)

Scale location plot indicates spread of points across predicted values range. A red horizontal line would be ideal and would indicate Homoscedasticity (equal or uniform variance). However, our model shows Heteroskedasticity (non-uniform variance) as the residuals spread out. The higher the price of the houses, the less observations there are. The observations densly populate the line in the low or medium price ranges.

plot(predSales.lm, 4)

This plot, also known as Cook’s distance, tells us which points have the greatest influence on the regression (leverage points). We see that observations 524, 692 and 1299 have the greatest influence on the model. Perhaps, if we were to remove those outliers our model would be more normal.

Predict Model

mypred <- predict(predSales.lm, subset(test_data, select = c(MSSubClass, LotArea, YearBuilt, TotalBsmtSF, YearRemodAdd, GrLivArea, BedroomAbvGr, KitchenAbvGr)))

kaggle_pred <- data.frame(cbind(test_data$Id, mypred))
colnames(kaggle_pred) <- c("ID", "SalePrice")
kaggle_pred[is.na(kaggle_pred)] <- 0

write.csv(kaggle_pred, file = "PredSub1.csv", row.names = F) #write predictions to file

Predictions

kable(head(kaggle_pred, 10)) %>% kable_styling(bootstrap_options = c("striped", "condensed", "responsive"), full_width = F)
ID SalePrice
1461 118066.4
1462 157409.7
1463 210328.0
1464 206856.3
1465 183692.0
1466 201398.6
1467 181089.2
1468 188934.4
1469 203909.2
1470 125268.2

My username is javernw and score is 0.19039

The code to this assignment can be found on MY GITHUB