Elina Azrilyan

December 11th, 2019
#install.packages("GGally")
#install.packages('MASS')
suppressMessages(library(kableExtra))
suppressMessages(library(GGally))
suppressMessages(library(ggplot2))
suppressMessages(library(pracma))
## Warning: package 'pracma' was built under R version 3.5.2
suppressMessages(library(MASS))
## Warning: package 'MASS' was built under R version 3.5.2

Problem 1.

Using R, generate a random variable X that has 10,000 random uniform numbers from 1 to N, where N can be any number of your choosing greater than or equal to 6. Then generate a random variable Y that has 10,000 random normal numbers with a mean of μ=σ=(N+1)/2.

set.seed(101)
N <- 6
X <- runif(10000, 1, N)

Let’s take a look at the distribution of X

hist(X)

mu <- (N+1)/2
Y <- rnorm(10000,mean=mu)

Let’s take a look at the distribution of X

hist(Y)

Probability. Part 1. 5 points.

Calculate as a minimum the below probabilities a through c. Assume the small letter “x” is estimated as the median of the X variable, and the small letter “y” is estimated as the 1st quartile of the Y variable. Interpret the meaning of all probabilities.

x <- median(X)
x
## [1] 3.487115
y <- quantile(Y, 0.25)
y
##      25% 
## 2.829464

a. P(X > x | X > y)

We need to calculate the P(X>x and X>y) and divide that by P(X>y)

#P(X>x and X>y)
P1<-sum(X>x & X>y)/10000
#P(X>y)
P2<-sum(X>y)/10000
round(P1/P2,3)
## [1] 0.785

b. P(X > x, Y > y)

#P(X>x and Y>y)
P3<-sum(X>x & Y>y)/10000
round(P3,3)
## [1] 0.378

c. P (X < x | X > y)

#P(X<x and X>y)
P4<-sum((X<x) & (X>y))/10000
#P(X>y)
P2<-sum(X>y)/10000
round(P4/P2,3)
## [1] 0.215

Probability. Part 2. 5 points.

Investigate whether P(X>x and Y>y)=P(X>x)P(Y>y) by building a table and evaluating the marginal and joint probabilities.

df<-data.frame("Xgx" =c(sum(X>x & Y<y), sum(X>x & Y>y), sum(X>x & Y<y)+sum(X>x & Y>y)),
              "Xlx" = c(sum(X<x & Y<y), sum(X<x & Y>y), sum(X<x & Y<y)+sum(X<x & Y>y)),
              "Total" = c(sum(X>x & Y<y)+sum(X<x & Y<y), sum(X>x & Y>y)+sum(X<x & Y>y), sum(X>x & Y<y)+sum(X>x & Y>y)+ sum(X<x & Y<y)+sum(X<x & Y>y)))
names(df) <- c("X>x","X<x","Total")
row.names(df) <- c("Y<y","Y>y", "Total")

df %>% kable(caption = "Table of Probabilities") %>% kable_styling("striped", full_width = TRUE)
Table of Probabilities
X>x X<x Total
Y<y 1222 1278 2500
Y>y 3778 3722 7500
Total 5000 5000 10000

Let’s use the table to locate P(X>x and Y>y) = 3756/10000.

df[2,1]/df[3,3]
## [1] 0.3778

Now, let’s find P(X>x)P(Y>y), P(X>x) = 5000/10000, P(Y>y) = 7500/10000.

#P(X>x)
df[3,1]/df[3,3]
## [1] 0.5
#P(Y>y)
df[2,3]/df[3,3]
## [1] 0.75
#P(X>x)P(Y>y)
(df[3,1]/df[3,3]) * (df[2,3]/df[3,3])
## [1] 0.375

We conclude that the probabilities are independant since P(X>x and Y>y)=P(X>x)P(Y>y).

Probability. Part 3. 5 points.

Check to see if independence holds by using Fisher’s Exact Test and the Chi Square Test. What is the difference between the two? Which is most appropriate?

Fisher’s Exact Test.

fisher.test(df, simulate.p.value=TRUE)
## 
##  Fisher's Exact Test for Count Data with simulated p-value (based
##  on 2000 replicates)
## 
## data:  df
## p-value = 0.7966
## alternative hypothesis: two.sided

p-value is close to 1, we don’t reject the null hypothesis and conclude that these variables are independent.

Chi Square Test.

chisq.test(df)
## 
##  Pearson's Chi-squared test
## 
## data:  df
## X-squared = 1.6725, df = 4, p-value = 0.7957

A test statistic close to 0 and a p-value close to 1, we don’t reject the null hypothesis and conclude that these variables are independent.

Difference between Fisher’s Exact Test and the Chi Square Test:

Fisher’s exact test, always gives an exact P value and works fine with small sample sizes. Most statistical books advise using it instead of chi-square test. Chi Square Rest is very accurate with large values. Fisher’s Exact test is more appropriate.

Problem 2.

You are to register for Kaggle.com (free) and compete in the House Prices: Advanced Regression Techniques competition. https://www.kaggle.com/c/house-prices-advanced-regression-techniques. I want you to do the following.

Problem 2. Part 1. 5 points. Descriptive and Inferential Statistics.

Provide univariate descriptive statistics and appropriate plots for the training data set. Provide a scatterplot matrix for at least two of the independent variables and the dependent variable. Derive a correlation matrix for any three quantitative variables in the dataset. Test the hypotheses that the correlations between each pairwise set of variables is 0 and provide an 80% confidence interval. Discuss the meaning of your analysis. Would you be worried about familywise error? Why or why not?

Loading the Train and Test datasets
train <- read.csv('train.csv', sep = ',', header = T, stringsAsFactors = F)
test <- read.csv("test.csv", sep = ',', header = T, stringsAsFactors = F)
Descriptive Statistics and Plots

Let’s view the head of Train data.

head(train,20) %>% kable(caption = "Train") %>% kable_styling("striped", full_width = TRUE) %>% scroll_box("width:400px")
Train
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical X1stFlrSF X2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch X3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition SalePrice
1 60 RL 65 8450 Pave NA Reg Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2003 2003 Gable CompShg VinylSd VinylSd BrkFace 196 Gd TA PConc Gd TA No GLQ 706 Unf 0 150 856 GasA Ex Y SBrkr 856 854 0 1710 1 0 2 1 3 1 Gd 8 Typ 0 NA Attchd 2003 RFn 2 548 TA TA Y 0 61 0 0 0 0 NA NA NA 0 2 2008 WD Normal 208500
2 20 RL 80 9600 Pave NA Reg Lvl AllPub FR2 Gtl Veenker Feedr Norm 1Fam 1Story 6 8 1976 1976 Gable CompShg MetalSd MetalSd None 0 TA TA CBlock Gd TA Gd ALQ 978 Unf 0 284 1262 GasA Ex Y SBrkr 1262 0 0 1262 0 1 2 0 3 1 TA 6 Typ 1 TA Attchd 1976 RFn 2 460 TA TA Y 298 0 0 0 0 0 NA NA NA 0 5 2007 WD Normal 181500
3 60 RL 68 11250 Pave NA IR1 Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 2Story 7 5 2001 2002 Gable CompShg VinylSd VinylSd BrkFace 162 Gd TA PConc Gd TA Mn GLQ 486 Unf 0 434 920 GasA Ex Y SBrkr 920 866 0 1786 1 0 2 1 3 1 Gd 6 Typ 1 TA Attchd 2001 RFn 2 608 TA TA Y 0 42 0 0 0 0 NA NA NA 0 9 2008 WD Normal 223500
4 70 RL 60 9550 Pave NA IR1 Lvl AllPub Corner Gtl Crawfor Norm Norm 1Fam 2Story 7 5 1915 1970 Gable CompShg Wd Sdng Wd Shng None 0 TA TA BrkTil TA Gd No ALQ 216 Unf 0 540 756 GasA Gd Y SBrkr 961 756 0 1717 1 0 1 0 3 1 Gd 7 Typ 1 Gd Detchd 1998 Unf 3 642 TA TA Y 0 35 272 0 0 0 NA NA NA 0 2 2006 WD Abnorml 140000
5 60 RL 84 14260 Pave NA IR1 Lvl AllPub FR2 Gtl NoRidge Norm Norm 1Fam 2Story 8 5 2000 2000 Gable CompShg VinylSd VinylSd BrkFace 350 Gd TA PConc Gd TA Av GLQ 655 Unf 0 490 1145 GasA Ex Y SBrkr 1145 1053 0 2198 1 0 2 1 4 1 Gd 9 Typ 1 TA Attchd 2000 RFn 3 836 TA TA Y 192 84 0 0 0 0 NA NA NA 0 12 2008 WD Normal 250000
6 50 RL 85 14115 Pave NA IR1 Lvl AllPub Inside Gtl Mitchel Norm Norm 1Fam 1.5Fin 5 5 1993 1995 Gable CompShg VinylSd VinylSd None 0 TA TA Wood Gd TA No GLQ 732 Unf 0 64 796 GasA Ex Y SBrkr 796 566 0 1362 1 0 1 1 1 1 TA 5 Typ 0 NA Attchd 1993 Unf 2 480 TA TA Y 40 30 0 320 0 0 NA MnPrv Shed 700 10 2009 WD Normal 143000
7 20 RL 75 10084 Pave NA Reg Lvl AllPub Inside Gtl Somerst Norm Norm 1Fam 1Story 8 5 2004 2005 Gable CompShg VinylSd VinylSd Stone 186 Gd TA PConc Ex TA Av GLQ 1369 Unf 0 317 1686 GasA Ex Y SBrkr 1694 0 0 1694 1 0 2 0 3 1 Gd 7 Typ 1 Gd Attchd 2004 RFn 2 636 TA TA Y 255 57 0 0 0 0 NA NA NA 0 8 2007 WD Normal 307000
8 60 RL NA 10382 Pave NA IR1 Lvl AllPub Corner Gtl NWAmes PosN Norm 1Fam 2Story 7 6 1973 1973 Gable CompShg HdBoard HdBoard Stone 240 TA TA CBlock Gd TA Mn ALQ 859 BLQ 32 216 1107 GasA Ex Y SBrkr 1107 983 0 2090 1 0 2 1 3 1 TA 7 Typ 2 TA Attchd 1973 RFn 2 484 TA TA Y 235 204 228 0 0 0 NA NA Shed 350 11 2009 WD Normal 200000
9 50 RM 51 6120 Pave NA Reg Lvl AllPub Inside Gtl OldTown Artery Norm 1Fam 1.5Fin 7 5 1931 1950 Gable CompShg BrkFace Wd Shng None 0 TA TA BrkTil TA TA No Unf 0 Unf 0 952 952 GasA Gd Y FuseF 1022 752 0 1774 0 0 2 0 2 2 TA 8 Min1 2 TA Detchd 1931 Unf 2 468 Fa TA Y 90 0 205 0 0 0 NA NA NA 0 4 2008 WD Abnorml 129900
10 190 RL 50 7420 Pave NA Reg Lvl AllPub Corner Gtl BrkSide Artery Artery 2fmCon 1.5Unf 5 6 1939 1950 Gable CompShg MetalSd MetalSd None 0 TA TA BrkTil TA TA No GLQ 851 Unf 0 140 991 GasA Ex Y SBrkr 1077 0 0 1077 1 0 1 0 2 2 TA 5 Typ 2 TA Attchd 1939 RFn 1 205 Gd TA Y 0 4 0 0 0 0 NA NA NA 0 1 2008 WD Normal 118000
11 20 RL 70 11200 Pave NA Reg Lvl AllPub Inside Gtl Sawyer Norm Norm 1Fam 1Story 5 5 1965 1965 Hip CompShg HdBoard HdBoard None 0 TA TA CBlock TA TA No Rec 906 Unf 0 134 1040 GasA Ex Y SBrkr 1040 0 0 1040 1 0 1 0 3 1 TA 5 Typ 0 NA Detchd 1965 Unf 1 384 TA TA Y 0 0 0 0 0 0 NA NA NA 0 2 2008 WD Normal 129500
12 60 RL 85 11924 Pave NA IR1 Lvl AllPub Inside Gtl NridgHt Norm Norm 1Fam 2Story 9 5 2005 2006 Hip CompShg WdShing Wd Shng Stone 286 Ex TA PConc Ex TA No GLQ 998 Unf 0 177 1175 GasA Ex Y SBrkr 1182 1142 0 2324 1 0 3 0 4 1 Ex 11 Typ 2 Gd BuiltIn 2005 Fin 3 736 TA TA Y 147 21 0 0 0 0 NA NA NA 0 7 2006 New Partial 345000
13 20 RL NA 12968 Pave NA IR2 Lvl AllPub Inside Gtl Sawyer Norm Norm 1Fam 1Story 5 6 1962 1962 Hip CompShg HdBoard Plywood None 0 TA TA CBlock TA TA No ALQ 737 Unf 0 175 912 GasA TA Y SBrkr 912 0 0 912 1 0 1 0 2 1 TA 4 Typ 0 NA Detchd 1962 Unf 1 352 TA TA Y 140 0 0 0 176 0 NA NA NA 0 9 2008 WD Normal 144000
14 20 RL 91 10652 Pave NA IR1 Lvl AllPub Inside Gtl CollgCr Norm Norm 1Fam 1Story 7 5 2006 2007 Gable CompShg VinylSd VinylSd Stone 306 Gd TA PConc Gd TA Av Unf 0 Unf 0 1494 1494 GasA Ex Y SBrkr 1494 0 0 1494 0 0 2 0 3 1 Gd 7 Typ 1 Gd Attchd 2006 RFn 3 840 TA TA Y 160 33 0 0 0 0 NA NA NA 0 8 2007 New Partial 279500
15 20 RL NA 10920 Pave NA IR1 Lvl AllPub Corner Gtl NAmes Norm Norm 1Fam 1Story 6 5 1960 1960 Hip CompShg MetalSd MetalSd BrkFace 212 TA TA CBlock TA TA No BLQ 733 Unf 0 520 1253 GasA TA Y SBrkr 1253 0 0 1253 1 0 1 1 2 1 TA 5 Typ 1 Fa Attchd 1960 RFn 1 352 TA TA Y 0 213 176 0 0 0 NA GdWo NA 0 5 2008 WD Normal 157000
16 45 RM 51 6120 Pave NA Reg Lvl AllPub Corner Gtl BrkSide Norm Norm 1Fam 1.5Unf 7 8 1929 2001 Gable CompShg Wd Sdng Wd Sdng None 0 TA TA BrkTil TA TA No Unf 0 Unf 0 832 832 GasA Ex Y FuseA 854 0 0 854 0 0 1 0 2 1 TA 5 Typ 0 NA Detchd 1991 Unf 2 576 TA TA Y 48 112 0 0 0 0 NA GdPrv NA 0 7 2007 WD Normal 132000
17 20 RL NA 11241 Pave NA IR1 Lvl AllPub CulDSac Gtl NAmes Norm Norm 1Fam 1Story 6 7 1970 1970 Gable CompShg Wd Sdng Wd Sdng BrkFace 180 TA TA CBlock TA TA No ALQ 578 Unf 0 426 1004 GasA Ex Y SBrkr 1004 0 0 1004 1 0 1 0 2 1 TA 5 Typ 1 TA Attchd 1970 Fin 2 480 TA TA Y 0 0 0 0 0 0 NA NA Shed 700 3 2010 WD Normal 149000
18 90 RL 72 10791 Pave NA Reg Lvl AllPub Inside Gtl Sawyer Norm Norm Duplex 1Story 4 5 1967 1967 Gable CompShg MetalSd MetalSd None 0 TA TA Slab NA NA NA NA 0 NA 0 0 0 GasA TA Y SBrkr 1296 0 0 1296 0 0 2 0 2 2 TA 6 Typ 0 NA CarPort 1967 Unf 2 516 TA TA Y 0 0 0 0 0 0 NA NA Shed 500 10 2006 WD Normal 90000
19 20 RL 66 13695 Pave NA Reg Lvl AllPub Inside Gtl SawyerW RRAe Norm 1Fam 1Story 5 5 2004 2004 Gable CompShg VinylSd VinylSd None 0 TA TA PConc TA TA No GLQ 646 Unf 0 468 1114 GasA Ex Y SBrkr 1114 0 0 1114 1 0 1 1 3 1 Gd 6 Typ 0 NA Detchd 2004 Unf 2 576 TA TA Y 0 102 0 0 0 0 NA NA NA 0 6 2008 WD Normal 159000
20 20 RL 70 7560 Pave NA Reg Lvl AllPub Inside Gtl NAmes Norm Norm 1Fam 1Story 5 6 1958 1965 Hip CompShg BrkFace Plywood None 0 TA TA CBlock TA TA No LwQ 504 Unf 0 525 1029 GasA TA Y SBrkr 1339 0 0 1339 0 0 1 0 3 1 TA 6 Min1 0 NA Attchd 1958 Unf 1 294 TA TA Y 0 0 0 0 0 0 NA MnPrv NA 0 5 2009 COD Abnorml 139000

Next step is to review a summary of our Train dataset to get a better idea of the type of variables we have available to us for analysis.

summary(train) %>% kable(caption = "Train Summary All Columns") %>% kable_styling("striped", full_width = TRUE) %>% scroll_box("width:400px")
Train Summary All Columns
   Id </th>
MSSubClass MSZoning LotFrontage
LotArea </th>
Street </th>
Alley </th>
LotShape LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical X1stFlrSF X2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath
FullBath </th>
HalfBath </th>
BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF EnclosedPorch X3SsnPorch ScreenPorch
PoolArea </th>
PoolQC </th>
Fence </th>
MiscFeature
MiscVal </th>
 MoSold </th>
 YrSold </th>
SaleType SaleCondition SalePrice
Min. : 1.0 Min. : 20.0 Length:1460 Min. : 21.00 Min. : 1300 Length:1460 Length:1460 Length:1460 Length:1460 Length:1460 Length:1460 Length:1460 Length:1460 Length:1460 Length:1460 Length:1460 Length:1460 Min. : 1.000 Min. :1.000 Min. :1872 Min. :1950 Length:1460 Length:1460 Length:1460 Length:1460 Length:1460 Min. : 0.0 Length:1460 Length:1460 Length:1460 Length:1460 Length:1460 Length:1460 Length:1460 Min. : 0.0 Length:1460 Min. : 0.00 Min. : 0.0 Min. : 0.0 Length:1460 Length:1460 Length:1460 Length:1460 Min. : 334 Min. : 0 Min. : 0.000 Min. : 334 Min. :0.0000 Min. :0.00000 Min. :0.000 Min. :0.0000 Min. :0.000 Min. :0.000 Length:1460 Min. : 2.000 Length:1460 Min. :0.000 Length:1460 Length:1460 Min. :1900 Length:1460 Min. :0.000 Min. : 0.0 Length:1460 Length:1460 Length:1460 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.000 Length:1460 Length:1460 Length:1460 Min. : 0.00 Min. : 1.000 Min. :2006 Length:1460 Length:1460 Min. : 34900
1st Qu.: 365.8 1st Qu.: 20.0 Class :character 1st Qu.: 59.00 1st Qu.: 7554 Class :character Class :character Class :character Class :character Class :character Class :character Class :character Class :character Class :character Class :character Class :character Class :character 1st Qu.: 5.000 1st Qu.:5.000 1st Qu.:1954 1st Qu.:1967 Class :character Class :character Class :character Class :character Class :character 1st Qu.: 0.0 Class :character Class :character Class :character Class :character Class :character Class :character Class :character 1st Qu.: 0.0 Class :character 1st Qu.: 0.00 1st Qu.: 223.0 1st Qu.: 795.8 Class :character Class :character Class :character Class :character 1st Qu.: 882 1st Qu.: 0 1st Qu.: 0.000 1st Qu.:1130 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:2.000 1st Qu.:1.000 Class :character 1st Qu.: 5.000 Class :character 1st Qu.:0.000 Class :character Class :character 1st Qu.:1961 Class :character 1st Qu.:1.000 1st Qu.: 334.5 Class :character Class :character Class :character 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.000 Class :character Class :character Class :character 1st Qu.: 0.00 1st Qu.: 5.000 1st Qu.:2007 Class :character Class :character 1st Qu.:129975
Median : 730.5 Median : 50.0 Mode :character Median : 69.00 Median : 9478 Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Median : 6.000 Median :5.000 Median :1973 Median :1994 Mode :character Mode :character Mode :character Mode :character Mode :character Median : 0.0 Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Median : 383.5 Mode :character Median : 0.00 Median : 477.5 Median : 991.5 Mode :character Mode :character Mode :character Mode :character Median :1087 Median : 0 Median : 0.000 Median :1464 Median :0.0000 Median :0.00000 Median :2.000 Median :0.0000 Median :3.000 Median :1.000 Mode :character Median : 6.000 Mode :character Median :1.000 Mode :character Mode :character Median :1980 Mode :character Median :2.000 Median : 480.0 Mode :character Mode :character Mode :character Median : 0.00 Median : 25.00 Median : 0.00 Median : 0.00 Median : 0.00 Median : 0.000 Mode :character Mode :character Mode :character Median : 0.00 Median : 6.000 Median :2008 Mode :character Mode :character Median :163000
Mean : 730.5 Mean : 56.9 NA Mean : 70.05 Mean : 10517 NA NA NA NA NA NA NA NA NA NA NA NA Mean : 6.099 Mean :5.575 Mean :1971 Mean :1985 NA NA NA NA NA Mean : 103.7 NA NA NA NA NA NA NA Mean : 443.6 NA Mean : 46.55 Mean : 567.2 Mean :1057.4 NA NA NA NA Mean :1163 Mean : 347 Mean : 5.845 Mean :1515 Mean :0.4253 Mean :0.05753 Mean :1.565 Mean :0.3829 Mean :2.866 Mean :1.047 NA Mean : 6.518 NA Mean :0.613 NA NA Mean :1979 NA Mean :1.767 Mean : 473.0 NA NA NA Mean : 94.24 Mean : 46.66 Mean : 21.95 Mean : 3.41 Mean : 15.06 Mean : 2.759 NA NA NA Mean : 43.49 Mean : 6.322 Mean :2008 NA NA Mean :180921
3rd Qu.:1095.2 3rd Qu.: 70.0 NA 3rd Qu.: 80.00 3rd Qu.: 11602 NA NA NA NA NA NA NA NA NA NA NA NA 3rd Qu.: 7.000 3rd Qu.:6.000 3rd Qu.:2000 3rd Qu.:2004 NA NA NA NA NA 3rd Qu.: 166.0 NA NA NA NA NA NA NA 3rd Qu.: 712.2 NA 3rd Qu.: 0.00 3rd Qu.: 808.0 3rd Qu.:1298.2 NA NA NA NA 3rd Qu.:1391 3rd Qu.: 728 3rd Qu.: 0.000 3rd Qu.:1777 3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:2.000 3rd Qu.:1.0000 3rd Qu.:3.000 3rd Qu.:1.000 NA 3rd Qu.: 7.000 NA 3rd Qu.:1.000 NA NA 3rd Qu.:2002 NA 3rd Qu.:2.000 3rd Qu.: 576.0 NA NA NA 3rd Qu.:168.00 3rd Qu.: 68.00 3rd Qu.: 0.00 3rd Qu.: 0.00 3rd Qu.: 0.00 3rd Qu.: 0.000 NA NA NA 3rd Qu.: 0.00 3rd Qu.: 8.000 3rd Qu.:2009 NA NA 3rd Qu.:214000
Max. :1460.0 Max. :190.0 NA Max. :313.00 Max. :215245 NA NA NA NA NA NA NA NA NA NA NA NA Max. :10.000 Max. :9.000 Max. :2010 Max. :2010 NA NA NA NA NA Max. :1600.0 NA NA NA NA NA NA NA Max. :5644.0 NA Max. :1474.00 Max. :2336.0 Max. :6110.0 NA NA NA NA Max. :4692 Max. :2065 Max. :572.000 Max. :5642 Max. :3.0000 Max. :2.00000 Max. :3.000 Max. :2.0000 Max. :8.000 Max. :3.000 NA Max. :14.000 NA Max. :3.000 NA NA Max. :2010 NA Max. :4.000 Max. :1418.0 NA NA NA Max. :857.00 Max. :547.00 Max. :552.00 Max. :508.00 Max. :480.00 Max. :738.000 NA NA NA Max. :15500.00 Max. :12.000 Max. :2010 NA NA Max. :755000
NA NA NA NA’s :259 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA’s :8 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA’s :81 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
summary(train$SalePrice)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   34900  129975  163000  180921  214000  755000
hist(train$SalePrice, main="Histogram of Sale Price", xlab="Sale Price")