#install.packages("GGally")
#install.packages('MASS')
suppressMessages(library(kableExtra))
suppressMessages(library(GGally))
suppressMessages(library(ggplot2))
suppressMessages(library(pracma))
## Warning: package 'pracma' was built under R version 3.5.2
suppressMessages(library(MASS))
## Warning: package 'MASS' was built under R version 3.5.2
Using R, generate a random variable X that has 10,000 random uniform numbers from 1 to N, where N can be any number of your choosing greater than or equal to 6. Then generate a random variable Y that has 10,000 random normal numbers with a mean of μ=σ=(N+1)/2.
set.seed(101)
N < 6
X < runif(10000, 1, N)
Let’s take a look at the distribution of X
hist(X)
mu < (N+1)/2
Y < rnorm(10000,mean=mu)
Let’s take a look at the distribution of X
hist(Y)
Calculate as a minimum the below probabilities a through c. Assume the small letter “x” is estimated as the median of the X variable, and the small letter “y” is estimated as the 1st quartile of the Y variable. Interpret the meaning of all probabilities.
x < median(X)
x
## [1] 3.487115
y < quantile(Y, 0.25)
y
## 25%
## 2.829464
We need to calculate the P(X>x and X>y) and divide that by P(X>y)
#P(X>x and X>y)
P1<sum(X>x & X>y)/10000
#P(X>y)
P2<sum(X>y)/10000
round(P1/P2,3)
## [1] 0.785
#P(X>x and Y>y)
P3<sum(X>x & Y>y)/10000
round(P3,3)
## [1] 0.378
#P(X<x and X>y)
P4<sum((X<x) & (X>y))/10000
#P(X>y)
P2<sum(X>y)/10000
round(P4/P2,3)
## [1] 0.215
Investigate whether P(X>x and Y>y)=P(X>x)P(Y>y) by building a table and evaluating the marginal and joint probabilities.
df<data.frame("Xgx" =c(sum(X>x & Y<y), sum(X>x & Y>y), sum(X>x & Y<y)+sum(X>x & Y>y)),
"Xlx" = c(sum(X<x & Y<y), sum(X<x & Y>y), sum(X<x & Y<y)+sum(X<x & Y>y)),
"Total" = c(sum(X>x & Y<y)+sum(X<x & Y<y), sum(X>x & Y>y)+sum(X<x & Y>y), sum(X>x & Y<y)+sum(X>x & Y>y)+ sum(X<x & Y<y)+sum(X<x & Y>y)))
names(df) < c("X>x","X<x","Total")
row.names(df) < c("Y<y","Y>y", "Total")
df %>% kable(caption = "Table of Probabilities") %>% kable_styling("striped", full_width = TRUE)
X>x  X<x  Total  

Y<y  1222  1278  2500 
Y>y  3778  3722  7500 
Total  5000  5000  10000 
Let’s use the table to locate P(X>x and Y>y) = 3756/10000.
df[2,1]/df[3,3]
## [1] 0.3778
Now, let’s find P(X>x)P(Y>y), P(X>x) = 5000/10000, P(Y>y) = 7500/10000.
#P(X>x)
df[3,1]/df[3,3]
## [1] 0.5
#P(Y>y)
df[2,3]/df[3,3]
## [1] 0.75
#P(X>x)P(Y>y)
(df[3,1]/df[3,3]) * (df[2,3]/df[3,3])
## [1] 0.375
We conclude that the probabilities are independant since P(X>x and Y>y)=P(X>x)P(Y>y).
Check to see if independence holds by using Fisher’s Exact Test and the Chi Square Test. What is the difference between the two? Which is most appropriate?
Fisher’s Exact Test.
fisher.test(df, simulate.p.value=TRUE)
##
## Fisher's Exact Test for Count Data with simulated pvalue (based
## on 2000 replicates)
##
## data: df
## pvalue = 0.7966
## alternative hypothesis: two.sided
pvalue is close to 1, we don’t reject the null hypothesis and conclude that these variables are independent.
Chi Square Test.
chisq.test(df)
##
## Pearson's Chisquared test
##
## data: df
## Xsquared = 1.6725, df = 4, pvalue = 0.7957
A test statistic close to 0 and a pvalue close to 1, we don’t reject the null hypothesis and conclude that these variables are independent.
Difference between Fisher’s Exact Test and the Chi Square Test:
Fisher’s exact test, always gives an exact P value and works fine with small sample sizes. Most statistical books advise using it instead of chisquare test. Chi Square Rest is very accurate with large values. Fisher’s Exact test is more appropriate.
You are to register for Kaggle.com (free) and compete in the House Prices: Advanced Regression Techniques competition. https://www.kaggle.com/c/housepricesadvancedregressiontechniques. I want you to do the following.
Provide univariate descriptive statistics and appropriate plots for the training data set. Provide a scatterplot matrix for at least two of the independent variables and the dependent variable. Derive a correlation matrix for any three quantitative variables in the dataset. Test the hypotheses that the correlations between each pairwise set of variables is 0 and provide an 80% confidence interval. Discuss the meaning of your analysis. Would you be worried about familywise error? Why or why not?
train < read.csv('train.csv', sep = ',', header = T, stringsAsFactors = F)
test < read.csv("test.csv", sep = ',', header = T, stringsAsFactors = F)
Let’s view the head of Train data.
head(train,20) %>% kable(caption = "Train") %>% kable_styling("striped", full_width = TRUE) %>% scroll_box("width:400px")
Id  MSSubClass  MSZoning  LotFrontage  LotArea  Street  Alley  LotShape  LandContour  Utilities  LotConfig  LandSlope  Neighborhood  Condition1  Condition2  BldgType  HouseStyle  OverallQual  OverallCond  YearBuilt  YearRemodAdd  RoofStyle  RoofMatl  Exterior1st  Exterior2nd  MasVnrType  MasVnrArea  ExterQual  ExterCond  Foundation  BsmtQual  BsmtCond  BsmtExposure  BsmtFinType1  BsmtFinSF1  BsmtFinType2  BsmtFinSF2  BsmtUnfSF  TotalBsmtSF  Heating  HeatingQC  CentralAir  Electrical  X1stFlrSF  X2ndFlrSF  LowQualFinSF  GrLivArea  BsmtFullBath  BsmtHalfBath  FullBath  HalfBath  BedroomAbvGr  KitchenAbvGr  KitchenQual  TotRmsAbvGrd  Functional  Fireplaces  FireplaceQu  GarageType  GarageYrBlt  GarageFinish  GarageCars  GarageArea  GarageQual  GarageCond  PavedDrive  WoodDeckSF  OpenPorchSF  EnclosedPorch  X3SsnPorch  ScreenPorch  PoolArea  PoolQC  Fence  MiscFeature  MiscVal  MoSold  YrSold  SaleType  SaleCondition  SalePrice 

1  60  RL  65  8450  Pave  NA  Reg  Lvl  AllPub  Inside  Gtl  CollgCr  Norm  Norm  1Fam  2Story  7  5  2003  2003  Gable  CompShg  VinylSd  VinylSd  BrkFace  196  Gd  TA  PConc  Gd  TA  No  GLQ  706  Unf  0  150  856  GasA  Ex  Y  SBrkr  856  854  0  1710  1  0  2  1  3  1  Gd  8  Typ  0  NA  Attchd  2003  RFn  2  548  TA  TA  Y  0  61  0  0  0  0  NA  NA  NA  0  2  2008  WD  Normal  208500 
2  20  RL  80  9600  Pave  NA  Reg  Lvl  AllPub  FR2  Gtl  Veenker  Feedr  Norm  1Fam  1Story  6  8  1976  1976  Gable  CompShg  MetalSd  MetalSd  None  0  TA  TA  CBlock  Gd  TA  Gd  ALQ  978  Unf  0  284  1262  GasA  Ex  Y  SBrkr  1262  0  0  1262  0  1  2  0  3  1  TA  6  Typ  1  TA  Attchd  1976  RFn  2  460  TA  TA  Y  298  0  0  0  0  0  NA  NA  NA  0  5  2007  WD  Normal  181500 
3  60  RL  68  11250  Pave  NA  IR1  Lvl  AllPub  Inside  Gtl  CollgCr  Norm  Norm  1Fam  2Story  7  5  2001  2002  Gable  CompShg  VinylSd  VinylSd  BrkFace  162  Gd  TA  PConc  Gd  TA  Mn  GLQ  486  Unf  0  434  920  GasA  Ex  Y  SBrkr  920  866  0  1786  1  0  2  1  3  1  Gd  6  Typ  1  TA  Attchd  2001  RFn  2  608  TA  TA  Y  0  42  0  0  0  0  NA  NA  NA  0  9  2008  WD  Normal  223500 
4  70  RL  60  9550  Pave  NA  IR1  Lvl  AllPub  Corner  Gtl  Crawfor  Norm  Norm  1Fam  2Story  7  5  1915  1970  Gable  CompShg  Wd Sdng  Wd Shng  None  0  TA  TA  BrkTil  TA  Gd  No  ALQ  216  Unf  0  540  756  GasA  Gd  Y  SBrkr  961  756  0  1717  1  0  1  0  3  1  Gd  7  Typ  1  Gd  Detchd  1998  Unf  3  642  TA  TA  Y  0  35  272  0  0  0  NA  NA  NA  0  2  2006  WD  Abnorml  140000 
5  60  RL  84  14260  Pave  NA  IR1  Lvl  AllPub  FR2  Gtl  NoRidge  Norm  Norm  1Fam  2Story  8  5  2000  2000  Gable  CompShg  VinylSd  VinylSd  BrkFace  350  Gd  TA  PConc  Gd  TA  Av  GLQ  655  Unf  0  490  1145  GasA  Ex  Y  SBrkr  1145  1053  0  2198  1  0  2  1  4  1  Gd  9  Typ  1  TA  Attchd  2000  RFn  3  836  TA  TA  Y  192  84  0  0  0  0  NA  NA  NA  0  12  2008  WD  Normal  250000 
6  50  RL  85  14115  Pave  NA  IR1  Lvl  AllPub  Inside  Gtl  Mitchel  Norm  Norm  1Fam  1.5Fin  5  5  1993  1995  Gable  CompShg  VinylSd  VinylSd  None  0  TA  TA  Wood  Gd  TA  No  GLQ  732  Unf  0  64  796  GasA  Ex  Y  SBrkr  796  566  0  1362  1  0  1  1  1  1  TA  5  Typ  0  NA  Attchd  1993  Unf  2  480  TA  TA  Y  40  30  0  320  0  0  NA  MnPrv  Shed  700  10  2009  WD  Normal  143000 
7  20  RL  75  10084  Pave  NA  Reg  Lvl  AllPub  Inside  Gtl  Somerst  Norm  Norm  1Fam  1Story  8  5  2004  2005  Gable  CompShg  VinylSd  VinylSd  Stone  186  Gd  TA  PConc  Ex  TA  Av  GLQ  1369  Unf  0  317  1686  GasA  Ex  Y  SBrkr  1694  0  0  1694  1  0  2  0  3  1  Gd  7  Typ  1  Gd  Attchd  2004  RFn  2  636  TA  TA  Y  255  57  0  0  0  0  NA  NA  NA  0  8  2007  WD  Normal  307000 
8  60  RL  NA  10382  Pave  NA  IR1  Lvl  AllPub  Corner  Gtl  NWAmes  PosN  Norm  1Fam  2Story  7  6  1973  1973  Gable  CompShg  HdBoard  HdBoard  Stone  240  TA  TA  CBlock  Gd  TA  Mn  ALQ  859  BLQ  32  216  1107  GasA  Ex  Y  SBrkr  1107  983  0  2090  1  0  2  1  3  1  TA  7  Typ  2  TA  Attchd  1973  RFn  2  484  TA  TA  Y  235  204  228  0  0  0  NA  NA  Shed  350  11  2009  WD  Normal  200000 
9  50  RM  51  6120  Pave  NA  Reg  Lvl  AllPub  Inside  Gtl  OldTown  Artery  Norm  1Fam  1.5Fin  7  5  1931  1950  Gable  CompShg  BrkFace  Wd Shng  None  0  TA  TA  BrkTil  TA  TA  No  Unf  0  Unf  0  952  952  GasA  Gd  Y  FuseF  1022  752  0  1774  0  0  2  0  2  2  TA  8  Min1  2  TA  Detchd  1931  Unf  2  468  Fa  TA  Y  90  0  205  0  0  0  NA  NA  NA  0  4  2008  WD  Abnorml  129900 
10  190  RL  50  7420  Pave  NA  Reg  Lvl  AllPub  Corner  Gtl  BrkSide  Artery  Artery  2fmCon  1.5Unf  5  6  1939  1950  Gable  CompShg  MetalSd  MetalSd  None  0  TA  TA  BrkTil  TA  TA  No  GLQ  851  Unf  0  140  991  GasA  Ex  Y  SBrkr  1077  0  0  1077  1  0  1  0  2  2  TA  5  Typ  2  TA  Attchd  1939  RFn  1  205  Gd  TA  Y  0  4  0  0  0  0  NA  NA  NA  0  1  2008  WD  Normal  118000 
11  20  RL  70  11200  Pave  NA  Reg  Lvl  AllPub  Inside  Gtl  Sawyer  Norm  Norm  1Fam  1Story  5  5  1965  1965  Hip  CompShg  HdBoard  HdBoard  None  0  TA  TA  CBlock  TA  TA  No  Rec  906  Unf  0  134  1040  GasA  Ex  Y  SBrkr  1040  0  0  1040  1  0  1  0  3  1  TA  5  Typ  0  NA  Detchd  1965  Unf  1  384  TA  TA  Y  0  0  0  0  0  0  NA  NA  NA  0  2  2008  WD  Normal  129500 
12  60  RL  85  11924  Pave  NA  IR1  Lvl  AllPub  Inside  Gtl  NridgHt  Norm  Norm  1Fam  2Story  9  5  2005  2006  Hip  CompShg  WdShing  Wd Shng  Stone  286  Ex  TA  PConc  Ex  TA  No  GLQ  998  Unf  0  177  1175  GasA  Ex  Y  SBrkr  1182  1142  0  2324  1  0  3  0  4  1  Ex  11  Typ  2  Gd  BuiltIn  2005  Fin  3  736  TA  TA  Y  147  21  0  0  0  0  NA  NA  NA  0  7  2006  New  Partial  345000 
13  20  RL  NA  12968  Pave  NA  IR2  Lvl  AllPub  Inside  Gtl  Sawyer  Norm  Norm  1Fam  1Story  5  6  1962  1962  Hip  CompShg  HdBoard  Plywood  None  0  TA  TA  CBlock  TA  TA  No  ALQ  737  Unf  0  175  912  GasA  TA  Y  SBrkr  912  0  0  912  1  0  1  0  2  1  TA  4  Typ  0  NA  Detchd  1962  Unf  1  352  TA  TA  Y  140  0  0  0  176  0  NA  NA  NA  0  9  2008  WD  Normal  144000 
14  20  RL  91  10652  Pave  NA  IR1  Lvl  AllPub  Inside  Gtl  CollgCr  Norm  Norm  1Fam  1Story  7  5  2006  2007  Gable  CompShg  VinylSd  VinylSd  Stone  306  Gd  TA  PConc  Gd  TA  Av  Unf  0  Unf  0  1494  1494  GasA  Ex  Y  SBrkr  1494  0  0  1494  0  0  2  0  3  1  Gd  7  Typ  1  Gd  Attchd  2006  RFn  3  840  TA  TA  Y  160  33  0  0  0  0  NA  NA  NA  0  8  2007  New  Partial  279500 
15  20  RL  NA  10920  Pave  NA  IR1  Lvl  AllPub  Corner  Gtl  NAmes  Norm  Norm  1Fam  1Story  6  5  1960  1960  Hip  CompShg  MetalSd  MetalSd  BrkFace  212  TA  TA  CBlock  TA  TA  No  BLQ  733  Unf  0  520  1253  GasA  TA  Y  SBrkr  1253  0  0  1253  1  0  1  1  2  1  TA  5  Typ  1  Fa  Attchd  1960  RFn  1  352  TA  TA  Y  0  213  176  0  0  0  NA  GdWo  NA  0  5  2008  WD  Normal  157000 
16  45  RM  51  6120  Pave  NA  Reg  Lvl  AllPub  Corner  Gtl  BrkSide  Norm  Norm  1Fam  1.5Unf  7  8  1929  2001  Gable  CompShg  Wd Sdng  Wd Sdng  None  0  TA  TA  BrkTil  TA  TA  No  Unf  0  Unf  0  832  832  GasA  Ex  Y  FuseA  854  0  0  854  0  0  1  0  2  1  TA  5  Typ  0  NA  Detchd  1991  Unf  2  576  TA  TA  Y  48  112  0  0  0  0  NA  GdPrv  NA  0  7  2007  WD  Normal  132000 
17  20  RL  NA  11241  Pave  NA  IR1  Lvl  AllPub  CulDSac  Gtl  NAmes  Norm  Norm  1Fam  1Story  6  7  1970  1970  Gable  CompShg  Wd Sdng  Wd Sdng  BrkFace  180  TA  TA  CBlock  TA  TA  No  ALQ  578  Unf  0  426  1004  GasA  Ex  Y  SBrkr  1004  0  0  1004  1  0  1  0  2  1  TA  5  Typ  1  TA  Attchd  1970  Fin  2  480  TA  TA  Y  0  0  0  0  0  0  NA  NA  Shed  700  3  2010  WD  Normal  149000 
18  90  RL  72  10791  Pave  NA  Reg  Lvl  AllPub  Inside  Gtl  Sawyer  Norm  Norm  Duplex  1Story  4  5  1967  1967  Gable  CompShg  MetalSd  MetalSd  None  0  TA  TA  Slab  NA  NA  NA  NA  0  NA  0  0  0  GasA  TA  Y  SBrkr  1296  0  0  1296  0  0  2  0  2  2  TA  6  Typ  0  NA  CarPort  1967  Unf  2  516  TA  TA  Y  0  0  0  0  0  0  NA  NA  Shed  500  10  2006  WD  Normal  90000 
19  20  RL  66  13695  Pave  NA  Reg  Lvl  AllPub  Inside  Gtl  SawyerW  RRAe  Norm  1Fam  1Story  5  5  2004  2004  Gable  CompShg  VinylSd  VinylSd  None  0  TA  TA  PConc  TA  TA  No  GLQ  646  Unf  0  468  1114  GasA  Ex  Y  SBrkr  1114  0  0  1114  1  0  1  1  3  1  Gd  6  Typ  0  NA  Detchd  2004  Unf  2  576  TA  TA  Y  0  102  0  0  0  0  NA  NA  NA  0  6  2008  WD  Normal  159000 
20  20  RL  70  7560  Pave  NA  Reg  Lvl  AllPub  Inside  Gtl  NAmes  Norm  Norm  1Fam  1Story  5  6  1958  1965  Hip  CompShg  BrkFace  Plywood  None  0  TA  TA  CBlock  TA  TA  No  LwQ  504  Unf  0  525  1029  GasA  TA  Y  SBrkr  1339  0  0  1339  0  0  1  0  3  1  TA  6  Min1  0  NA  Attchd  1958  Unf  1  294  TA  TA  Y  0  0  0  0  0  0  NA  MnPrv  NA  0  5  2009  COD  Abnorml  139000 
Next step is to review a summary of our Train dataset to get a better idea of the type of variables we have available to us for analysis.
summary(train) %>% kable(caption = "Train Summary All Columns") %>% kable_styling("striped", full_width = TRUE) %>% scroll_box("width:400px")
 MSSubClass  MSZoning  LotFrontage 


 LotShape  LandContour  Utilities  LotConfig  LandSlope  Neighborhood  Condition1  Condition2  BldgType  HouseStyle  OverallQual  OverallCond  YearBuilt  YearRemodAdd  RoofStyle  RoofMatl  Exterior1st  Exterior2nd  MasVnrType  MasVnrArea  ExterQual  ExterCond  Foundation  BsmtQual  BsmtCond  BsmtExposure  BsmtFinType1  BsmtFinSF1  BsmtFinType2  BsmtFinSF2  BsmtUnfSF  TotalBsmtSF  Heating  HeatingQC  CentralAir  Electrical  X1stFlrSF  X2ndFlrSF  LowQualFinSF  GrLivArea  BsmtFullBath  BsmtHalfBath 

 BedroomAbvGr  KitchenAbvGr  KitchenQual  TotRmsAbvGrd  Functional  Fireplaces  FireplaceQu  GarageType  GarageYrBlt  GarageFinish  GarageCars  GarageArea  GarageQual  GarageCond  PavedDrive  WoodDeckSF  OpenPorchSF  EnclosedPorch  X3SsnPorch  ScreenPorch 


 MiscFeature 


 SaleType  SaleCondition  SalePrice  

Min. : 1.0  Min. : 20.0  Length:1460  Min. : 21.00  Min. : 1300  Length:1460  Length:1460  Length:1460  Length:1460  Length:1460  Length:1460  Length:1460  Length:1460  Length:1460  Length:1460  Length:1460  Length:1460  Min. : 1.000  Min. :1.000  Min. :1872  Min. :1950  Length:1460  Length:1460  Length:1460  Length:1460  Length:1460  Min. : 0.0  Length:1460  Length:1460  Length:1460  Length:1460  Length:1460  Length:1460  Length:1460  Min. : 0.0  Length:1460  Min. : 0.00  Min. : 0.0  Min. : 0.0  Length:1460  Length:1460  Length:1460  Length:1460  Min. : 334  Min. : 0  Min. : 0.000  Min. : 334  Min. :0.0000  Min. :0.00000  Min. :0.000  Min. :0.0000  Min. :0.000  Min. :0.000  Length:1460  Min. : 2.000  Length:1460  Min. :0.000  Length:1460  Length:1460  Min. :1900  Length:1460  Min. :0.000  Min. : 0.0  Length:1460  Length:1460  Length:1460  Min. : 0.00  Min. : 0.00  Min. : 0.00  Min. : 0.00  Min. : 0.00  Min. : 0.000  Length:1460  Length:1460  Length:1460  Min. : 0.00  Min. : 1.000  Min. :2006  Length:1460  Length:1460  Min. : 34900  
1st Qu.: 365.8  1st Qu.: 20.0  Class :character  1st Qu.: 59.00  1st Qu.: 7554  Class :character  Class :character  Class :character  Class :character  Class :character  Class :character  Class :character  Class :character  Class :character  Class :character  Class :character  Class :character  1st Qu.: 5.000  1st Qu.:5.000  1st Qu.:1954  1st Qu.:1967  Class :character  Class :character  Class :character  Class :character  Class :character  1st Qu.: 0.0  Class :character  Class :character  Class :character  Class :character  Class :character  Class :character  Class :character  1st Qu.: 0.0  Class :character  1st Qu.: 0.00  1st Qu.: 223.0  1st Qu.: 795.8  Class :character  Class :character  Class :character  Class :character  1st Qu.: 882  1st Qu.: 0  1st Qu.: 0.000  1st Qu.:1130  1st Qu.:0.0000  1st Qu.:0.00000  1st Qu.:1.000  1st Qu.:0.0000  1st Qu.:2.000  1st Qu.:1.000  Class :character  1st Qu.: 5.000  Class :character  1st Qu.:0.000  Class :character  Class :character  1st Qu.:1961  Class :character  1st Qu.:1.000  1st Qu.: 334.5  Class :character  Class :character  Class :character  1st Qu.: 0.00  1st Qu.: 0.00  1st Qu.: 0.00  1st Qu.: 0.00  1st Qu.: 0.00  1st Qu.: 0.000  Class :character  Class :character  Class :character  1st Qu.: 0.00  1st Qu.: 5.000  1st Qu.:2007  Class :character  Class :character  1st Qu.:129975  
Median : 730.5  Median : 50.0  Mode :character  Median : 69.00  Median : 9478  Mode :character  Mode :character  Mode :character  Mode :character  Mode :character  Mode :character  Mode :character  Mode :character  Mode :character  Mode :character  Mode :character  Mode :character  Median : 6.000  Median :5.000  Median :1973  Median :1994  Mode :character  Mode :character  Mode :character  Mode :character  Mode :character  Median : 0.0  Mode :character  Mode :character  Mode :character  Mode :character  Mode :character  Mode :character  Mode :character  Median : 383.5  Mode :character  Median : 0.00  Median : 477.5  Median : 991.5  Mode :character  Mode :character  Mode :character  Mode :character  Median :1087  Median : 0  Median : 0.000  Median :1464  Median :0.0000  Median :0.00000  Median :2.000  Median :0.0000  Median :3.000  Median :1.000  Mode :character  Median : 6.000  Mode :character  Median :1.000  Mode :character  Mode :character  Median :1980  Mode :character  Median :2.000  Median : 480.0  Mode :character  Mode :character  Mode :character  Median : 0.00  Median : 25.00  Median : 0.00  Median : 0.00  Median : 0.00  Median : 0.000  Mode :character  Mode :character  Mode :character  Median : 0.00  Median : 6.000  Median :2008  Mode :character  Mode :character  Median :163000  
Mean : 730.5  Mean : 56.9  NA  Mean : 70.05  Mean : 10517  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  Mean : 6.099  Mean :5.575  Mean :1971  Mean :1985  NA  NA  NA  NA  NA  Mean : 103.7  NA  NA  NA  NA  NA  NA  NA  Mean : 443.6  NA  Mean : 46.55  Mean : 567.2  Mean :1057.4  NA  NA  NA  NA  Mean :1163  Mean : 347  Mean : 5.845  Mean :1515  Mean :0.4253  Mean :0.05753  Mean :1.565  Mean :0.3829  Mean :2.866  Mean :1.047  NA  Mean : 6.518  NA  Mean :0.613  NA  NA  Mean :1979  NA  Mean :1.767  Mean : 473.0  NA  NA  NA  Mean : 94.24  Mean : 46.66  Mean : 21.95  Mean : 3.41  Mean : 15.06  Mean : 2.759  NA  NA  NA  Mean : 43.49  Mean : 6.322  Mean :2008  NA  NA  Mean :180921  
3rd Qu.:1095.2  3rd Qu.: 70.0  NA  3rd Qu.: 80.00  3rd Qu.: 11602  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  3rd Qu.: 7.000  3rd Qu.:6.000  3rd Qu.:2000  3rd Qu.:2004  NA  NA  NA  NA  NA  3rd Qu.: 166.0  NA  NA  NA  NA  NA  NA  NA  3rd Qu.: 712.2  NA  3rd Qu.: 0.00  3rd Qu.: 808.0  3rd Qu.:1298.2  NA  NA  NA  NA  3rd Qu.:1391  3rd Qu.: 728  3rd Qu.: 0.000  3rd Qu.:1777  3rd Qu.:1.0000  3rd Qu.:0.00000  3rd Qu.:2.000  3rd Qu.:1.0000  3rd Qu.:3.000  3rd Qu.:1.000  NA  3rd Qu.: 7.000  NA  3rd Qu.:1.000  NA  NA  3rd Qu.:2002  NA  3rd Qu.:2.000  3rd Qu.: 576.0  NA  NA  NA  3rd Qu.:168.00  3rd Qu.: 68.00  3rd Qu.: 0.00  3rd Qu.: 0.00  3rd Qu.: 0.00  3rd Qu.: 0.000  NA  NA  NA  3rd Qu.: 0.00  3rd Qu.: 8.000  3rd Qu.:2009  NA  NA  3rd Qu.:214000  
Max. :1460.0  Max. :190.0  NA  Max. :313.00  Max. :215245  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  Max. :10.000  Max. :9.000  Max. :2010  Max. :2010  NA  NA  NA  NA  NA  Max. :1600.0  NA  NA  NA  NA  NA  NA  NA  Max. :5644.0  NA  Max. :1474.00  Max. :2336.0  Max. :6110.0  NA  NA  NA  NA  Max. :4692  Max. :2065  Max. :572.000  Max. :5642  Max. :3.0000  Max. :2.00000  Max. :3.000  Max. :2.0000  Max. :8.000  Max. :3.000  NA  Max. :14.000  NA  Max. :3.000  NA  NA  Max. :2010  NA  Max. :4.000  Max. :1418.0  NA  NA  NA  Max. :857.00  Max. :547.00  Max. :552.00  Max. :508.00  Max. :480.00  Max. :738.000  NA  NA  NA  Max. :15500.00  Max. :12.000  Max. :2010  NA  NA  Max. :755000  
NA  NA  NA  NA’s :259  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA’s :8  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA’s :81  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA 
summary(train$SalePrice)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 34900 129975 163000 180921 214000 755000
hist(train$SalePrice, main="Histogram of Sale Price", xlab="Sale Price")