#install.packages("GGally")
#install.packages('MASS')
suppressMessages(library(kableExtra))
suppressMessages(library(GGally))
suppressMessages(library(ggplot2))
suppressMessages(library(pracma))
## Warning: package 'pracma' was built under R version 3.5.2
suppressMessages(library(MASS))
## Warning: package 'MASS' was built under R version 3.5.2
Using R, generate a random variable X that has 10,000 random uniform numbers from 1 to N, where N can be any number of your choosing greater than or equal to 6. Then generate a random variable Y that has 10,000 random normal numbers with a mean of μ=σ=(N+1)/2.
set.seed(101)
N <- 6
X <- runif(10000, 1, N)
Let’s take a look at the distribution of X
hist(X)
mu <- (N+1)/2
Y <- rnorm(10000,mean=mu)
Let’s take a look at the distribution of X
hist(Y)
Calculate as a minimum the below probabilities a through c. Assume the small letter “x” is estimated as the median of the X variable, and the small letter “y” is estimated as the 1st quartile of the Y variable. Interpret the meaning of all probabilities.
x <- median(X)
x
## [1] 3.487115
y <- quantile(Y, 0.25)
y
## 25%
## 2.829464
We need to calculate the P(X>x and X>y) and divide that by P(X>y)
#P(X>x and X>y)
P1<-sum(X>x & X>y)/10000
#P(X>y)
P2<-sum(X>y)/10000
round(P1/P2,3)
## [1] 0.785
#P(X>x and Y>y)
P3<-sum(X>x & Y>y)/10000
round(P3,3)
## [1] 0.378
#P(X<x and X>y)
P4<-sum((X<x) & (X>y))/10000
#P(X>y)
P2<-sum(X>y)/10000
round(P4/P2,3)
## [1] 0.215
Investigate whether P(X>x and Y>y)=P(X>x)P(Y>y) by building a table and evaluating the marginal and joint probabilities.
df<-data.frame("Xgx" =c(sum(X>x & Y<y), sum(X>x & Y>y), sum(X>x & Y<y)+sum(X>x & Y>y)),
"Xlx" = c(sum(X<x & Y<y), sum(X<x & Y>y), sum(X<x & Y<y)+sum(X<x & Y>y)),
"Total" = c(sum(X>x & Y<y)+sum(X<x & Y<y), sum(X>x & Y>y)+sum(X<x & Y>y), sum(X>x & Y<y)+sum(X>x & Y>y)+ sum(X<x & Y<y)+sum(X<x & Y>y)))
names(df) <- c("X>x","X<x","Total")
row.names(df) <- c("Y<y","Y>y", "Total")
df %>% kable(caption = "Table of Probabilities") %>% kable_styling("striped", full_width = TRUE)
X>x | X<x | Total | |
---|---|---|---|
Y<y | 1222 | 1278 | 2500 |
Y>y | 3778 | 3722 | 7500 |
Total | 5000 | 5000 | 10000 |
Let’s use the table to locate P(X>x and Y>y) = 3756/10000.
df[2,1]/df[3,3]
## [1] 0.3778
Now, let’s find P(X>x)P(Y>y), P(X>x) = 5000/10000, P(Y>y) = 7500/10000.
#P(X>x)
df[3,1]/df[3,3]
## [1] 0.5
#P(Y>y)
df[2,3]/df[3,3]
## [1] 0.75
#P(X>x)P(Y>y)
(df[3,1]/df[3,3]) * (df[2,3]/df[3,3])
## [1] 0.375
We conclude that the probabilities are independant since P(X>x and Y>y)=P(X>x)P(Y>y).
Check to see if independence holds by using Fisher’s Exact Test and the Chi Square Test. What is the difference between the two? Which is most appropriate?
Fisher’s Exact Test.
fisher.test(df, simulate.p.value=TRUE)
##
## Fisher's Exact Test for Count Data with simulated p-value (based
## on 2000 replicates)
##
## data: df
## p-value = 0.7966
## alternative hypothesis: two.sided
p-value is close to 1, we don’t reject the null hypothesis and conclude that these variables are independent.
Chi Square Test.
chisq.test(df)
##
## Pearson's Chi-squared test
##
## data: df
## X-squared = 1.6725, df = 4, p-value = 0.7957
A test statistic close to 0 and a p-value close to 1, we don’t reject the null hypothesis and conclude that these variables are independent.
Difference between Fisher’s Exact Test and the Chi Square Test:
Fisher’s exact test, always gives an exact P value and works fine with small sample sizes. Most statistical books advise using it instead of chi-square test. Chi Square Rest is very accurate with large values. Fisher’s Exact test is more appropriate.
You are to register for Kaggle.com (free) and compete in the House Prices: Advanced Regression Techniques competition. https://www.kaggle.com/c/house-prices-advanced-regression-techniques. I want you to do the following.
Provide univariate descriptive statistics and appropriate plots for the training data set. Provide a scatterplot matrix for at least two of the independent variables and the dependent variable. Derive a correlation matrix for any three quantitative variables in the dataset. Test the hypotheses that the correlations between each pairwise set of variables is 0 and provide an 80% confidence interval. Discuss the meaning of your analysis. Would you be worried about familywise error? Why or why not?
train <- read.csv('train.csv', sep = ',', header = T, stringsAsFactors = F)
test <- read.csv("test.csv", sep = ',', header = T, stringsAsFactors = F)
Let’s view the head of Train data.
head(train,20) %>% kable(caption = "Train") %>% kable_styling("striped", full_width = TRUE) %>% scroll_box("width:400px")
Id | MSSubClass | MSZoning | LotFrontage | LotArea | Street | Alley | LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | X1stFlrSF | X2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath | FullBath | HalfBath | BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | X3SsnPorch | ScreenPorch | PoolArea | PoolQC | Fence | MiscFeature | MiscVal | MoSold | YrSold | SaleType | SaleCondition | SalePrice |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 60 | RL | 65 | 8450 | Pave | NA | Reg | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2003 | 2003 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 196 | Gd | TA | PConc | Gd | TA | No | GLQ | 706 | Unf | 0 | 150 | 856 | GasA | Ex | Y | SBrkr | 856 | 854 | 0 | 1710 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 8 | Typ | 0 | NA | Attchd | 2003 | RFn | 2 | 548 | TA | TA | Y | 0 | 61 | 0 | 0 | 0 | 0 | NA | NA | NA | 0 | 2 | 2008 | WD | Normal | 208500 |
2 | 20 | RL | 80 | 9600 | Pave | NA | Reg | Lvl | AllPub | FR2 | Gtl | Veenker | Feedr | Norm | 1Fam | 1Story | 6 | 8 | 1976 | 1976 | Gable | CompShg | MetalSd | MetalSd | None | 0 | TA | TA | CBlock | Gd | TA | Gd | ALQ | 978 | Unf | 0 | 284 | 1262 | GasA | Ex | Y | SBrkr | 1262 | 0 | 0 | 1262 | 0 | 1 | 2 | 0 | 3 | 1 | TA | 6 | Typ | 1 | TA | Attchd | 1976 | RFn | 2 | 460 | TA | TA | Y | 298 | 0 | 0 | 0 | 0 | 0 | NA | NA | NA | 0 | 5 | 2007 | WD | Normal | 181500 |
3 | 60 | RL | 68 | 11250 | Pave | NA | IR1 | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 2Story | 7 | 5 | 2001 | 2002 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 162 | Gd | TA | PConc | Gd | TA | Mn | GLQ | 486 | Unf | 0 | 434 | 920 | GasA | Ex | Y | SBrkr | 920 | 866 | 0 | 1786 | 1 | 0 | 2 | 1 | 3 | 1 | Gd | 6 | Typ | 1 | TA | Attchd | 2001 | RFn | 2 | 608 | TA | TA | Y | 0 | 42 | 0 | 0 | 0 | 0 | NA | NA | NA | 0 | 9 | 2008 | WD | Normal | 223500 |
4 | 70 | RL | 60 | 9550 | Pave | NA | IR1 | Lvl | AllPub | Corner | Gtl | Crawfor | Norm | Norm | 1Fam | 2Story | 7 | 5 | 1915 | 1970 | Gable | CompShg | Wd Sdng | Wd Shng | None | 0 | TA | TA | BrkTil | TA | Gd | No | ALQ | 216 | Unf | 0 | 540 | 756 | GasA | Gd | Y | SBrkr | 961 | 756 | 0 | 1717 | 1 | 0 | 1 | 0 | 3 | 1 | Gd | 7 | Typ | 1 | Gd | Detchd | 1998 | Unf | 3 | 642 | TA | TA | Y | 0 | 35 | 272 | 0 | 0 | 0 | NA | NA | NA | 0 | 2 | 2006 | WD | Abnorml | 140000 |
5 | 60 | RL | 84 | 14260 | Pave | NA | IR1 | Lvl | AllPub | FR2 | Gtl | NoRidge | Norm | Norm | 1Fam | 2Story | 8 | 5 | 2000 | 2000 | Gable | CompShg | VinylSd | VinylSd | BrkFace | 350 | Gd | TA | PConc | Gd | TA | Av | GLQ | 655 | Unf | 0 | 490 | 1145 | GasA | Ex | Y | SBrkr | 1145 | 1053 | 0 | 2198 | 1 | 0 | 2 | 1 | 4 | 1 | Gd | 9 | Typ | 1 | TA | Attchd | 2000 | RFn | 3 | 836 | TA | TA | Y | 192 | 84 | 0 | 0 | 0 | 0 | NA | NA | NA | 0 | 12 | 2008 | WD | Normal | 250000 |
6 | 50 | RL | 85 | 14115 | Pave | NA | IR1 | Lvl | AllPub | Inside | Gtl | Mitchel | Norm | Norm | 1Fam | 1.5Fin | 5 | 5 | 1993 | 1995 | Gable | CompShg | VinylSd | VinylSd | None | 0 | TA | TA | Wood | Gd | TA | No | GLQ | 732 | Unf | 0 | 64 | 796 | GasA | Ex | Y | SBrkr | 796 | 566 | 0 | 1362 | 1 | 0 | 1 | 1 | 1 | 1 | TA | 5 | Typ | 0 | NA | Attchd | 1993 | Unf | 2 | 480 | TA | TA | Y | 40 | 30 | 0 | 320 | 0 | 0 | NA | MnPrv | Shed | 700 | 10 | 2009 | WD | Normal | 143000 |
7 | 20 | RL | 75 | 10084 | Pave | NA | Reg | Lvl | AllPub | Inside | Gtl | Somerst | Norm | Norm | 1Fam | 1Story | 8 | 5 | 2004 | 2005 | Gable | CompShg | VinylSd | VinylSd | Stone | 186 | Gd | TA | PConc | Ex | TA | Av | GLQ | 1369 | Unf | 0 | 317 | 1686 | GasA | Ex | Y | SBrkr | 1694 | 0 | 0 | 1694 | 1 | 0 | 2 | 0 | 3 | 1 | Gd | 7 | Typ | 1 | Gd | Attchd | 2004 | RFn | 2 | 636 | TA | TA | Y | 255 | 57 | 0 | 0 | 0 | 0 | NA | NA | NA | 0 | 8 | 2007 | WD | Normal | 307000 |
8 | 60 | RL | NA | 10382 | Pave | NA | IR1 | Lvl | AllPub | Corner | Gtl | NWAmes | PosN | Norm | 1Fam | 2Story | 7 | 6 | 1973 | 1973 | Gable | CompShg | HdBoard | HdBoard | Stone | 240 | TA | TA | CBlock | Gd | TA | Mn | ALQ | 859 | BLQ | 32 | 216 | 1107 | GasA | Ex | Y | SBrkr | 1107 | 983 | 0 | 2090 | 1 | 0 | 2 | 1 | 3 | 1 | TA | 7 | Typ | 2 | TA | Attchd | 1973 | RFn | 2 | 484 | TA | TA | Y | 235 | 204 | 228 | 0 | 0 | 0 | NA | NA | Shed | 350 | 11 | 2009 | WD | Normal | 200000 |
9 | 50 | RM | 51 | 6120 | Pave | NA | Reg | Lvl | AllPub | Inside | Gtl | OldTown | Artery | Norm | 1Fam | 1.5Fin | 7 | 5 | 1931 | 1950 | Gable | CompShg | BrkFace | Wd Shng | None | 0 | TA | TA | BrkTil | TA | TA | No | Unf | 0 | Unf | 0 | 952 | 952 | GasA | Gd | Y | FuseF | 1022 | 752 | 0 | 1774 | 0 | 0 | 2 | 0 | 2 | 2 | TA | 8 | Min1 | 2 | TA | Detchd | 1931 | Unf | 2 | 468 | Fa | TA | Y | 90 | 0 | 205 | 0 | 0 | 0 | NA | NA | NA | 0 | 4 | 2008 | WD | Abnorml | 129900 |
10 | 190 | RL | 50 | 7420 | Pave | NA | Reg | Lvl | AllPub | Corner | Gtl | BrkSide | Artery | Artery | 2fmCon | 1.5Unf | 5 | 6 | 1939 | 1950 | Gable | CompShg | MetalSd | MetalSd | None | 0 | TA | TA | BrkTil | TA | TA | No | GLQ | 851 | Unf | 0 | 140 | 991 | GasA | Ex | Y | SBrkr | 1077 | 0 | 0 | 1077 | 1 | 0 | 1 | 0 | 2 | 2 | TA | 5 | Typ | 2 | TA | Attchd | 1939 | RFn | 1 | 205 | Gd | TA | Y | 0 | 4 | 0 | 0 | 0 | 0 | NA | NA | NA | 0 | 1 | 2008 | WD | Normal | 118000 |
11 | 20 | RL | 70 | 11200 | Pave | NA | Reg | Lvl | AllPub | Inside | Gtl | Sawyer | Norm | Norm | 1Fam | 1Story | 5 | 5 | 1965 | 1965 | Hip | CompShg | HdBoard | HdBoard | None | 0 | TA | TA | CBlock | TA | TA | No | Rec | 906 | Unf | 0 | 134 | 1040 | GasA | Ex | Y | SBrkr | 1040 | 0 | 0 | 1040 | 1 | 0 | 1 | 0 | 3 | 1 | TA | 5 | Typ | 0 | NA | Detchd | 1965 | Unf | 1 | 384 | TA | TA | Y | 0 | 0 | 0 | 0 | 0 | 0 | NA | NA | NA | 0 | 2 | 2008 | WD | Normal | 129500 |
12 | 60 | RL | 85 | 11924 | Pave | NA | IR1 | Lvl | AllPub | Inside | Gtl | NridgHt | Norm | Norm | 1Fam | 2Story | 9 | 5 | 2005 | 2006 | Hip | CompShg | WdShing | Wd Shng | Stone | 286 | Ex | TA | PConc | Ex | TA | No | GLQ | 998 | Unf | 0 | 177 | 1175 | GasA | Ex | Y | SBrkr | 1182 | 1142 | 0 | 2324 | 1 | 0 | 3 | 0 | 4 | 1 | Ex | 11 | Typ | 2 | Gd | BuiltIn | 2005 | Fin | 3 | 736 | TA | TA | Y | 147 | 21 | 0 | 0 | 0 | 0 | NA | NA | NA | 0 | 7 | 2006 | New | Partial | 345000 |
13 | 20 | RL | NA | 12968 | Pave | NA | IR2 | Lvl | AllPub | Inside | Gtl | Sawyer | Norm | Norm | 1Fam | 1Story | 5 | 6 | 1962 | 1962 | Hip | CompShg | HdBoard | Plywood | None | 0 | TA | TA | CBlock | TA | TA | No | ALQ | 737 | Unf | 0 | 175 | 912 | GasA | TA | Y | SBrkr | 912 | 0 | 0 | 912 | 1 | 0 | 1 | 0 | 2 | 1 | TA | 4 | Typ | 0 | NA | Detchd | 1962 | Unf | 1 | 352 | TA | TA | Y | 140 | 0 | 0 | 0 | 176 | 0 | NA | NA | NA | 0 | 9 | 2008 | WD | Normal | 144000 |
14 | 20 | RL | 91 | 10652 | Pave | NA | IR1 | Lvl | AllPub | Inside | Gtl | CollgCr | Norm | Norm | 1Fam | 1Story | 7 | 5 | 2006 | 2007 | Gable | CompShg | VinylSd | VinylSd | Stone | 306 | Gd | TA | PConc | Gd | TA | Av | Unf | 0 | Unf | 0 | 1494 | 1494 | GasA | Ex | Y | SBrkr | 1494 | 0 | 0 | 1494 | 0 | 0 | 2 | 0 | 3 | 1 | Gd | 7 | Typ | 1 | Gd | Attchd | 2006 | RFn | 3 | 840 | TA | TA | Y | 160 | 33 | 0 | 0 | 0 | 0 | NA | NA | NA | 0 | 8 | 2007 | New | Partial | 279500 |
15 | 20 | RL | NA | 10920 | Pave | NA | IR1 | Lvl | AllPub | Corner | Gtl | NAmes | Norm | Norm | 1Fam | 1Story | 6 | 5 | 1960 | 1960 | Hip | CompShg | MetalSd | MetalSd | BrkFace | 212 | TA | TA | CBlock | TA | TA | No | BLQ | 733 | Unf | 0 | 520 | 1253 | GasA | TA | Y | SBrkr | 1253 | 0 | 0 | 1253 | 1 | 0 | 1 | 1 | 2 | 1 | TA | 5 | Typ | 1 | Fa | Attchd | 1960 | RFn | 1 | 352 | TA | TA | Y | 0 | 213 | 176 | 0 | 0 | 0 | NA | GdWo | NA | 0 | 5 | 2008 | WD | Normal | 157000 |
16 | 45 | RM | 51 | 6120 | Pave | NA | Reg | Lvl | AllPub | Corner | Gtl | BrkSide | Norm | Norm | 1Fam | 1.5Unf | 7 | 8 | 1929 | 2001 | Gable | CompShg | Wd Sdng | Wd Sdng | None | 0 | TA | TA | BrkTil | TA | TA | No | Unf | 0 | Unf | 0 | 832 | 832 | GasA | Ex | Y | FuseA | 854 | 0 | 0 | 854 | 0 | 0 | 1 | 0 | 2 | 1 | TA | 5 | Typ | 0 | NA | Detchd | 1991 | Unf | 2 | 576 | TA | TA | Y | 48 | 112 | 0 | 0 | 0 | 0 | NA | GdPrv | NA | 0 | 7 | 2007 | WD | Normal | 132000 |
17 | 20 | RL | NA | 11241 | Pave | NA | IR1 | Lvl | AllPub | CulDSac | Gtl | NAmes | Norm | Norm | 1Fam | 1Story | 6 | 7 | 1970 | 1970 | Gable | CompShg | Wd Sdng | Wd Sdng | BrkFace | 180 | TA | TA | CBlock | TA | TA | No | ALQ | 578 | Unf | 0 | 426 | 1004 | GasA | Ex | Y | SBrkr | 1004 | 0 | 0 | 1004 | 1 | 0 | 1 | 0 | 2 | 1 | TA | 5 | Typ | 1 | TA | Attchd | 1970 | Fin | 2 | 480 | TA | TA | Y | 0 | 0 | 0 | 0 | 0 | 0 | NA | NA | Shed | 700 | 3 | 2010 | WD | Normal | 149000 |
18 | 90 | RL | 72 | 10791 | Pave | NA | Reg | Lvl | AllPub | Inside | Gtl | Sawyer | Norm | Norm | Duplex | 1Story | 4 | 5 | 1967 | 1967 | Gable | CompShg | MetalSd | MetalSd | None | 0 | TA | TA | Slab | NA | NA | NA | NA | 0 | NA | 0 | 0 | 0 | GasA | TA | Y | SBrkr | 1296 | 0 | 0 | 1296 | 0 | 0 | 2 | 0 | 2 | 2 | TA | 6 | Typ | 0 | NA | CarPort | 1967 | Unf | 2 | 516 | TA | TA | Y | 0 | 0 | 0 | 0 | 0 | 0 | NA | NA | Shed | 500 | 10 | 2006 | WD | Normal | 90000 |
19 | 20 | RL | 66 | 13695 | Pave | NA | Reg | Lvl | AllPub | Inside | Gtl | SawyerW | RRAe | Norm | 1Fam | 1Story | 5 | 5 | 2004 | 2004 | Gable | CompShg | VinylSd | VinylSd | None | 0 | TA | TA | PConc | TA | TA | No | GLQ | 646 | Unf | 0 | 468 | 1114 | GasA | Ex | Y | SBrkr | 1114 | 0 | 0 | 1114 | 1 | 0 | 1 | 1 | 3 | 1 | Gd | 6 | Typ | 0 | NA | Detchd | 2004 | Unf | 2 | 576 | TA | TA | Y | 0 | 102 | 0 | 0 | 0 | 0 | NA | NA | NA | 0 | 6 | 2008 | WD | Normal | 159000 |
20 | 20 | RL | 70 | 7560 | Pave | NA | Reg | Lvl | AllPub | Inside | Gtl | NAmes | Norm | Norm | 1Fam | 1Story | 5 | 6 | 1958 | 1965 | Hip | CompShg | BrkFace | Plywood | None | 0 | TA | TA | CBlock | TA | TA | No | LwQ | 504 | Unf | 0 | 525 | 1029 | GasA | TA | Y | SBrkr | 1339 | 0 | 0 | 1339 | 0 | 0 | 1 | 0 | 3 | 1 | TA | 6 | Min1 | 0 | NA | Attchd | 1958 | Unf | 1 | 294 | TA | TA | Y | 0 | 0 | 0 | 0 | 0 | 0 | NA | MnPrv | NA | 0 | 5 | 2009 | COD | Abnorml | 139000 |
Next step is to review a summary of our Train dataset to get a better idea of the type of variables we have available to us for analysis.
summary(train) %>% kable(caption = "Train Summary All Columns") %>% kable_styling("striped", full_width = TRUE) %>% scroll_box("width:400px")
| MSSubClass | MSZoning | LotFrontage |
|
|
| LotShape | LandContour | Utilities | LotConfig | LandSlope | Neighborhood | Condition1 | Condition2 | BldgType | HouseStyle | OverallQual | OverallCond | YearBuilt | YearRemodAdd | RoofStyle | RoofMatl | Exterior1st | Exterior2nd | MasVnrType | MasVnrArea | ExterQual | ExterCond | Foundation | BsmtQual | BsmtCond | BsmtExposure | BsmtFinType1 | BsmtFinSF1 | BsmtFinType2 | BsmtFinSF2 | BsmtUnfSF | TotalBsmtSF | Heating | HeatingQC | CentralAir | Electrical | X1stFlrSF | X2ndFlrSF | LowQualFinSF | GrLivArea | BsmtFullBath | BsmtHalfBath |
|
| BedroomAbvGr | KitchenAbvGr | KitchenQual | TotRmsAbvGrd | Functional | Fireplaces | FireplaceQu | GarageType | GarageYrBlt | GarageFinish | GarageCars | GarageArea | GarageQual | GarageCond | PavedDrive | WoodDeckSF | OpenPorchSF | EnclosedPorch | X3SsnPorch | ScreenPorch |
|
|
| MiscFeature |
|
|
| SaleType | SaleCondition | SalePrice | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Min. : 1.0 | Min. : 20.0 | Length:1460 | Min. : 21.00 | Min. : 1300 | Length:1460 | Length:1460 | Length:1460 | Length:1460 | Length:1460 | Length:1460 | Length:1460 | Length:1460 | Length:1460 | Length:1460 | Length:1460 | Length:1460 | Min. : 1.000 | Min. :1.000 | Min. :1872 | Min. :1950 | Length:1460 | Length:1460 | Length:1460 | Length:1460 | Length:1460 | Min. : 0.0 | Length:1460 | Length:1460 | Length:1460 | Length:1460 | Length:1460 | Length:1460 | Length:1460 | Min. : 0.0 | Length:1460 | Min. : 0.00 | Min. : 0.0 | Min. : 0.0 | Length:1460 | Length:1460 | Length:1460 | Length:1460 | Min. : 334 | Min. : 0 | Min. : 0.000 | Min. : 334 | Min. :0.0000 | Min. :0.00000 | Min. :0.000 | Min. :0.0000 | Min. :0.000 | Min. :0.000 | Length:1460 | Min. : 2.000 | Length:1460 | Min. :0.000 | Length:1460 | Length:1460 | Min. :1900 | Length:1460 | Min. :0.000 | Min. : 0.0 | Length:1460 | Length:1460 | Length:1460 | Min. : 0.00 | Min. : 0.00 | Min. : 0.00 | Min. : 0.00 | Min. : 0.00 | Min. : 0.000 | Length:1460 | Length:1460 | Length:1460 | Min. : 0.00 | Min. : 1.000 | Min. :2006 | Length:1460 | Length:1460 | Min. : 34900 | |
1st Qu.: 365.8 | 1st Qu.: 20.0 | Class :character | 1st Qu.: 59.00 | 1st Qu.: 7554 | Class :character | Class :character | Class :character | Class :character | Class :character | Class :character | Class :character | Class :character | Class :character | Class :character | Class :character | Class :character | 1st Qu.: 5.000 | 1st Qu.:5.000 | 1st Qu.:1954 | 1st Qu.:1967 | Class :character | Class :character | Class :character | Class :character | Class :character | 1st Qu.: 0.0 | Class :character | Class :character | Class :character | Class :character | Class :character | Class :character | Class :character | 1st Qu.: 0.0 | Class :character | 1st Qu.: 0.00 | 1st Qu.: 223.0 | 1st Qu.: 795.8 | Class :character | Class :character | Class :character | Class :character | 1st Qu.: 882 | 1st Qu.: 0 | 1st Qu.: 0.000 | 1st Qu.:1130 | 1st Qu.:0.0000 | 1st Qu.:0.00000 | 1st Qu.:1.000 | 1st Qu.:0.0000 | 1st Qu.:2.000 | 1st Qu.:1.000 | Class :character | 1st Qu.: 5.000 | Class :character | 1st Qu.:0.000 | Class :character | Class :character | 1st Qu.:1961 | Class :character | 1st Qu.:1.000 | 1st Qu.: 334.5 | Class :character | Class :character | Class :character | 1st Qu.: 0.00 | 1st Qu.: 0.00 | 1st Qu.: 0.00 | 1st Qu.: 0.00 | 1st Qu.: 0.00 | 1st Qu.: 0.000 | Class :character | Class :character | Class :character | 1st Qu.: 0.00 | 1st Qu.: 5.000 | 1st Qu.:2007 | Class :character | Class :character | 1st Qu.:129975 | |
Median : 730.5 | Median : 50.0 | Mode :character | Median : 69.00 | Median : 9478 | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | Median : 6.000 | Median :5.000 | Median :1973 | Median :1994 | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | Median : 0.0 | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | Mode :character | Median : 383.5 | Mode :character | Median : 0.00 | Median : 477.5 | Median : 991.5 | Mode :character | Mode :character | Mode :character | Mode :character | Median :1087 | Median : 0 | Median : 0.000 | Median :1464 | Median :0.0000 | Median :0.00000 | Median :2.000 | Median :0.0000 | Median :3.000 | Median :1.000 | Mode :character | Median : 6.000 | Mode :character | Median :1.000 | Mode :character | Mode :character | Median :1980 | Mode :character | Median :2.000 | Median : 480.0 | Mode :character | Mode :character | Mode :character | Median : 0.00 | Median : 25.00 | Median : 0.00 | Median : 0.00 | Median : 0.00 | Median : 0.000 | Mode :character | Mode :character | Mode :character | Median : 0.00 | Median : 6.000 | Median :2008 | Mode :character | Mode :character | Median :163000 | |
Mean : 730.5 | Mean : 56.9 | NA | Mean : 70.05 | Mean : 10517 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | Mean : 6.099 | Mean :5.575 | Mean :1971 | Mean :1985 | NA | NA | NA | NA | NA | Mean : 103.7 | NA | NA | NA | NA | NA | NA | NA | Mean : 443.6 | NA | Mean : 46.55 | Mean : 567.2 | Mean :1057.4 | NA | NA | NA | NA | Mean :1163 | Mean : 347 | Mean : 5.845 | Mean :1515 | Mean :0.4253 | Mean :0.05753 | Mean :1.565 | Mean :0.3829 | Mean :2.866 | Mean :1.047 | NA | Mean : 6.518 | NA | Mean :0.613 | NA | NA | Mean :1979 | NA | Mean :1.767 | Mean : 473.0 | NA | NA | NA | Mean : 94.24 | Mean : 46.66 | Mean : 21.95 | Mean : 3.41 | Mean : 15.06 | Mean : 2.759 | NA | NA | NA | Mean : 43.49 | Mean : 6.322 | Mean :2008 | NA | NA | Mean :180921 | |
3rd Qu.:1095.2 | 3rd Qu.: 70.0 | NA | 3rd Qu.: 80.00 | 3rd Qu.: 11602 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | 3rd Qu.: 7.000 | 3rd Qu.:6.000 | 3rd Qu.:2000 | 3rd Qu.:2004 | NA | NA | NA | NA | NA | 3rd Qu.: 166.0 | NA | NA | NA | NA | NA | NA | NA | 3rd Qu.: 712.2 | NA | 3rd Qu.: 0.00 | 3rd Qu.: 808.0 | 3rd Qu.:1298.2 | NA | NA | NA | NA | 3rd Qu.:1391 | 3rd Qu.: 728 | 3rd Qu.: 0.000 | 3rd Qu.:1777 | 3rd Qu.:1.0000 | 3rd Qu.:0.00000 | 3rd Qu.:2.000 | 3rd Qu.:1.0000 | 3rd Qu.:3.000 | 3rd Qu.:1.000 | NA | 3rd Qu.: 7.000 | NA | 3rd Qu.:1.000 | NA | NA | 3rd Qu.:2002 | NA | 3rd Qu.:2.000 | 3rd Qu.: 576.0 | NA | NA | NA | 3rd Qu.:168.00 | 3rd Qu.: 68.00 | 3rd Qu.: 0.00 | 3rd Qu.: 0.00 | 3rd Qu.: 0.00 | 3rd Qu.: 0.000 | NA | NA | NA | 3rd Qu.: 0.00 | 3rd Qu.: 8.000 | 3rd Qu.:2009 | NA | NA | 3rd Qu.:214000 | |
Max. :1460.0 | Max. :190.0 | NA | Max. :313.00 | Max. :215245 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | Max. :10.000 | Max. :9.000 | Max. :2010 | Max. :2010 | NA | NA | NA | NA | NA | Max. :1600.0 | NA | NA | NA | NA | NA | NA | NA | Max. :5644.0 | NA | Max. :1474.00 | Max. :2336.0 | Max. :6110.0 | NA | NA | NA | NA | Max. :4692 | Max. :2065 | Max. :572.000 | Max. :5642 | Max. :3.0000 | Max. :2.00000 | Max. :3.000 | Max. :2.0000 | Max. :8.000 | Max. :3.000 | NA | Max. :14.000 | NA | Max. :3.000 | NA | NA | Max. :2010 | NA | Max. :4.000 | Max. :1418.0 | NA | NA | NA | Max. :857.00 | Max. :547.00 | Max. :552.00 | Max. :508.00 | Max. :480.00 | Max. :738.000 | NA | NA | NA | Max. :15500.00 | Max. :12.000 | Max. :2010 | NA | NA | Max. :755000 | |
NA | NA | NA | NA’s :259 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA’s :8 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA’s :81 | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA | NA |
summary(train$SalePrice)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 34900 129975 163000 180921 214000 755000
hist(train$SalePrice, main="Histogram of Sale Price", xlab="Sale Price")