You are to register for Kaggle.com (free) and compete in the House Prices: Advanced Regression Techniques competition. https://www.kaggle.com/c/house-prices-advanced-regression-techniques . I want you to do the following.
Descriptive and Inferential Statistics. Provide univariate descriptive statistics and appropriate plots for the training data set. Provide a scatterplot matrix for at least two of the independent variables and the dependent variable. Derive a correlation matrix for any three quantitative variables in the dataset. Test the hypotheses that the correlations between each pairwise set of variables is 0 and provide an 80% confidence interval. Discuss the meaning of your analysis. Would you be worried about familywise error? Why or why not?
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.2.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.2.2
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2
## ──
## ✔ ggplot2 3.4.1 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ stringr 1.5.0
## ✔ tidyr 1.3.0 ✔ forcats 1.0.0
## ✔ readr 2.1.4
## Warning: package 'ggplot2' was built under R version 4.2.2
## Warning: package 'tidyr' was built under R version 4.2.2
## Warning: package 'readr' was built under R version 4.2.2
## Warning: package 'purrr' was built under R version 4.2.2
## Warning: package 'stringr' was built under R version 4.2.2
## Warning: package 'forcats' was built under R version 4.2.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(rstatix) # summary statistics and statistical tests
## Warning: package 'rstatix' was built under R version 4.2.2
##
## Attaching package: 'rstatix'
##
## The following object is masked from 'package:stats':
##
## filter
library(corrplot)
## Warning: package 'corrplot' was built under R version 4.2.2
## corrplot 0.92 loaded
library(GGally)
## Warning: package 'GGally' was built under R version 4.2.2
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library(plotly)
## Warning: package 'plotly' was built under R version 4.2.2
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(infer)
## Warning: package 'infer' was built under R version 4.2.2
##
## Attaching package: 'infer'
##
## The following objects are masked from 'package:rstatix':
##
## chisq_test, prop_test, t_test
library(forcats)
library(DT)
## Warning: package 'DT' was built under R version 4.2.2
# Import training and testing data
train <- read.csv('https://raw.githubusercontent.com/enidroman/Data_605_Fundamentals_of_Computational_Mathematics/main/train.csv')
test <- read.csv('https://raw.githubusercontent.com/enidroman/Data_605_Fundamentals_of_Computational_Mathematics/main/test.csv')
Provide univariate descriptive statistics and appropriate plots for the training data set.
# Preview of the train dataset.
head(train) # first 6 observations
## Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour
## 1 1 60 RL 65 8450 Pave <NA> Reg Lvl
## 2 2 20 RL 80 9600 Pave <NA> Reg Lvl
## 3 3 60 RL 68 11250 Pave <NA> IR1 Lvl
## 4 4 70 RL 60 9550 Pave <NA> IR1 Lvl
## 5 5 60 RL 84 14260 Pave <NA> IR1 Lvl
## 6 6 50 RL 85 14115 Pave <NA> IR1 Lvl
## Utilities LotConfig LandSlope Neighborhood Condition1 Condition2 BldgType
## 1 AllPub Inside Gtl CollgCr Norm Norm 1Fam
## 2 AllPub FR2 Gtl Veenker Feedr Norm 1Fam
## 3 AllPub Inside Gtl CollgCr Norm Norm 1Fam
## 4 AllPub Corner Gtl Crawfor Norm Norm 1Fam
## 5 AllPub FR2 Gtl NoRidge Norm Norm 1Fam
## 6 AllPub Inside Gtl Mitchel Norm Norm 1Fam
## HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl
## 1 2Story 7 5 2003 2003 Gable CompShg
## 2 1Story 6 8 1976 1976 Gable CompShg
## 3 2Story 7 5 2001 2002 Gable CompShg
## 4 2Story 7 5 1915 1970 Gable CompShg
## 5 2Story 8 5 2000 2000 Gable CompShg
## 6 1.5Fin 5 5 1993 1995 Gable CompShg
## Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation
## 1 VinylSd VinylSd BrkFace 196 Gd TA PConc
## 2 MetalSd MetalSd None 0 TA TA CBlock
## 3 VinylSd VinylSd BrkFace 162 Gd TA PConc
## 4 Wd Sdng Wd Shng None 0 TA TA BrkTil
## 5 VinylSd VinylSd BrkFace 350 Gd TA PConc
## 6 VinylSd VinylSd None 0 TA TA Wood
## BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2
## 1 Gd TA No GLQ 706 Unf
## 2 Gd TA Gd ALQ 978 Unf
## 3 Gd TA Mn GLQ 486 Unf
## 4 TA Gd No ALQ 216 Unf
## 5 Gd TA Av GLQ 655 Unf
## 6 Gd TA No GLQ 732 Unf
## BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir Electrical
## 1 0 150 856 GasA Ex Y SBrkr
## 2 0 284 1262 GasA Ex Y SBrkr
## 3 0 434 920 GasA Ex Y SBrkr
## 4 0 540 756 GasA Gd Y SBrkr
## 5 0 490 1145 GasA Ex Y SBrkr
## 6 0 64 796 GasA Ex Y SBrkr
## X1stFlrSF X2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath BsmtHalfBath FullBath
## 1 856 854 0 1710 1 0 2
## 2 1262 0 0 1262 0 1 2
## 3 920 866 0 1786 1 0 2
## 4 961 756 0 1717 1 0 1
## 5 1145 1053 0 2198 1 0 2
## 6 796 566 0 1362 1 0 1
## HalfBath BedroomAbvGr KitchenAbvGr KitchenQual TotRmsAbvGrd Functional
## 1 1 3 1 Gd 8 Typ
## 2 0 3 1 TA 6 Typ
## 3 1 3 1 Gd 6 Typ
## 4 0 3 1 Gd 7 Typ
## 5 1 4 1 Gd 9 Typ
## 6 1 1 1 TA 5 Typ
## Fireplaces FireplaceQu GarageType GarageYrBlt GarageFinish GarageCars
## 1 0 <NA> Attchd 2003 RFn 2
## 2 1 TA Attchd 1976 RFn 2
## 3 1 TA Attchd 2001 RFn 2
## 4 1 Gd Detchd 1998 Unf 3
## 5 1 TA Attchd 2000 RFn 3
## 6 0 <NA> Attchd 1993 Unf 2
## GarageArea GarageQual GarageCond PavedDrive WoodDeckSF OpenPorchSF
## 1 548 TA TA Y 0 61
## 2 460 TA TA Y 298 0
## 3 608 TA TA Y 0 42
## 4 642 TA TA Y 0 35
## 5 836 TA TA Y 192 84
## 6 480 TA TA Y 40 30
## EnclosedPorch X3SsnPorch ScreenPorch PoolArea PoolQC Fence MiscFeature
## 1 0 0 0 0 <NA> <NA> <NA>
## 2 0 0 0 0 <NA> <NA> <NA>
## 3 0 0 0 0 <NA> <NA> <NA>
## 4 272 0 0 0 <NA> <NA> <NA>
## 5 0 0 0 0 <NA> <NA> <NA>
## 6 0 320 0 0 <NA> MnPrv Shed
## MiscVal MoSold YrSold SaleType SaleCondition SalePrice
## 1 0 2 2008 WD Normal 208500
## 2 0 5 2007 WD Normal 181500
## 3 0 9 2008 WD Normal 223500
## 4 0 2 2006 WD Abnorml 140000
## 5 0 12 2008 WD Normal 250000
## 6 700 10 2009 WD Normal 143000
Data Structure
# Structure of dataset
str(train)
## 'data.frame': 1460 obs. of 81 variables:
## $ Id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ MSSubClass : int 60 20 60 70 60 50 20 60 50 190 ...
## $ MSZoning : chr "RL" "RL" "RL" "RL" ...
## $ LotFrontage : int 65 80 68 60 84 85 75 NA 51 50 ...
## $ LotArea : int 8450 9600 11250 9550 14260 14115 10084 10382 6120 7420 ...
## $ Street : chr "Pave" "Pave" "Pave" "Pave" ...
## $ Alley : chr NA NA NA NA ...
## $ LotShape : chr "Reg" "Reg" "IR1" "IR1" ...
## $ LandContour : chr "Lvl" "Lvl" "Lvl" "Lvl" ...
## $ Utilities : chr "AllPub" "AllPub" "AllPub" "AllPub" ...
## $ LotConfig : chr "Inside" "FR2" "Inside" "Corner" ...
## $ LandSlope : chr "Gtl" "Gtl" "Gtl" "Gtl" ...
## $ Neighborhood : chr "CollgCr" "Veenker" "CollgCr" "Crawfor" ...
## $ Condition1 : chr "Norm" "Feedr" "Norm" "Norm" ...
## $ Condition2 : chr "Norm" "Norm" "Norm" "Norm" ...
## $ BldgType : chr "1Fam" "1Fam" "1Fam" "1Fam" ...
## $ HouseStyle : chr "2Story" "1Story" "2Story" "2Story" ...
## $ OverallQual : int 7 6 7 7 8 5 8 7 7 5 ...
## $ OverallCond : int 5 8 5 5 5 5 5 6 5 6 ...
## $ YearBuilt : int 2003 1976 2001 1915 2000 1993 2004 1973 1931 1939 ...
## $ YearRemodAdd : int 2003 1976 2002 1970 2000 1995 2005 1973 1950 1950 ...
## $ RoofStyle : chr "Gable" "Gable" "Gable" "Gable" ...
## $ RoofMatl : chr "CompShg" "CompShg" "CompShg" "CompShg" ...
## $ Exterior1st : chr "VinylSd" "MetalSd" "VinylSd" "Wd Sdng" ...
## $ Exterior2nd : chr "VinylSd" "MetalSd" "VinylSd" "Wd Shng" ...
## $ MasVnrType : chr "BrkFace" "None" "BrkFace" "None" ...
## $ MasVnrArea : int 196 0 162 0 350 0 186 240 0 0 ...
## $ ExterQual : chr "Gd" "TA" "Gd" "TA" ...
## $ ExterCond : chr "TA" "TA" "TA" "TA" ...
## $ Foundation : chr "PConc" "CBlock" "PConc" "BrkTil" ...
## $ BsmtQual : chr "Gd" "Gd" "Gd" "TA" ...
## $ BsmtCond : chr "TA" "TA" "TA" "Gd" ...
## $ BsmtExposure : chr "No" "Gd" "Mn" "No" ...
## $ BsmtFinType1 : chr "GLQ" "ALQ" "GLQ" "ALQ" ...
## $ BsmtFinSF1 : int 706 978 486 216 655 732 1369 859 0 851 ...
## $ BsmtFinType2 : chr "Unf" "Unf" "Unf" "Unf" ...
## $ BsmtFinSF2 : int 0 0 0 0 0 0 0 32 0 0 ...
## $ BsmtUnfSF : int 150 284 434 540 490 64 317 216 952 140 ...
## $ TotalBsmtSF : int 856 1262 920 756 1145 796 1686 1107 952 991 ...
## $ Heating : chr "GasA" "GasA" "GasA" "GasA" ...
## $ HeatingQC : chr "Ex" "Ex" "Ex" "Gd" ...
## $ CentralAir : chr "Y" "Y" "Y" "Y" ...
## $ Electrical : chr "SBrkr" "SBrkr" "SBrkr" "SBrkr" ...
## $ X1stFlrSF : int 856 1262 920 961 1145 796 1694 1107 1022 1077 ...
## $ X2ndFlrSF : int 854 0 866 756 1053 566 0 983 752 0 ...
## $ LowQualFinSF : int 0 0 0 0 0 0 0 0 0 0 ...
## $ GrLivArea : int 1710 1262 1786 1717 2198 1362 1694 2090 1774 1077 ...
## $ BsmtFullBath : int 1 0 1 1 1 1 1 1 0 1 ...
## $ BsmtHalfBath : int 0 1 0 0 0 0 0 0 0 0 ...
## $ FullBath : int 2 2 2 1 2 1 2 2 2 1 ...
## $ HalfBath : int 1 0 1 0 1 1 0 1 0 0 ...
## $ BedroomAbvGr : int 3 3 3 3 4 1 3 3 2 2 ...
## $ KitchenAbvGr : int 1 1 1 1 1 1 1 1 2 2 ...
## $ KitchenQual : chr "Gd" "TA" "Gd" "Gd" ...
## $ TotRmsAbvGrd : int 8 6 6 7 9 5 7 7 8 5 ...
## $ Functional : chr "Typ" "Typ" "Typ" "Typ" ...
## $ Fireplaces : int 0 1 1 1 1 0 1 2 2 2 ...
## $ FireplaceQu : chr NA "TA" "TA" "Gd" ...
## $ GarageType : chr "Attchd" "Attchd" "Attchd" "Detchd" ...
## $ GarageYrBlt : int 2003 1976 2001 1998 2000 1993 2004 1973 1931 1939 ...
## $ GarageFinish : chr "RFn" "RFn" "RFn" "Unf" ...
## $ GarageCars : int 2 2 2 3 3 2 2 2 2 1 ...
## $ GarageArea : int 548 460 608 642 836 480 636 484 468 205 ...
## $ GarageQual : chr "TA" "TA" "TA" "TA" ...
## $ GarageCond : chr "TA" "TA" "TA" "TA" ...
## $ PavedDrive : chr "Y" "Y" "Y" "Y" ...
## $ WoodDeckSF : int 0 298 0 0 192 40 255 235 90 0 ...
## $ OpenPorchSF : int 61 0 42 35 84 30 57 204 0 4 ...
## $ EnclosedPorch: int 0 0 0 272 0 0 0 228 205 0 ...
## $ X3SsnPorch : int 0 0 0 0 0 320 0 0 0 0 ...
## $ ScreenPorch : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PoolArea : int 0 0 0 0 0 0 0 0 0 0 ...
## $ PoolQC : chr NA NA NA NA ...
## $ Fence : chr NA NA NA NA ...
## $ MiscFeature : chr NA NA NA NA ...
## $ MiscVal : int 0 0 0 0 0 700 0 350 0 0 ...
## $ MoSold : int 2 5 9 2 12 10 8 11 4 1 ...
## $ YrSold : int 2008 2007 2008 2006 2008 2009 2007 2009 2008 2008 ...
## $ SaleType : chr "WD" "WD" "WD" "WD" ...
## $ SaleCondition: chr "Normal" "Normal" "Normal" "Abnorml" ...
## $ SalePrice : int 208500 181500 223500 140000 250000 143000 307000 200000 129900 118000 ...
The dataset contain 1460 observation and 81 variables
Data fields Here’s a brief version of what you’ll find in the data description file.
SalePrice - the property’s sale price in dollars. This is the target variable that you’re trying to predict. MSSubClass: The building class MSZoning: The general zoning classification LotFrontage: Linear feet of street connected to property LotArea: Lot size in square feet Street: Type of road access Alley: Type of alley access LotShape: General shape of property LandContour: Flatness of the property Utilities: Type of utilities available LotConfig: Lot configuration LandSlope: Slope of property Neighborhood: Physical locations within Ames city limits Condition1: Proximity to main road or railroad Condition2: Proximity to main road or railroad (if a second is present) BldgType: Type of dwelling HouseStyle: Style of dwelling OverallQual: Overall material and finish quality OverallCond: Overall condition rating YearBuilt: Original construction date YearRemodAdd: Remodel date RoofStyle: Type of roof RoofMatl: Roof material Exterior1st: Exterior covering on house Exterior2nd: Exterior covering on house (if more than one material) MasVnrType: Masonry veneer type MasVnrArea: Masonry veneer area in square feet ExterQual: Exterior material quality ExterCond: Present condition of the material on the exterior Foundation: Type of foundation BsmtQual: Height of the basement BsmtCond: General condition of the basement BsmtExposure: Walkout or garden level basement walls BsmtFinType1: Quality of basement finished area BsmtFinSF1: Type 1 finished square feet BsmtFinType2: Quality of second finished area (if present) BsmtFinSF2: Type 2 finished square feet BsmtUnfSF: Unfinished square feet of basement area TotalBsmtSF: Total square feet of basement area Heating: Type of heating HeatingQC: Heating quality and condition CentralAir: Central air conditioning Electrical: Electrical system 1stFlrSF: First Floor square feet 2ndFlrSF: Second floor square feet LowQualFinSF: Low quality finished square feet (all floors) GrLivArea: Above grade (ground) living area square feet BsmtFullBath: Basement full bathrooms BsmtHalfBath: Basement half bathrooms FullBath: Full bathrooms above grade HalfBath: Half baths above grade Bedroom: Number of bedrooms above basement level Kitchen: Number of kitchens KitchenQual: Kitchen quality TotRmsAbvGrd: Total rooms above grade (does not include bathrooms) Functional: Home functionality rating Fireplaces: Number of fireplaces FireplaceQu: Fireplace quality GarageType: Garage location GarageYrBlt: Year garage was built GarageFinish: Interior finish of the garage GarageCars: Size of garage in car capacity GarageArea: Size of garage in square feet GarageQual: Garage quality GarageCond: Garage condition PavedDrive: Paved driveway WoodDeckSF: Wood deck area in square feet OpenPorchSF: Open porch area in square feet EnclosedPorch: Enclosed porch area in square feet 3SsnPorch: Three season porch area in square feet ScreenPorch: Screen porch area in square feet PoolArea: Pool area in square feet PoolQC: Pool quality Fence: Fence quality MiscFeature: Miscellaneous feature not covered in other categories MiscVal: $Value of miscellaneous feature MoSold: Month Sold YrSold: Year Sold SaleType: Type of sale SaleCondition: Condition of sale
Summary of Each Variable for the Train Dataset
summary(train)
## Id MSSubClass MSZoning LotFrontage
## Min. : 1.0 Min. : 20.0 Length:1460 Min. : 21.00
## 1st Qu.: 365.8 1st Qu.: 20.0 Class :character 1st Qu.: 59.00
## Median : 730.5 Median : 50.0 Mode :character Median : 69.00
## Mean : 730.5 Mean : 56.9 Mean : 70.05
## 3rd Qu.:1095.2 3rd Qu.: 70.0 3rd Qu.: 80.00
## Max. :1460.0 Max. :190.0 Max. :313.00
## NA's :259
## LotArea Street Alley LotShape
## Min. : 1300 Length:1460 Length:1460 Length:1460
## 1st Qu.: 7554 Class :character Class :character Class :character
## Median : 9478 Mode :character Mode :character Mode :character
## Mean : 10517
## 3rd Qu.: 11602
## Max. :215245
##
## LandContour Utilities LotConfig LandSlope
## Length:1460 Length:1460 Length:1460 Length:1460
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Neighborhood Condition1 Condition2 BldgType
## Length:1460 Length:1460 Length:1460 Length:1460
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## HouseStyle OverallQual OverallCond YearBuilt
## Length:1460 Min. : 1.000 Min. :1.000 Min. :1872
## Class :character 1st Qu.: 5.000 1st Qu.:5.000 1st Qu.:1954
## Mode :character Median : 6.000 Median :5.000 Median :1973
## Mean : 6.099 Mean :5.575 Mean :1971
## 3rd Qu.: 7.000 3rd Qu.:6.000 3rd Qu.:2000
## Max. :10.000 Max. :9.000 Max. :2010
##
## YearRemodAdd RoofStyle RoofMatl Exterior1st
## Min. :1950 Length:1460 Length:1460 Length:1460
## 1st Qu.:1967 Class :character Class :character Class :character
## Median :1994 Mode :character Mode :character Mode :character
## Mean :1985
## 3rd Qu.:2004
## Max. :2010
##
## Exterior2nd MasVnrType MasVnrArea ExterQual
## Length:1460 Length:1460 Min. : 0.0 Length:1460
## Class :character Class :character 1st Qu.: 0.0 Class :character
## Mode :character Mode :character Median : 0.0 Mode :character
## Mean : 103.7
## 3rd Qu.: 166.0
## Max. :1600.0
## NA's :8
## ExterCond Foundation BsmtQual BsmtCond
## Length:1460 Length:1460 Length:1460 Length:1460
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2
## Length:1460 Length:1460 Min. : 0.0 Length:1460
## Class :character Class :character 1st Qu.: 0.0 Class :character
## Mode :character Mode :character Median : 383.5 Mode :character
## Mean : 443.6
## 3rd Qu.: 712.2
## Max. :5644.0
##
## BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating
## Min. : 0.00 Min. : 0.0 Min. : 0.0 Length:1460
## 1st Qu.: 0.00 1st Qu.: 223.0 1st Qu.: 795.8 Class :character
## Median : 0.00 Median : 477.5 Median : 991.5 Mode :character
## Mean : 46.55 Mean : 567.2 Mean :1057.4
## 3rd Qu.: 0.00 3rd Qu.: 808.0 3rd Qu.:1298.2
## Max. :1474.00 Max. :2336.0 Max. :6110.0
##
## HeatingQC CentralAir Electrical X1stFlrSF
## Length:1460 Length:1460 Length:1460 Min. : 334
## Class :character Class :character Class :character 1st Qu.: 882
## Mode :character Mode :character Mode :character Median :1087
## Mean :1163
## 3rd Qu.:1391
## Max. :4692
##
## X2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath
## Min. : 0 Min. : 0.000 Min. : 334 Min. :0.0000
## 1st Qu.: 0 1st Qu.: 0.000 1st Qu.:1130 1st Qu.:0.0000
## Median : 0 Median : 0.000 Median :1464 Median :0.0000
## Mean : 347 Mean : 5.845 Mean :1515 Mean :0.4253
## 3rd Qu.: 728 3rd Qu.: 0.000 3rd Qu.:1777 3rd Qu.:1.0000
## Max. :2065 Max. :572.000 Max. :5642 Max. :3.0000
##
## BsmtHalfBath FullBath HalfBath BedroomAbvGr
## Min. :0.00000 Min. :0.000 Min. :0.0000 Min. :0.000
## 1st Qu.:0.00000 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:2.000
## Median :0.00000 Median :2.000 Median :0.0000 Median :3.000
## Mean :0.05753 Mean :1.565 Mean :0.3829 Mean :2.866
## 3rd Qu.:0.00000 3rd Qu.:2.000 3rd Qu.:1.0000 3rd Qu.:3.000
## Max. :2.00000 Max. :3.000 Max. :2.0000 Max. :8.000
##
## KitchenAbvGr KitchenQual TotRmsAbvGrd Functional
## Min. :0.000 Length:1460 Min. : 2.000 Length:1460
## 1st Qu.:1.000 Class :character 1st Qu.: 5.000 Class :character
## Median :1.000 Mode :character Median : 6.000 Mode :character
## Mean :1.047 Mean : 6.518
## 3rd Qu.:1.000 3rd Qu.: 7.000
## Max. :3.000 Max. :14.000
##
## Fireplaces FireplaceQu GarageType GarageYrBlt
## Min. :0.000 Length:1460 Length:1460 Min. :1900
## 1st Qu.:0.000 Class :character Class :character 1st Qu.:1961
## Median :1.000 Mode :character Mode :character Median :1980
## Mean :0.613 Mean :1979
## 3rd Qu.:1.000 3rd Qu.:2002
## Max. :3.000 Max. :2010
## NA's :81
## GarageFinish GarageCars GarageArea GarageQual
## Length:1460 Min. :0.000 Min. : 0.0 Length:1460
## Class :character 1st Qu.:1.000 1st Qu.: 334.5 Class :character
## Mode :character Median :2.000 Median : 480.0 Mode :character
## Mean :1.767 Mean : 473.0
## 3rd Qu.:2.000 3rd Qu.: 576.0
## Max. :4.000 Max. :1418.0
##
## GarageCond PavedDrive WoodDeckSF OpenPorchSF
## Length:1460 Length:1460 Min. : 0.00 Min. : 0.00
## Class :character Class :character 1st Qu.: 0.00 1st Qu.: 0.00
## Mode :character Mode :character Median : 0.00 Median : 25.00
## Mean : 94.24 Mean : 46.66
## 3rd Qu.:168.00 3rd Qu.: 68.00
## Max. :857.00 Max. :547.00
##
## EnclosedPorch X3SsnPorch ScreenPorch PoolArea
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.000
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.000
## Median : 0.00 Median : 0.00 Median : 0.00 Median : 0.000
## Mean : 21.95 Mean : 3.41 Mean : 15.06 Mean : 2.759
## 3rd Qu.: 0.00 3rd Qu.: 0.00 3rd Qu.: 0.00 3rd Qu.: 0.000
## Max. :552.00 Max. :508.00 Max. :480.00 Max. :738.000
##
## PoolQC Fence MiscFeature MiscVal
## Length:1460 Length:1460 Length:1460 Min. : 0.00
## Class :character Class :character Class :character 1st Qu.: 0.00
## Mode :character Mode :character Mode :character Median : 0.00
## Mean : 43.49
## 3rd Qu.: 0.00
## Max. :15500.00
##
## MoSold YrSold SaleType SaleCondition
## Min. : 1.000 Min. :2006 Length:1460 Length:1460
## 1st Qu.: 5.000 1st Qu.:2007 Class :character Class :character
## Median : 6.000 Median :2008 Mode :character Mode :character
## Mean : 6.322 Mean :2008
## 3rd Qu.: 8.000 3rd Qu.:2009
## Max. :12.000 Max. :2010
##
## SalePrice
## Min. : 34900
## 1st Qu.:129975
## Median :163000
## Mean :180921
## 3rd Qu.:214000
## Max. :755000
##
Independent Variable and Dependent Varialble
Dependent Variable SalePrice - the property’s sale price in dollars. This is the target variable that you’re trying to predict.
Independent Variable LotArea - Lot size in square feet GrLivArea - Above grade (ground) living area square feet GarageArea - Size of garage in square feet PoolArea - Pool area in square feet
You can use the get_summary_stats() function from rstatix to return summary statistics in a data frame format. This can be helpful for performing subsequent operations or plotting on the numbers.
By using get_summary_stats I get a calculation of the summary stats for the Indpendent Variable and Dependent Variable I am using for my analysis.
variables_sum <- train %>%
# columns to calculate for
get_summary_stats(SalePrice, LotArea, GrLivArea, GarageArea, PoolArea,
# summary stats to return
type = "common")
variables_sum
## # A tibble: 5 × 10
## variable n min max median iqr mean sd se ci
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SalePrice 1460 34900 755000 163000 84025 180921. 79443. 2079. 4078.
## 2 LotArea 1460 1300 215245 9478. 4048 10517. 9981. 261. 512.
## 3 GrLivArea 1460 334 5642 1464 647. 1515. 525. 13.8 27.0
## 4 GarageArea 1460 0 1418 480 242. 473. 214. 5.60 11.0
## 5 PoolArea 1460 0 738 0 0 2.76 40.2 1.05 2.06
Prepararation of Dataset with Variables Needed
# Subsetting LotArea, GrLivArea, GarageArea, PoolArea, and SalePrice from dataset train.
subset.train <- subset(train, select = c("LotArea", "GrLivArea", "GarageArea", "PoolArea","SalePrice"))
head(subset.train)
## LotArea GrLivArea GarageArea PoolArea SalePrice
## 1 8450 1710 548 0 208500
## 2 9600 1262 460 0 181500
## 3 11250 1786 608 0 223500
## 4 9550 1717 642 0 140000
## 5 14260 2198 836 0 250000
## 6 14115 1362 480 0 143000
# Subsetting LotArea, GrLivArea, GarageArea, and PoolArea from dataset train excluding SalePrice
subset.ind <- subset(train, select = c("LotArea", "GrLivArea", "GarageArea", "PoolArea"))
head(subset.ind)
## LotArea GrLivArea GarageArea PoolArea
## 1 8450 1710 548 0
## 2 9600 1262 460 0
## 3 11250 1786 608 0
## 4 9550 1717 642 0
## 5 14260 2198 836 0
## 6 14115 1362 480 0
Histogram with Density of Each Independent Variable
#install.packages("gridExtra")
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 4.2.3
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
#install.packages("plotfunctions")
library(plotfunctions)
## Warning: package 'plotfunctions' was built under R version 4.2.3
##
## Attaching package: 'plotfunctions'
## The following object is masked from 'package:plotly':
##
## add_bars
## The following object is masked from 'package:ggplot2':
##
## alpha
p1 <-ggplot(train, aes(x=LotArea)) +
geom_histogram(aes(y=..density..), colour="black", fill="white",bins=50)+
geom_density(alpha=.2, fill="green")+
labs(title = "Lot Area", x = "", y = "")
p2 <- ggplot(train, aes(x=GrLivArea)) +
geom_histogram(aes(y=..density..), colour="black", fill="white",bins=50)+
geom_density(alpha=.2, fill="green")+
labs(title = "Ground Living Area", x = "", y = "")
p3 <- ggplot(train, aes(x=GarageArea)) +
geom_histogram(aes(y=..density..), colour="black", fill="white",bins=50)+
geom_density(alpha=.2, fill="green")+
labs(title = "Garage Area", x = "", y = "")
p4 <-ggplot(train, aes(x=PoolArea)) +
geom_histogram(aes(y=..density..), colour="black", fill="white",bins=50)+
geom_density(alpha=.2, fill="green")+
labs(title = "Pool Area", x = "", y = "")
grid.arrange(p1, p2, p3, p4, nrow=2)
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
summary(train[c("LotArea", "GrLivArea", "GarageArea", "PoolArea")])
## LotArea GrLivArea GarageArea PoolArea
## Min. : 1300 Min. : 334 Min. : 0.0 Min. : 0.000
## 1st Qu.: 7554 1st Qu.:1130 1st Qu.: 334.5 1st Qu.: 0.000
## Median : 9478 Median :1464 Median : 480.0 Median : 0.000
## Mean : 10517 Mean :1515 Mean : 473.0 Mean : 2.759
## 3rd Qu.: 11602 3rd Qu.:1777 3rd Qu.: 576.0 3rd Qu.: 0.000
## Max. :215245 Max. :5642 Max. :1418.0 Max. :738.000
Density curves come in all shapes and sizes and they allow us to gain a quick visual understanding of the distribution of values in a given dataset.
Histograms are one of the most intuitive ways of representing the shape of a data set’s distribution along a single numeric variable.
In regards to the skewness of the 4 variables Lot Area, Ground Living Area, and Pool Area are more skewed to the left. The right skewness mean that the mean is greater than the median. Right skewness or positive-skewed means many of the values are near the lower end of the range, and higher values are infrequent. All 3 are unimodel because the distribution has only one peak.
The Garage Area has no skew. The no skewness means that the mean is equal to the median. The Garage Area has a multimodal distributions that have two or more peaks. In this case it looks like it has 4 peaks.
I notice Pool Area has only 1 bar and Lot Area has 6 bars. Number of bars is too small, then important features of the data may be obscured.
Looking at the Pool Area number of square feet looks like not many of the houses has Pool installed.
Scatter Plot of Each Independent Variable
p1 <- ggplot(train, aes(sample = LotArea))+
stat_qq()+
stat_qq_line()+
labs(title="Lot Area",x = "", y = "")
p2 <- ggplot(train, aes(sample = GrLivArea))+
stat_qq()+
stat_qq_line()+
labs(title="Ground Living Area", x = "", y = "")
p3 <- ggplot(train, aes(sample = GarageArea))+
stat_qq()+
stat_qq_line()+
labs(title="Garage Area", x = "", y = "")
p4 <- ggplot(train, aes(sample = PoolArea))+
stat_qq()+
stat_qq_line()+
labs(title="Pool Area", x = "", y = "")
grid.arrange(p1, p2, p3, p4, nrow=2)
The qq plot of Ground Living Area shows points seem to fall along a straight line. Notice the x-axis plots the theoretical quantiles. Those are the quantiles from the standard Normal distribution with mean 0 and standard deviation 1.
The qq plot of the Garage Area it starts straight but not within the line and then the points seems to fall within the line but as it increases it tends to be off range form the line.
Histogram and Scatterplot with Density of Dependent Variable
p1 <- ggplot(train, aes(x=SalePrice)) +
geom_histogram(aes(y=..density..), colour="black", fill="white",bins=50)+
geom_density(alpha=.2, fill="green")+
labs(title="SalePrice", x = "", y = "")
p2 <- ggplot(train, aes(sample = SalePrice))+
stat_qq()+
stat_qq_line()+
labs(title="SalePrice", x = "", y = "")
grid.arrange(p1, p2, nrow=1)
summary(train$SalePrice)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 34900 129975 163000 180921 214000 755000
The distribution of SalePrice is skewed to the right with some prices that are outliers towards the tail. The minimum sale price is $34,900 and the maximum sale price is $755,000. The median sale price is $163,000.
Provide a scatterplot matrix for at least two of the independent variables and the dependent variable.
# Drawing a scatterplot matrix of LotArea, GrLivArea, GarageArea, PoolArea, SalePrice using the pairs function
pairs(subset.train, pch = 16, col = "blue", main = "Matrix Scatterplot of LotArea, GrLivArea, GarageArea, PoolArea, SalePrice")
Derive a correlation matrix for any three quantitative variables in the dataset.
cor_matrix<-cor(subset.train)
cor_matrix
## LotArea GrLivArea GarageArea PoolArea SalePrice
## LotArea 1.00000000 0.2631162 0.18040276 0.07767239 0.26384335
## GrLivArea 0.26311617 1.0000000 0.46899748 0.17020534 0.70862448
## GarageArea 0.18040276 0.4689975 1.00000000 0.06104727 0.62343144
## PoolArea 0.07767239 0.1702053 0.06104727 1.00000000 0.09240355
## SalePrice 0.26384335 0.7086245 0.62343144 0.09240355 1.00000000
train %>%
dplyr::select(LotArea, GrLivArea, GarageArea, PoolArea, SalePrice)%>%
cor() %>%
corrplot(method ="color",order = "hclust", addrect = 3, number.cex = 1, sig.level = 0.20,
addCoef.col = "black", # Add coefficient of correlation
tl.srt = 90, # Text label color and rotation
# Combine with significance
diag = TRUE)
Test the hypotheses that the correlations between each pairwise set of variables is 0 and provide an 80% confidence interval.
Discuss the meaning of your analysis.
Hypotheses
H0 = There is 0 correlation between each pairwise variables
HA = There is correlation between each pairwise variables
cor.test(subset.train$LotArea, subset.train$SalePrice, conf.level = 0.80)
##
## Pearson's product-moment correlation
##
## data: subset.train$LotArea and subset.train$SalePrice
## t = 10.445, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
## 0.2323391 0.2947946
## sample estimates:
## cor
## 0.2638434
The correlation coefficient between LotArea and SalePrice is 0.2638434, which indicates a moderate positive correlation between the two variables.
The p-value is less than 0.05, which suggests that the correlation is statistically significant, and we can reject the null hypothesis that the true correlation is equal to zero.
The t-value of 10.445 and the associated p-value less than 2.2e-16 indicate that this correlation is statistically significant, meaning it is unlikely to have occurred by chance. The alternative hypothesis that the true correlation is not equal to 0 is supported by this result.
The 80 percent confidence interval is [0.2323391, 0.2947946], which means that we can be 80 percent confident that the true correlation coefficient lies between these two values.
Overall, this test tells us that there is a statistically significant moderate positive correlation between LotArea and SalePrice.
cor.test(subset.train$GrLivArea, subset.train$SalePrice, conf.level = 0.80)
##
## Pearson's product-moment correlation
##
## data: subset.train$GrLivArea and subset.train$SalePrice
## t = 38.348, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
## 0.6915087 0.7249450
## sample estimates:
## cor
## 0.7086245
The correlation coefficient between GrLivArea and SalePrice is 0.7086. This indicates a strong positive linear relationship between the two variables.
The t-value for the test of the null hypothesis that the true correlation between the two variables is zero is 38.348. The degrees of freedom for the t-test are 1458, and the p-value is less than 2.2e-16, indicating strong evidence against the null hypothesis.
The 80 percent confidence interval for the true correlation coefficient between the two variables is (0.6915, 0.7249). This indicates that we are 80 percent confident that the true correlation between the two variables lies in this interval.
Overall, this test tells that there is a strong positive linear relationship between GrLivArea and SalePrice.
cor.test(subset.train$GarageArea, subset.train$SalePrice, conf.level = 0.80)
##
## Pearson's product-moment correlation
##
## data: subset.train$GarageArea and subset.train$SalePrice
## t = 30.446, df = 1458, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
## 0.6024756 0.6435283
## sample estimates:
## cor
## 0.6234314
The correlation coefficient GarageArea and SalePrice is 0.6234, which indicates a moderate positive correlation between the two variables.
The t-value of 30.446 with 1458 degrees of freedom and p-value less than 2.2e-16 suggests that the observed correlation is statistically significant and unlikely to have occurred by chance.
The 80 percent confidence interval suggests that we can be reasonably confident that the true correlation between GarageArea and SalePrice falls between 0.6025 and 0.6435.
Overall, these results suggest that there is a moderate positive relationship between the GarageArea and SalePrice variables, with larger garage areas generally associated with higher sale prices.
cor.test(subset.train$PoolArea, subset.train$SalePrice, conf.level = 0.80)
##
## Pearson's product-moment correlation
##
## data: subset.train$PoolArea and subset.train$SalePrice
## t = 3.5435, df = 1458, p-value = 0.0004073
## alternative hypothesis: true correlation is not equal to 0
## 80 percent confidence interval:
## 0.05902496 0.12557575
## sample estimates:
## cor
## 0.09240355
The correlation coefficient between PoolArea and SalePrice is 0.0924, which indicates a weak positive correlation between the two variables.
The t value is 3.5435, and the p-value is 0.0004073, which means that the correlation coefficient is statistically significant at a significance level of 0.05. Therefore, we can reject the null hypothesis that there is no correlation between PoolArea and SalePrice.
The 80 percent confidence interval for the true correlation coefficient lies between 0.059 and 0.126, which means that we can be 80 percent confident that the true correlation coefficient falls within this interval.
Would you be worried about familywise error? Why or why not?
Familywise error rate (FWER) is a statistical concept that pertains to the probability of making one or more type I errors in a set of hypothesis tests. A type I error occurs when a researcher rejects a null hypothesis that is actually true.
The FWER is the probability of making one or more type I errors in a family of hypothesis tests, meaning that if a family of tests is conducted, the FWER is the probability of at least one type I error occurring in that family of tests.
You should worry about familywise error when you are performing multiple statistical tests simultaneously, such as in a hypothesis testing scenario where you are comparing the means or variances of multiple groups or testing the correlation between multiple pairs of variables.
The familywise error rate is the probability of making at least one false positive error in a family of tests, which increases as the number of tests increases. This means that the more tests you perform, the greater the chance of falsely rejecting a null hypothesis, even if all the individual tests have a low probability of error.
If you don’t account for familywise error, you may end up with a higher chance of finding significant results purely by chance. This can lead to incorrect conclusions and invalid interpretations of your data. Therefore, it is important to adjust your statistical tests to control for familywise error, especially when you are performing a large number of tests.
For example:
k <- 4
alpha <- .05
1 - (1-alpha)^k
## [1] 0.1854938
The familywise error rate in this case is 0.1854938. This means the probability of committing at least one Type I error is 18.54%. This is quite low.
Linear Algebra and Correlation. Invert your correlation matrix from above. (This is known as the precision matrix and contains variance inflation factors on the diagonal.) Multiply the correlation matrix by the precision matrix, and then multiply the precision matrix by the correlation matrix. Conduct LU decomposition on the matrix.
Invert your correlation matrix from above. (This is known as the precision matrix and contains variance inflation factors on the diagonal.)
# Correlation from above matrix.
cor_matrix<-cor(subset.train)
cor_matrix
## LotArea GrLivArea GarageArea PoolArea SalePrice
## LotArea 1.00000000 0.2631162 0.18040276 0.07767239 0.26384335
## GrLivArea 0.26311617 1.0000000 0.46899748 0.17020534 0.70862448
## GarageArea 0.18040276 0.4689975 1.00000000 0.06104727 0.62343144
## PoolArea 0.07767239 0.1702053 0.06104727 1.00000000 0.09240355
## SalePrice 0.26384335 0.7086245 0.62343144 0.09240355 1.00000000
# Invert Correlation from above matrix.
prec_matrix <- solve(cor_matrix)
prec_matrix
## LotArea GrLivArea GarageArea PoolArea SalePrice
## LotArea 1.09042226 -0.15680158 -0.021168829 -0.041975537 -0.15951123
## GrLivArea -0.15680158 2.08181516 -0.087419540 -0.211165985 -1.35984155
## GarageArea -0.02116883 -0.08741954 1.640186020 0.004680966 -0.95544319
## PoolArea -0.04197554 -0.21116598 0.004680966 1.033156947 0.06232672
## SalePrice -0.15951123 -1.35984155 -0.955443187 0.062326722 2.59559710
Multiply the correlation matrix by the precision matrix.
# Multiply the correlation matrix by the precision matrix
cor_prec <- cor_matrix %*% prec_matrix
cor_prec
## LotArea GrLivArea GarageArea PoolArea SalePrice
## LotArea 1.000000e+00 5.551115e-17 0.000000e+00 -3.469447e-18 0
## GrLivArea 1.387779e-17 1.000000e+00 0.000000e+00 0.000000e+00 0
## GarageArea 4.163336e-17 0.000000e+00 1.000000e+00 -6.938894e-18 0
## PoolArea 1.040834e-17 5.551115e-17 -1.387779e-17 1.000000e+00 0
## SalePrice 8.326673e-17 0.000000e+00 1.110223e-16 0.000000e+00 1
Multiply the precision matrix by the correlation matrix.
prec_cor <- prec_matrix %*% cor_matrix
prec_cor
## LotArea GrLivArea GarageArea PoolArea
## LotArea 1.000000e+00 -2.775558e-17 0.000000e+00 6.938894e-18
## GrLivArea 5.551115e-17 1.000000e+00 0.000000e+00 5.551115e-17
## GarageArea 0.000000e+00 1.110223e-16 1.000000e+00 -1.387779e-17
## PoolArea -1.734723e-17 -2.775558e-17 -6.938894e-18 1.000000e+00
## SalePrice 1.110223e-16 0.000000e+00 0.000000e+00 0.000000e+00
## SalePrice
## LotArea -2.775558e-17
## GrLivArea 0.000000e+00
## GarageArea 0.000000e+00
## PoolArea -2.775558e-17
## SalePrice 1.000000e+00
Conduct LU decomposition on the matrix.
#The function lu.decomposition is used from the matrixcalc package.
#install.packages('matrixcalc')
library('matrixcalc')
lu_decomp <- lu.decomposition(cor_matrix)
The lower triangular matrix.
L <- lu_decomp$L
L
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.00000000 0.0000000 0.0000000 0.00000000 0
## [2,] 0.26311617 1.0000000 0.0000000 0.00000000 0
## [3,] 0.18040276 0.4528838 1.0000000 0.00000000 0
## [4,] 0.07767239 0.1609082 -0.0267758 1.00000000 0
## [5,] 0.26384335 0.6867466 0.3687445 -0.02401248 1
The upper triangular.
U <- lu_decomp$U
U
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 0.2631162 0.1804028 0.07767239 0.26384335
## [2,] 0 0.9307699 0.4215306 0.14976847 0.63920303
## [3,] 0 0.0000000 0.7765505 -0.02079276 0.28634868
## [4,] 0 0.0000000 0.0000000 0.96931129 -0.02327557
## [5,] 0 0.0000000 0.0000000 0.00000000 0.38526781
Multiplying lower triangular and upper triangular result in the correlation matrix.
L %*% U
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.00000000 0.2631162 0.18040276 0.07767239 0.26384335
## [2,] 0.26311617 1.0000000 0.46899748 0.17020534 0.70862448
## [3,] 0.18040276 0.4689975 1.00000000 0.06104727 0.62343144
## [4,] 0.07767239 0.1702053 0.06104727 1.00000000 0.09240355
## [5,] 0.26384335 0.7086245 0.62343144 0.09240355 1.00000000
LU is equivalent to the cor_matrix.
cor_matrix
## LotArea GrLivArea GarageArea PoolArea SalePrice
## LotArea 1.00000000 0.2631162 0.18040276 0.07767239 0.26384335
## GrLivArea 0.26311617 1.0000000 0.46899748 0.17020534 0.70862448
## GarageArea 0.18040276 0.4689975 1.00000000 0.06104727 0.62343144
## PoolArea 0.07767239 0.1702053 0.06104727 1.00000000 0.09240355
## SalePrice 0.26384335 0.7086245 0.62343144 0.09240355 1.00000000
Calculus-Based Probability & Statistics. Many times, it makes sense to fit a closed form distribution to data. Select a variable in the Kaggle.com training dataset that is skewed to the right, shift it so that the minimum value is absolutely above zero if necessary. Then load the MASS package and run fitdistr to fit an exponential probability density function. (See https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/fitdistr.html ). Find the optimal value of λ for this distribution, and then take 1000 samples from this exponential distribution using this value (e.g., rexp(1000, λ)). Plot a histogram and compare it with a histogram of your original variable. Using the exponential pdf, find the 5th and 95th percentiles using the cumulative distribution function (CDF). Also generate a 95% confidence interval from the empirical data, assuming normality. Finally, provide the empirical 5th percentile and 95th percentile of the data. Discuss.
Select a variable in the Kaggle.com training dataset that is skewed to the right, shift it so that the minimum value is absolutely above zero if necessary.
I selected the variable BsmtUnfSF, Unfinished square feet of basement area.
fit_data <- train$BsmtUnfSF
fit_data <- fit_data[complete.cases(fit_data)]
The distribution is skewed to the right.
hist(fit_data)
summary(train$BsmtUnfSF)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 223.0 477.5 567.2 808.0 2336.0
length(fit_data[fit_data == 0])
## [1] 118
Out of the 1452 houses, there are 118 that have no Unfinished square feet of basement area.
fit_data <- fit_data + .01
Because the data measures area, adding a value of .01 should be negligible and would get rid of the zero values. A property with a Unfinished square feet of basement area .01 square feet would mean this property does not really have any masonry veneer.
load the MASS package and run fitdistr to fit an exponential probability density function. (See https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/fitdistr.html ).
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:plotly':
##
## select
## The following object is masked from 'package:rstatix':
##
## select
## The following object is masked from 'package:dplyr':
##
## select
# Run fitdistr to fit an exponential probability density function.
BsmtUnfSF_exp_dist <- fitdistr(train$BsmtUnfSF,'exponential')
Find the optimal value of λ for this distribution.
BsmtUnfSF_lamb <- BsmtUnfSF_exp_dist$estimate
BsmtUnfSF_lamb
## rate
## 0.001762921
Take 1000 samples from this exponential distribution using this value (e.g., rexp(1000, λ)).
set.seed(1000)
BsmtUnfSF_sample <- rexp(1000,BsmtUnfSF_lamb)
Plot a histogram and compare it with a histogram of your original variable.**
hist(BsmtUnfSF_sample)
Compare it with a histogram of your original variable.
par(mfrow=c(1,2))
hist(fit_data)
hist(BsmtUnfSF_sample)
The histogram of fit_data and exp_dist are both right skrewed; however, the second bin of BsmtUnfSF_sample has a frequency that is about double the frequency of fit_data. Both have the same count, but the distribution of the frequency is not similar.
Using the exponential pdf, find the 5th and 95th percentiles using the cumulative distribution function (CDF).
qexp(.05, rate=BsmtUnfSF_lamb)
## [1] 29.09563
qexp(.95, rate=BsmtUnfSF_lamb)
## [1] 1699.3
Generate a 95% confidence interval from the empirical data, assuming normality.
norm.interval = function(data, variance = var(data), conf.level = 0.95)
{
z = qnorm((1 - conf.level)/2, lower.tail = FALSE)
xbar = mean(data)
sdx = sqrt(variance/length(data))
c(xbar - z * sdx, xbar + z * sdx)
}
norm.interval(fit_data, variance=var(fit_data), conf.level = 0.95)
## [1] 544.5850 589.9158
Provide the empirical 5th percentile and 95th percentile of the data. Discuss.
quantile(x=fit_data, probs=c(.05, .95))
## 5% 95%
## 0.01 1468.01
We are 95% confident that the mean of Unfinished square feet of basement area is between 544.8550 and 589.9158. The exponential distribution is a good fit since 95% is 1468.01 and only 5% is .01.
Modeling. Build some type of multiple regression model and submit your model to the competition board. Provide your complete model summary and results with analysis. Report your Kaggle.com user name and score. Provide a screen snapshot of your score with your name identifiable.
Build some type of multiple regression model and submit your model to the competition board.
For my mutilple regression model I chose the following variables:
Dependent Variable
SalePrice - the property’s sale price in dollars. This is the target variable that we are trying to predict.
Independent Variable
LotArea - Lot size in square feet BsmtUnfSF - Unfinished square feet of basement area TotalBsmtSF - Total square feet of basement area GrLivArea - Above grade (ground) living area square feet GarageArea - Size of garage in square feet
We want to build a model for estimating SalePrice based on the BsmtUnfSF, TotalBsmtSF, GrLivArea, and GarageArea of each house.
# Subsetting LotArea, BsmtUnfSF, TotalBsmtSF, FullBath, GrLivArea, GarageArea, YearBuilt, YearRemodAdd, and SalePrice from dataset train.
new_subset_train <- subset(train, select =c("LotArea", "BsmtUnfSF", "TotalBsmtSF", "FullBath","GrLivArea", "GarageArea", "YearBuilt", "YearRemodAdd","SalePrice"))
head(new_subset_train)
## LotArea BsmtUnfSF TotalBsmtSF FullBath GrLivArea GarageArea YearBuilt
## 1 8450 150 856 2 1710 548 2003
## 2 9600 284 1262 2 1262 460 1976
## 3 11250 434 920 2 1786 608 2001
## 4 9550 540 756 1 1717 642 1915
## 5 14260 490 1145 2 2198 836 2000
## 6 14115 64 796 1 1362 480 1993
## YearRemodAdd SalePrice
## 1 2003 208500
## 2 1976 181500
## 3 2002 223500
## 4 1970 140000
## 5 2000 250000
## 6 1995 143000
# Preview of the test dataset
head(test) # 6 observations
## Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape
## 1 1461 20 RH 80 11622 Pave <NA> Reg
## 2 1462 20 RL 81 14267 Pave <NA> IR1
## 3 1463 60 RL 74 13830 Pave <NA> IR1
## 4 1464 60 RL 78 9978 Pave <NA> IR1
## 5 1465 120 RL 43 5005 Pave <NA> IR1
## 6 1466 60 RL 75 10000 Pave <NA> IR1
## LandContour Utilities LotConfig LandSlope Neighborhood Condition1 Condition2
## 1 Lvl AllPub Inside Gtl NAmes Feedr Norm
## 2 Lvl AllPub Corner Gtl NAmes Norm Norm
## 3 Lvl AllPub Inside Gtl Gilbert Norm Norm
## 4 Lvl AllPub Inside Gtl Gilbert Norm Norm
## 5 HLS AllPub Inside Gtl StoneBr Norm Norm
## 6 Lvl AllPub Corner Gtl Gilbert Norm Norm
## BldgType HouseStyle OverallQual OverallCond YearBuilt YearRemodAdd RoofStyle
## 1 1Fam 1Story 5 6 1961 1961 Gable
## 2 1Fam 1Story 6 6 1958 1958 Hip
## 3 1Fam 2Story 5 5 1997 1998 Gable
## 4 1Fam 2Story 6 6 1998 1998 Gable
## 5 TwnhsE 1Story 8 5 1992 1992 Gable
## 6 1Fam 2Story 6 5 1993 1994 Gable
## RoofMatl Exterior1st Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond
## 1 CompShg VinylSd VinylSd None 0 TA TA
## 2 CompShg Wd Sdng Wd Sdng BrkFace 108 TA TA
## 3 CompShg VinylSd VinylSd None 0 TA TA
## 4 CompShg VinylSd VinylSd BrkFace 20 TA TA
## 5 CompShg HdBoard HdBoard None 0 Gd TA
## 6 CompShg HdBoard HdBoard None 0 TA TA
## Foundation BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1
## 1 CBlock TA TA No Rec 468
## 2 CBlock TA TA No ALQ 923
## 3 PConc Gd TA No GLQ 791
## 4 PConc TA TA No GLQ 602
## 5 PConc Gd TA No ALQ 263
## 6 PConc Gd TA No Unf 0
## BsmtFinType2 BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir
## 1 LwQ 144 270 882 GasA TA Y
## 2 Unf 0 406 1329 GasA TA Y
## 3 Unf 0 137 928 GasA Gd Y
## 4 Unf 0 324 926 GasA Ex Y
## 5 Unf 0 1017 1280 GasA Ex Y
## 6 Unf 0 763 763 GasA Gd Y
## Electrical X1stFlrSF X2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath
## 1 SBrkr 896 0 0 896 0
## 2 SBrkr 1329 0 0 1329 0
## 3 SBrkr 928 701 0 1629 0
## 4 SBrkr 926 678 0 1604 0
## 5 SBrkr 1280 0 0 1280 0
## 6 SBrkr 763 892 0 1655 0
## BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual
## 1 0 1 0 2 1 TA
## 2 0 1 1 3 1 Gd
## 3 0 2 1 3 1 TA
## 4 0 2 1 3 1 Gd
## 5 0 2 0 2 1 Gd
## 6 0 2 1 3 1 TA
## TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt
## 1 5 Typ 0 <NA> Attchd 1961
## 2 6 Typ 0 <NA> Attchd 1958
## 3 6 Typ 1 TA Attchd 1997
## 4 7 Typ 1 Gd Attchd 1998
## 5 5 Typ 0 <NA> Attchd 1992
## 6 7 Typ 1 TA Attchd 1993
## GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive
## 1 Unf 1 730 TA TA Y
## 2 Unf 1 312 TA TA Y
## 3 Fin 2 482 TA TA Y
## 4 Fin 2 470 TA TA Y
## 5 RFn 2 506 TA TA Y
## 6 Fin 2 440 TA TA Y
## WoodDeckSF OpenPorchSF EnclosedPorch X3SsnPorch ScreenPorch PoolArea PoolQC
## 1 140 0 0 0 120 0 <NA>
## 2 393 36 0 0 0 0 <NA>
## 3 212 34 0 0 0 0 <NA>
## 4 360 36 0 0 0 0 <NA>
## 5 0 82 0 0 144 0 <NA>
## 6 157 84 0 0 0 0 <NA>
## Fence MiscFeature MiscVal MoSold YrSold SaleType SaleCondition
## 1 MnPrv <NA> 0 6 2010 WD Normal
## 2 <NA> Gar2 12500 6 2010 WD Normal
## 3 MnPrv <NA> 0 3 2010 WD Normal
## 4 <NA> <NA> 0 6 2010 WD Normal
## 5 <NA> <NA> 0 1 2010 WD Normal
## 6 <NA> <NA> 0 4 2010 WD Normal
# Subsetting LotArea, BsmtUnfSF, TotalBsmtSF, FullBath, GrLivArea, GarageArea, YearBuildt, YearRemodAdd, and SalePrice from dataset test.
new_subset_test <- subset(test, select =c("LotArea", "BsmtUnfSF", "TotalBsmtSF", "FullBath","GrLivArea", "GarageArea", "YearBuilt", "YearRemodAdd"))
head(new_subset_test)
## LotArea BsmtUnfSF TotalBsmtSF FullBath GrLivArea GarageArea YearBuilt
## 1 11622 270 882 1 896 730 1961
## 2 14267 406 1329 1 1329 312 1958
## 3 13830 137 928 2 1629 482 1997
## 4 9978 324 926 2 1604 470 1998
## 5 5005 1017 1280 2 1280 506 1992
## 6 10000 763 763 2 1655 440 1993
## YearRemodAdd
## 1 1961
## 2 1958
## 3 1998
## 4 1998
## 5 1992
## 6 1994
Data Cleaning
Checked for missing values for both train and test subset. There was none.
# create a data frame
stats <- data.frame(new_subset_train)
# find location of missing values
print("Position of missing values -")
## [1] "Position of missing values -"
which(is.na(stats))
## integer(0)
# count total missing values
print("Count of total missing values - ")
## [1] "Count of total missing values - "
sum(is.na(stats))
## [1] 0
# create a data frame
stats <- data.frame(new_subset_test)
# find location of missing values
print("Position of missing values -")
## [1] "Position of missing values -"
which(is.na(stats))
## [1] 2120 3579 8412
# count total missing values
print("Count of total missing values - ")
## [1] "Count of total missing values - "
sum(is.na(stats))
## [1] 3
Compute the model coefficients
model_1 <- lm(SalePrice ~ LotArea + BsmtUnfSF + TotalBsmtSF + GrLivArea + GarageArea, data = new_subset_train)
summary(model_1)
##
## Call:
## lm(formula = SalePrice ~ LotArea + BsmtUnfSF + TotalBsmtSF +
## GrLivArea + GarageArea, data = new_subset_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -681901 -19033 362 19594 273221
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.318e+04 4.033e+03 -5.749 1.09e-08 ***
## LotArea 1.318e-01 1.279e-01 1.031 0.303
## BsmtUnfSF -1.235e+01 3.032e+00 -4.074 4.88e-05 ***
## TotalBsmtSF 5.364e+01 3.568e+00 15.034 < 2e-16 ***
## GrLivArea 6.914e+01 2.758e+00 25.065 < 2e-16 ***
## GarageArea 1.019e+02 6.802e+00 14.988 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 45950 on 1454 degrees of freedom
## Multiple R-squared: 0.6665, Adjusted R-squared: 0.6654
## F-statistic: 581.3 on 5 and 1454 DF, p-value: < 2.2e-16
summary(model_1)$coefficients[, "Pr(>|t|)"]
## (Intercept) LotArea BsmtUnfSF TotalBsmtSF GrLivArea
## 1.093188e-08 3.028421e-01 4.877380e-05 1.362393e-47 1.541041e-115
## GarageArea
## 2.483565e-47
The model is a multiple linear regression model with SalePrice as the response variable and LotArea, BsmtUnfSF, TotalBsmtSF, GrLivArea, and GarageArea as the predictor variables.
The coefficients table shows the estimated regression coefficients for each predictor variable, as well as their standard errors, t-values, and associated p-values.
The intercept coefficient (-2.318e+04) represents the estimated SalePrice when all predictor variables are zero.
The p-values for each predictor variable show whether they are statistically significant in predicting SalePrice or not. In this case, LotArea is not significant (p-value = 0.303), while all the other predictor variables have very small p-values (less than 0.001), indicating strong evidence of a significant linear relationship between each predictor variable and SalePrice.
Multiple R-squared is 0.6665. The adjusted R-squared value of 0.6654 suggests that the model explains about 66.5% of the variation in SalePrice after accounting for the number of predictor variables in the model.
The F-statistic of 581.3 with a very small p-value (< 2.2e-16) suggests that the overall model is statistically significant in predicting SalePrice.
The residual standard error of 45950 is an estimate of the standard deviation of the errors or residuals, and indicates the degree of variability of the response variable that is not explained by the model. The small residual standard error suggests that the model has a good fit to the data.
To see which predictor variables are significant, you can examine the coefficients table, which shows the estimate of regression beta coefficients and the associated t-statitic p-values:
summary(model_1)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.318421e+04 4032.843200 -5.748851 1.093188e-08
## LotArea 1.318336e-01 0.127904 1.030723 3.028421e-01
## BsmtUnfSF -1.235047e+01 3.031774 -4.073676 4.877380e-05
## TotalBsmtSF 5.364311e+01 3.568100 15.034082 1.362393e-47
## GrLivArea 6.914123e+01 2.758440 25.065337 1.541041e-115
## GarageArea 1.019489e+02 6.801995 14.988084 2.483565e-47
The table presents the estimates of the coefficients for each predictor variable. The “Estimate” column shows the estimated effect of each predictor variable on SalePrice. For example, the estimated effect of LotArea on SalePrice is 0.1318.
The “Std. Error” column shows the standard error of each coefficient estimate. The “t value” column shows the t-statistic for each coefficient, which measures the number of standard errors that the estimate is away from zero.
Finally, the “Pr(>|t|)” column shows the p-value for each coefficient, which indicates the probability of observing a t-statistic as extreme or more extreme than the observed value, assuming that the null hypothesis is true (i.e., the coefficient is equal to zero). A p-value less than the significance level (usually 0.05) indicates that the coefficient is statistically significant, meaning that we reject the null hypothesis and conclude that the predictor variable is associated with the response variable. In this case, all predictor variables except for LotArea have a p-value less than 0.05, indicating that they are statistically significant. The adjusted R-squared value of 0.6654 suggests that the model explains approximately 67% of the variation in SalePrice.
plot(model_1$fitted.values, model_1$residuals,
xlab="Fitted Values", ylab="Residuals", main="Fitted Values vs. Residuals")
abline(h=0, col='blue')
The resulting plot will show the relationship between the predicted and actual values, and whether there is any pattern in the residuals. If the points are randomly scattered around the horizontal line at 0, then the model’s assumptions are met and the residuals are unbiased and normally distributed. If there is a clear pattern (e.g., a U-shape or a curve), then it suggests that the model is not adequately capturing some important nonlinear relationship between the predictor and outcome variables.
This plot of residuals versus fits shows that the residual variance (vertical spread) increases as the fitted values (predicted values of sale price) increase. This violates the assumption of constant error variance.
qqnorm(model_1$residuals); qqline(model_1$residuals)
Residuals are normally distributed.
The reference line in this plot is a straight line that passes through the first and third quartiles of the data, and it is used to check whether the residuals are approximately normally distributed. If the residuals are normally distributed, they will fall roughly along this line.
The pattern of the normal probability plot is straight, so this plot also provides evidence that it is reasonable to assume that the errors have a normal distribution
Model 2
I was not happy with my numbers in Model 1, especially the R-squared at 0.6665, so I added 2 variables to increase my numbers R-squared.
Every time you add a variable, the R-squared increases. Some of the independent variables will be statistically significant. Perhaps there is an actual relationship or just a chance correlation.
Added:
Independent Variable
YearBuilt - Original construction date YearRemodAdd - Remodel date
We want to build a model 2 for estimating SalePrice based on the LotArea, BsmtUnfSF, TotalBsmtSF, GrLivArea, and GarageArea, YearBuilt, and YearRemodAdd of each house.
model_2 <- lm(SalePrice ~ LotArea + BsmtUnfSF + TotalBsmtSF + FullBath + GrLivArea + GarageArea + YearBuilt + YearRemodAdd, data = new_subset_train)
summary(model_2)
##
## Call:
## lm(formula = SalePrice ~ LotArea + BsmtUnfSF + TotalBsmtSF +
## FullBath + GrLivArea + GarageArea + YearBuilt + YearRemodAdd,
## data = new_subset_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -626171 -17752 -3960 14599 287055
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.148e+06 1.258e+05 -17.071 < 2e-16 ***
## LotArea 4.073e-01 1.154e-01 3.530 0.000429 ***
## BsmtUnfSF -1.324e+01 2.777e+00 -4.769 2.04e-06 ***
## TotalBsmtSF 4.168e+01 3.335e+00 12.497 < 2e-16 ***
## FullBath -9.406e+02 2.920e+03 -0.322 0.747409
## GrLivArea 6.911e+01 3.069e+00 22.515 < 2e-16 ***
## GarageArea 5.802e+01 6.566e+00 8.837 < 2e-16 ***
## YearBuilt 4.969e+02 5.212e+01 9.534 < 2e-16 ***
## YearRemodAdd 5.934e+02 6.705e+01 8.850 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 41120 on 1451 degrees of freedom
## Multiple R-squared: 0.7335, Adjusted R-squared: 0.7321
## F-statistic: 499.3 on 8 and 1451 DF, p-value: < 2.2e-16
By adding the 2 variables the R-squared increase to 0.7335. The adjusted R-squared value of 0.7335 suggests that the model explains about 73.3% of the variation in SalePrice after accounting for the number of predictor variables in the model.
This is a summary of a linear regression model fitted to the data set new_subset_train. The model is trying to predict the SalePrice of houses based on a set of predictor variables. The summary provides information on the goodness of fit of the model and the significance of the coefficients.
Residuals: This section provides summary statistics on the residuals (i.e., the differences between the predicted and actual values of the dependent variable). The minimum and maximum residuals are -626171 and 287055, respectively. The median residual is -3960, indicating that the model tends to overestimate the sale price of houses on average.
Coefficients: This section provides estimates of the regression coefficients (i.e., the slopes) and their significance levels. The intercept is -2.148e+06, meaning that the predicted sale price when all predictor variables are zero is -2.148 million. The coefficient for LotArea is 0.4073, indicating that for every 1-unit increase in LotArea, the predicted sale price increases by $407. The coefficient for BsmtUnfSF is -13.24, indicating that for every 1-unit increase in BsmtUnfSF, the predicted sale price decreases by $13.24. The coefficients for the other variables can be interpreted similarly.
Significance: The significance levels (p-values) for the coefficients are also provided. All variables except FullBath have p-values less than 0.05, indicating that they are statistically significant predictors of SalePrice. The adjusted R-squared value of 0.7321 indicates that the model explains about 73% of the variability in SalePrice.
The coefficient for ‘FullBath’ is -940.62, but the p-value is 0.7474, which is greater than 0.05. Therefore, we cannot reject the null hypothesis that the coefficient is equal to zero, and we conclude that there is not a significant relationship between ‘FullBath’ and ‘SalePrice’.
Residual standard error: This is an estimate of the standard deviation of the residuals. It is a measure of the average distance that the data points fall from the regression line. The residual standard error of 41120 means that on average, the predicted sale price can be off by $41120.
F-statistic: This is a test of whether the model as a whole is significant. The F-statistic of 499.3 and the associated p-value of < 2.2e-16 suggest that the model is significant and that at least one of the predictor variables is related to SalePrice.
summary(model_2)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.147928e+06 1.258257e+05 -17.0706563 1.105332e-59
## LotArea 4.073197e-01 1.153903e-01 3.5299309 4.286577e-04
## BsmtUnfSF -1.324043e+01 2.776587e+00 -4.7685990 2.041667e-06
## TotalBsmtSF 4.168114e+01 3.335253e+00 12.4971448 4.118002e-34
## FullBath -9.406210e+02 2.920109e+03 -0.3221185 7.474093e-01
## GrLivArea 6.910657e+01 3.069297e+00 22.5154419 1.592333e-96
## GarageArea 5.802276e+01 6.565973e+00 8.8368869 2.785968e-18
## YearBuilt 4.968937e+02 5.211734e+01 9.5341339 6.111098e-21
## YearRemodAdd 5.933854e+02 6.704808e+01 8.8501474 2.489138e-18
plot(model_2$fitted.values, model_2$residuals,
xlab="Fitted Values", ylab="Residuals", main="Fitted Values vs. Residuals")
abline(h=0, col='blue')
The resulting plot will show the relationship between the predicted and actual values, and whether there is any pattern in the residuals. If the points are randomly scattered around the horizontal line at 0, then the model’s assumptions are met and the residuals are unbiased and normally distributed. If there is a clear pattern (e.g., a U-shape or a curve), then it suggests that the model is not adequately capturing some important nonlinear relationship between the predictor and outcome variables.
This plot of residuals versus fits shows that the residual variance (vertical spread) increases as the fitted values (predicted values of sale price) increase. This violates the assumption of constant error variance.
qqnorm(model_2$residuals); qqline(model_1$residuals)
Residuals are normally distributed.
The reference line in this plot is a straight line that passes through the first and third quartiles of the data, and it is used to check whether the residuals are approximately normally distributed. If the residuals are normally distributed, they will fall roughly along this line.
The pattern of the normal probability plot is straight, so this plot also provides evidence that it is reasonable to assume that the errors have a normal distribution
House Sale Prices Prediction
mySalePrice <- predict(model_2,test)
##create dataframe
prediction <- data.frame( Id = test[,"Id"], SalePrice = mySalePrice)
prediction[prediction<0] <- 0
prediction <- replace(prediction,is.na(prediction),0)
prediction
## Id SalePrice
## 1 1461 131366.75
## 2 1462 151673.68
## 3 1463 211112.71
## 4 1464 205057.34
## 5 1465 181767.87
## 6 1466 189385.47
## 7 1467 186175.76
## 8 1468 178474.73
## 9 1469 192370.41
## 10 1470 130579.64
## 11 1471 207965.00
## 12 1472 100084.58
## 13 1473 113691.18
## 14 1474 161692.48
## 15 1475 104782.07
## 16 1476 296482.96
## 17 1477 247379.35
## 18 1478 250341.26
## 19 1479 259987.45
## 20 1480 381456.57
## 21 1481 293022.05
## 22 1482 202236.43
## 23 1483 197835.95
## 24 1484 175854.94
## 25 1485 173926.81
## 26 1486 207578.71
## 27 1487 308856.01
## 28 1488 247102.50
## 29 1489 202944.52
## 30 1490 233010.40
## 31 1491 206330.61
## 32 1492 84296.02
## 33 1493 202685.44
## 34 1494 286324.80
## 35 1495 276768.82
## 36 1496 229369.03
## 37 1497 210648.66
## 38 1498 167970.40
## 39 1499 170011.73
## 40 1500 172546.28
## 41 1501 189653.37
## 42 1502 156717.45
## 43 1503 255620.16
## 44 1504 229951.28
## 45 1505 227339.70
## 46 1506 215184.73
## 47 1507 289687.02
## 48 1508 228031.73
## 49 1509 167941.55
## 50 1510 156381.82
## 51 1511 159852.54
## 52 1512 189379.83
## 53 1513 161490.72
## 54 1514 206284.02
## 55 1515 246619.71
## 56 1516 143822.70
## 57 1517 163622.12
## 58 1518 178211.73
## 59 1519 236992.86
## 60 1520 133113.61
## 61 1521 139495.04
## 62 1522 170891.72
## 63 1523 118449.83
## 64 1524 104858.48
## 65 1525 108157.70
## 66 1526 110170.98
## 67 1527 115270.23
## 68 1528 144222.43
## 69 1529 146309.13
## 70 1530 182370.12
## 71 1531 118817.84
## 72 1532 63066.35
## 73 1533 168373.89
## 74 1534 127591.16
## 75 1535 182331.59
## 76 1536 108058.99
## 77 1537 101597.74
## 78 1538 138131.56
## 79 1539 182783.30
## 80 1540 139838.32
## 81 1541 156202.24
## 82 1542 178434.58
## 83 1543 198051.10
## 84 1544 55507.54
## 85 1545 89743.56
## 86 1546 110486.97
## 87 1547 120748.35
## 88 1548 120739.88
## 89 1549 117310.68
## 90 1550 135162.35
## 91 1551 101447.07
## 92 1552 157112.89
## 93 1553 128548.07
## 94 1554 109803.23
## 95 1555 179781.35
## 96 1556 79735.18
## 97 1557 108068.22
## 98 1558 73500.35
## 99 1559 109707.15
## 100 1560 135122.45
## 101 1561 187185.92
## 102 1562 142526.58
## 103 1563 110779.57
## 104 1564 194879.81
## 105 1565 158748.67
## 106 1566 226391.40
## 107 1567 66064.13
## 108 1568 233748.33
## 109 1569 181210.65
## 110 1570 130481.07
## 111 1571 138674.05
## 112 1572 159718.77
## 113 1573 247329.63
## 114 1574 150555.82
## 115 1575 211790.68
## 116 1576 249982.99
## 117 1577 199854.40
## 118 1578 129872.25
## 119 1579 168455.62
## 120 1580 208213.69
## 121 1581 158527.90
## 122 1582 124098.59
## 123 1583 302789.94
## 124 1584 241893.04
## 125 1585 152754.34
## 126 1586 70243.07
## 127 1587 83037.39
## 128 1588 151221.46
## 129 1589 107395.36
## 130 1590 138824.99
## 131 1591 82154.41
## 132 1592 132261.49
## 133 1593 84726.56
## 134 1594 179894.48
## 135 1595 112300.04
## 136 1596 215671.44
## 137 1597 219762.22
## 138 1598 198104.03
## 139 1599 165261.33
## 140 1600 168342.69
## 141 1601 44954.15
## 142 1602 116055.94
## 143 1603 60330.32
## 144 1604 244980.96
## 145 1605 247043.13
## 146 1606 170179.95
## 147 1607 251082.10
## 148 1608 214658.91
## 149 1609 193742.28
## 150 1610 175797.03
## 151 1611 155800.16
## 152 1612 215522.14
## 153 1613 205403.00
## 154 1614 122204.34
## 155 1615 92905.97
## 156 1616 90748.36
## 157 1617 113412.04
## 158 1618 134091.22
## 159 1619 140025.25
## 160 1620 254636.61
## 161 1621 171417.98
## 162 1622 150285.42
## 163 1623 250663.88
## 164 1624 234179.09
## 165 1625 111647.68
## 166 1626 188250.34
## 167 1627 208109.64
## 168 1628 273186.43
## 169 1629 180218.84
## 170 1630 334501.64
## 171 1631 207279.35
## 172 1632 234568.41
## 173 1633 164109.12
## 174 1634 204183.28
## 175 1635 184935.25
## 176 1636 166845.39
## 177 1637 214277.54
## 178 1638 208185.22
## 179 1639 204912.59
## 180 1640 268461.03
## 181 1641 212343.39
## 182 1642 235350.92
## 183 1643 228110.84
## 184 1644 228592.43
## 185 1645 213042.93
## 186 1646 157788.45
## 187 1647 161663.94
## 188 1648 144953.41
## 189 1649 131879.44
## 190 1650 129162.67
## 191 1651 145385.75
## 192 1652 103562.70
## 193 1653 105851.04
## 194 1654 164947.48
## 195 1655 146915.32
## 196 1656 162246.09
## 197 1657 167336.21
## 198 1658 161886.03
## 199 1659 115044.37
## 200 1660 170093.27
## 201 1661 347077.94
## 202 1662 333497.07
## 203 1663 309847.93
## 204 1664 378056.06
## 205 1665 261920.43
## 206 1666 290752.74
## 207 1667 318330.36
## 208 1668 280831.39
## 209 1669 244642.49
## 210 1670 284814.36
## 211 1671 262078.78
## 212 1672 356945.65
## 213 1673 278059.90
## 214 1674 248195.99
## 215 1675 210633.95
## 216 1676 213554.65
## 217 1677 215096.09
## 218 1678 368312.07
## 219 1679 315794.89
## 220 1680 275103.10
## 221 1681 215268.50
## 222 1682 263864.34
## 223 1683 200226.31
## 224 1684 184604.66
## 225 1685 187328.49
## 226 1686 180344.63
## 227 1687 168629.11
## 228 1688 201583.91
## 229 1689 198633.01
## 230 1690 194653.84
## 231 1691 179285.81
## 232 1692 247399.67
## 233 1693 172845.04
## 234 1694 193297.57
## 235 1695 166361.76
## 236 1696 279650.26
## 237 1697 166388.52
## 238 1698 325530.35
## 239 1699 299928.63
## 240 1700 250796.80
## 241 1701 260490.03
## 242 1702 265869.27
## 243 1703 246235.05
## 244 1704 274637.09
## 245 1705 213981.98
## 246 1706 358010.19
## 247 1707 214092.45
## 248 1708 216016.18
## 249 1709 250450.94
## 250 1710 220495.64
## 251 1711 245656.38
## 252 1712 252468.86
## 253 1713 264239.19
## 254 1714 238755.00
## 255 1715 216524.20
## 256 1716 195467.77
## 257 1717 181385.83
## 258 1718 152065.58
## 259 1719 216143.28
## 260 1720 252841.81
## 261 1721 184057.58
## 262 1722 150043.09
## 263 1723 192842.33
## 264 1724 230215.14
## 265 1725 245141.77
## 266 1726 204070.57
## 267 1727 184743.62
## 268 1728 192875.43
## 269 1729 167008.37
## 270 1730 178962.77
## 271 1731 125640.91
## 272 1732 128122.94
## 273 1733 114117.59
## 274 1734 119909.36
## 275 1735 128143.23
## 276 1736 106421.92
## 277 1737 286084.43
## 278 1738 226124.38
## 279 1739 324524.46
## 280 1740 241948.84
## 281 1741 211422.06
## 282 1742 190203.56
## 283 1743 187817.48
## 284 1744 250926.30
## 285 1745 219920.81
## 286 1746 222916.84
## 287 1747 225279.21
## 288 1748 255637.48
## 289 1749 168214.55
## 290 1750 152314.79
## 291 1751 240956.30
## 292 1752 109578.55
## 293 1753 178105.03
## 294 1754 220011.43
## 295 1755 173570.06
## 296 1756 111010.24
## 297 1757 97785.60
## 298 1758 167181.29
## 299 1759 181853.42
## 300 1760 187509.57
## 301 1761 166133.50
## 302 1762 202282.81
## 303 1763 165984.20
## 304 1764 103993.29
## 305 1765 211826.22
## 306 1766 182394.61
## 307 1767 227114.65
## 308 1768 127519.08
## 309 1769 155513.74
## 310 1770 135937.11
## 311 1771 130606.49
## 312 1772 149982.10
## 313 1773 134051.68
## 314 1774 199564.17
## 315 1775 112163.46
## 316 1776 110311.38
## 317 1777 78694.22
## 318 1778 140155.39
## 319 1779 103690.65
## 320 1780 168359.32
## 321 1781 117580.24
## 322 1782 86619.91
## 323 1783 162832.43
## 324 1784 86723.24
## 325 1785 88711.80
## 326 1786 195329.15
## 327 1787 162631.65
## 328 1788 27632.63
## 329 1789 79030.69
## 330 1790 66830.99
## 331 1791 264817.49
## 332 1792 156300.79
## 333 1793 142034.96
## 334 1794 148719.12
## 335 1795 121955.97
## 336 1796 95314.93
## 337 1797 141811.39
## 338 1798 107497.31
## 339 1799 77311.19
## 340 1800 110718.20
## 341 1801 154341.24
## 342 1802 168472.71
## 343 1803 170538.39
## 344 1804 148184.33
## 345 1805 122773.49
## 346 1806 130120.44
## 347 1807 149725.85
## 348 1808 119593.86
## 349 1809 106754.14
## 350 1810 130394.77
## 351 1811 99049.76
## 352 1812 84206.11
## 353 1813 128058.81
## 354 1814 65983.61
## 355 1815 34784.53
## 356 1816 56362.04
## 357 1817 125651.43
## 358 1818 140303.19
## 359 1819 119070.33
## 360 1820 32782.24
## 361 1821 106401.38
## 362 1822 155435.53
## 363 1823 18202.64
## 364 1824 107014.08
## 365 1825 129550.81
## 366 1826 80410.68
## 367 1827 107704.33
## 368 1828 127287.79
## 369 1829 143314.58
## 370 1830 147548.20
## 371 1831 154985.03
## 372 1832 140181.30
## 373 1833 146707.11
## 374 1834 127335.46
## 375 1835 144499.05
## 376 1836 139503.98
## 377 1837 62657.92
## 378 1838 114691.40
## 379 1839 84537.10
## 380 1840 153572.64
## 381 1841 153308.23
## 382 1842 78866.00
## 383 1843 148802.35
## 384 1844 130674.20
## 385 1845 155076.10
## 386 1846 140960.64
## 387 1847 176144.15
## 388 1848 24961.40
## 389 1849 112375.14
## 390 1850 115414.76
## 391 1851 131961.30
## 392 1852 107708.76
## 393 1853 145303.09
## 394 1854 165752.96
## 395 1855 168557.91
## 396 1856 222271.42
## 397 1857 159709.46
## 398 1858 206457.73
## 399 1859 139999.59
## 400 1860 182124.54
## 401 1861 138474.77
## 402 1862 317951.30
## 403 1863 316356.24
## 404 1864 316371.31
## 405 1865 285828.27
## 406 1866 275363.01
## 407 1867 238861.79
## 408 1868 271316.51
## 409 1869 217097.84
## 410 1870 224544.98
## 411 1871 236952.02
## 412 1872 185359.44
## 413 1873 233245.11
## 414 1874 161220.71
## 415 1875 221142.17
## 416 1876 217553.40
## 417 1877 227347.95
## 418 1878 217523.76
## 419 1879 130982.82
## 420 1880 136899.16
## 421 1881 281399.59
## 422 1882 253420.31
## 423 1883 211504.94
## 424 1884 227776.97
## 425 1885 229533.58
## 426 1886 265228.17
## 427 1887 214220.03
## 428 1888 264691.76
## 429 1889 182184.51
## 430 1890 144046.52
## 431 1891 166575.27
## 432 1892 101231.24
## 433 1893 132753.29
## 434 1894 154182.47
## 435 1895 154413.93
## 436 1896 109368.70
## 437 1897 110590.78
## 438 1898 84902.42
## 439 1899 127320.03
## 440 1900 102327.24
## 441 1901 145128.30
## 442 1902 146548.44
## 443 1903 198599.92
## 444 1904 127572.93
## 445 1905 184153.12
## 446 1906 178284.90
## 447 1907 207200.72
## 448 1908 122809.79
## 449 1909 151540.92
## 450 1910 138234.56
## 451 1911 209345.55
## 452 1912 291713.33
## 453 1913 185292.61
## 454 1914 36309.78
## 455 1915 247328.71
## 456 1916 38394.23
## 457 1917 255209.95
## 458 1918 130752.37
## 459 1919 170645.23
## 460 1920 220383.41
## 461 1921 316465.55
## 462 1922 271550.80
## 463 1923 234867.53
## 464 1924 226397.60
## 465 1925 231067.48
## 466 1926 311570.58
## 467 1927 129552.02
## 468 1928 188718.12
## 469 1929 111976.55
## 470 1930 145269.41
## 471 1931 162717.68
## 472 1932 163802.04
## 473 1933 199714.88
## 474 1934 205417.72
## 475 1935 180804.70
## 476 1936 208289.81
## 477 1937 195847.36
## 478 1938 196120.70
## 479 1939 233854.76
## 480 1940 178312.11
## 481 1941 199172.34
## 482 1942 199356.66
## 483 1943 182630.60
## 484 1944 286602.86
## 485 1945 281740.22
## 486 1946 169072.53
## 487 1947 254502.90
## 488 1948 183464.47
## 489 1949 230772.10
## 490 1950 184121.47
## 491 1951 265548.23
## 492 1952 236923.58
## 493 1953 167203.12
## 494 1954 208543.02
## 495 1955 144518.72
## 496 1956 337830.95
## 497 1957 177121.66
## 498 1958 288288.19
## 499 1959 183240.59
## 500 1960 113435.92
## 501 1961 130545.02
## 502 1962 106263.75
## 503 1963 110696.02
## 504 1964 128752.24
## 505 1965 163738.20
## 506 1966 140150.17
## 507 1967 263785.45
## 508 1968 350155.07
## 509 1969 293640.74
## 510 1970 353397.21
## 511 1971 361834.95
## 512 1972 317186.63
## 513 1973 251854.74
## 514 1974 288422.78
## 515 1975 358595.79
## 516 1976 251150.02
## 517 1977 321962.12
## 518 1978 322943.47
## 519 1979 280790.37
## 520 1980 202907.21
## 521 1981 288627.66
## 522 1982 214448.30
## 523 1983 203564.67
## 524 1984 197209.31
## 525 1985 227992.17
## 526 1986 238396.56
## 527 1987 184019.27
## 528 1988 188725.08
## 529 1989 202748.37
## 530 1990 223956.69
## 531 1991 213871.69
## 532 1992 225579.76
## 533 1993 177293.93
## 534 1994 260014.25
## 535 1995 196915.75
## 536 1996 246679.11
## 537 1997 268392.26
## 538 1998 304725.82
## 539 1999 253259.00
## 540 2000 281890.70
## 541 2001 270062.48
## 542 2002 248380.81
## 543 2003 257556.35
## 544 2004 260193.63
## 545 2005 221060.60
## 546 2006 210507.83
## 547 2007 242932.43
## 548 2008 199591.40
## 549 2009 218946.78
## 550 2010 206223.62
## 551 2011 158736.37
## 552 2012 183527.86
## 553 2013 196632.62
## 554 2014 203935.73
## 555 2015 204040.88
## 556 2016 217323.26
## 557 2017 203832.12
## 558 2018 124604.21
## 559 2019 138530.05
## 560 2020 116714.81
## 561 2021 100709.75
## 562 2022 188515.90
## 563 2023 188161.59
## 564 2024 272817.98
## 565 2025 317078.16
## 566 2026 187303.83
## 567 2027 170502.87
## 568 2028 186800.71
## 569 2029 191605.82
## 570 2030 234109.13
## 571 2031 207331.24
## 572 2032 219866.94
## 573 2033 226997.44
## 574 2034 182020.67
## 575 2035 224906.74
## 576 2036 203596.45
## 577 2037 208302.40
## 578 2038 285661.32
## 579 2039 195312.70
## 580 2040 302961.48
## 581 2041 235494.89
## 582 2042 215493.99
## 583 2043 181576.21
## 584 2044 204727.06
## 585 2045 204477.83
## 586 2046 183158.14
## 587 2047 159344.31
## 588 2048 153337.70
## 589 2049 199082.08
## 590 2050 179024.77
## 591 2051 72454.66
## 592 2052 128037.33
## 593 2053 147796.63
## 594 2054 60204.81
## 595 2055 166113.34
## 596 2056 142799.04
## 597 2057 109728.57
## 598 2058 223483.38
## 599 2059 128984.10
## 600 2060 189998.37
## 601 2061 177719.19
## 602 2062 129371.65
## 603 2063 81377.76
## 604 2064 142756.79
## 605 2065 112452.61
## 606 2066 173088.38
## 607 2067 141670.56
## 608 2068 194941.88
## 609 2069 62987.45
## 610 2070 85861.14
## 611 2071 90465.43
## 612 2072 178620.57
## 613 2073 130467.93
## 614 2074 181176.23
## 615 2075 150870.92
## 616 2076 105791.24
## 617 2077 143575.17
## 618 2078 122156.35
## 619 2079 129181.06
## 620 2080 95007.50
## 621 2081 119912.52
## 622 2082 167280.93
## 623 2083 157014.17
## 624 2084 85254.15
## 625 2085 118719.37
## 626 2086 147318.90
## 627 2087 149379.95
## 628 2088 123176.80
## 629 2089 54567.66
## 630 2090 134059.71
## 631 2091 122014.04
## 632 2092 157893.33
## 633 2093 129933.36
## 634 2094 95148.91
## 635 2095 155898.68
## 636 2096 47463.31
## 637 2097 71155.40
## 638 2098 164469.09
## 639 2099 35100.71
## 640 2100 128114.54
## 641 2101 147817.05
## 642 2102 109546.97
## 643 2103 85497.61
## 644 2104 187147.01
## 645 2105 112587.96
## 646 2106 85709.53
## 647 2107 209503.88
## 648 2108 107465.36
## 649 2109 135968.80
## 650 2110 122080.22
## 651 2111 126719.40
## 652 2112 137110.19
## 653 2113 132570.83
## 654 2114 100899.59
## 655 2115 162063.12
## 656 2116 112190.76
## 657 2117 172602.50
## 658 2118 104392.97
## 659 2119 74788.80
## 660 2120 114007.59
## 661 2121 0.00
## 662 2122 88364.98
## 663 2123 69240.38
## 664 2124 170632.98
## 665 2125 164146.30
## 666 2126 175226.45
## 667 2127 175152.17
## 668 2128 106131.20
## 669 2129 70246.76
## 670 2130 134324.57
## 671 2131 173276.85
## 672 2132 118290.62
## 673 2133 112624.86
## 674 2134 97807.60
## 675 2135 85526.00
## 676 2136 71170.95
## 677 2137 121578.17
## 678 2138 134026.14
## 679 2139 152487.54
## 680 2140 146302.23
## 681 2141 130297.44
## 682 2142 123036.58
## 683 2143 168615.04
## 684 2144 123699.55
## 685 2145 138294.78
## 686 2146 145996.03
## 687 2147 187863.54
## 688 2148 140267.76
## 689 2149 152419.24
## 690 2150 200437.66
## 691 2151 83520.93
## 692 2152 177536.93
## 693 2153 165017.34
## 694 2154 103961.86
## 695 2155 143107.94
## 696 2156 304468.22
## 697 2157 233997.50
## 698 2158 235595.73
## 699 2159 217950.62
## 700 2160 182357.55
## 701 2161 249858.07
## 702 2162 371965.22
## 703 2163 282088.83
## 704 2164 231480.29
## 705 2165 183978.69
## 706 2166 165886.92
## 707 2167 219631.02
## 708 2168 223714.93
## 709 2169 208664.76
## 710 2170 233783.53
## 711 2171 159374.02
## 712 2172 150461.37
## 713 2173 191584.41
## 714 2174 254237.84
## 715 2175 267412.11
## 716 2176 275417.88
## 717 2177 265963.97
## 718 2178 213585.21
## 719 2179 153708.55
## 720 2180 276559.53
## 721 2181 207625.97
## 722 2182 243421.23
## 723 2183 210491.51
## 724 2184 128525.31
## 725 2185 130089.66
## 726 2186 159634.77
## 727 2187 154906.34
## 728 2188 168698.14
## 729 2189 326402.56
## 730 2190 63746.21
## 731 2191 64085.51
## 732 2192 72307.96
## 733 2193 99404.29
## 734 2194 83053.85
## 735 2195 101252.90
## 736 2196 97808.83
## 737 2197 131406.97
## 738 2198 116230.74
## 739 2199 170025.52
## 740 2200 104592.25
## 741 2201 122115.28
## 742 2202 137805.84
## 743 2203 135521.35
## 744 2204 183912.45
## 745 2205 108134.06
## 746 2206 114447.77
## 747 2207 168776.18
## 748 2208 219073.09
## 749 2209 221401.04
## 750 2210 128295.49
## 751 2211 101322.03
## 752 2212 117609.36
## 753 2213 75671.32
## 754 2214 150455.35
## 755 2215 95653.79
## 756 2216 181487.40
## 757 2217 64208.97
## 758 2218 61705.85
## 759 2219 82999.20
## 760 2220 55012.82
## 761 2221 249823.69
## 762 2222 243823.26
## 763 2223 283336.19
## 764 2224 230483.92
## 765 2225 166212.11
## 766 2226 232884.54
## 767 2227 198727.89
## 768 2228 251422.91
## 769 2229 238496.23
## 770 2230 174748.71
## 771 2231 217048.35
## 772 2232 195253.63
## 773 2233 191957.14
## 774 2234 219544.42
## 775 2235 238044.91
## 776 2236 259600.91
## 777 2237 305596.44
## 778 2238 225638.29
## 779 2239 150215.90
## 780 2240 179479.67
## 781 2241 179297.49
## 782 2242 146737.78
## 783 2243 146948.69
## 784 2244 129741.37
## 785 2245 107338.09
## 786 2246 141247.81
## 787 2247 116330.77
## 788 2248 117785.56
## 789 2249 134593.82
## 790 2250 141952.17
## 791 2251 137545.40
## 792 2252 199840.94
## 793 2253 160285.70
## 794 2254 182111.66
## 795 2255 216170.44
## 796 2256 186421.85
## 797 2257 247667.09
## 798 2258 166130.50
## 799 2259 188728.53
## 800 2260 149951.87
## 801 2261 165369.61
## 802 2262 191819.33
## 803 2263 303913.23
## 804 2264 375260.71
## 805 2265 218894.87
## 806 2266 240623.18
## 807 2267 324126.89
## 808 2268 291454.56
## 809 2269 174260.14
## 810 2270 183174.19
## 811 2271 230457.94
## 812 2272 205976.56
## 813 2273 171324.08
## 814 2274 192956.20
## 815 2275 197842.70
## 816 2276 184869.56
## 817 2277 208870.99
## 818 2278 180193.57
## 819 2279 140974.64
## 820 2280 105156.28
## 821 2281 193585.12
## 822 2282 207485.20
## 823 2283 112995.19
## 824 2284 121692.51
## 825 2285 137741.20
## 826 2286 124396.88
## 827 2287 276965.87
## 828 2288 257613.38
## 829 2289 325340.74
## 830 2290 360097.97
## 831 2291 289967.94
## 832 2292 311059.94
## 833 2293 371048.75
## 834 2294 328644.93
## 835 2295 360026.51
## 836 2296 281711.18
## 837 2297 269475.13
## 838 2298 267529.91
## 839 2299 346381.29
## 840 2300 301993.06
## 841 2301 248357.98
## 842 2302 264819.55
## 843 2303 241569.02
## 844 2304 256326.40
## 845 2305 214408.68
## 846 2306 216580.93
## 847 2307 208994.68
## 848 2308 227446.98
## 849 2309 237179.17
## 850 2310 201215.43
## 851 2311 213369.80
## 852 2312 202295.16
## 853 2313 192162.97
## 854 2314 187654.17
## 855 2315 190732.08
## 856 2316 202666.72
## 857 2317 205191.65
## 858 2318 192586.69
## 859 2319 186055.91
## 860 2320 190695.99
## 861 2321 236199.88
## 862 2322 185899.99
## 863 2323 179058.50
## 864 2324 171249.97
## 865 2325 221062.46
## 866 2326 169352.40
## 867 2327 221118.43
## 868 2328 231485.61
## 869 2329 209957.87
## 870 2330 201246.91
## 871 2331 329232.26
## 872 2332 341287.16
## 873 2333 309437.88
## 874 2334 262914.24
## 875 2335 245444.20
## 876 2336 314096.44
## 877 2337 211063.68
## 878 2338 259206.36
## 879 2339 219420.21
## 880 2340 319340.28
## 881 2341 218402.64
## 882 2342 228096.77
## 883 2343 236012.03
## 884 2344 239060.70
## 885 2345 254202.70
## 886 2346 207434.03
## 887 2347 205138.84
## 888 2348 230089.82
## 889 2349 204592.65
## 890 2350 261362.75
## 891 2351 231329.57
## 892 2352 244653.16
## 893 2353 284368.71
## 894 2354 149181.78
## 895 2355 152782.93
## 896 2356 190415.28
## 897 2357 217406.86
## 898 2358 215470.32
## 899 2359 134974.69
## 900 2360 120726.25
## 901 2361 151850.02
## 902 2362 277905.46
## 903 2363 144306.21
## 904 2364 173876.45
## 905 2365 213914.01
## 906 2366 180189.20
## 907 2367 218964.51
## 908 2368 210681.86
## 909 2369 221889.57
## 910 2370 188487.31
## 911 2371 192868.58
## 912 2372 201427.35
## 913 2373 254175.15
## 914 2374 305967.58
## 915 2375 222848.93
## 916 2376 288612.32
## 917 2377 331635.00
## 918 2378 131020.60
## 919 2379 241771.52
## 920 2380 144624.20
## 921 2381 170575.06
## 922 2382 191440.34
## 923 2383 209883.03
## 924 2384 241741.06
## 925 2385 143242.12
## 926 2386 128078.18
## 927 2387 142558.87
## 928 2388 111277.03
## 929 2389 117902.37
## 930 2390 133478.03
## 931 2391 141320.19
## 932 2392 96924.56
## 933 2393 147770.65
## 934 2394 130051.82
## 935 2395 224309.04
## 936 2396 148665.36
## 937 2397 222323.06
## 938 2398 125445.76
## 939 2399 46150.27
## 940 2400 34848.10
## 941 2401 116952.57
## 942 2402 118252.20
## 943 2403 197518.70
## 944 2404 141819.73
## 945 2405 152785.83
## 946 2406 134438.44
## 947 2407 114732.73
## 948 2408 141392.16
## 949 2409 122875.00
## 950 2410 190275.44
## 951 2411 103376.03
## 952 2412 159530.03
## 953 2413 128647.21
## 954 2414 145822.61
## 955 2415 168504.58
## 956 2416 105919.76
## 957 2417 115901.52
## 958 2418 130596.68
## 959 2419 148218.23
## 960 2420 109351.63
## 961 2421 158386.32
## 962 2422 104154.60
## 963 2423 118439.27
## 964 2424 193785.62
## 965 2425 312164.05
## 966 2426 162942.83
## 967 2427 110263.49
## 968 2428 176208.65
## 969 2429 96712.33
## 970 2430 122362.10
## 971 2431 116669.26
## 972 2432 141795.92
## 973 2433 130894.92
## 974 2434 141035.48
## 975 2435 162602.82
## 976 2436 73350.58
## 977 2437 101595.78
## 978 2438 109608.35
## 979 2439 109832.27
## 980 2440 118861.21
## 981 2441 85811.16
## 982 2442 92061.06
## 983 2443 94192.59
## 984 2444 124839.80
## 985 2445 71975.13
## 986 2446 129987.87
## 987 2447 180068.77
## 988 2448 131018.65
## 989 2449 91896.06
## 990 2450 154461.44
## 991 2451 138405.65
## 992 2452 194711.44
## 993 2453 73331.77
## 994 2454 129186.78
## 995 2455 113482.19
## 996 2456 135677.73
## 997 2457 127542.02
## 998 2458 92602.97
## 999 2459 74360.09
## 1000 2460 126937.94
## 1001 2461 102791.44
## 1002 2462 134066.78
## 1003 2463 104407.92
## 1004 2464 201611.09
## 1005 2465 116228.05
## 1006 2466 114428.89
## 1007 2467 135696.78
## 1008 2468 66025.40
## 1009 2469 79128.99
## 1010 2470 217594.04
## 1011 2471 203755.41
## 1012 2472 193747.52
## 1013 2473 136572.59
## 1014 2474 100362.98
## 1015 2475 211723.43
## 1016 2476 120289.16
## 1017 2477 121389.26
## 1018 2478 190821.48
## 1019 2479 130528.03
## 1020 2480 158674.13
## 1021 2481 129516.62
## 1022 2482 139929.63
## 1023 2483 108588.61
## 1024 2484 130267.98
## 1025 2485 113807.53
## 1026 2486 175465.16
## 1027 2487 237564.22
## 1028 2488 129253.94
## 1029 2489 166178.84
## 1030 2490 161101.44
## 1031 2491 74016.10
## 1032 2492 204625.11
## 1033 2493 155614.14
## 1034 2494 177338.47
## 1035 2495 101533.15
## 1036 2496 247811.91
## 1037 2497 152050.04
## 1038 2498 101527.47
## 1039 2499 94988.76
## 1040 2500 145196.30
## 1041 2501 135183.27
## 1042 2502 170276.07
## 1043 2503 101820.33
## 1044 2504 210268.69
## 1045 2505 225156.04
## 1046 2506 253422.36
## 1047 2507 302776.90
## 1048 2508 253852.90
## 1049 2509 238839.00
## 1050 2510 226451.06
## 1051 2511 194032.76
## 1052 2512 223171.40
## 1053 2513 225358.54
## 1054 2514 206046.97
## 1055 2515 181257.56
## 1056 2516 181325.72
## 1057 2517 152994.56
## 1058 2518 162394.19
## 1059 2519 230769.96
## 1060 2520 219028.53
## 1061 2521 206546.80
## 1062 2522 228986.86
## 1063 2523 128811.40
## 1064 2524 135926.56
## 1065 2525 151076.59
## 1066 2526 162258.73
## 1067 2527 125033.13
## 1068 2528 129310.33
## 1069 2529 147295.78
## 1070 2530 143025.86
## 1071 2531 251093.26
## 1072 2532 242994.14
## 1073 2533 211645.27
## 1074 2534 223572.32
## 1075 2535 297472.27
## 1076 2536 243664.69
## 1077 2537 209102.68
## 1078 2538 196717.87
## 1079 2539 198181.01
## 1080 2540 196372.69
## 1081 2541 192795.22
## 1082 2542 191084.06
## 1083 2543 125690.56
## 1084 2544 143919.72
## 1085 2545 126908.20
## 1086 2546 142533.37
## 1087 2547 144842.66
## 1088 2548 199655.42
## 1089 2549 194295.15
## 1090 2550 673124.18
## 1091 2551 164922.65
## 1092 2552 142336.64
## 1093 2553 64100.58
## 1094 2554 94377.05
## 1095 2555 89730.13
## 1096 2556 88673.06
## 1097 2557 109902.72
## 1098 2558 188150.41
## 1099 2559 127798.25
## 1100 2560 120876.71
## 1101 2561 102340.70
## 1102 2562 94607.17
## 1103 2563 126023.73
## 1104 2564 150919.93
## 1105 2565 91539.05
## 1106 2566 164565.55
## 1107 2567 126334.54
## 1108 2568 204941.22
## 1109 2569 192593.20
## 1110 2570 128677.71
## 1111 2571 233488.12
## 1112 2572 156414.35
## 1113 2573 215010.53
## 1114 2574 259248.16
## 1115 2575 126241.51
## 1116 2576 135490.99
## 1117 2577 0.00
## 1118 2578 50159.38
## 1119 2579 49466.72
## 1120 2580 150305.00
## 1121 2581 160377.85
## 1122 2582 118800.98
## 1123 2583 246736.09
## 1124 2584 172493.59
## 1125 2585 219642.80
## 1126 2586 228406.04
## 1127 2587 214830.65
## 1128 2588 126539.68
## 1129 2589 148129.12
## 1130 2590 206918.13
## 1131 2591 215922.20
## 1132 2592 231173.92
## 1133 2593 230162.80
## 1134 2594 174814.62
## 1135 2595 191916.87
## 1136 2596 282768.65
## 1137 2597 208129.79
## 1138 2598 278557.03
## 1139 2599 275033.37
## 1140 2600 199129.56
## 1141 2601 161316.69
## 1142 2602 110783.96
## 1143 2603 107125.86
## 1144 2604 95667.72
## 1145 2605 113880.47
## 1146 2606 157802.37
## 1147 2607 187097.16
## 1148 2608 233071.38
## 1149 2609 167153.36
## 1150 2610 90997.15
## 1151 2611 187777.16
## 1152 2612 181992.45
## 1153 2613 120698.05
## 1154 2614 127913.54
## 1155 2615 147505.22
## 1156 2616 115837.18
## 1157 2617 230570.95
## 1158 2618 192008.63
## 1159 2619 211646.71
## 1160 2620 192414.46
## 1161 2621 192842.56
## 1162 2622 202160.06
## 1163 2623 245307.16
## 1164 2624 289253.99
## 1165 2625 309738.83
## 1166 2626 173436.60
## 1167 2627 161908.75
## 1168 2628 330502.22
## 1169 2629 365417.18
## 1170 2630 286191.98
## 1171 2631 341566.71
## 1172 2632 313319.17
## 1173 2633 252482.18
## 1174 2634 296814.43
## 1175 2635 163424.90
## 1176 2636 194226.47
## 1177 2637 162074.62
## 1178 2638 269984.28
## 1179 2639 205844.40
## 1180 2640 150872.83
## 1181 2641 129871.33
## 1182 2642 220954.73
## 1183 2643 119302.25
## 1184 2644 150072.37
## 1185 2645 106213.13
## 1186 2646 100068.56
## 1187 2647 112617.90
## 1188 2648 146638.37
## 1189 2649 149964.17
## 1190 2650 135035.97
## 1191 2651 158023.32
## 1192 2652 338655.54
## 1193 2653 250585.35
## 1194 2654 262175.36
## 1195 2655 307320.63
## 1196 2656 264810.98
## 1197 2657 278331.18
## 1198 2658 265479.96
## 1199 2659 275701.18
## 1200 2660 304974.75
## 1201 2661 287185.87
## 1202 2662 281035.68
## 1203 2663 269594.79
## 1204 2664 229637.68
## 1205 2665 270615.66
## 1206 2666 241258.35
## 1207 2667 184916.59
## 1208 2668 191086.86
## 1209 2669 196163.63
## 1210 2670 271021.13
## 1211 2671 190612.74
## 1212 2672 198861.15
## 1213 2673 214290.00
## 1214 2674 218694.58
## 1215 2675 175160.86
## 1216 2676 199113.10
## 1217 2677 227446.66
## 1218 2678 266982.08
## 1219 2679 260573.35
## 1220 2680 273260.26
## 1221 2681 336167.73
## 1222 2682 304974.88
## 1223 2683 398907.70
## 1224 2684 291858.56
## 1225 2685 316122.62
## 1226 2686 252971.27
## 1227 2687 280920.09
## 1228 2688 218048.03
## 1229 2689 199678.60
## 1230 2690 352321.47
## 1231 2691 218183.06
## 1232 2692 152007.78
## 1233 2693 213360.39
## 1234 2694 153385.05
## 1235 2695 208038.55
## 1236 2696 197192.90
## 1237 2697 225085.26
## 1238 2698 206679.60
## 1239 2699 203151.09
## 1240 2700 174338.10
## 1241 2701 168204.72
## 1242 2702 127025.39
## 1243 2703 154247.20
## 1244 2704 167543.34
## 1245 2705 118630.16
## 1246 2706 110923.87
## 1247 2707 115578.69
## 1248 2708 126033.00
## 1249 2709 93196.43
## 1250 2710 115693.75
## 1251 2711 292847.15
## 1252 2712 326673.13
## 1253 2713 183142.61
## 1254 2714 166669.50
## 1255 2715 188993.81
## 1256 2716 154319.34
## 1257 2717 189889.37
## 1258 2718 221036.39
## 1259 2719 162604.32
## 1260 2720 164927.44
## 1261 2721 142513.06
## 1262 2722 190162.38
## 1263 2723 156487.62
## 1264 2724 114737.48
## 1265 2725 127014.67
## 1266 2726 157172.64
## 1267 2727 191468.25
## 1268 2728 166457.81
## 1269 2729 148856.46
## 1270 2730 139179.17
## 1271 2731 73832.20
## 1272 2732 94241.71
## 1273 2733 182599.45
## 1274 2734 151531.63
## 1275 2735 125677.23
## 1276 2736 147196.63
## 1277 2737 109690.22
## 1278 2738 182276.46
## 1279 2739 157177.44
## 1280 2740 118587.31
## 1281 2741 143476.58
## 1282 2742 145023.77
## 1283 2743 142661.41
## 1284 2744 152628.22
## 1285 2745 142058.01
## 1286 2746 138110.04
## 1287 2747 120603.73
## 1288 2748 117817.30
## 1289 2749 111224.03
## 1290 2750 122565.46
## 1291 2751 119994.79
## 1292 2752 202319.05
## 1293 2753 198877.21
## 1294 2754 303815.79
## 1295 2755 129362.19
## 1296 2756 92818.50
## 1297 2757 79227.91
## 1298 2758 62412.72
## 1299 2759 125969.12
## 1300 2760 151563.46
## 1301 2761 130233.19
## 1302 2762 148072.07
## 1303 2763 207232.94
## 1304 2764 120600.92
## 1305 2765 261263.18
## 1306 2766 153778.48
## 1307 2767 74795.32
## 1308 2768 142388.71
## 1309 2769 131250.87
## 1310 2770 136557.81
## 1311 2771 103083.47
## 1312 2772 105714.54
## 1313 2773 183791.68
## 1314 2774 147412.07
## 1315 2775 142610.07
## 1316 2776 136576.62
## 1317 2777 121935.59
## 1318 2778 64115.76
## 1319 2779 149470.66
## 1320 2780 87171.55
## 1321 2781 61910.78
## 1322 2782 86834.51
## 1323 2783 86529.78
## 1324 2784 119764.16
## 1325 2785 124239.30
## 1326 2786 31061.09
## 1327 2787 108212.34
## 1328 2788 59860.82
## 1329 2789 196896.17
## 1330 2790 67754.58
## 1331 2791 119721.05
## 1332 2792 62712.74
## 1333 2793 160633.95
## 1334 2794 90301.76
## 1335 2795 112921.65
## 1336 2796 61291.12
## 1337 2797 253391.99
## 1338 2798 104761.03
## 1339 2799 118208.29
## 1340 2800 39385.15
## 1341 2801 88569.69
## 1342 2802 125566.14
## 1343 2803 216844.50
## 1344 2804 122076.03
## 1345 2805 59486.74
## 1346 2806 67140.12
## 1347 2807 167983.70
## 1348 2808 145064.94
## 1349 2809 112678.79
## 1350 2810 128418.14
## 1351 2811 179076.45
## 1352 2812 156055.44
## 1353 2813 235137.00
## 1354 2814 239543.17
## 1355 2815 75141.09
## 1356 2816 235399.75
## 1357 2817 140206.75
## 1358 2818 125899.23
## 1359 2819 171306.81
## 1360 2820 145974.21
## 1361 2821 93560.28
## 1362 2822 177504.96
## 1363 2823 359752.24
## 1364 2824 206543.65
## 1365 2825 149778.60
## 1366 2826 132838.84
## 1367 2827 147853.06
## 1368 2828 261854.86
## 1369 2829 235725.08
## 1370 2830 257323.39
## 1371 2831 198961.59
## 1372 2832 272115.64
## 1373 2833 254673.50
## 1374 2834 230176.75
## 1375 2835 231599.06
## 1376 2836 207980.67
## 1377 2837 177725.69
## 1378 2838 166738.49
## 1379 2839 183611.75
## 1380 2840 216603.94
## 1381 2841 214751.99
## 1382 2842 231546.96
## 1383 2843 150260.44
## 1384 2844 180051.72
## 1385 2845 111394.38
## 1386 2846 222561.33
## 1387 2847 217900.34
## 1388 2848 206370.70
## 1389 2849 223902.81
## 1390 2850 267628.87
## 1391 2851 234737.63
## 1392 2852 239867.14
## 1393 2853 239254.58
## 1394 2854 153350.16
## 1395 2855 218062.07
## 1396 2856 218564.39
## 1397 2857 206439.30
## 1398 2858 213005.04
## 1399 2859 145455.40
## 1400 2860 117302.25
## 1401 2861 151280.89
## 1402 2862 215051.19
## 1403 2863 136529.12
## 1404 2864 261458.60
## 1405 2865 166575.27
## 1406 2866 203876.84
## 1407 2867 76822.18
## 1408 2868 108793.30
## 1409 2869 130938.28
## 1410 2870 169459.13
## 1411 2871 88911.88
## 1412 2872 30635.63
## 1413 2873 108902.64
## 1414 2874 138221.85
## 1415 2875 127954.57
## 1416 2876 139663.43
## 1417 2877 106659.42
## 1418 2878 118633.96
## 1419 2879 126842.50
## 1420 2880 106679.73
## 1421 2881 119942.96
## 1422 2882 153298.82
## 1423 2883 146804.49
## 1424 2884 159889.75
## 1425 2885 143828.89
## 1426 2886 213410.41
## 1427 2887 87888.63
## 1428 2888 115702.48
## 1429 2889 26955.59
## 1430 2890 45926.98
## 1431 2891 150536.98
## 1432 2892 30107.00
## 1433 2893 96235.04
## 1434 2894 34578.79
## 1435 2895 256292.43
## 1436 2896 247522.51
## 1437 2897 219667.95
## 1438 2898 208239.13
## 1439 2899 209426.75
## 1440 2900 161967.90
## 1441 2901 208838.05
## 1442 2902 214128.52
## 1443 2903 286238.11
## 1444 2904 279637.76
## 1445 2905 117231.54
## 1446 2906 223454.86
## 1447 2907 124817.87
## 1448 2908 129234.13
## 1449 2909 219827.91
## 1450 2910 67946.50
## 1451 2911 110775.81
## 1452 2912 160286.03
## 1453 2913 112595.47
## 1454 2914 90596.02
## 1455 2915 90763.02
## 1456 2916 110677.01
## 1457 2917 186612.19
## 1458 2918 124653.74
## 1459 2919 241923.02
Prediction csv file to upload in Kaggle
write.csv(prediction, file="prediction.csv", row.names = FALSE)