PROJECT: HOUSE PRICES

DataSet

  • original data set is on Kaggle

  • this project is for the individual project, not for commercial.

  • Kaggle-House Prices

Data fields

Here’s a brief version of what you’ll find in the data description file.

SalePrice: the property’s sale price in dollars. This is the target variable that you’re trying to predict.

MSSubClass: The building class

MSZoning: The general zoning classification

LotFrontage: Linear feet of street connected to property

LotArea: Lot size in square feet

Street: Type of road access

Alley: Type of alley access

LotShape: General shape of property

LandContour: Flatness of the property

Utilities: Type of utilities available

LotConfig: Lot configuration

LandSlope: Slope of property

Neighborhood: Physical locations within Ames city limits

Condition1: Proximity to main road or railroad

Condition2: Proximity to main road or railroad (if a second is present)

BldgType: Type of dwelling

HouseStyle: Style of dwelling

OverallQual: Overall material and finish quality

OverallCond: Overall condition rating

YearBuilt: Original construction date

YearRemodAdd: Remodel date

RoofStyle: Type of roof

RoofMatl: Roof material

Exterior1st: Exterior covering on house

Exterior2nd: Exterior covering on house (if more than one material)

MasVnrType: Masonry veneer type

MasVnrArea: Masonry veneer area in square feet

ExterQual: Exterior material quality

ExterCond: Present condition of the material on the exterior

Foundation: Type of foundation

BsmtQual: Height of the basement

BsmtCond: General condition of the basement

BsmtExposure: Walkout or garden level basement walls

BsmtFinType1: Quality of basement finished area

BsmtFinSF1: Type 1 finished square feet

BsmtFinType2: Quality of second finished area (if present)

BsmtFinSF2: Type 2 finished square feet

BsmtUnfSF: Unfinished square feet of basement area

TotalBsmtSF: Total square feet of basement area

Heating: Type of heating

HeatingQC: Heating quality and condition

CentralAir: Central air conditioning

Electrical: Electrical system

1stFlrSF: First Floor square feet

2ndFlrSF: Second floor square feet

LowQualFinSF: Low quality finished square feet (all floors)

GrLivArea: Above grade (ground) living area square feet

BsmtFullBath: Basement full bathrooms

BsmtHalfBath: Basement half bathrooms

FullBath: Full bathrooms above grade

HalfBath: Half baths above grade

Bedroom: Number of bedrooms above basement level

Kitchen: Number of kitchens

KitchenQual: Kitchen quality

TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)

Functional: Home functionality rating

Fireplaces: Number of fireplaces

FireplaceQu: Fireplace quality

GarageType: Garage location

GarageYrBlt: Year garage was built

GarageFinish: Interior finish of the garage

GarageCars: Size of garage in car capacity

GarageArea: Size of garage in square feet

GarageQual: Garage quality

GarageCond: Garage condition

PavedDrive: Paved driveway

WoodDeckSF: Wood deck area in square feet

OpenPorchSF: Open porch area in square feet

EnclosedPorch: Enclosed porch area in square feet

3SsnPorch: Three season porch area in square feet

ScreenPorch: Screen porch area in square feet

PoolArea: Pool area in square feet

PoolQC: Pool quality

Fence: Fence quality

MiscFeature: Miscellaneous feature not covered in other categories

MiscVal: $Value of miscellaneous feature

MoSold: Month Sold

YrSold: Year Sold

SaleType: Type of sale

SaleCondition: Condition of sale

Call library

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.8
## ✓ tidyr   1.2.0     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(caret)
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
library(ggpubr)

Collect data

  • This section, we follow to the data of kaggle which have already given train and test data sets.
data_df <- read.csv("~/Desktop/r_houseprice/house/train.csv")
glimpse(data_df)
## Rows: 1,460
## Columns: 81
## $ Id            <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
## $ MSSubClass    <int> 60, 20, 60, 70, 60, 50, 20, 60, 50, 190, 20, 60, 20, 20,…
## $ MSZoning      <chr> "RL", "RL", "RL", "RL", "RL", "RL", "RL", "RL", "RM", "R…
## $ LotFrontage   <int> 65, 80, 68, 60, 84, 85, 75, NA, 51, 50, 70, 85, NA, 91, …
## $ LotArea       <int> 8450, 9600, 11250, 9550, 14260, 14115, 10084, 10382, 612…
## $ Street        <chr> "Pave", "Pave", "Pave", "Pave", "Pave", "Pave", "Pave", …
## $ Alley         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ LotShape      <chr> "Reg", "Reg", "IR1", "IR1", "IR1", "IR1", "Reg", "IR1", …
## $ LandContour   <chr> "Lvl", "Lvl", "Lvl", "Lvl", "Lvl", "Lvl", "Lvl", "Lvl", …
## $ Utilities     <chr> "AllPub", "AllPub", "AllPub", "AllPub", "AllPub", "AllPu…
## $ LotConfig     <chr> "Inside", "FR2", "Inside", "Corner", "FR2", "Inside", "I…
## $ LandSlope     <chr> "Gtl", "Gtl", "Gtl", "Gtl", "Gtl", "Gtl", "Gtl", "Gtl", …
## $ Neighborhood  <chr> "CollgCr", "Veenker", "CollgCr", "Crawfor", "NoRidge", "…
## $ Condition1    <chr> "Norm", "Feedr", "Norm", "Norm", "Norm", "Norm", "Norm",…
## $ Condition2    <chr> "Norm", "Norm", "Norm", "Norm", "Norm", "Norm", "Norm", …
## $ BldgType      <chr> "1Fam", "1Fam", "1Fam", "1Fam", "1Fam", "1Fam", "1Fam", …
## $ HouseStyle    <chr> "2Story", "1Story", "2Story", "2Story", "2Story", "1.5Fi…
## $ OverallQual   <int> 7, 6, 7, 7, 8, 5, 8, 7, 7, 5, 5, 9, 5, 7, 6, 7, 6, 4, 5,…
## $ OverallCond   <int> 5, 8, 5, 5, 5, 5, 5, 6, 5, 6, 5, 5, 6, 5, 5, 8, 7, 5, 5,…
## $ YearBuilt     <int> 2003, 1976, 2001, 1915, 2000, 1993, 2004, 1973, 1931, 19…
## $ YearRemodAdd  <int> 2003, 1976, 2002, 1970, 2000, 1995, 2005, 1973, 1950, 19…
## $ RoofStyle     <chr> "Gable", "Gable", "Gable", "Gable", "Gable", "Gable", "G…
## $ RoofMatl      <chr> "CompShg", "CompShg", "CompShg", "CompShg", "CompShg", "…
## $ Exterior1st   <chr> "VinylSd", "MetalSd", "VinylSd", "Wd Sdng", "VinylSd", "…
## $ Exterior2nd   <chr> "VinylSd", "MetalSd", "VinylSd", "Wd Shng", "VinylSd", "…
## $ MasVnrType    <chr> "BrkFace", "None", "BrkFace", "None", "BrkFace", "None",…
## $ MasVnrArea    <int> 196, 0, 162, 0, 350, 0, 186, 240, 0, 0, 0, 286, 0, 306, …
## $ ExterQual     <chr> "Gd", "TA", "Gd", "TA", "Gd", "TA", "Gd", "TA", "TA", "T…
## $ ExterCond     <chr> "TA", "TA", "TA", "TA", "TA", "TA", "TA", "TA", "TA", "T…
## $ Foundation    <chr> "PConc", "CBlock", "PConc", "BrkTil", "PConc", "Wood", "…
## $ BsmtQual      <chr> "Gd", "Gd", "Gd", "TA", "Gd", "Gd", "Ex", "Gd", "TA", "T…
## $ BsmtCond      <chr> "TA", "TA", "TA", "Gd", "TA", "TA", "TA", "TA", "TA", "T…
## $ BsmtExposure  <chr> "No", "Gd", "Mn", "No", "Av", "No", "Av", "Mn", "No", "N…
## $ BsmtFinType1  <chr> "GLQ", "ALQ", "GLQ", "ALQ", "GLQ", "GLQ", "GLQ", "ALQ", …
## $ BsmtFinSF1    <int> 706, 978, 486, 216, 655, 732, 1369, 859, 0, 851, 906, 99…
## $ BsmtFinType2  <chr> "Unf", "Unf", "Unf", "Unf", "Unf", "Unf", "Unf", "BLQ", …
## $ BsmtFinSF2    <int> 0, 0, 0, 0, 0, 0, 0, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ BsmtUnfSF     <int> 150, 284, 434, 540, 490, 64, 317, 216, 952, 140, 134, 17…
## $ TotalBsmtSF   <int> 856, 1262, 920, 756, 1145, 796, 1686, 1107, 952, 991, 10…
## $ Heating       <chr> "GasA", "GasA", "GasA", "GasA", "GasA", "GasA", "GasA", …
## $ HeatingQC     <chr> "Ex", "Ex", "Ex", "Gd", "Ex", "Ex", "Ex", "Ex", "Gd", "E…
## $ CentralAir    <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "…
## $ Electrical    <chr> "SBrkr", "SBrkr", "SBrkr", "SBrkr", "SBrkr", "SBrkr", "S…
## $ X1stFlrSF     <int> 856, 1262, 920, 961, 1145, 796, 1694, 1107, 1022, 1077, …
## $ X2ndFlrSF     <int> 854, 0, 866, 756, 1053, 566, 0, 983, 752, 0, 0, 1142, 0,…
## $ LowQualFinSF  <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ GrLivArea     <int> 1710, 1262, 1786, 1717, 2198, 1362, 1694, 2090, 1774, 10…
## $ BsmtFullBath  <int> 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1,…
## $ BsmtHalfBath  <int> 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ FullBath      <int> 2, 2, 2, 1, 2, 1, 2, 2, 2, 1, 1, 3, 1, 2, 1, 1, 1, 2, 1,…
## $ HalfBath      <int> 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,…
## $ BedroomAbvGr  <int> 3, 3, 3, 3, 4, 1, 3, 3, 2, 2, 3, 4, 2, 3, 2, 2, 2, 2, 3,…
## $ KitchenAbvGr  <int> 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1,…
## $ KitchenQual   <chr> "Gd", "TA", "Gd", "Gd", "Gd", "TA", "Gd", "TA", "TA", "T…
## $ TotRmsAbvGrd  <int> 8, 6, 6, 7, 9, 5, 7, 7, 8, 5, 5, 11, 4, 7, 5, 5, 5, 6, 6…
## $ Functional    <chr> "Typ", "Typ", "Typ", "Typ", "Typ", "Typ", "Typ", "Typ", …
## $ Fireplaces    <int> 0, 1, 1, 1, 1, 0, 1, 2, 2, 2, 0, 2, 0, 1, 1, 0, 1, 0, 0,…
## $ FireplaceQu   <chr> NA, "TA", "TA", "Gd", "TA", NA, "Gd", "TA", "TA", "TA", …
## $ GarageType    <chr> "Attchd", "Attchd", "Attchd", "Detchd", "Attchd", "Attch…
## $ GarageYrBlt   <int> 2003, 1976, 2001, 1998, 2000, 1993, 2004, 1973, 1931, 19…
## $ GarageFinish  <chr> "RFn", "RFn", "RFn", "Unf", "RFn", "Unf", "RFn", "RFn", …
## $ GarageCars    <int> 2, 2, 2, 3, 3, 2, 2, 2, 2, 1, 1, 3, 1, 3, 1, 2, 2, 2, 2,…
## $ GarageArea    <int> 548, 460, 608, 642, 836, 480, 636, 484, 468, 205, 384, 7…
## $ GarageQual    <chr> "TA", "TA", "TA", "TA", "TA", "TA", "TA", "TA", "Fa", "G…
## $ GarageCond    <chr> "TA", "TA", "TA", "TA", "TA", "TA", "TA", "TA", "TA", "T…
## $ PavedDrive    <chr> "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "…
## $ WoodDeckSF    <int> 0, 298, 0, 0, 192, 40, 255, 235, 90, 0, 0, 147, 140, 160…
## $ OpenPorchSF   <int> 61, 0, 42, 35, 84, 30, 57, 204, 0, 4, 0, 21, 0, 33, 213,…
## $ EnclosedPorch <int> 0, 0, 0, 272, 0, 0, 0, 228, 205, 0, 0, 0, 0, 0, 176, 0, …
## $ X3SsnPorch    <int> 0, 0, 0, 0, 0, 320, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ ScreenPorch   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 176, 0, 0, 0, 0, 0, …
## $ PoolArea      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ PoolQC        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ Fence         <chr> NA, NA, NA, NA, NA, "MnPrv", NA, NA, NA, NA, NA, NA, NA,…
## $ MiscFeature   <chr> NA, NA, NA, NA, NA, "Shed", NA, "Shed", NA, NA, NA, NA, …
## $ MiscVal       <int> 0, 0, 0, 0, 0, 700, 0, 350, 0, 0, 0, 0, 0, 0, 0, 0, 700,…
## $ MoSold        <int> 2, 5, 9, 2, 12, 10, 8, 11, 4, 1, 2, 7, 9, 8, 5, 7, 3, 10…
## $ YrSold        <int> 2008, 2007, 2008, 2006, 2008, 2009, 2007, 2009, 2008, 20…
## $ SaleType      <chr> "WD", "WD", "WD", "WD", "WD", "WD", "WD", "WD", "WD", "W…
## $ SaleCondition <chr> "Normal", "Normal", "Normal", "Abnorml", "Normal", "Norm…
## $ SalePrice     <int> 208500, 181500, 223500, 140000, 250000, 143000, 307000, …

Explore and Plot data

  • Focus on SalePrice of train data.
data_df %>%
  ggplot(aes(SalePrice)) +
  geom_histogram(color = "black", bins = 20, fill =  "#b3cde0") +
  theme_minimal()+
  labs(title = "The SalePrice ranges of property")+
  theme(plot.title = element_text(hjust = 0.5))

MSSubClass: Identifies the type of dwelling involved in the sale.

    20  1-STORY 1946 & NEWER ALL STYLES
    30  1-STORY 1945 & OLDER
    40  1-STORY W/FINISHED ATTIC ALL AGES
    45  1-1/2 STORY - UNFINISHED ALL AGES
    50  1-1/2 STORY FINISHED ALL AGES
    60  2-STORY 1946 & NEWER
    70  2-STORY 1945 & OLDER
    75  2-1/2 STORY ALL AGES
    80  SPLIT OR MULTI-LEVEL
    85  SPLIT FOYER
    90  DUPLEX - ALL STYLES AND AGES
   120  1-STORY PUD (Planned Unit Development) - 1946 & NEWER
   150  1-1/2 STORY PUD - ALL AGES
   160  2-STORY PUD - 1946 & NEWER
   180  PUD - MULTILEVEL - INCL SPLIT LEV/FOYER
   190  2 FAMILY CONVERSION - ALL STYLES AND AGES
data_df %>%
  ggplot(aes(MSSubClass, SalePrice))+
  geom_jitter(color = "#68a7d9") +
  theme_minimal() +
  labs(title = "The types of dwelling involved in SalePrice")+
  theme(plot.title = element_text(hjust = 0.5))

MSZoning: Identifies the general zoning classification of the sale.

   A    Agriculture
   C    Commercial
   FV   Floating Village Residential
   I    Industrial
   RH   Residential High Density
   RL   Residential Low Density
   RP   Residential Low Density Park 
   RM   Residential Medium Density
data_df %>%
  ggplot(aes(MSZoning, SalePrice))+
  geom_jitter(color = "#68a7d9") +
  theme_minimal() +
  labs(title = "The types of dwelling zones involved in SalePrice")+
  theme(plot.title = element_text(hjust = 0.5))

  • SalePrice vs. Street and Alley

Street: Type of road access to property

   Grvl Gravel  
   Pave Paved

Alley: Type of alley access to property

   Grvl Gravel
   Pave Paved
   NA   No alley access
street <- data_df %>%
            ggplot(aes( Street))+
            geom_bar(color = "black", fill = "#b3cde0", alpha = 0.8) +
            theme_minimal() +
            labs(title = "The types of Street access to property")+
            theme(plot.title = element_text(hjust = 0.5))

alley <- data_df %>%
            ggplot(aes( Alley))+
            geom_bar(color = "black", fill = "#b3cde0", alpha = 0.8) +
            theme_minimal() +
            labs(title = "The types of Alley access to property")+
            theme(plot.title = element_text(hjust = 0.5))

ggarrange(street, alley)

data_df %>%
      ggplot(aes( Street, SalePrice))+
      geom_jitter(color = "#68a7d9") +
      theme_minimal() +
      facet_wrap(~Alley)+
      labs(title = "The types of Street and Alley access to property involved in SalePrice",
           subtitle = "Alley")+
      theme(plot.title = element_text(hjust = 0.5), 
            plot.subtitle  = element_text(hjust = 0.5) )

LotShape: General shape of property

   Reg  Regular 
   IR1  Slightly irregular
   IR2  Moderately Irregular
   IR3  Irregular

LandContour: Flatness of the property

   Lvl  Near Flat/Level 
   Bnk  Banked - Quick and significant rise from street grade to building
   HLS  Hillside - Significant slope from side to side
   Low  Depression

Utilities: Type of utilities available

   AllPub   All public Utilities (E,G,W,& S)    
   NoSeWa   Electricity and Gas Only
   

LandSlope: Slope of property

   Gtl  Gentle slope
   Mod  Moderate Slope  
   Sev  Severe Slope
   
lot_shape <- data_df %>%
                ggplot(aes(LotShape, SalePrice))+
                geom_col(color ="#b3cde0") +
                theme_minimal() +
                labs(title = "The general shape of property vs. SalePrice")+
                theme(plot.title = element_text(hjust = 0.5))

land_contour <- data_df %>%
                    ggplot(aes(LandContour, SalePrice))+
                    geom_col(color = "#b3cde0") +
                    theme_minimal() +
                    labs(title = "The Flatness of the property vs. SalePrice")+
                    theme(plot.title = element_text(hjust = 0.5))

util <- data_df %>%
          ggplot(aes(Utilities, SalePrice))+
          geom_col(color = "#b3cde0") +
          theme_minimal() +
          labs(title = "The utilities available vs. SalePrice")+
          theme(plot.title = element_text(hjust = 0.5))

land_slope <- data_df %>%
                ggplot(aes(LandSlope, SalePrice))+
                geom_col(color = "#b3cde0") +
                theme_minimal() +
                labs(title = "The slope of property vs. SalePrice")+
                theme(plot.title = element_text(hjust = 0.5))

ggarrange(lot_shape, land_contour, util, land_slope + rremove("x.text"),
          ncol = 2, nrow = 2)

Neighborhood: Physical locations within Ames city limits

   Blmngtn  Bloomington Heights
   Blueste  Bluestem
   BrDale   Briardale
   BrkSide  Brookside
   ClearCr  Clear Creek
   CollgCr  College Creek
   Crawfor  Crawford
   Edwards  Edwards
   Gilbert  Gilbert
   IDOTRR   Iowa DOT and Rail Road
   MeadowV  Meadow Village
   Mitchel  Mitchell
   Names    North Ames
   NoRidge  Northridge
   NPkVill  Northpark Villa
   NridgHt  Northridge Heights
   NWAmes   Northwest Ames
   OldTown  Old Town
   SWISU    South & West of Iowa State University
   Sawyer   Sawyer
   SawyerW  Sawyer West
   Somerst  Somerset
   StoneBr  Stone Brook
   Timber   Timberland
   Veenker  Veenker
   
data_df %>%
  ggplot(aes(Neighborhood, SalePrice))+
  geom_col(color = "#b3cde0") +
  theme_minimal() +
  labs(title = "The Physical locations within Ames city limits(Neighborhood) involved in SalePrice")+
  theme(plot.title = element_text(hjust = 0.5),
        axis.text.x  = element_text(angle = 90, vjust = 0.5))

Condition1: Proximity to various conditions

   Artery   Adjacent to arterial street
   Feedr    Adjacent to feeder street   
   Norm Normal  
   RRNn Within 200' of North-South Railroad
   RRAn Adjacent to North-South Railroad
   PosN Near positive off-site feature--park, greenbelt, etc.
   PosA Adjacent to postive off-site feature
   RRNe Within 200' of East-West Railroad
   RRAe Adjacent to East-West Railroad

Condition2: Proximity to various conditions (if more than one is present)

   Artery   Adjacent to arterial street
   Feedr    Adjacent to feeder street   
   Norm Normal  
   RRNn Within 200' of North-South Railroad
   RRAn Adjacent to North-South Railroad
   PosN Near positive off-site feature--park, greenbelt, etc.
   PosA Adjacent to postive off-site feature
   RRNe Within 200' of East-West Railroad
   RRAe Adjacent to East-West Railroad
con1 <- data_df %>%
                ggplot(aes(Condition1, SalePrice))+
                geom_col(color = "#b3cde0") +
                theme_minimal() +
                labs(title = "The Condition1 vs. SalePrice")+
                theme(plot.title = element_text(hjust = 0.5),
                axis.text.x  = element_text(angle = 90, vjust = 0.5))
con2 <- data_df %>%
                ggplot(aes(Condition2, SalePrice))+
                geom_col(color = "#b3cde0") +
                theme_minimal() +
                labs(title = "The Condition2 vs. SalePrice")+
                theme(plot.title = element_text(hjust = 0.5),
                axis.text.x  = element_text(angle = 90, vjust = 0.5))

ggarrange(con1, con2)

BldgType: Type of dwelling

   1Fam Single-family Detached  
   2FmCon   Two-family Conversion; originally built as one-family dwelling
   Duplx    Duplex
   TwnhsE   Townhouse End Unit
   TwnhsI   Townhouse Inside Unit
data_df %>%
  ggplot(aes(BldgType)) +
  geom_bar(color = "black",fill =  "#b3cde0") +
  theme_minimal() +
  labs(title = "The type of dwelling") +
  theme(plot.title = element_text(hjust = 0.5))

HouseStyle: Style of dwelling

   1Story   One story
   1.5Fin   One and one-half story: 2nd level finished
   1.5Unf   One and one-half story: 2nd level unfinished
   2Story   Two story
   2.5Fin   Two and one-half story: 2nd level finished
   2.5Unf   Two and one-half story: 2nd level unfinished
   SFoyer   Split Foyer
   SLvl Split Level
data_df %>%
  ggplot(aes(HouseStyle)) +
  geom_bar(color = "black",fill =  "#b3cde0") +
  theme_minimal() +
  labs(title = "The style of dwelling") +
  theme(plot.title = element_text(hjust = 0.5))

data_df %>%
    ggplot(aes(BldgType, SalePrice)) +
    geom_boxplot(fill = "#b3cde0") +
    theme_minimal() +
    labs(title = "The type of dwelling vs. SalePrice") +
    theme(plot.title = element_text(hjust = 0.5))

data_df %>%
    ggplot(aes(HouseStyle, SalePrice)) +
    geom_boxplot(fill = "#b3cde0") +
    theme_minimal() +
    labs(title = "The style of dwelling vs. SalePrice") +
    theme(plot.title = element_text(hjust = 0.5))

OverallQual: Rates the overall material and finish of the house

   10   Very Excellent
   9    Excellent
   8    Very Good
   7    Good
   6    Above Average
   5    Average
   4    Below Average
   3    Fair
   2    Poor
   1    Very Poor

OverallCond: Rates the overall condition of the house

   10   Very Excellent
   9    Excellent
   8    Very Good
   7    Good
   6    Above Average   
   5    Average
   4    Below Average   
   3    Fair
   2    Poor
   1    Very Poor
   
data_df %>%
    ggplot(aes(OverallQual)) +
    geom_bar(color = "black", fill = "#b3cde0") +
    theme_minimal() +
    labs(title = "Rates the overall material and finish of the house") +
    theme(plot.title = element_text(hjust = 0.5))

data_df %>%
    ggplot(aes(OverallCond)) +
    geom_bar(color = "black", fill = "#b3cde0") +
    theme_minimal() +
    labs(title = "Rates the overall condition of the house") +
    theme(plot.title = element_text(hjust = 0.5))

data_df %>%
    ggplot(aes(OverallQual, SalePrice)) +
    geom_point(color = "#7db0df") +
    geom_rug(color = "#6b7b86") +
    theme_minimal() +
    labs(title = "Rates the overall material and finish of the house") +
    theme(plot.title = element_text(hjust = 0.5))

data_df %>%
    ggplot(aes(OverallCond, SalePrice)) +
    geom_point(color = "#7db0df") +
    geom_rug(color = "#6b7b86") +
    theme_minimal() +
    labs(title = "Rates the overall condition of the house") +
    theme(plot.title = element_text(hjust = 0.5))

data_df %>% 
  ggplot(aes(OverallQual, SalePrice)) +
  geom_col(color =  "#6fa8dc") +
  facet_grid(~MSZoning)+
  theme_minimal() +
  labs(title = "Zone of property with overall quality and sale price",
       subtitle = "Zoning Classification of property sale") +
  theme(plot.title = element_text(hjust = 0.5))

YearBuilt: Original construction date

YearRemodAdd: Remodel date (same as construction date if no remodeling or additions)

y_original <- data_df %>%
          ggplot(aes(YearBuilt, fill = YearRemodAdd)) +
          geom_histogram(bins = 15, color = "black", fill = "#b3cde0") +
          theme_minimal() +
          labs(title = "Original construction date")+
          theme(plot.title = element_text(hjust = 0.5))

y_remodel <- data_df %>%
                ggplot(aes(YearRemodAdd)) +
                geom_histogram(bins = 15, color = "black", fill = "#b3cde0") +
                theme_minimal() +
                labs(title = "Remodel date")+
                theme(plot.title = element_text(hjust = 0.5))
ggarrange(y_original, y_remodel)

RoofStyle: Type of roof

   Flat Flat
   Gable    Gable
   Gambrel  Gabrel (Barn)
   Hip  Hip
   Mansard  Mansard
   Shed Shed
    

RoofMatl: Roof material

   ClyTile  Clay or Tile
   CompShg  Standard (Composite) Shingle
   Membran  Membrane
   Metal    Metal
   Roll Roll
   Tar&Grv  Gravel & Tar
   WdShake  Wood Shakes
   WdShngl  Wood Shingles
r_style <- data_df %>%
              ggplot(aes(RoofStyle, SalePrice)) +
              geom_boxplot(fill = "#b3cde0")+
              theme_minimal() +
              labs(title = "Type of roof vs SalePrice") +
              theme(plot.title = element_text(hjust = 0.5),
                    axis.text.x  = element_text(angle = 90, vjust = 0.5))

r_matl <- data_df %>%
              ggplot(aes(RoofMatl, SalePrice)) +
              geom_boxplot(fill = "#b3cde0")+
              theme_minimal() +
              labs(title = "Material of roof vs SalePrice") +
              theme(plot.title = element_text(hjust = 0.5),
                    axis.text.x  = element_text(angle = 90, vjust = 0.5))
ggarrange(r_style, r_matl,nrow = 1, ncol = 2 )

Foundation: Type of foundation

   BrkTil   Brick & Tile
   CBlock   Cinder Block
   PConc    Poured Contrete 
   Slab Slab
   Stone    Stone
   Wood Wood
   

KitchenQual: Kitchen quality

   Ex   Excellent
   Gd   Good
   TA   Typical/Average
   Fa   Fair
   Po   Poor
f_style <- data_df %>%
              ggplot(aes(Foundation, SalePrice)) +
              geom_boxplot(fill = "#b3cde0")+
              theme_minimal() +
              labs(title = "Type of foundation") +
              theme(plot.title = element_text(hjust = 0.5))

kitchen <- data_df %>%
              ggplot(aes(KitchenQual, SalePrice)) +
              geom_boxplot(fill = "#b3cde0")+
              theme_minimal() +
              labs(title = "Kitchen quality") +
              theme(plot.title = element_text(hjust = 0.5))
ggarrange(f_style, kitchen, nrow = 1, ncol = 2)

Fireplaces: Number of fireplaces

FireplaceQu: Fireplace quality

   Ex   Excellent - Exceptional Masonry Fireplace
   Gd   Good - Masonry Fireplace in main level
   TA   Average - Prefabricated Fireplace in main living area or Masonry Fireplace in basement
   Fa   Fair - Prefabricated Fireplace in basement
   Po   Poor - Ben Franklin Stove
   NA   No Fireplace
    
data_df %>%
  ggplot(aes(Fireplaces)) +
  geom_bar(color = "black", fill = "#b3cde0")+
  theme_minimal() +
  facet_wrap(~MSZoning)+
  labs(title = "Fireplace Number in Zoning") +
  theme(plot.title = element_text(hjust = 0.5))

data_df %>%
  ggplot(aes(FireplaceQu, SalePrice)) +
  geom_boxplot(fill = "#b3cde0")+
  theme_minimal() +
  labs(title = "Fireplace quality") +
  theme(plot.title = element_text(hjust = 0.5))

GarageType: Garage location

   2Types   More than one type of garage
   Attchd   Attached to home
   Basment  Basement Garage
   BuiltIn  Built-In (Garage part of house - typically has room above garage)
   CarPort  Car Port
   Detchd   Detached from home
   NA   No Garage

GarageFinish: Interior finish of the garage

   Fin  Finished
   RFn  Rough Finished  
   Unf  Unfinished
   No   No Garage
   

GarageQual: Garage quality

   Ex   Excellent
   Gd   Good
   TA   Typical/Average
   Fa   Fair
   Po   Poor
   NA   No Garage
    

GarageCond: Garage condition

   Ex   Excellent
   Gd   Good
   TA   Typical/Average
   Fa   Fair
   Po   Poor
   NA   No Garage
garage_t <- data_df %>%
              ggplot(aes(GarageType, SalePrice)) +
              geom_boxplot(fill = "#b3cde0")+
              theme_minimal() +
              labs(title = "Garage type vs Price") +
              theme(plot.title = element_text(hjust = 0.5),
                    axis.text.x  = element_text(angle = 90, vjust = 0.5))

garage_f <- data_df %>%
              ggplot(aes(GarageFinish, SalePrice)) +
              geom_boxplot(fill = "#b3cde0")+
              theme_minimal() +
              labs(title = "Garage finish interior vs Price") +
              theme(plot.title = element_text(hjust = 0.5))

garage_q <- data_df %>%
              ggplot(aes(GarageQual, SalePrice)) +
              geom_boxplot(fill = "#b3cde0")+
              theme_minimal() +
              labs(title = "Garage quality vs Price") +
              theme(plot.title = element_text(hjust = 0.5))

garage_c <- data_df %>%
              ggplot(aes(GarageCond, SalePrice)) +
              geom_boxplot(fill = "#b3cde0")+
              theme_minimal() +
              labs(title = "Garage condition vs Price") +
              theme(plot.title = element_text(hjust = 0.5))

ggarrange(garage_t, garage_f, garage_q, garage_c, nrow = 2, ncol = 2 )

PoolQC: Pool quality

   Ex   Excellent
   Gd   Good
   TA   Average/Typical
   Fa   Fair
   NA   No Pool
    

Fence: Fence quality

   GdPrv    Good Privacy
   MnPrv    Minimum Privacy
   GdWo Good Wood
   MnWw Minimum Wood/Wire
   NA   No Fence
pool <- data_df %>%
              ggplot(aes(PoolQC, SalePrice)) +
              geom_boxplot(fill = "#b3cde0")+
              theme_minimal() +
              labs(title = "Pool quality vs SalePrice") +
              theme(plot.title = element_text(hjust = 0.5))

fence <- data_df %>%
              ggplot(aes(Fence, SalePrice)) +
              geom_boxplot(fill = "#b3cde0")+
              theme_minimal() +
              labs(title = "Fence quality vs SalePrice") +
              theme(plot.title = element_text(hjust = 0.5))

ggarrange( pool, fence,nrow = 1, ncol = 2 )

SaleType: Type of sale

   WD   Warranty Deed - Conventional
   CWD  Warranty Deed - Cash
   VWD  Warranty Deed - VA Loan
   New  Home just constructed and sold
   COD  Court Officer Deed/Estate
   Con  Contract 15% Down payment regular terms
   ConLw    Contract Low Down payment and low interest
   ConLI    Contract Low Interest
   ConLD    Contract Low Down
   Oth  Other
   
data_df %>% 
  ggplot(aes(SaleType, SalePrice)) +
  geom_col(color =  "#6fa8dc") +
  facet_grid(~MSZoning)+
  theme_minimal() +
  labs(title = "Sale type vs sale price")+
  theme(plot.title = element_text(hjust = 0.5),
        axis.text.x = element_text(angle = 90, vjust =0.5))

SaleCondition: Condition of sale

   Normal   Normal Sale
   Abnorml  Abnormal Sale -  trade, foreclosure, short sale
   AdjLand  Adjoining Land Purchase
   Alloca   Allocation - two linked properties with separate deeds, typically condo with a garage unit  
   Family   Sale between family members
   Partial  Home was not completed when last assessed (associated with New Homes)
   
data_df %>% 
  ggplot(aes(SaleCondition, SalePrice)) +
  geom_col(color =  "#6fa8dc") +
  facet_grid(~MSZoning)+
  theme_minimal() +
  labs(title = "Zone of property with sale condition and sale price",
       subtitle = "Zoning Classification of property sale") +
  theme(plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5),
        axis.text.x  = element_text(angle = 90, vjust = 0.5))

Cleaning Data and Normalization for training models

  • Check NA in both data sets.

  • If having NA in numeric column, 0 will be assigned instead of NA.

  • If having NA in character column, “No” will be assigned instead of NA.

mean(complete.cases(data_df))
## [1] 0
data_id <- data_df$Id
# normalization [0,1] for all numeric data 
process_norm <- preProcess(as.data.frame(data_df), method = c('range'))
data_df <- predict(process_norm, as.data.frame(data_df))
data_df <- data_df %>% mutate_if(is.numeric, ~replace_na(.,0)) 
data_df <- data_df %>% mutate_if(is.character, ~replace_na(.,"None"))
data_df <- select(data_df, -1)

mean(complete.cases(data_df))
## [1] 1
data_df <- data_df %>% 
               mutate(across(where(is.character), as.factor)) 
glimpse(data_df)
## Rows: 1,460
## Columns: 80
## $ MSSubClass    <dbl> 0.2352941, 0.0000000, 0.2352941, 0.2941176, 0.2352941, 0…
## $ MSZoning      <fct> RL, RL, RL, RL, RL, RL, RL, RL, RM, RL, RL, RL, RL, RL, …
## $ LotFrontage   <dbl> 0.15068493, 0.20205479, 0.16095890, 0.13356164, 0.215753…
## $ LotArea       <dbl> 0.03341980, 0.03879502, 0.04650728, 0.03856131, 0.060576…
## $ Street        <fct> Pave, Pave, Pave, Pave, Pave, Pave, Pave, Pave, Pave, Pa…
## $ Alley         <fct> None, None, None, None, None, None, None, None, None, No…
## $ LotShape      <fct> Reg, Reg, IR1, IR1, IR1, IR1, Reg, IR1, Reg, Reg, Reg, I…
## $ LandContour   <fct> Lvl, Lvl, Lvl, Lvl, Lvl, Lvl, Lvl, Lvl, Lvl, Lvl, Lvl, L…
## $ Utilities     <fct> AllPub, AllPub, AllPub, AllPub, AllPub, AllPub, AllPub, …
## $ LotConfig     <fct> Inside, FR2, Inside, Corner, FR2, Inside, Inside, Corner…
## $ LandSlope     <fct> Gtl, Gtl, Gtl, Gtl, Gtl, Gtl, Gtl, Gtl, Gtl, Gtl, Gtl, G…
## $ Neighborhood  <fct> CollgCr, Veenker, CollgCr, Crawfor, NoRidge, Mitchel, So…
## $ Condition1    <fct> Norm, Feedr, Norm, Norm, Norm, Norm, Norm, PosN, Artery,…
## $ Condition2    <fct> Norm, Norm, Norm, Norm, Norm, Norm, Norm, Norm, Norm, Ar…
## $ BldgType      <fct> 1Fam, 1Fam, 1Fam, 1Fam, 1Fam, 1Fam, 1Fam, 1Fam, 1Fam, 2f…
## $ HouseStyle    <fct> 2Story, 1Story, 2Story, 2Story, 2Story, 1.5Fin, 1Story, …
## $ OverallQual   <dbl> 0.6666667, 0.5555556, 0.6666667, 0.6666667, 0.7777778, 0…
## $ OverallCond   <dbl> 0.500, 0.875, 0.500, 0.500, 0.500, 0.500, 0.500, 0.625, …
## $ YearBuilt     <dbl> 0.9492754, 0.7536232, 0.9347826, 0.3115942, 0.9275362, 0…
## $ YearRemodAdd  <dbl> 0.8833333, 0.4333333, 0.8666667, 0.3333333, 0.8333333, 0…
## $ RoofStyle     <fct> Gable, Gable, Gable, Gable, Gable, Gable, Gable, Gable, …
## $ RoofMatl      <fct> CompShg, CompShg, CompShg, CompShg, CompShg, CompShg, Co…
## $ Exterior1st   <fct> VinylSd, MetalSd, VinylSd, Wd Sdng, VinylSd, VinylSd, Vi…
## $ Exterior2nd   <fct> VinylSd, MetalSd, VinylSd, Wd Shng, VinylSd, VinylSd, Vi…
## $ MasVnrType    <fct> BrkFace, None, BrkFace, None, BrkFace, None, Stone, Ston…
## $ MasVnrArea    <dbl> 0.122500, 0.000000, 0.101250, 0.000000, 0.218750, 0.0000…
## $ ExterQual     <fct> Gd, TA, Gd, TA, Gd, TA, Gd, TA, TA, TA, TA, Ex, TA, Gd, …
## $ ExterCond     <fct> TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, …
## $ Foundation    <fct> PConc, CBlock, PConc, BrkTil, PConc, Wood, PConc, CBlock…
## $ BsmtQual      <fct> Gd, Gd, Gd, TA, Gd, Gd, Ex, Gd, TA, TA, TA, Ex, TA, Gd, …
## $ BsmtCond      <fct> TA, TA, TA, Gd, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, …
## $ BsmtExposure  <fct> No, Gd, Mn, No, Av, No, Av, Mn, No, No, No, No, No, Av, …
## $ BsmtFinType1  <fct> GLQ, ALQ, GLQ, ALQ, GLQ, GLQ, GLQ, ALQ, Unf, GLQ, Rec, G…
## $ BsmtFinSF1    <dbl> 0.12508859, 0.17328136, 0.08610914, 0.03827073, 0.116052…
## $ BsmtFinType2  <fct> Unf, Unf, Unf, Unf, Unf, Unf, Unf, BLQ, Unf, Unf, Unf, U…
## $ BsmtFinSF2    <dbl> 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.000000…
## $ BsmtUnfSF     <dbl> 0.06421233, 0.12157534, 0.18578767, 0.23116438, 0.209760…
## $ TotalBsmtSF   <dbl> 0.1400982, 0.2065466, 0.1505728, 0.1237316, 0.1873977, 0…
## $ Heating       <fct> GasA, GasA, GasA, GasA, GasA, GasA, GasA, GasA, GasA, Ga…
## $ HeatingQC     <fct> Ex, Ex, Ex, Gd, Ex, Ex, Ex, Ex, Gd, Ex, Ex, Ex, TA, Ex, …
## $ CentralAir    <fct> Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y,…
## $ Electrical    <fct> SBrkr, SBrkr, SBrkr, SBrkr, SBrkr, SBrkr, SBrkr, SBrkr, …
## $ X1stFlrSF     <dbl> 0.1197797, 0.2129417, 0.1344654, 0.1438733, 0.1860945, 0…
## $ X2ndFlrSF     <dbl> 0.4135593, 0.0000000, 0.4193705, 0.3661017, 0.5099274, 0…
## $ LowQualFinSF  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ GrLivArea     <dbl> 0.25923135, 0.17483044, 0.27354936, 0.26055011, 0.351168…
## $ BsmtFullBath  <dbl> 0.3333333, 0.0000000, 0.3333333, 0.3333333, 0.3333333, 0…
## $ BsmtHalfBath  <dbl> 0.0, 0.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0…
## $ FullBath      <dbl> 0.6666667, 0.6666667, 0.6666667, 0.3333333, 0.6666667, 0…
## $ HalfBath      <dbl> 0.5, 0.0, 0.5, 0.0, 0.5, 0.5, 0.0, 0.5, 0.0, 0.0, 0.0, 0…
## $ BedroomAbvGr  <dbl> 0.375, 0.375, 0.375, 0.375, 0.500, 0.125, 0.375, 0.375, …
## $ KitchenAbvGr  <dbl> 0.3333333, 0.3333333, 0.3333333, 0.3333333, 0.3333333, 0…
## $ KitchenQual   <fct> Gd, TA, Gd, Gd, Gd, TA, Gd, TA, TA, TA, TA, Ex, TA, Gd, …
## $ TotRmsAbvGrd  <dbl> 0.5000000, 0.3333333, 0.3333333, 0.4166667, 0.5833333, 0…
## $ Functional    <fct> Typ, Typ, Typ, Typ, Typ, Typ, Typ, Typ, Min1, Typ, Typ, …
## $ Fireplaces    <dbl> 0.0000000, 0.3333333, 0.3333333, 0.3333333, 0.3333333, 0…
## $ FireplaceQu   <fct> None, TA, TA, Gd, TA, None, Gd, TA, TA, TA, None, Gd, No…
## $ GarageType    <fct> Attchd, Attchd, Attchd, Detchd, Attchd, Attchd, Attchd, …
## $ GarageYrBlt   <dbl> 0.9363636, 0.6909091, 0.9181818, 0.8909091, 0.9090909, 0…
## $ GarageFinish  <fct> RFn, RFn, RFn, Unf, RFn, Unf, RFn, RFn, Unf, RFn, Unf, F…
## $ GarageCars    <dbl> 0.50, 0.50, 0.50, 0.75, 0.75, 0.50, 0.50, 0.50, 0.50, 0.…
## $ GarageArea    <dbl> 0.3864598, 0.3244006, 0.4287729, 0.4527504, 0.5895628, 0…
## $ GarageQual    <fct> TA, TA, TA, TA, TA, TA, TA, TA, Fa, Gd, TA, TA, TA, TA, …
## $ GarageCond    <fct> TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, TA, …
## $ PavedDrive    <fct> Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y, Y,…
## $ WoodDeckSF    <dbl> 0.00000000, 0.34772462, 0.00000000, 0.00000000, 0.224037…
## $ OpenPorchSF   <dbl> 0.111517367, 0.000000000, 0.076782450, 0.063985375, 0.15…
## $ EnclosedPorch <dbl> 0.0000000, 0.0000000, 0.0000000, 0.4927536, 0.0000000, 0…
## $ X3SsnPorch    <dbl> 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0…
## $ ScreenPorch   <dbl> 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0.0000000, 0…
## $ PoolArea      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ PoolQC        <fct> None, None, None, None, None, None, None, None, None, No…
## $ Fence         <fct> None, None, None, None, None, MnPrv, None, None, None, N…
## $ MiscFeature   <fct> None, None, None, None, None, Shed, None, Shed, None, No…
## $ MiscVal       <dbl> 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.000000…
## $ MoSold        <dbl> 0.09090909, 0.36363636, 0.72727273, 0.09090909, 1.000000…
## $ YrSold        <dbl> 0.50, 0.25, 0.50, 0.00, 0.50, 0.75, 0.25, 0.75, 0.50, 0.…
## $ SaleType      <fct> WD, WD, WD, WD, WD, WD, WD, WD, WD, WD, WD, New, WD, New…
## $ SaleCondition <fct> Normal, Normal, Normal, Abnorml, Normal, Normal, Normal,…
## $ SalePrice     <dbl> 0.24107763, 0.20358284, 0.26190807, 0.14595195, 0.298708…

Train Data set

  1. 80% train data

  2. 20% test data

set.seed(55)
nrow(data_df)
## [1] 1460
id <- createDataPartition(y = data_df$SalePrice, 
                          p = 0.8,
                          list = FALSE)
train_df <- data_df[id, ]
test_df <- data_df[-id, ]
nrow(train_df)
## [1] 1169
nrow(test_df)
## [1] 291

Linear Regression with RepeatedCV

set.seed(55)
control <- trainControl(method = "repeatedcv",
                        repeats = 5,
                        number = 5,
                        verboseIter = FALSE)

lm_model <- train(SalePrice ~. ,
               data = train_df,
               method = "lm",
               trControl = control)
lm_model
## Linear Regression 
## 
## 1169 samples
##   79 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 5 times) 
## Summary of sample sizes: 934, 935, 935, 936, 936, 936, ... 
## Resampling results:
## 
##   RMSE        Rsquared   MAE       
##   0.07895367  0.6360975  0.03034698
## 
## Tuning parameter 'intercept' was held constant at a value of TRUE

Random Forest with RepeatedCV

set.seed(55)
control <- trainControl(method = "repeatedcv",
                        number = 5,
                        repeats = 5,
                        verboseIter = FALSE)

rf_model <- train(SalePrice ~. ,
               data = train_df,
               method = "rf",
               trControl = control)
rf_model
## Random Forest 
## 
## 1169 samples
##   79 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 5 times) 
## Summary of sample sizes: 934, 935, 935, 936, 936, 936, ... 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE        Rsquared   MAE       
##     2   0.06774708  0.7727620  0.04299229
##   131   0.04076242  0.8692313  0.02358575
##   260   0.04167289  0.8609170  0.02447494
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 131.

Lasso Regression with RepeatedCV

set.seed(55)
control <- trainControl(method = "repeatedcv",
                        number = 5,
                        repeats = 5,
                        verboseIter = FALSE)
grid = expand.grid(alpha = 1,
                   lambda = seq(0.001, 0.1, by = 0.0002))
 
lasso_model <- train(SalePrice ~. ,
               data = train_df,
               method = "glmnet",
               trControl = control,
               tuneGrid = grid)
lasso_model
## glmnet 
## 
## 1169 samples
##   79 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 5 times) 
## Summary of sample sizes: 934, 935, 935, 936, 936, 936, ... 
## Resampling results across tuning parameters:
## 
##   lambda  RMSE        Rsquared   MAE       
##   0.0010  0.05226546  0.7849217  0.02610218
##   0.0012  0.05207359  0.7857421  0.02623435
##   0.0014  0.05190789  0.7865095  0.02640744
##   0.0016  0.05175243  0.7873268  0.02655905
##   0.0018  0.05160822  0.7881541  0.02669449
##   0.0020  0.05147554  0.7889570  0.02682417
##   0.0022  0.05136430  0.7896807  0.02697084
##   0.0024  0.05128917  0.7901747  0.02713268
##   0.0026  0.05123168  0.7906097  0.02728465
##   0.0028  0.05118295  0.7910329  0.02742169
##   0.0030  0.05114188  0.7914336  0.02753800
##   0.0032  0.05111793  0.7917323  0.02764722
##   0.0034  0.05110687  0.7919780  0.02775666
##   0.0036  0.05110274  0.7922053  0.02787167
##   0.0038  0.05111690  0.7923209  0.02799838
##   0.0040  0.05114556  0.7923279  0.02813092
##   0.0042  0.05119038  0.7922295  0.02826661
##   0.0044  0.05125279  0.7919964  0.02840915
##   0.0046  0.05132559  0.7916825  0.02854963
##   0.0048  0.05141128  0.7912749  0.02869435
##   0.0050  0.05150414  0.7908201  0.02884708
##   0.0052  0.05159512  0.7903932  0.02898953
##   0.0054  0.05169268  0.7899258  0.02913321
##   0.0056  0.05179221  0.7894537  0.02926440
##   0.0058  0.05189805  0.7889437  0.02939244
##   0.0060  0.05200396  0.7884392  0.02951537
##   0.0062  0.05211556  0.7878903  0.02963836
##   0.0064  0.05223353  0.7872896  0.02976349
##   0.0066  0.05235501  0.7866649  0.02988964
##   0.0068  0.05248017  0.7860107  0.03001732
##   0.0070  0.05260881  0.7853175  0.03014629
##   0.0072  0.05273427  0.7846379  0.03027608
##   0.0074  0.05284954  0.7840486  0.03040061
##   0.0076  0.05296858  0.7834229  0.03052606
##   0.0078  0.05309234  0.7827547  0.03065536
##   0.0080  0.05321002  0.7821345  0.03078241
##   0.0082  0.05332552  0.7815290  0.03090880
##   0.0084  0.05344313  0.7809028  0.03103547
##   0.0086  0.05355929  0.7802871  0.03115803
##   0.0088  0.05367159  0.7797028  0.03127742
##   0.0090  0.05378306  0.7791196  0.03139517
##   0.0092  0.05389777  0.7785053  0.03151345
##   0.0094  0.05401319  0.7778790  0.03163160
##   0.0096  0.05412754  0.7772642  0.03174674
##   0.0098  0.05424502  0.7766278  0.03186104
##   0.0100  0.05436434  0.7759716  0.03197587
##   0.0102  0.05448538  0.7752998  0.03209068
##   0.0104  0.05460844  0.7746180  0.03220548
##   0.0106  0.05473312  0.7739290  0.03231859
##   0.0108  0.05486168  0.7732052  0.03243159
##   0.0110  0.05499315  0.7724513  0.03254525
##   0.0112  0.05512860  0.7716560  0.03266115
##   0.0114  0.05526693  0.7708326  0.03277540
##   0.0116  0.05540622  0.7700080  0.03288943
##   0.0118  0.05554635  0.7691817  0.03300400
##   0.0120  0.05568685  0.7683536  0.03311758
##   0.0122  0.05582886  0.7675125  0.03323164
##   0.0124  0.05597165  0.7666710  0.03334654
##   0.0126  0.05611326  0.7658458  0.03346065
##   0.0128  0.05625259  0.7650466  0.03357103
##   0.0130  0.05639313  0.7642360  0.03368194
##   0.0132  0.05653575  0.7634051  0.03379334
##   0.0134  0.05668004  0.7625595  0.03390518
##   0.0136  0.05682367  0.7617308  0.03401522
##   0.0138  0.05696631  0.7609149  0.03412371
##   0.0140  0.05710522  0.7601386  0.03422855
##   0.0142  0.05724294  0.7593773  0.03433276
##   0.0144  0.05738185  0.7586044  0.03443815
##   0.0146  0.05752075  0.7578302  0.03454377
##   0.0148  0.05766066  0.7570473  0.03464960
##   0.0150  0.05779924  0.7562806  0.03475404
##   0.0152  0.05793679  0.7555234  0.03485804
##   0.0154  0.05806982  0.7548128  0.03496040
##   0.0156  0.05820357  0.7540939  0.03506403
##   0.0158  0.05833765  0.7533705  0.03516774
##   0.0160  0.05847228  0.7526401  0.03527160
##   0.0162  0.05860676  0.7519102  0.03537543
##   0.0164  0.05873751  0.7512171  0.03547722
##   0.0166  0.05886234  0.7505885  0.03557586
##   0.0168  0.05898390  0.7499969  0.03567407
##   0.0170  0.05910481  0.7494155  0.03577302
##   0.0172  0.05922508  0.7488438  0.03587281
##   0.0174  0.05934621  0.7482631  0.03597383
##   0.0176  0.05946824  0.7476730  0.03607625
##   0.0178  0.05959034  0.7470819  0.03617881
##   0.0180  0.05971023  0.7465170  0.03628039
##   0.0182  0.05982758  0.7459853  0.03638092
##   0.0184  0.05994130  0.7455062  0.03647969
##   0.0186  0.06005231  0.7450664  0.03657636
##   0.0188  0.06016284  0.7446364  0.03667258
##   0.0190  0.06027365  0.7442072  0.03676887
##   0.0192  0.06038471  0.7437839  0.03686497
##   0.0194  0.06049656  0.7433568  0.03696148
##   0.0196  0.06060703  0.7429531  0.03705743
##   0.0198  0.06071647  0.7425729  0.03715312
##   0.0200  0.06082533  0.7422078  0.03724830
##   0.0202  0.06093109  0.7418977  0.03734134
##   0.0204  0.06103457  0.7416287  0.03743271
##   0.0206  0.06113828  0.7413646  0.03752451
##   0.0208  0.06124247  0.7411030  0.03761654
##   0.0210  0.06134657  0.7408514  0.03770848
##   0.0212  0.06145143  0.7405971  0.03780133
##   0.0214  0.06155665  0.7403452  0.03789504
##   0.0216  0.06166220  0.7401017  0.03798955
##   0.0218  0.06176723  0.7398756  0.03808334
##   0.0220  0.06187277  0.7396514  0.03817735
##   0.0222  0.06197740  0.7394519  0.03827059
##   0.0224  0.06208283  0.7392498  0.03836490
##   0.0226  0.06218916  0.7390436  0.03845995
##   0.0228  0.06229621  0.7388355  0.03855571
##   0.0230  0.06240411  0.7386244  0.03865208
##   0.0232  0.06251286  0.7384086  0.03874902
##   0.0234  0.06262244  0.7381891  0.03884700
##   0.0236  0.06273264  0.7379717  0.03894596
##   0.0238  0.06284373  0.7377500  0.03904633
##   0.0240  0.06295562  0.7375238  0.03914728
##   0.0242  0.06306833  0.7372931  0.03924905
##   0.0244  0.06318164  0.7370596  0.03935116
##   0.0246  0.06329584  0.7368198  0.03945412
##   0.0248  0.06341090  0.7365748  0.03955850
##   0.0250  0.06352679  0.7363247  0.03966396
##   0.0252  0.06364350  0.7360698  0.03977012
##   0.0254  0.06376095  0.7358110  0.03987683
##   0.0256  0.06387923  0.7355468  0.03998454
##   0.0258  0.06399824  0.7352789  0.04009349
##   0.0260  0.06411779  0.7350100  0.04020307
##   0.0262  0.06423803  0.7347357  0.04031308
##   0.0264  0.06435898  0.7344560  0.04042375
##   0.0266  0.06448062  0.7341704  0.04053469
##   0.0268  0.06460311  0.7338759  0.04064545
##   0.0270  0.06472630  0.7335748  0.04075665
##   0.0272  0.06485025  0.7332683  0.04086871
##   0.0274  0.06497494  0.7329562  0.04098198
##   0.0276  0.06510038  0.7326385  0.04109604
##   0.0278  0.06522641  0.7323174  0.04121087
##   0.0280  0.06535316  0.7319905  0.04132642
##   0.0282  0.06548044  0.7316615  0.04144284
##   0.0284  0.06560784  0.7313366  0.04155960
##   0.0286  0.06573566  0.7310104  0.04167679
##   0.0288  0.06586393  0.7306826  0.04179458
##   0.0290  0.06599287  0.7303497  0.04191305
##   0.0292  0.06612255  0.7300102  0.04203234
##   0.0294  0.06625304  0.7296625  0.04215220
##   0.0296  0.06638423  0.7293083  0.04227254
##   0.0298  0.06651612  0.7289474  0.04239350
##   0.0300  0.06664850  0.7285815  0.04251476
##   0.0302  0.06678151  0.7282096  0.04263644
##   0.0304  0.06691485  0.7278404  0.04275824
##   0.0306  0.06704861  0.7274721  0.04288009
##   0.0308  0.06718301  0.7270983  0.04300274
##   0.0310  0.06731784  0.7267249  0.04312587
##   0.0312  0.06745323  0.7263471  0.04324940
##   0.0314  0.06758871  0.7259719  0.04337319
##   0.0316  0.06772361  0.7256120  0.04349695
##   0.0318  0.06785913  0.7252465  0.04362151
##   0.0320  0.06799486  0.7248837  0.04374615
##   0.0322  0.06812970  0.7245408  0.04386984
##   0.0324  0.06826501  0.7241940  0.04399418
##   0.0326  0.06840095  0.7238408  0.04411911
##   0.0328  0.06853732  0.7234850  0.04424452
##   0.0330  0.06867402  0.7231275  0.04437017
##   0.0332  0.06881118  0.7227691  0.04449612
##   0.0334  0.06894880  0.7224098  0.04462266
##   0.0336  0.06908705  0.7220434  0.04474986
##   0.0338  0.06922592  0.7216699  0.04487792
##   0.0340  0.06936525  0.7212919  0.04500645
##   0.0342  0.06950477  0.7209139  0.04513521
##   0.0344  0.06964444  0.7205399  0.04526443
##   0.0346  0.06978320  0.7201897  0.04539317
##   0.0348  0.06992230  0.7198374  0.04552248
##   0.0350  0.07006186  0.7194832  0.04565206
##   0.0352  0.07020107  0.7191468  0.04578150
##   0.0354  0.07034032  0.7188169  0.04591106
##   0.0356  0.07048017  0.7184802  0.04604106
##   0.0358  0.07062060  0.7181368  0.04617133
##   0.0360  0.07076157  0.7177876  0.04630174
##   0.0362  0.07090313  0.7174310  0.04643250
##   0.0364  0.07104527  0.7170671  0.04656365
##   0.0366  0.07118800  0.7166959  0.04669514
##   0.0368  0.07133132  0.7163168  0.04682721
##   0.0370  0.07147523  0.7159298  0.04695991
##   0.0372  0.07161968  0.7155352  0.04709307
##   0.0374  0.07176419  0.7151421  0.04722577
##   0.0376  0.07190902  0.7147446  0.04735838
##   0.0378  0.07205398  0.7143480  0.04749086
##   0.0380  0.07219819  0.7139685  0.04762239
##   0.0382  0.07234289  0.7135821  0.04775415
##   0.0384  0.07248814  0.7131875  0.04788625
##   0.0386  0.07263325  0.7127977  0.04801829
##   0.0388  0.07277853  0.7124091  0.04815051
##   0.0390  0.07292376  0.7120256  0.04828250
##   0.0392  0.07306953  0.7116338  0.04841464
##   0.0394  0.07321476  0.7112574  0.04854599
##   0.0396  0.07336016  0.7108788  0.04867710
##   0.0398  0.07350606  0.7104923  0.04880831
##   0.0400  0.07365249  0.7100974  0.04893973
##   0.0402  0.07379941  0.7096943  0.04907166
##   0.0404  0.07394683  0.7092827  0.04920380
##   0.0406  0.07409478  0.7088622  0.04933606
##   0.0408  0.07424310  0.7084359  0.04946829
##   0.0410  0.07439110  0.7080173  0.04959996
##   0.0412  0.07453895  0.7076033  0.04973132
##   0.0414  0.07468698  0.7071910  0.04986273
##   0.0416  0.07483421  0.7068009  0.04999322
##   0.0418  0.07498052  0.7064326  0.05012252
##   0.0420  0.07512731  0.7060563  0.05025197
##   0.0422  0.07527441  0.7056758  0.05038147
##   0.0424  0.07542013  0.7053320  0.05050978
##   0.0426  0.07556501  0.7050085  0.05063731
##   0.0428  0.07571015  0.7046804  0.05076484
##   0.0430  0.07585574  0.7043455  0.05089267
##   0.0432  0.07600170  0.7040056  0.05102107
##   0.0434  0.07614782  0.7036636  0.05114944
##   0.0436  0.07629424  0.7033171  0.05127790
##   0.0438  0.07644106  0.7029641  0.05140673
##   0.0440  0.07658811  0.7026096  0.05153559
##   0.0442  0.07673527  0.7022600  0.05166433
##   0.0444  0.07688287  0.7019033  0.05179337
##   0.0446  0.07703090  0.7015393  0.05192262
##   0.0448  0.07717900  0.7011808  0.05205151
##   0.0450  0.07732714  0.7008236  0.05218026
##   0.0452  0.07747523  0.7004722  0.05230863
##   0.0454  0.07762354  0.7001217  0.05243717
##   0.0456  0.07777206  0.6997701  0.05256584
##   0.0458  0.07792037  0.6994262  0.05269426
##   0.0460  0.07806900  0.6990767  0.05282295
##   0.0462  0.07821781  0.6987262  0.05295179
##   0.0464  0.07836628  0.6983842  0.05307993
##   0.0466  0.07851351  0.6980648  0.05320624
##   0.0468  0.07866062  0.6977484  0.05333215
##   0.0470  0.07880792  0.6974287  0.05345811
##   0.0472  0.07895558  0.6971029  0.05358430
##   0.0474  0.07910362  0.6967706  0.05371092
##   0.0476  0.07925203  0.6964317  0.05383798
##   0.0478  0.07940079  0.6960862  0.05396542
##   0.0480  0.07954993  0.6957338  0.05409309
##   0.0482  0.07969927  0.6953789  0.05422077
##   0.0484  0.07984896  0.6950174  0.05434864
##   0.0486  0.07999900  0.6946488  0.05447676
##   0.0488  0.08014940  0.6942727  0.05460512
##   0.0490  0.08030015  0.6938891  0.05473375
##   0.0492  0.08045125  0.6934977  0.05486299
##   0.0494  0.08060271  0.6930984  0.05499264
##   0.0496  0.08075450  0.6926909  0.05512269
##   0.0498  0.08090665  0.6922752  0.05525307
##   0.0500  0.08105914  0.6918509  0.05538371
##   0.0502  0.08121197  0.6914179  0.05551475
##   0.0504  0.08136514  0.6909760  0.05564602
##   0.0506  0.08151865  0.6905249  0.05577743
##   0.0508  0.08167249  0.6900645  0.05590894
##   0.0510  0.08182638  0.6895944  0.05604011
##   0.0512  0.08198057  0.6891149  0.05617159
##   0.0514  0.08213508  0.6886257  0.05630327
##   0.0516  0.08228991  0.6881266  0.05643517
##   0.0518  0.08244506  0.6876173  0.05656734
##   0.0520  0.08260052  0.6870975  0.05669973
##   0.0522  0.08275630  0.6865670  0.05683233
##   0.0524  0.08291240  0.6860256  0.05696531
##   0.0526  0.08306880  0.6854730  0.05709877
##   0.0528  0.08322544  0.6849111  0.05723267
##   0.0530  0.08338230  0.6843401  0.05736696
##   0.0532  0.08353947  0.6837572  0.05750141
##   0.0534  0.08369695  0.6831620  0.05763618
##   0.0536  0.08385472  0.6825543  0.05777115
##   0.0538  0.08401239  0.6819492  0.05790614
##   0.0540  0.08416857  0.6813990  0.05804037
##   0.0542  0.08432506  0.6808368  0.05817483
##   0.0544  0.08448184  0.6802623  0.05830945
##   0.0546  0.08463851  0.6796932  0.05844400
##   0.0548  0.08479539  0.6791152  0.05857869
##   0.0550  0.08495256  0.6785244  0.05871360
##   0.0552  0.08511002  0.6779203  0.05884872
##   0.0554  0.08526777  0.6773026  0.05898400
##   0.0556  0.08542581  0.6766709  0.05911939
##   0.0558  0.08558413  0.6760249  0.05925487
##   0.0560  0.08574274  0.6753642  0.05939056
##   0.0562  0.08590163  0.6746883  0.05952641
##   0.0564  0.08606081  0.6739970  0.05966251
##   0.0566  0.08622026  0.6732896  0.05979902
##   0.0568  0.08637999  0.6725659  0.05993573
##   0.0570  0.08654000  0.6718252  0.06007259
##   0.0572  0.08670029  0.6710672  0.06020960
##   0.0574  0.08685911  0.6703600  0.06034543
##   0.0576  0.08701817  0.6696377  0.06048152
##   0.0578  0.08717749  0.6688984  0.06061775
##   0.0580  0.08733578  0.6681975  0.06075333
##   0.0582  0.08749349  0.6675197  0.06088839
##   0.0584  0.08765086  0.6668561  0.06102305
##   0.0586  0.08780850  0.6661767  0.06115800
##   0.0588  0.08796640  0.6654811  0.06129318
##   0.0590  0.08812457  0.6647689  0.06142856
##   0.0592  0.08828291  0.6640430  0.06156399
##   0.0594  0.08844119  0.6633132  0.06169923
##   0.0596  0.08859932  0.6625829  0.06183433
##   0.0598  0.08875742  0.6618517  0.06196929
##   0.0600  0.08891505  0.6611445  0.06210383
##   0.0602  0.08907221  0.6604565  0.06223779
##   0.0604  0.08922853  0.6597922  0.06237102
##   0.0606  0.08938445  0.6591322  0.06250392
##   0.0608  0.08954063  0.6584560  0.06263701
##   0.0610  0.08969675  0.6577768  0.06276993
##   0.0612  0.08985219  0.6571242  0.06290218
##   0.0614  0.09000637  0.6565293  0.06303363
##   0.0616  0.09015859  0.6560326  0.06316367
##   0.0618  0.09031076  0.6555390  0.06329349
##   0.0620  0.09046318  0.6550326  0.06342339
##   0.0622  0.09061583  0.6545130  0.06355336
##   0.0624  0.09076872  0.6539797  0.06368357
##   0.0626  0.09092186  0.6534323  0.06381386
##   0.0628  0.09107520  0.6528714  0.06394426
##   0.0630  0.09122861  0.6523048  0.06407465
##   0.0632  0.09138224  0.6517230  0.06420522
##   0.0634  0.09153611  0.6511253  0.06433586
##   0.0636  0.09169021  0.6505112  0.06446669
##   0.0638  0.09184450  0.6498835  0.06459769
##   0.0640  0.09199864  0.6492642  0.06472857
##   0.0642  0.09215301  0.6486274  0.06485958
##   0.0644  0.09230761  0.6479726  0.06499090
##   0.0646  0.09246243  0.6472991  0.06512247
##   0.0648  0.09261748  0.6466063  0.06525428
##   0.0650  0.09277275  0.6458935  0.06538617
##   0.0652  0.09292704  0.6452376  0.06551726
##   0.0654  0.09307999  0.6446583  0.06564698
##   0.0656  0.09323295  0.6440739  0.06577659
##   0.0658  0.09338612  0.6434725  0.06590635
##   0.0660  0.09353901  0.6428932  0.06603578
##   0.0662  0.09368973  0.6424661  0.06616299
##   0.0664  0.09383848  0.6421445  0.06628853
##   0.0666  0.09398731  0.6418195  0.06641417
##   0.0668  0.09413635  0.6414846  0.06654002
##   0.0670  0.09428537  0.6411551  0.06666582
##   0.0672  0.09443425  0.6408334  0.06679141
##   0.0674  0.09458160  0.6406166  0.06691560
##   0.0676  0.09472803  0.6404808  0.06703876
##   0.0678  0.09487454  0.6403516  0.06716194
##   0.0680  0.09502123  0.6402180  0.06728516
##   0.0682  0.09516814  0.6400796  0.06740845
##   0.0684  0.09531526  0.6399360  0.06753181
##   0.0686  0.09546240  0.6398005  0.06765505
##   0.0688  0.09560906  0.6397058  0.06777782
##   0.0690  0.09575594  0.6396075  0.06790085
##   0.0692  0.09590302  0.6395052  0.06802409
##   0.0694  0.09605030  0.6393990  0.06814747
##   0.0696  0.09619779  0.6392885  0.06827101
##   0.0698  0.09634548  0.6391737  0.06839475
##   0.0700  0.09649338  0.6390542  0.06851871
##   0.0702  0.09664147  0.6389299  0.06864278
##   0.0704  0.09678977  0.6388005  0.06876687
##   0.0706  0.09693826  0.6386658  0.06889116
##   0.0708  0.09708696  0.6385256  0.06901572
##   0.0710  0.09723585  0.6383795  0.06914032
##   0.0712  0.09738493  0.6382274  0.06926499
##   0.0714  0.09753421  0.6380687  0.06938976
##   0.0716  0.09768369  0.6379034  0.06951456
##   0.0718  0.09783321  0.6377434  0.06963923
##   0.0720  0.09798291  0.6375778  0.06976396
##   0.0722  0.09813280  0.6374049  0.06988879
##   0.0724  0.09828287  0.6372262  0.07001378
##   0.0726  0.09843295  0.6370662  0.07013877
##   0.0728  0.09858322  0.6368994  0.07026384
##   0.0730  0.09873368  0.6367246  0.07038906
##   0.0732  0.09888433  0.6365413  0.07051431
##   0.0734  0.09903517  0.6363491  0.07063962
##   0.0736  0.09918619  0.6361474  0.07076505
##   0.0738  0.09933679  0.6359806  0.07089004
##   0.0740  0.09948676  0.6358684  0.07101445
##   0.0742  0.09963651  0.6357877  0.07113863
##   0.0744  0.09978645  0.6357029  0.07126291
##   0.0746  0.09993599  0.6356741  0.07138660
##   0.0748  0.10008567  0.6356477  0.07151044
##   0.0750  0.10023553  0.6356206  0.07163440
##   0.0752  0.10038557  0.6355927  0.07175843
##   0.0754  0.10053557  0.6355874  0.07188238
##   0.0756  0.10068569  0.6355874  0.07200637
##   0.0758  0.10083600  0.6355874  0.07213039
##   0.0760  0.10098647  0.6355874  0.07225450
##   0.0762  0.10113712  0.6355874  0.07237883
##   0.0764  0.10128794  0.6355874  0.07250332
##   0.0766  0.10143894  0.6355874  0.07262797
##   0.0768  0.10159010  0.6355874  0.07275272
##   0.0770  0.10174144  0.6355874  0.07287762
##   0.0772  0.10189294  0.6355874  0.07300269
##   0.0774  0.10204462  0.6355874  0.07312782
##   0.0776  0.10219646  0.6355874  0.07325319
##   0.0778  0.10234847  0.6355874  0.07337885
##   0.0780  0.10250064  0.6355874  0.07350464
##   0.0782  0.10265298  0.6355874  0.07363050
##   0.0784  0.10280549  0.6355874  0.07375649
##   0.0786  0.10295816  0.6355874  0.07388272
##   0.0788  0.10311099  0.6355874  0.07400902
##   0.0790  0.10326399  0.6355874  0.07413548
##   0.0792  0.10341715  0.6355874  0.07426195
##   0.0794  0.10357047  0.6355874  0.07438848
##   0.0796  0.10372395  0.6355874  0.07451512
##   0.0798  0.10387759  0.6355874  0.07464191
##   0.0800  0.10403138  0.6355874  0.07476875
##   0.0802  0.10418534  0.6355874  0.07489561
##   0.0804  0.10433946  0.6355874  0.07502270
##   0.0806  0.10449373  0.6355874  0.07514984
##   0.0808  0.10464816  0.6355874  0.07527708
##   0.0810  0.10480274  0.6355874  0.07540440
##   0.0812  0.10495748  0.6355874  0.07553180
##   0.0814  0.10511237  0.6355874  0.07565934
##   0.0816  0.10526742  0.6355874  0.07578693
##   0.0818  0.10542262  0.6355874  0.07591467
##   0.0820  0.10557797  0.6355874  0.07604245
##   0.0822  0.10573347  0.6355874  0.07617027
##   0.0824  0.10588913  0.6355874  0.07629817
##   0.0826  0.10604493  0.6355874  0.07642611
##   0.0828  0.10620089  0.6355874  0.07655424
##   0.0830  0.10635676  0.6367800  0.07668237
##   0.0832  0.10650637  0.6367800  0.07680517
##   0.0834  0.10665612  0.6367800  0.07692827
##   0.0836  0.10680601  0.6367800  0.07705152
##   0.0838  0.10695604  0.6367800  0.07717487
##   0.0840  0.10710359  0.6359810  0.07729644
##   0.0842  0.10724723  0.6359810  0.07741480
##   0.0844  0.10738810  0.6352574  0.07753082
##   0.0846  0.10752555  0.6352574  0.07764388
##   0.0848  0.10766312  0.6352574  0.07775705
##   0.0850  0.10780083  0.6352574  0.07787031
##   0.0852  0.10793866  0.6352574  0.07798365
##   0.0854  0.10807661  0.6352574  0.07809741
##   0.0856  0.10821470  0.6352574  0.07821140
##   0.0858  0.10834714  0.6375306  0.07832041
##   0.0860  0.10847953  0.6375306  0.07842937
##   0.0862  0.10860196  0.6345138  0.07853012
##   0.0864  0.10871640  0.6303803  0.07862439
##   0.0866  0.10882853  0.6303803  0.07871674
##   0.0868  0.10893691  0.6323395  0.07880635
##   0.0870  0.10904303  0.6323395  0.07889427
##   0.0872  0.10914443  0.6278031  0.07897825
##   0.0874  0.10923181  0.6263924  0.07905069
##   0.0876  0.10930490  0.6353794  0.07911149
##   0.0878  0.10937349  0.6353794  0.07916847
##   0.0880  0.10944215  0.6353794  0.07922547
##   0.0882  0.10951087  0.6353794  0.07928247
##   0.0884  0.10957966  0.6353794  0.07933950
##   0.0886  0.10964347  0.6335537  0.07939220
##   0.0888  0.10970486  0.6337036  0.07944290
##   0.0890  0.10974487  0.6259503  0.07947652
##   0.0892  0.10976753  0.6009704  0.07949590
##   0.0894  0.10977941  0.6009704  0.07950619
##   0.0896  0.10979129  0.6009704  0.07951648
##   0.0898  0.10979785  0.5842176  0.07952231
##   0.0900  0.10980354  0.5842176  0.07952742
##   0.0902  0.10980924  0.5842176  0.07953253
##   0.0904  0.10981495  0.5842176  0.07953764
##   0.0906  0.10982066  0.5842176  0.07954275
##   0.0908  0.10982136        NaN  0.07954336
##   0.0910  0.10982136        NaN  0.07954336
##   0.0912  0.10982136        NaN  0.07954336
##   0.0914  0.10982136        NaN  0.07954336
##   0.0916  0.10982136        NaN  0.07954336
##   0.0918  0.10982136        NaN  0.07954336
##   0.0920  0.10982136        NaN  0.07954336
##   0.0922  0.10982136        NaN  0.07954336
##   0.0924  0.10982136        NaN  0.07954336
##   0.0926  0.10982136        NaN  0.07954336
##   0.0928  0.10982136        NaN  0.07954336
##   0.0930  0.10982136        NaN  0.07954336
##   0.0932  0.10982136        NaN  0.07954336
##   0.0934  0.10982136        NaN  0.07954336
##   0.0936  0.10982136        NaN  0.07954336
##   0.0938  0.10982136        NaN  0.07954336
##   0.0940  0.10982136        NaN  0.07954336
##   0.0942  0.10982136        NaN  0.07954336
##   0.0944  0.10982136        NaN  0.07954336
##   0.0946  0.10982136        NaN  0.07954336
##   0.0948  0.10982136        NaN  0.07954336
##   0.0950  0.10982136        NaN  0.07954336
##   0.0952  0.10982136        NaN  0.07954336
##   0.0954  0.10982136        NaN  0.07954336
##   0.0956  0.10982136        NaN  0.07954336
##   0.0958  0.10982136        NaN  0.07954336
##   0.0960  0.10982136        NaN  0.07954336
##   0.0962  0.10982136        NaN  0.07954336
##   0.0964  0.10982136        NaN  0.07954336
##   0.0966  0.10982136        NaN  0.07954336
##   0.0968  0.10982136        NaN  0.07954336
##   0.0970  0.10982136        NaN  0.07954336
##   0.0972  0.10982136        NaN  0.07954336
##   0.0974  0.10982136        NaN  0.07954336
##   0.0976  0.10982136        NaN  0.07954336
##   0.0978  0.10982136        NaN  0.07954336
##   0.0980  0.10982136        NaN  0.07954336
##   0.0982  0.10982136        NaN  0.07954336
##   0.0984  0.10982136        NaN  0.07954336
##   0.0986  0.10982136        NaN  0.07954336
##   0.0988  0.10982136        NaN  0.07954336
##   0.0990  0.10982136        NaN  0.07954336
##   0.0992  0.10982136        NaN  0.07954336
##   0.0994  0.10982136        NaN  0.07954336
##   0.0996  0.10982136        NaN  0.07954336
##   0.0998  0.10982136        NaN  0.07954336
##   0.1000  0.10982136        NaN  0.07954336
## 
## Tuning parameter 'alpha' was held constant at a value of 1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 1 and lambda = 0.0036.

Stochastic Gradient Boosting with RepeatedCV

set.seed(55)
control <- trainControl(method = "repeatedcv",
                        number = 5,
                        repeats = 5,
                        verboseIter = FALSE)

gbm_model <- train(SalePrice ~. ,
               data = train_df,
               method = "gbm",
               trControl = control)
gbm_model
## Stochastic Gradient Boosting 
## 
## 1169 samples
##   79 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 5 times) 
## Summary of sample sizes: 934, 935, 935, 936, 936, 936, ... 
## Resampling results across tuning parameters:
## 
##   interaction.depth  n.trees  RMSE        Rsquared   MAE       
##   1                   50      0.04988156  0.8108476  0.03208121
##   1                  100      0.04549907  0.8325540  0.02850307
##   1                  150      0.04474028  0.8378777  0.02753639
##   2                   50      0.04559703  0.8335385  0.02811659
##   2                  100      0.04330617  0.8482977  0.02585975
##   2                  150      0.04227966  0.8551180  0.02493875
##   3                   50      0.04389564  0.8445934  0.02642121
##   3                  100      0.04185455  0.8574481  0.02447859
##   3                  150      0.04080081  0.8651538  0.02376450
## 
## Tuning parameter 'shrinkage' was held constant at a value of 0.1
## 
## Tuning parameter 'n.minobsinnode' was held constant at a value of 10
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 150, interaction.depth =
##  3, shrinkage = 0.1 and n.minobsinnode = 10.

eXtreme Gradient Boosting with RepeatedCV

set.seed(55)
control <- trainControl(method = "repeatedcv",
                        repeats = 5,
                        number = 5,
                        verboseIter = FALSE)

xgboost_model <- train(SalePrice ~. ,
               data = train_df,
               method = "xgbTree",
               trControl = control)
xgboost_model
## eXtreme Gradient Boosting 
## 
## 1169 samples
##   79 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold, repeated 5 times) 
## Summary of sample sizes: 934, 935, 935, 936, 936, 936, ... 
## Resampling results across tuning parameters:
## 
##   eta  max_depth  colsample_bytree  subsample  nrounds  RMSE        Rsquared 
##   0.3  1          0.6               0.50        50      0.04666810  0.8222816
##   0.3  1          0.6               0.50       100      0.04389896  0.8434252
##   0.3  1          0.6               0.50       150      0.04307133  0.8492586
##   0.3  1          0.6               0.75        50      0.04522468  0.8340231
##   0.3  1          0.6               0.75       100      0.04314480  0.8490399
##   0.3  1          0.6               0.75       150      0.04210926  0.8564211
##   0.3  1          0.6               1.00        50      0.04450992  0.8400402
##   0.3  1          0.6               1.00       100      0.04179542  0.8590594
##   0.3  1          0.6               1.00       150      0.04072539  0.8662231
##   0.3  1          0.8               0.50        50      0.04603343  0.8282794
##   0.3  1          0.8               0.50       100      0.04460901  0.8379362
##   0.3  1          0.8               0.50       150      0.04420564  0.8408984
##   0.3  1          0.8               0.75        50      0.04546706  0.8322798
##   0.3  1          0.8               0.75       100      0.04342871  0.8469451
##   0.3  1          0.8               0.75       150      0.04251441  0.8533721
##   0.3  1          0.8               1.00        50      0.04472090  0.8382726
##   0.3  1          0.8               1.00       100      0.04229729  0.8551669
##   0.3  1          0.8               1.00       150      0.04128870  0.8618764
##   0.3  2          0.6               0.50        50      0.04455563  0.8392005
##   0.3  2          0.6               0.50       100      0.04354272  0.8461002
##   0.3  2          0.6               0.50       150      0.04292920  0.8509368
##   0.3  2          0.6               0.75        50      0.04195861  0.8565356
##   0.3  2          0.6               0.75       100      0.04119405  0.8630885
##   0.3  2          0.6               0.75       150      0.04068898  0.8664690
##   0.3  2          0.6               1.00        50      0.04124803  0.8616308
##   0.3  2          0.6               1.00       100      0.03993713  0.8702749
##   0.3  2          0.6               1.00       150      0.03950825  0.8732313
##   0.3  2          0.8               0.50        50      0.04426876  0.8415029
##   0.3  2          0.8               0.50       100      0.04276487  0.8525843
##   0.3  2          0.8               0.50       150      0.04233918  0.8562013
##   0.3  2          0.8               0.75        50      0.04215356  0.8556172
##   0.3  2          0.8               0.75       100      0.04096162  0.8636792
##   0.3  2          0.8               0.75       150      0.04047555  0.8670668
##   0.3  2          0.8               1.00        50      0.04096968  0.8651001
##   0.3  2          0.8               1.00       100      0.03967595  0.8736744
##   0.3  2          0.8               1.00       150      0.03928677  0.8760867
##   0.3  3          0.6               0.50        50      0.04395390  0.8446129
##   0.3  3          0.6               0.50       100      0.04350677  0.8480621
##   0.3  3          0.6               0.50       150      0.04354990  0.8478870
##   0.3  3          0.6               0.75        50      0.04192066  0.8590029
##   0.3  3          0.6               0.75       100      0.04151973  0.8620221
##   0.3  3          0.6               0.75       150      0.04130315  0.8634516
##   0.3  3          0.6               1.00        50      0.04086992  0.8653029
##   0.3  3          0.6               1.00       100      0.04022240  0.8692102
##   0.3  3          0.6               1.00       150      0.04005197  0.8702001
##   0.3  3          0.8               0.50        50      0.04259803  0.8533075
##   0.3  3          0.8               0.50       100      0.04217077  0.8563402
##   0.3  3          0.8               0.50       150      0.04191975  0.8581681
##   0.3  3          0.8               0.75        50      0.04116572  0.8620571
##   0.3  3          0.8               0.75       100      0.04071964  0.8655462
##   0.3  3          0.8               0.75       150      0.04065327  0.8659396
##   0.3  3          0.8               1.00        50      0.04082889  0.8658059
##   0.3  3          0.8               1.00       100      0.04017897  0.8702059
##   0.3  3          0.8               1.00       150      0.04008968  0.8708165
##   0.4  1          0.6               0.50        50      0.04909205  0.8070808
##   0.4  1          0.6               0.50       100      0.04629840  0.8279314
##   0.4  1          0.6               0.50       150      0.04529632  0.8353251
##   0.4  1          0.6               0.75        50      0.04537138  0.8331203
##   0.4  1          0.6               0.75       100      0.04326242  0.8488621
##   0.4  1          0.6               0.75       150      0.04233739  0.8556701
##   0.4  1          0.6               1.00        50      0.04477728  0.8376783
##   0.4  1          0.6               1.00       100      0.04198712  0.8572797
##   0.4  1          0.6               1.00       150      0.04075005  0.8656987
##   0.4  1          0.8               0.50        50      0.04672960  0.8222226
##   0.4  1          0.8               0.50       100      0.04477340  0.8377956
##   0.4  1          0.8               0.50       150      0.04399271  0.8436509
##   0.4  1          0.8               0.75        50      0.04608577  0.8282226
##   0.4  1          0.8               0.75       100      0.04395508  0.8437708
##   0.4  1          0.8               0.75       150      0.04328696  0.8477514
##   0.4  1          0.8               1.00        50      0.04464676  0.8386428
##   0.4  1          0.8               1.00       100      0.04201217  0.8575910
##   0.4  1          0.8               1.00       150      0.04085886  0.8652383
##   0.4  2          0.6               0.50        50      0.04570131  0.8312397
##   0.4  2          0.6               0.50       100      0.04495909  0.8366255
##   0.4  2          0.6               0.50       150      0.04480582  0.8379476
##   0.4  2          0.6               0.75        50      0.04306136  0.8504284
##   0.4  2          0.6               0.75       100      0.04174834  0.8591871
##   0.4  2          0.6               0.75       150      0.04161111  0.8603478
##   0.4  2          0.6               1.00        50      0.04180129  0.8588491
##   0.4  2          0.6               1.00       100      0.04065542  0.8669360
##   0.4  2          0.6               1.00       150      0.04015888  0.8702007
##   0.4  2          0.8               0.50        50      0.04503367  0.8345773
##   0.4  2          0.8               0.50       100      0.04380941  0.8429576
##   0.4  2          0.8               0.50       150      0.04367132  0.8443337
##   0.4  2          0.8               0.75        50      0.04217579  0.8545164
##   0.4  2          0.8               0.75       100      0.04106282  0.8621827
##   0.4  2          0.8               0.75       150      0.04078791  0.8637340
##   0.4  2          0.8               1.00        50      0.04113160  0.8630861
##   0.4  2          0.8               1.00       100      0.04037025  0.8684953
##   0.4  2          0.8               1.00       150      0.03990118  0.8716743
##   0.4  3          0.6               0.50        50      0.04485524  0.8379344
##   0.4  3          0.6               0.50       100      0.04437346  0.8420352
##   0.4  3          0.6               0.50       150      0.04455475  0.8406748
##   0.4  3          0.6               0.75        50      0.04291738  0.8509567
##   0.4  3          0.6               0.75       100      0.04277700  0.8520056
##   0.4  3          0.6               0.75       150      0.04266493  0.8529617
##   0.4  3          0.6               1.00        50      0.04256776  0.8535387
##   0.4  3          0.6               1.00       100      0.04195062  0.8576612
##   0.4  3          0.6               1.00       150      0.04180185  0.8587902
##   0.4  3          0.8               0.50        50      0.04503155  0.8372493
##   0.4  3          0.8               0.50       100      0.04497415  0.8375359
##   0.4  3          0.8               0.50       150      0.04509112  0.8366474
##   0.4  3          0.8               0.75        50      0.04316254  0.8483986
##   0.4  3          0.8               0.75       100      0.04291841  0.8506797
##   0.4  3          0.8               0.75       150      0.04281373  0.8514134
##   0.4  3          0.8               1.00        50      0.04129532  0.8628121
##   0.4  3          0.8               1.00       100      0.04086382  0.8655789
##   0.4  3          0.8               1.00       150      0.04074503  0.8663587
##   MAE       
##   0.03001082
##   0.02736020
##   0.02637073
##   0.02928508
##   0.02684440
##   0.02577014
##   0.02864958
##   0.02638172
##   0.02527026
##   0.02950558
##   0.02779960
##   0.02686287
##   0.02919927
##   0.02690909
##   0.02572395
##   0.02859874
##   0.02639199
##   0.02535859
##   0.02705550
##   0.02594649
##   0.02545092
##   0.02573945
##   0.02444504
##   0.02394855
##   0.02524177
##   0.02399348
##   0.02347391
##   0.02652959
##   0.02516193
##   0.02485469
##   0.02533370
##   0.02421569
##   0.02377764
##   0.02529678
##   0.02404901
##   0.02358020
##   0.02614266
##   0.02588245
##   0.02593841
##   0.02497857
##   0.02449108
##   0.02435101
##   0.02416022
##   0.02351333
##   0.02330846
##   0.02544564
##   0.02503152
##   0.02498328
##   0.02458161
##   0.02414855
##   0.02417589
##   0.02422676
##   0.02366818
##   0.02357368
##   0.03129159
##   0.02878475
##   0.02761522
##   0.02936444
##   0.02694053
##   0.02591381
##   0.02923004
##   0.02661285
##   0.02544173
##   0.03009048
##   0.02796800
##   0.02684121
##   0.02947253
##   0.02705066
##   0.02614667
##   0.02898076
##   0.02645329
##   0.02533884
##   0.02780905
##   0.02675767
##   0.02648729
##   0.02651737
##   0.02518546
##   0.02478980
##   0.02586885
##   0.02463968
##   0.02414782
##   0.02727373
##   0.02638282
##   0.02608367
##   0.02591073
##   0.02479389
##   0.02437764
##   0.02548640
##   0.02436406
##   0.02389855
##   0.02750330
##   0.02741406
##   0.02764022
##   0.02607903
##   0.02587065
##   0.02584057
##   0.02556683
##   0.02500330
##   0.02489852
##   0.02751847
##   0.02750749
##   0.02761151
##   0.02583099
##   0.02554132
##   0.02561996
##   0.02514468
##   0.02469098
##   0.02452470
## 
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning
##  parameter 'min_child_weight' was held constant at a value of 1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nrounds = 150, max_depth = 2, eta
##  = 0.3, gamma = 0, colsample_bytree = 0.8, min_child_weight = 1 and subsample
##  = 1.

Plot the RMSE of all models

rmse_models <- data.frame(model = c("lm", "rf", "lasso", "gbm", "xgboost"),
                          RMSE = rep(0, times = 5))

lm <- min(lm_model$results$RMSE)
rf <- min(rf_model$results$RMSE)
lasso <- min(lasso_model$results$RMSE)
gbm <- min(gbm_model$results$RMSE)
xgboost <- min(xgboost_model$results$RMSE)
rmse_models$RMSE <- c(lm, rf, lasso, gbm, xgboost)
rmse_models
##     model       RMSE
## 1      lm 0.07895367
## 2      rf 0.04076242
## 3   lasso 0.05110274
## 4     gbm 0.04080081
## 5 xgboost 0.03928677
rmse_models %>%
  ggplot( aes(model, RMSE)) + 
    geom_point(color = "#6fa8dc", size = 3)+
    scale_x_discrete(name = "Model",
                     limits = c("lm", "lasso", "rf",  "gbm", "xgboost")) +
    theme_minimal() +
    labs(title = "RMSE of train models with RepeatedCV 5 reps") +
    theme(plot.title = element_text(hjust = 0.5))

Scoring eXtreme Gradient Boosting (XGBoost) Model

eXtreme Gradient Boosting (XGBoost) is used to test data.

xgboost_pred <- predict(xgboost_model, newdata = test_df)
xgboost_pred
##   [1] 0.13150251 0.44251642 0.25439680 0.17116161 0.14257967 0.15466762
##   [7] 0.13675860 0.02799611 0.07449380 0.05345283 0.17289414 0.10403908
##  [13] 0.15116212 0.28815109 0.05480476 0.11477421 0.11968883 0.10978008
##  [19] 0.06646258 0.41327205 0.26152790 0.45851231 0.07919944 0.07636599
##  [25] 0.24337997 0.24508503 0.15811375 0.12343169 0.30912596 0.28613058
##  [31] 0.22271970 0.10243414 0.23148160 0.23764129 0.24974819 0.11196637
##  [37] 0.21666287 0.15151586 0.28270608 0.14155176 0.13543865 0.26642409
##  [43] 0.22964168 0.18288073 0.14534533 0.13676836 0.13539080 0.15092991
##  [49] 0.33427900 0.21512036 0.24807067 0.22445007 0.21119039 0.32499161
##  [55] 0.19627394 0.17907615 0.31452334 0.20481187 0.14581819 0.26950783
##  [61] 0.09320923 0.49407145 0.30332670 0.05720381 0.34738237 0.12261432
##  [67] 0.26205599 0.36085150 0.06661431 0.14788111 0.39886516 0.18646064
##  [73] 0.15192996 0.18586703 0.23160821 0.05268119 0.15621568 0.12527624
##  [79] 0.18733585 0.34893045 0.08611120 0.28900957 0.08832998 0.24036060
##  [85] 0.09708202 0.13939299 0.15248437 0.16009468 0.24568677 0.22660200
##  [91] 0.11887615 0.27373832 0.09770814 0.11843304 0.19613321 0.14204632
##  [97] 0.33370957 0.45607248 0.19986430 0.46360207 0.13290356 0.21588953
## [103] 0.21792135 0.09701488 0.15922403 0.22134537 0.12542012 0.21933360
## [109] 0.11122136 0.45698202 0.28073066 0.11548221 0.09895558 0.11429666
## [115] 0.21407373 0.11335277 0.10693188 0.14832817 0.23150600 0.18698069
## [121] 0.16716379 0.25191271 0.38142872 0.43913841 0.36676925 0.02669199
## [127] 0.33664861 0.27153680 0.40985250 0.15649031 0.43348277 0.08837362
## [133] 0.26677337 0.15738320 0.38629380 0.18934050 0.28617549 0.16239768
## [139] 0.16678494 0.15947028 0.24206369 0.24941534 0.07198805 0.38887686
## [145] 0.21220617 0.32104957 0.26406756 0.13399285 0.26776198 0.40295184
## [151] 0.21369059 0.15836287 0.16764195 0.14922999 0.17664856 0.56240207
## [157] 0.22914331 0.14560381 0.08633146 0.14659415 0.14809911 0.18666627
## [163] 0.14015257 0.18344754 0.14261857 0.19146712 0.14181578 0.20267802
## [169] 0.20886506 0.16610961 0.15190566 0.24268080 0.39168942 0.43590230
## [175] 0.15152995 0.11650676 0.22192718 0.14591286 0.21977021 0.21988076
## [181] 0.22568497 0.27137905 0.27486956 0.25116962 0.12541458 0.38111007
## [187] 0.16188148 0.12980931 0.17115897 0.21023828 0.09753671 0.17998758
## [193] 0.28217828 0.19206160 0.21467726 0.11713880 0.28229335 0.06138631
## [199] 0.41509894 0.20702155 0.13764364 0.54881531 0.22673418 0.27730498
## [205] 0.41055229 0.23054627 0.14308000 0.31015179 0.22322597 0.12924951
## [211] 0.15492305 0.15638930 0.14095055 0.13639787 0.10015264 0.21818969
## [217] 0.13948335 0.07463707 0.17571108 0.28945550 0.24134709 0.25195652
## [223] 0.25830767 0.10971408 0.22105433 0.20453313 0.18250215 0.17515698
## [229] 0.13202187 0.09886902 0.22007719 0.23213108 0.09765033 0.12236453
## [235] 0.06487637 0.16121688 0.14195485 0.16901554 0.14525028 0.16411780
## [241] 0.26578727 0.24731115 0.17508003 0.13228719 0.20442735 0.21831466
## [247] 0.15339199 0.14577495 0.12253786 0.15535423 0.15445724 0.23753421
## [253] 0.23274359 0.15245944 0.16486089 0.16169873 0.14730576 0.18775576
## [259] 0.37469339 0.19548468 0.47440901 0.10817692 0.26178977 0.12640603
## [265] 0.13251229 0.05894818 0.32768363 0.45641726 0.19958034 0.18651265
## [271] 0.27640688 0.28581539 0.27146292 0.14948590 0.17688911 0.08806807
## [277] 0.15303612 0.14972748 0.16079675 0.18398161 0.14680324 0.10120692
## [283] 0.24523167 0.27141416 0.21198121 0.14505224 0.09843111 0.21380579
## [289] 0.06798968 0.15830880 0.20961148

Evaluation XGBoost Model

  • In this part, the predict data and actual data are be compared to compute the Root Mean Square Error(RMSE).

  • RMSE of train model vs. RMSE of test model

test_rmse <- sqrt(mean((test_df$SalePrice - xgboost_pred)**2))

cat("The train model RMSE: ",min(xgboost_model$results$RMSE))
## The train model RMSE:  0.03928677
cat("\nThe test model RMSE: ", test_rmse)
## 
## The test model RMSE:  0.03853753

Sum Up

In House Prices Project, there are some cleaning data and normalizing data.

eXtreme Gradient Boosting (XGBoost) is the best model that gives the lowest Root Mean Square Error(RMSE) at 0.03928677.

When the model is used to test data(unseen data set) the Root Mean Square Error(RMSE) at 0.03853753

Hence xgboost model is quite the efficient model for predicting the house prices.