To what extent do a home’s living area, overall quality, and garage area predict its sale price in Ames, Iowa?
Housing prices are influenced by a combination of structural and quality-related factors. Understanding these relationships is essential in real estate economics, valuation modeling, and data-driven decision-making. This project uses regression analysis to quantify how physical attributes of a house impact its market price.
The dataset used in this study contains detailed information on residential properties in Ames, Iowa, including structural characteristics, quality ratings, and sale prices.
Each row represents a single house, and each column represents a measurable feature of that house.
OpenIntro Ames Housing Dataset
https://www.openintro.org/data/
The dataset contains:
library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.5.2
## Warning: package 'tibble' was built under R version 4.5.2
## Warning: package 'tidyr' was built under R version 4.5.2
## Warning: package 'readr' was built under R version 4.5.2
## Warning: package 'purrr' was built under R version 4.5.2
## Warning: package 'dplyr' was built under R version 4.5.2
## Warning: package 'lubridate' was built under R version 4.5.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.0 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.2 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
ames <- read_csv("ames.csv")
## Rows: 2930 Columns: 82
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (43): MS.Zoning, Street, Alley, Lot.Shape, Land.Contour, Utilities, Lot....
## dbl (39): Order, PID, area, price, MS.SubClass, Lot.Frontage, Lot.Area, Over...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(ames)
## # A tibble: 6 × 82
## Order PID area price MS.SubClass MS.Zoning Lot.Frontage Lot.Area Street
## <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <chr>
## 1 1 5.26e8 1656 215000 20 RL 141 31770 Pave
## 2 2 5.26e8 896 105000 20 RH 80 11622 Pave
## 3 3 5.26e8 1329 172000 20 RL 81 14267 Pave
## 4 4 5.26e8 2110 244000 20 RL 93 11160 Pave
## 5 5 5.27e8 1629 189900 60 RL 74 13830 Pave
## 6 6 5.27e8 1604 195500 60 RL 78 9978 Pave
## # ℹ 73 more variables: Alley <chr>, Lot.Shape <chr>, Land.Contour <chr>,
## # Utilities <chr>, Lot.Config <chr>, Land.Slope <chr>, Neighborhood <chr>,
## # Condition.1 <chr>, Condition.2 <chr>, Bldg.Type <chr>, House.Style <chr>,
## # Overall.Qual <dbl>, Overall.Cond <dbl>, Year.Built <dbl>,
## # Year.Remod.Add <dbl>, Roof.Style <chr>, Roof.Matl <chr>,
## # Exterior.1st <chr>, Exterior.2nd <chr>, Mas.Vnr.Type <chr>,
## # Mas.Vnr.Area <dbl>, Exter.Qual <chr>, Exter.Cond <chr>, Foundation <chr>, …
str(ames)
## spc_tbl_ [2,930 × 82] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Order : num [1:2930] 1 2 3 4 5 6 7 8 9 10 ...
## $ PID : num [1:2930] 5.26e+08 5.26e+08 5.26e+08 5.26e+08 5.27e+08 ...
## $ area : num [1:2930] 1656 896 1329 2110 1629 ...
## $ price : num [1:2930] 215000 105000 172000 244000 189900 ...
## $ MS.SubClass : num [1:2930] 20 20 20 20 60 60 120 120 120 60 ...
## $ MS.Zoning : chr [1:2930] "RL" "RH" "RL" "RL" ...
## $ Lot.Frontage : num [1:2930] 141 80 81 93 74 78 41 43 39 60 ...
## $ Lot.Area : num [1:2930] 31770 11622 14267 11160 13830 ...
## $ Street : chr [1:2930] "Pave" "Pave" "Pave" "Pave" ...
## $ Alley : chr [1:2930] NA NA NA NA ...
## $ Lot.Shape : chr [1:2930] "IR1" "Reg" "IR1" "Reg" ...
## $ Land.Contour : chr [1:2930] "Lvl" "Lvl" "Lvl" "Lvl" ...
## $ Utilities : chr [1:2930] "AllPub" "AllPub" "AllPub" "AllPub" ...
## $ Lot.Config : chr [1:2930] "Corner" "Inside" "Corner" "Corner" ...
## $ Land.Slope : chr [1:2930] "Gtl" "Gtl" "Gtl" "Gtl" ...
## $ Neighborhood : chr [1:2930] "NAmes" "NAmes" "NAmes" "NAmes" ...
## $ Condition.1 : chr [1:2930] "Norm" "Feedr" "Norm" "Norm" ...
## $ Condition.2 : chr [1:2930] "Norm" "Norm" "Norm" "Norm" ...
## $ Bldg.Type : chr [1:2930] "1Fam" "1Fam" "1Fam" "1Fam" ...
## $ House.Style : chr [1:2930] "1Story" "1Story" "1Story" "1Story" ...
## $ Overall.Qual : num [1:2930] 6 5 6 7 5 6 8 8 8 7 ...
## $ Overall.Cond : num [1:2930] 5 6 6 5 5 6 5 5 5 5 ...
## $ Year.Built : num [1:2930] 1960 1961 1958 1968 1997 ...
## $ Year.Remod.Add : num [1:2930] 1960 1961 1958 1968 1998 ...
## $ Roof.Style : chr [1:2930] "Hip" "Gable" "Hip" "Hip" ...
## $ Roof.Matl : chr [1:2930] "CompShg" "CompShg" "CompShg" "CompShg" ...
## $ Exterior.1st : chr [1:2930] "BrkFace" "VinylSd" "Wd Sdng" "BrkFace" ...
## $ Exterior.2nd : chr [1:2930] "Plywood" "VinylSd" "Wd Sdng" "BrkFace" ...
## $ Mas.Vnr.Type : chr [1:2930] "Stone" "None" "BrkFace" "None" ...
## $ Mas.Vnr.Area : num [1:2930] 112 0 108 0 0 20 0 0 0 0 ...
## $ Exter.Qual : chr [1:2930] "TA" "TA" "TA" "Gd" ...
## $ Exter.Cond : chr [1:2930] "TA" "TA" "TA" "TA" ...
## $ Foundation : chr [1:2930] "CBlock" "CBlock" "CBlock" "CBlock" ...
## $ Bsmt.Qual : chr [1:2930] "TA" "TA" "TA" "TA" ...
## $ Bsmt.Cond : chr [1:2930] "Gd" "TA" "TA" "TA" ...
## $ Bsmt.Exposure : chr [1:2930] "Gd" "No" "No" "No" ...
## $ BsmtFin.Type.1 : chr [1:2930] "BLQ" "Rec" "ALQ" "ALQ" ...
## $ BsmtFin.SF.1 : num [1:2930] 639 468 923 1065 791 ...
## $ BsmtFin.Type.2 : chr [1:2930] "Unf" "LwQ" "Unf" "Unf" ...
## $ BsmtFin.SF.2 : num [1:2930] 0 144 0 0 0 0 0 0 0 0 ...
## $ Bsmt.Unf.SF : num [1:2930] 441 270 406 1045 137 ...
## $ Total.Bsmt.SF : num [1:2930] 1080 882 1329 2110 928 ...
## $ Heating : chr [1:2930] "GasA" "GasA" "GasA" "GasA" ...
## $ Heating.QC : chr [1:2930] "Fa" "TA" "TA" "Ex" ...
## $ Central.Air : chr [1:2930] "Y" "Y" "Y" "Y" ...
## $ Electrical : chr [1:2930] "SBrkr" "SBrkr" "SBrkr" "SBrkr" ...
## $ X1st.Flr.SF : num [1:2930] 1656 896 1329 2110 928 ...
## $ X2nd.Flr.SF : num [1:2930] 0 0 0 0 701 678 0 0 0 776 ...
## $ Low.Qual.Fin.SF: num [1:2930] 0 0 0 0 0 0 0 0 0 0 ...
## $ Bsmt.Full.Bath : num [1:2930] 1 0 0 1 0 0 1 0 1 0 ...
## $ Bsmt.Half.Bath : num [1:2930] 0 0 0 0 0 0 0 0 0 0 ...
## $ Full.Bath : num [1:2930] 1 1 1 2 2 2 2 2 2 2 ...
## $ Half.Bath : num [1:2930] 0 0 1 1 1 1 0 0 0 1 ...
## $ Bedroom.AbvGr : num [1:2930] 3 2 3 3 3 3 2 2 2 3 ...
## $ Kitchen.AbvGr : num [1:2930] 1 1 1 1 1 1 1 1 1 1 ...
## $ Kitchen.Qual : chr [1:2930] "TA" "TA" "Gd" "Ex" ...
## $ TotRms.AbvGrd : num [1:2930] 7 5 6 8 6 7 6 5 5 7 ...
## $ Functional : chr [1:2930] "Typ" "Typ" "Typ" "Typ" ...
## $ Fireplaces : num [1:2930] 2 0 0 2 1 1 0 0 1 1 ...
## $ Fireplace.Qu : chr [1:2930] "Gd" NA NA "TA" ...
## $ Garage.Type : chr [1:2930] "Attchd" "Attchd" "Attchd" "Attchd" ...
## $ Garage.Yr.Blt : num [1:2930] 1960 1961 1958 1968 1997 ...
## $ Garage.Finish : chr [1:2930] "Fin" "Unf" "Unf" "Fin" ...
## $ Garage.Cars : num [1:2930] 2 1 1 2 2 2 2 2 2 2 ...
## $ Garage.Area : num [1:2930] 528 730 312 522 482 470 582 506 608 442 ...
## $ Garage.Qual : chr [1:2930] "TA" "TA" "TA" "TA" ...
## $ Garage.Cond : chr [1:2930] "TA" "TA" "TA" "TA" ...
## $ Paved.Drive : chr [1:2930] "P" "Y" "Y" "Y" ...
## $ Wood.Deck.SF : num [1:2930] 210 140 393 0 212 360 0 0 237 140 ...
## $ Open.Porch.SF : num [1:2930] 62 0 36 0 34 36 0 82 152 60 ...
## $ Enclosed.Porch : num [1:2930] 0 0 0 0 0 0 170 0 0 0 ...
## $ X3Ssn.Porch : num [1:2930] 0 0 0 0 0 0 0 0 0 0 ...
## $ Screen.Porch : num [1:2930] 0 120 0 0 0 0 0 144 0 0 ...
## $ Pool.Area : num [1:2930] 0 0 0 0 0 0 0 0 0 0 ...
## $ Pool.QC : chr [1:2930] NA NA NA NA ...
## $ Fence : chr [1:2930] NA "MnPrv" NA NA ...
## $ Misc.Feature : chr [1:2930] NA NA "Gar2" NA ...
## $ Misc.Val : num [1:2930] 0 0 12500 0 0 0 0 0 0 0 ...
## $ Mo.Sold : num [1:2930] 5 6 6 4 3 6 4 1 3 6 ...
## $ Yr.Sold : num [1:2930] 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
## $ Sale.Type : chr [1:2930] "WD" "WD" "WD" "WD" ...
## $ Sale.Condition : chr [1:2930] "Normal" "Normal" "Normal" "Normal" ...
## - attr(*, "spec")=
## .. cols(
## .. Order = col_double(),
## .. PID = col_double(),
## .. area = col_double(),
## .. price = col_double(),
## .. MS.SubClass = col_double(),
## .. MS.Zoning = col_character(),
## .. Lot.Frontage = col_double(),
## .. Lot.Area = col_double(),
## .. Street = col_character(),
## .. Alley = col_character(),
## .. Lot.Shape = col_character(),
## .. Land.Contour = col_character(),
## .. Utilities = col_character(),
## .. Lot.Config = col_character(),
## .. Land.Slope = col_character(),
## .. Neighborhood = col_character(),
## .. Condition.1 = col_character(),
## .. Condition.2 = col_character(),
## .. Bldg.Type = col_character(),
## .. House.Style = col_character(),
## .. Overall.Qual = col_double(),
## .. Overall.Cond = col_double(),
## .. Year.Built = col_double(),
## .. Year.Remod.Add = col_double(),
## .. Roof.Style = col_character(),
## .. Roof.Matl = col_character(),
## .. Exterior.1st = col_character(),
## .. Exterior.2nd = col_character(),
## .. Mas.Vnr.Type = col_character(),
## .. Mas.Vnr.Area = col_double(),
## .. Exter.Qual = col_character(),
## .. Exter.Cond = col_character(),
## .. Foundation = col_character(),
## .. Bsmt.Qual = col_character(),
## .. Bsmt.Cond = col_character(),
## .. Bsmt.Exposure = col_character(),
## .. BsmtFin.Type.1 = col_character(),
## .. BsmtFin.SF.1 = col_double(),
## .. BsmtFin.Type.2 = col_character(),
## .. BsmtFin.SF.2 = col_double(),
## .. Bsmt.Unf.SF = col_double(),
## .. Total.Bsmt.SF = col_double(),
## .. Heating = col_character(),
## .. Heating.QC = col_character(),
## .. Central.Air = col_character(),
## .. Electrical = col_character(),
## .. X1st.Flr.SF = col_double(),
## .. X2nd.Flr.SF = col_double(),
## .. Low.Qual.Fin.SF = col_double(),
## .. Bsmt.Full.Bath = col_double(),
## .. Bsmt.Half.Bath = col_double(),
## .. Full.Bath = col_double(),
## .. Half.Bath = col_double(),
## .. Bedroom.AbvGr = col_double(),
## .. Kitchen.AbvGr = col_double(),
## .. Kitchen.Qual = col_character(),
## .. TotRms.AbvGrd = col_double(),
## .. Functional = col_character(),
## .. Fireplaces = col_double(),
## .. Fireplace.Qu = col_character(),
## .. Garage.Type = col_character(),
## .. Garage.Yr.Blt = col_double(),
## .. Garage.Finish = col_character(),
## .. Garage.Cars = col_double(),
## .. Garage.Area = col_double(),
## .. Garage.Qual = col_character(),
## .. Garage.Cond = col_character(),
## .. Paved.Drive = col_character(),
## .. Wood.Deck.SF = col_double(),
## .. Open.Porch.SF = col_double(),
## .. Enclosed.Porch = col_double(),
## .. X3Ssn.Porch = col_double(),
## .. Screen.Porch = col_double(),
## .. Pool.Area = col_double(),
## .. Pool.QC = col_character(),
## .. Fence = col_character(),
## .. Misc.Feature = col_character(),
## .. Misc.Val = col_double(),
## .. Mo.Sold = col_double(),
## .. Yr.Sold = col_double(),
## .. Sale.Type = col_character(),
## .. Sale.Condition = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
dim(ames)
## [1] 2930 82
summary(ames)
## Order PID area price
## Min. : 1.0 Min. :5.263e+08 Min. : 334 Min. : 12789
## 1st Qu.: 733.2 1st Qu.:5.285e+08 1st Qu.:1126 1st Qu.:129500
## Median :1465.5 Median :5.355e+08 Median :1442 Median :160000
## Mean :1465.5 Mean :7.145e+08 Mean :1500 Mean :180796
## 3rd Qu.:2197.8 3rd Qu.:9.072e+08 3rd Qu.:1743 3rd Qu.:213500
## Max. :2930.0 Max. :1.007e+09 Max. :5642 Max. :755000
##
## MS.SubClass MS.Zoning Lot.Frontage Lot.Area
## Min. : 20.00 Length:2930 Min. : 21.00 Min. : 1300
## 1st Qu.: 20.00 Class :character 1st Qu.: 58.00 1st Qu.: 7440
## Median : 50.00 Mode :character Median : 68.00 Median : 9436
## Mean : 57.39 Mean : 69.22 Mean : 10148
## 3rd Qu.: 70.00 3rd Qu.: 80.00 3rd Qu.: 11555
## Max. :190.00 Max. :313.00 Max. :215245
## NA's :490
## Street Alley Lot.Shape Land.Contour
## Length:2930 Length:2930 Length:2930 Length:2930
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Utilities Lot.Config Land.Slope Neighborhood
## Length:2930 Length:2930 Length:2930 Length:2930
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Condition.1 Condition.2 Bldg.Type House.Style
## Length:2930 Length:2930 Length:2930 Length:2930
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Overall.Qual Overall.Cond Year.Built Year.Remod.Add
## Min. : 1.000 Min. :1.000 Min. :1872 Min. :1950
## 1st Qu.: 5.000 1st Qu.:5.000 1st Qu.:1954 1st Qu.:1965
## Median : 6.000 Median :5.000 Median :1973 Median :1993
## Mean : 6.095 Mean :5.563 Mean :1971 Mean :1984
## 3rd Qu.: 7.000 3rd Qu.:6.000 3rd Qu.:2001 3rd Qu.:2004
## Max. :10.000 Max. :9.000 Max. :2010 Max. :2010
##
## Roof.Style Roof.Matl Exterior.1st Exterior.2nd
## Length:2930 Length:2930 Length:2930 Length:2930
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Mas.Vnr.Type Mas.Vnr.Area Exter.Qual Exter.Cond
## Length:2930 Min. : 0.0 Length:2930 Length:2930
## Class :character 1st Qu.: 0.0 Class :character Class :character
## Mode :character Median : 0.0 Mode :character Mode :character
## Mean : 101.9
## 3rd Qu.: 164.0
## Max. :1600.0
## NA's :23
## Foundation Bsmt.Qual Bsmt.Cond Bsmt.Exposure
## Length:2930 Length:2930 Length:2930 Length:2930
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## BsmtFin.Type.1 BsmtFin.SF.1 BsmtFin.Type.2 BsmtFin.SF.2
## Length:2930 Min. : 0.0 Length:2930 Min. : 0.00
## Class :character 1st Qu.: 0.0 Class :character 1st Qu.: 0.00
## Mode :character Median : 370.0 Mode :character Median : 0.00
## Mean : 442.6 Mean : 49.72
## 3rd Qu.: 734.0 3rd Qu.: 0.00
## Max. :5644.0 Max. :1526.00
## NA's :1 NA's :1
## Bsmt.Unf.SF Total.Bsmt.SF Heating Heating.QC
## Min. : 0.0 Min. : 0 Length:2930 Length:2930
## 1st Qu.: 219.0 1st Qu.: 793 Class :character Class :character
## Median : 466.0 Median : 990 Mode :character Mode :character
## Mean : 559.3 Mean :1052
## 3rd Qu.: 802.0 3rd Qu.:1302
## Max. :2336.0 Max. :6110
## NA's :1 NA's :1
## Central.Air Electrical X1st.Flr.SF X2nd.Flr.SF
## Length:2930 Length:2930 Min. : 334.0 Min. : 0.0
## Class :character Class :character 1st Qu.: 876.2 1st Qu.: 0.0
## Mode :character Mode :character Median :1084.0 Median : 0.0
## Mean :1159.6 Mean : 335.5
## 3rd Qu.:1384.0 3rd Qu.: 703.8
## Max. :5095.0 Max. :2065.0
##
## Low.Qual.Fin.SF Bsmt.Full.Bath Bsmt.Half.Bath Full.Bath
## Min. : 0.000 Min. :0.0000 Min. :0.00000 Min. :0.000
## 1st Qu.: 0.000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:1.000
## Median : 0.000 Median :0.0000 Median :0.00000 Median :2.000
## Mean : 4.677 Mean :0.4314 Mean :0.06113 Mean :1.567
## 3rd Qu.: 0.000 3rd Qu.:1.0000 3rd Qu.:0.00000 3rd Qu.:2.000
## Max. :1064.000 Max. :3.0000 Max. :2.00000 Max. :4.000
## NA's :2 NA's :2
## Half.Bath Bedroom.AbvGr Kitchen.AbvGr Kitchen.Qual
## Min. :0.0000 Min. :0.000 Min. :0.000 Length:2930
## 1st Qu.:0.0000 1st Qu.:2.000 1st Qu.:1.000 Class :character
## Median :0.0000 Median :3.000 Median :1.000 Mode :character
## Mean :0.3795 Mean :2.854 Mean :1.044
## 3rd Qu.:1.0000 3rd Qu.:3.000 3rd Qu.:1.000
## Max. :2.0000 Max. :8.000 Max. :3.000
##
## TotRms.AbvGrd Functional Fireplaces Fireplace.Qu
## Min. : 2.000 Length:2930 Min. :0.0000 Length:2930
## 1st Qu.: 5.000 Class :character 1st Qu.:0.0000 Class :character
## Median : 6.000 Mode :character Median :1.0000 Mode :character
## Mean : 6.443 Mean :0.5993
## 3rd Qu.: 7.000 3rd Qu.:1.0000
## Max. :15.000 Max. :4.0000
##
## Garage.Type Garage.Yr.Blt Garage.Finish Garage.Cars
## Length:2930 Min. :1895 Length:2930 Min. :0.000
## Class :character 1st Qu.:1960 Class :character 1st Qu.:1.000
## Mode :character Median :1979 Mode :character Median :2.000
## Mean :1978 Mean :1.767
## 3rd Qu.:2002 3rd Qu.:2.000
## Max. :2207 Max. :5.000
## NA's :159 NA's :1
## Garage.Area Garage.Qual Garage.Cond Paved.Drive
## Min. : 0.0 Length:2930 Length:2930 Length:2930
## 1st Qu.: 320.0 Class :character Class :character Class :character
## Median : 480.0 Mode :character Mode :character Mode :character
## Mean : 472.8
## 3rd Qu.: 576.0
## Max. :1488.0
## NA's :1
## Wood.Deck.SF Open.Porch.SF Enclosed.Porch X3Ssn.Porch
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.000
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.000
## Median : 0.00 Median : 27.00 Median : 0.00 Median : 0.000
## Mean : 93.75 Mean : 47.53 Mean : 23.01 Mean : 2.592
## 3rd Qu.: 168.00 3rd Qu.: 70.00 3rd Qu.: 0.00 3rd Qu.: 0.000
## Max. :1424.00 Max. :742.00 Max. :1012.00 Max. :508.000
##
## Screen.Porch Pool.Area Pool.QC Fence
## Min. : 0 Min. : 0.000 Length:2930 Length:2930
## 1st Qu.: 0 1st Qu.: 0.000 Class :character Class :character
## Median : 0 Median : 0.000 Mode :character Mode :character
## Mean : 16 Mean : 2.243
## 3rd Qu.: 0 3rd Qu.: 0.000
## Max. :576 Max. :800.000
##
## Misc.Feature Misc.Val Mo.Sold Yr.Sold
## Length:2930 Min. : 0.00 Min. : 1.000 Min. :2006
## Class :character 1st Qu.: 0.00 1st Qu.: 4.000 1st Qu.:2007
## Mode :character Median : 0.00 Median : 6.000 Median :2008
## Mean : 50.64 Mean : 6.216 Mean :2008
## 3rd Qu.: 0.00 3rd Qu.: 8.000 3rd Qu.:2009
## Max. :17000.00 Max. :12.000 Max. :2010
##
## Sale.Type Sale.Condition
## Length:2930 Length:2930
## Class :character Class :character
## Mode :character Mode :character
##
##
##
##
I select only variables relevant to the regression model and remove missing values.
ames_clean <- ames |>
select(price, area, Overall.Qual, Garage.Area, Neighborhood) |>
filter(!is.na(price),!is.na(area), !is.na(Overall.Qual), !is.na(Garage.Area))
ggplot(ames_clean, aes(x = price)) +
geom_histogram(bins = 30, fill = "orange", color = "black") +
labs(title = "Distribution of House Prices",
x = "Price",
y = "Count")
ggplot(ames_clean, aes(x = price, y = Neighborhood, fill = Neighborhood)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "House Prices by Neighborhood",
x = "Neighborhood",
y = "Price")
The dataset has been reduced to key predictive variables relevant to housing price modeling. This ensures a clean structure for regression analysis in the next section.
The primary objective of this analysis is to determine whether a home’s living area, overall quality, and garage area can significantly predict its selling price. Multiple linear regression is appropriate because the response variable, house price, is quantitative, and there are multiple quantitative predictor variables. Before fitting the regression model, the relationships between the variables are explored using correlation analysis.
correlation_data <- ames_clean |>
select(price, area, Overall.Qual, Garage.Area)
cor(correlation_data)
## price area Overall.Qual Garage.Area
## price 1.0000000 0.7069307 0.7992639 0.6404008
## area 0.7069307 1.0000000 0.5708278 0.4848923
## Overall.Qual 0.7992639 0.5708278 1.0000000 0.5635025
## Garage.Area 0.6404008 0.4848923 0.5635025 1.0000000
The correlation matrix above summarizes the strength and direction of the linear relationship between the response variable and each predictor. Positive correlation values indicate that as one variable increases, the other tends to increase as well.
house_model1 <- lm(price ~ area , data = ames_clean)
summary(house_model1)
##
## Call:
## lm(formula = price ~ area, data = ames_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -483611 -30182 -1961 22742 334275
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13268.557 3269.535 4.058 5.07e-05 ***
## area 111.723 2.066 54.075 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 56520 on 2927 degrees of freedom
## Multiple R-squared: 0.4998, Adjusted R-squared: 0.4996
## F-statistic: 2924 on 1 and 2927 DF, p-value: < 2.2e-16
house_model <- lm(price ~ area + Overall.Qual + Garage.Area, data = ames_clean)
house_model
##
## Call:
## lm(formula = price ~ area + Overall.Qual + Garage.Area, data = ames_clean)
##
## Coefficients:
## (Intercept) area Overall.Qual Garage.Area
## -104174.44 51.07 28392.86 74.73
## Model Summary
summary(house_model)
##
## Call:
## lm(formula = price ~ area + Overall.Qual + Garage.Area, data = ames_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -413865 -21713 -1702 18451 292647
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.042e+05 3.268e+03 -31.88 <2e-16 ***
## area 5.107e+01 1.804e+00 28.32 <2e-16 ***
## Overall.Qual 2.839e+04 6.841e+02 41.51 <2e-16 ***
## Garage.Area 7.473e+01 4.214e+00 17.73 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 39320 on 2925 degrees of freedom
## Multiple R-squared: 0.7581, Adjusted R-squared: 0.7578
## F-statistic: 3055 on 3 and 2925 DF, p-value: < 2.2e-16
Slope (Living Area): For every 1-unit increase in living area, the predicted house price increases by approximately $51.07, holding overall quality and garage area constant.
Slope (Overall Quality): For every 1-unit increase in the overall quality rating, the predicted house price increases by approximately $28,392.86, holding the other variables constant.
Slope (Garage Area): For every 1-unit increase in garage area, the predicted house price increases by approximately $74.73, holding the other predictors constant.
P-values: All predictors have p-values less than 0.001, indicating that living area, overall quality, and garage area are all statistically significant predictors of house price. The overall model is also highly significant (F-test p-value < 0.001), providing strong evidence that the model explains variate house prices
The intercept (-104,174.44) is the predicted house price when the living area, overall quality, and garage area are all zero. However, these values are unrealistic for a house.
The fitted regression model has the form
Price = −104174.44+51.07(Area)+28392.86(Overall.Qual)+74.73(Garage.Area)
where
Price is the predicted selling price.
Area is the home’s living area.
Overall Quality is the overall construction quality rating.
Garage Area represents the garage size.
ggplot(ames_clean,aes(x = area,y = price)) +
geom_point(alpha = 0.3) +
geom_smooth(method = "lm",color = "red",se = FALSE) +
labs(title = "Regression Line: Living Area vs House Price",
x = "Living Area",
y = "House Price")
## `geom_smooth()` using formula = 'y ~ x'
The scatterplot with the fitted regression line illustrates the positive relationship between living area and house price.
plot(house_model, which = 1)
plot(house_model, which = 2)
The points deviate noticeably at both ends. This indicates that the residuals are not perfectly normally distributed and that a few outliers are present.
rmse <- sqrt(mean(residuals(house_model)^2))
rmse
## [1] 39293.87
The Root Mean Squared Error indicates that the model’s predicted house prices differ from the actual selling prices by about $39,294 on average. A smaller RMSE indicates better predictive accuracy.
new_house <- data.frame( area = 2000, Overall.Qual = 7, Garage.Area = 500)
predict(house_model,newdata = new_house)
## 1
## 234083.1
The model predicts that a house with 2,000 square feet of living area, an overall quality rating of 7, and a garage area of 500 square feet will have an estimated selling price of approximately $234,083.
The multiple linear regression model shows that living area, overall quality, and garage area are all significant predictors of house price (p < 0.001). Specifically, for every 1-square-foot increase in living area, the predicted house price increases by about $51.07, holding the other variables constant. For every 1-unit increase in overall quality, the predicted house price increases by about $28,392.86, and for every 1-square-foot increase in garage area, the predicted price increases by about $74.73, assuming the other predictors remain unchanged. The model explains approximately 75.8% of the variation in house prices (R² = 0.758), indicating a strong fit and good predictive performance.
The regression model, Multiple linear regression explains how measurable housing characteristics influence selling price. These analyses provide a more comprehensive understanding of the housing market in Ames, Iowa.
The objective of this project was to investigate whether important housing characteristics could predict the selling price of homes in Ames, Iowa. Using multiple linear regression, the relationship between house price and three important predictors—living area, overall quality, and garage area—was examined.
The results of the regression model indicate that these variables are useful predictors of house price. Larger homes, higher construction quality, and larger garage areas are generally associated with higher selling prices. The coefficient of determination (R-squared) describes how much of the variation in house prices is explained by the model, while the RMSE provides an estimate of the model’s prediction error.
Overall, the analyses demonstrate that both structural characteristics and neighborhood location contribute to explaining variation in residential property values.
Several improvements could strengthen this model in future studies.
Additional predictors such as the year built, basement area, number of bathrooms, lot size, and overall condition could be incorporated into a more comprehensive regression model. Feature selection techniques could also be used to determine the most influential variables. Furthermore, interaction effects between variables and nonlinear regression models may improve predictive performance.
Future research may also compare several regression models using cross-validation to determine which model produces the most accurate predictions.
OpenIntro. (2024). Ames Housing Dataset.
https://www.openintro.org/data/
Wickham, H., Averick, M., Bryan, J., et al. (2019). Welcome to the tidyverse. Journal of Open Source Software.
R Core Team. (2024). R: A Language and Environment for Statistical Computing. Vienna, Austria.
Quarto Documentation.