Introduction

Research Question

To what extent do a home’s living area, overall quality, and garage area predict its sale price in Ames, Iowa?

Housing prices are influenced by a combination of structural and quality-related factors. Understanding these relationships is essential in real estate economics, valuation modeling, and data-driven decision-making. This project uses regression analysis to quantify how physical attributes of a house impact its market price.

The dataset used in this study contains detailed information on residential properties in Ames, Iowa, including structural characteristics, quality ratings, and sale prices.

Each row represents a single house, and each column represents a measurable feature of that house.

Dataset Source

OpenIntro Ames Housing Dataset
https://www.openintro.org/data/

The dataset contains:

  • 2,930 observations
  • 82 variables

Load Libraries

library(tidyverse)
## Warning: package 'ggplot2' was built under R version 4.5.2
## Warning: package 'tibble' was built under R version 4.5.2
## Warning: package 'tidyr' was built under R version 4.5.2
## Warning: package 'readr' was built under R version 4.5.2
## Warning: package 'purrr' was built under R version 4.5.2
## Warning: package 'dplyr' was built under R version 4.5.2
## Warning: package 'lubridate' was built under R version 4.5.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Import Dataset

ames <- read_csv("ames.csv")
## Rows: 2930 Columns: 82
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (43): MS.Zoning, Street, Alley, Lot.Shape, Land.Contour, Utilities, Lot....
## dbl (39): Order, PID, area, price, MS.SubClass, Lot.Frontage, Lot.Area, Over...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data Inspection

First Rows

head(ames)
## # A tibble: 6 × 82
##   Order      PID  area  price MS.SubClass MS.Zoning Lot.Frontage Lot.Area Street
##   <dbl>    <dbl> <dbl>  <dbl>       <dbl> <chr>            <dbl>    <dbl> <chr> 
## 1     1   5.26e8  1656 215000          20 RL                 141    31770 Pave  
## 2     2   5.26e8   896 105000          20 RH                  80    11622 Pave  
## 3     3   5.26e8  1329 172000          20 RL                  81    14267 Pave  
## 4     4   5.26e8  2110 244000          20 RL                  93    11160 Pave  
## 5     5   5.27e8  1629 189900          60 RL                  74    13830 Pave  
## 6     6   5.27e8  1604 195500          60 RL                  78     9978 Pave  
## # ℹ 73 more variables: Alley <chr>, Lot.Shape <chr>, Land.Contour <chr>,
## #   Utilities <chr>, Lot.Config <chr>, Land.Slope <chr>, Neighborhood <chr>,
## #   Condition.1 <chr>, Condition.2 <chr>, Bldg.Type <chr>, House.Style <chr>,
## #   Overall.Qual <dbl>, Overall.Cond <dbl>, Year.Built <dbl>,
## #   Year.Remod.Add <dbl>, Roof.Style <chr>, Roof.Matl <chr>,
## #   Exterior.1st <chr>, Exterior.2nd <chr>, Mas.Vnr.Type <chr>,
## #   Mas.Vnr.Area <dbl>, Exter.Qual <chr>, Exter.Cond <chr>, Foundation <chr>, …

Structure

str(ames)
## spc_tbl_ [2,930 × 82] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Order          : num [1:2930] 1 2 3 4 5 6 7 8 9 10 ...
##  $ PID            : num [1:2930] 5.26e+08 5.26e+08 5.26e+08 5.26e+08 5.27e+08 ...
##  $ area           : num [1:2930] 1656 896 1329 2110 1629 ...
##  $ price          : num [1:2930] 215000 105000 172000 244000 189900 ...
##  $ MS.SubClass    : num [1:2930] 20 20 20 20 60 60 120 120 120 60 ...
##  $ MS.Zoning      : chr [1:2930] "RL" "RH" "RL" "RL" ...
##  $ Lot.Frontage   : num [1:2930] 141 80 81 93 74 78 41 43 39 60 ...
##  $ Lot.Area       : num [1:2930] 31770 11622 14267 11160 13830 ...
##  $ Street         : chr [1:2930] "Pave" "Pave" "Pave" "Pave" ...
##  $ Alley          : chr [1:2930] NA NA NA NA ...
##  $ Lot.Shape      : chr [1:2930] "IR1" "Reg" "IR1" "Reg" ...
##  $ Land.Contour   : chr [1:2930] "Lvl" "Lvl" "Lvl" "Lvl" ...
##  $ Utilities      : chr [1:2930] "AllPub" "AllPub" "AllPub" "AllPub" ...
##  $ Lot.Config     : chr [1:2930] "Corner" "Inside" "Corner" "Corner" ...
##  $ Land.Slope     : chr [1:2930] "Gtl" "Gtl" "Gtl" "Gtl" ...
##  $ Neighborhood   : chr [1:2930] "NAmes" "NAmes" "NAmes" "NAmes" ...
##  $ Condition.1    : chr [1:2930] "Norm" "Feedr" "Norm" "Norm" ...
##  $ Condition.2    : chr [1:2930] "Norm" "Norm" "Norm" "Norm" ...
##  $ Bldg.Type      : chr [1:2930] "1Fam" "1Fam" "1Fam" "1Fam" ...
##  $ House.Style    : chr [1:2930] "1Story" "1Story" "1Story" "1Story" ...
##  $ Overall.Qual   : num [1:2930] 6 5 6 7 5 6 8 8 8 7 ...
##  $ Overall.Cond   : num [1:2930] 5 6 6 5 5 6 5 5 5 5 ...
##  $ Year.Built     : num [1:2930] 1960 1961 1958 1968 1997 ...
##  $ Year.Remod.Add : num [1:2930] 1960 1961 1958 1968 1998 ...
##  $ Roof.Style     : chr [1:2930] "Hip" "Gable" "Hip" "Hip" ...
##  $ Roof.Matl      : chr [1:2930] "CompShg" "CompShg" "CompShg" "CompShg" ...
##  $ Exterior.1st   : chr [1:2930] "BrkFace" "VinylSd" "Wd Sdng" "BrkFace" ...
##  $ Exterior.2nd   : chr [1:2930] "Plywood" "VinylSd" "Wd Sdng" "BrkFace" ...
##  $ Mas.Vnr.Type   : chr [1:2930] "Stone" "None" "BrkFace" "None" ...
##  $ Mas.Vnr.Area   : num [1:2930] 112 0 108 0 0 20 0 0 0 0 ...
##  $ Exter.Qual     : chr [1:2930] "TA" "TA" "TA" "Gd" ...
##  $ Exter.Cond     : chr [1:2930] "TA" "TA" "TA" "TA" ...
##  $ Foundation     : chr [1:2930] "CBlock" "CBlock" "CBlock" "CBlock" ...
##  $ Bsmt.Qual      : chr [1:2930] "TA" "TA" "TA" "TA" ...
##  $ Bsmt.Cond      : chr [1:2930] "Gd" "TA" "TA" "TA" ...
##  $ Bsmt.Exposure  : chr [1:2930] "Gd" "No" "No" "No" ...
##  $ BsmtFin.Type.1 : chr [1:2930] "BLQ" "Rec" "ALQ" "ALQ" ...
##  $ BsmtFin.SF.1   : num [1:2930] 639 468 923 1065 791 ...
##  $ BsmtFin.Type.2 : chr [1:2930] "Unf" "LwQ" "Unf" "Unf" ...
##  $ BsmtFin.SF.2   : num [1:2930] 0 144 0 0 0 0 0 0 0 0 ...
##  $ Bsmt.Unf.SF    : num [1:2930] 441 270 406 1045 137 ...
##  $ Total.Bsmt.SF  : num [1:2930] 1080 882 1329 2110 928 ...
##  $ Heating        : chr [1:2930] "GasA" "GasA" "GasA" "GasA" ...
##  $ Heating.QC     : chr [1:2930] "Fa" "TA" "TA" "Ex" ...
##  $ Central.Air    : chr [1:2930] "Y" "Y" "Y" "Y" ...
##  $ Electrical     : chr [1:2930] "SBrkr" "SBrkr" "SBrkr" "SBrkr" ...
##  $ X1st.Flr.SF    : num [1:2930] 1656 896 1329 2110 928 ...
##  $ X2nd.Flr.SF    : num [1:2930] 0 0 0 0 701 678 0 0 0 776 ...
##  $ Low.Qual.Fin.SF: num [1:2930] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Bsmt.Full.Bath : num [1:2930] 1 0 0 1 0 0 1 0 1 0 ...
##  $ Bsmt.Half.Bath : num [1:2930] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Full.Bath      : num [1:2930] 1 1 1 2 2 2 2 2 2 2 ...
##  $ Half.Bath      : num [1:2930] 0 0 1 1 1 1 0 0 0 1 ...
##  $ Bedroom.AbvGr  : num [1:2930] 3 2 3 3 3 3 2 2 2 3 ...
##  $ Kitchen.AbvGr  : num [1:2930] 1 1 1 1 1 1 1 1 1 1 ...
##  $ Kitchen.Qual   : chr [1:2930] "TA" "TA" "Gd" "Ex" ...
##  $ TotRms.AbvGrd  : num [1:2930] 7 5 6 8 6 7 6 5 5 7 ...
##  $ Functional     : chr [1:2930] "Typ" "Typ" "Typ" "Typ" ...
##  $ Fireplaces     : num [1:2930] 2 0 0 2 1 1 0 0 1 1 ...
##  $ Fireplace.Qu   : chr [1:2930] "Gd" NA NA "TA" ...
##  $ Garage.Type    : chr [1:2930] "Attchd" "Attchd" "Attchd" "Attchd" ...
##  $ Garage.Yr.Blt  : num [1:2930] 1960 1961 1958 1968 1997 ...
##  $ Garage.Finish  : chr [1:2930] "Fin" "Unf" "Unf" "Fin" ...
##  $ Garage.Cars    : num [1:2930] 2 1 1 2 2 2 2 2 2 2 ...
##  $ Garage.Area    : num [1:2930] 528 730 312 522 482 470 582 506 608 442 ...
##  $ Garage.Qual    : chr [1:2930] "TA" "TA" "TA" "TA" ...
##  $ Garage.Cond    : chr [1:2930] "TA" "TA" "TA" "TA" ...
##  $ Paved.Drive    : chr [1:2930] "P" "Y" "Y" "Y" ...
##  $ Wood.Deck.SF   : num [1:2930] 210 140 393 0 212 360 0 0 237 140 ...
##  $ Open.Porch.SF  : num [1:2930] 62 0 36 0 34 36 0 82 152 60 ...
##  $ Enclosed.Porch : num [1:2930] 0 0 0 0 0 0 170 0 0 0 ...
##  $ X3Ssn.Porch    : num [1:2930] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Screen.Porch   : num [1:2930] 0 120 0 0 0 0 0 144 0 0 ...
##  $ Pool.Area      : num [1:2930] 0 0 0 0 0 0 0 0 0 0 ...
##  $ Pool.QC        : chr [1:2930] NA NA NA NA ...
##  $ Fence          : chr [1:2930] NA "MnPrv" NA NA ...
##  $ Misc.Feature   : chr [1:2930] NA NA "Gar2" NA ...
##  $ Misc.Val       : num [1:2930] 0 0 12500 0 0 0 0 0 0 0 ...
##  $ Mo.Sold        : num [1:2930] 5 6 6 4 3 6 4 1 3 6 ...
##  $ Yr.Sold        : num [1:2930] 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
##  $ Sale.Type      : chr [1:2930] "WD" "WD" "WD" "WD" ...
##  $ Sale.Condition : chr [1:2930] "Normal" "Normal" "Normal" "Normal" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Order = col_double(),
##   ..   PID = col_double(),
##   ..   area = col_double(),
##   ..   price = col_double(),
##   ..   MS.SubClass = col_double(),
##   ..   MS.Zoning = col_character(),
##   ..   Lot.Frontage = col_double(),
##   ..   Lot.Area = col_double(),
##   ..   Street = col_character(),
##   ..   Alley = col_character(),
##   ..   Lot.Shape = col_character(),
##   ..   Land.Contour = col_character(),
##   ..   Utilities = col_character(),
##   ..   Lot.Config = col_character(),
##   ..   Land.Slope = col_character(),
##   ..   Neighborhood = col_character(),
##   ..   Condition.1 = col_character(),
##   ..   Condition.2 = col_character(),
##   ..   Bldg.Type = col_character(),
##   ..   House.Style = col_character(),
##   ..   Overall.Qual = col_double(),
##   ..   Overall.Cond = col_double(),
##   ..   Year.Built = col_double(),
##   ..   Year.Remod.Add = col_double(),
##   ..   Roof.Style = col_character(),
##   ..   Roof.Matl = col_character(),
##   ..   Exterior.1st = col_character(),
##   ..   Exterior.2nd = col_character(),
##   ..   Mas.Vnr.Type = col_character(),
##   ..   Mas.Vnr.Area = col_double(),
##   ..   Exter.Qual = col_character(),
##   ..   Exter.Cond = col_character(),
##   ..   Foundation = col_character(),
##   ..   Bsmt.Qual = col_character(),
##   ..   Bsmt.Cond = col_character(),
##   ..   Bsmt.Exposure = col_character(),
##   ..   BsmtFin.Type.1 = col_character(),
##   ..   BsmtFin.SF.1 = col_double(),
##   ..   BsmtFin.Type.2 = col_character(),
##   ..   BsmtFin.SF.2 = col_double(),
##   ..   Bsmt.Unf.SF = col_double(),
##   ..   Total.Bsmt.SF = col_double(),
##   ..   Heating = col_character(),
##   ..   Heating.QC = col_character(),
##   ..   Central.Air = col_character(),
##   ..   Electrical = col_character(),
##   ..   X1st.Flr.SF = col_double(),
##   ..   X2nd.Flr.SF = col_double(),
##   ..   Low.Qual.Fin.SF = col_double(),
##   ..   Bsmt.Full.Bath = col_double(),
##   ..   Bsmt.Half.Bath = col_double(),
##   ..   Full.Bath = col_double(),
##   ..   Half.Bath = col_double(),
##   ..   Bedroom.AbvGr = col_double(),
##   ..   Kitchen.AbvGr = col_double(),
##   ..   Kitchen.Qual = col_character(),
##   ..   TotRms.AbvGrd = col_double(),
##   ..   Functional = col_character(),
##   ..   Fireplaces = col_double(),
##   ..   Fireplace.Qu = col_character(),
##   ..   Garage.Type = col_character(),
##   ..   Garage.Yr.Blt = col_double(),
##   ..   Garage.Finish = col_character(),
##   ..   Garage.Cars = col_double(),
##   ..   Garage.Area = col_double(),
##   ..   Garage.Qual = col_character(),
##   ..   Garage.Cond = col_character(),
##   ..   Paved.Drive = col_character(),
##   ..   Wood.Deck.SF = col_double(),
##   ..   Open.Porch.SF = col_double(),
##   ..   Enclosed.Porch = col_double(),
##   ..   X3Ssn.Porch = col_double(),
##   ..   Screen.Porch = col_double(),
##   ..   Pool.Area = col_double(),
##   ..   Pool.QC = col_character(),
##   ..   Fence = col_character(),
##   ..   Misc.Feature = col_character(),
##   ..   Misc.Val = col_double(),
##   ..   Mo.Sold = col_double(),
##   ..   Yr.Sold = col_double(),
##   ..   Sale.Type = col_character(),
##   ..   Sale.Condition = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

Dimensions

dim(ames)
## [1] 2930   82

Summary Statistics

summary(ames)
##      Order             PID                 area          price       
##  Min.   :   1.0   Min.   :5.263e+08   Min.   : 334   Min.   : 12789  
##  1st Qu.: 733.2   1st Qu.:5.285e+08   1st Qu.:1126   1st Qu.:129500  
##  Median :1465.5   Median :5.355e+08   Median :1442   Median :160000  
##  Mean   :1465.5   Mean   :7.145e+08   Mean   :1500   Mean   :180796  
##  3rd Qu.:2197.8   3rd Qu.:9.072e+08   3rd Qu.:1743   3rd Qu.:213500  
##  Max.   :2930.0   Max.   :1.007e+09   Max.   :5642   Max.   :755000  
##                                                                      
##   MS.SubClass      MS.Zoning          Lot.Frontage       Lot.Area     
##  Min.   : 20.00   Length:2930        Min.   : 21.00   Min.   :  1300  
##  1st Qu.: 20.00   Class :character   1st Qu.: 58.00   1st Qu.:  7440  
##  Median : 50.00   Mode  :character   Median : 68.00   Median :  9436  
##  Mean   : 57.39                      Mean   : 69.22   Mean   : 10148  
##  3rd Qu.: 70.00                      3rd Qu.: 80.00   3rd Qu.: 11555  
##  Max.   :190.00                      Max.   :313.00   Max.   :215245  
##                                      NA's   :490                      
##     Street             Alley            Lot.Shape         Land.Contour      
##  Length:2930        Length:2930        Length:2930        Length:2930       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   Utilities          Lot.Config         Land.Slope        Neighborhood      
##  Length:2930        Length:2930        Length:2930        Length:2930       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  Condition.1        Condition.2         Bldg.Type         House.Style       
##  Length:2930        Length:2930        Length:2930        Length:2930       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   Overall.Qual     Overall.Cond     Year.Built   Year.Remod.Add
##  Min.   : 1.000   Min.   :1.000   Min.   :1872   Min.   :1950  
##  1st Qu.: 5.000   1st Qu.:5.000   1st Qu.:1954   1st Qu.:1965  
##  Median : 6.000   Median :5.000   Median :1973   Median :1993  
##  Mean   : 6.095   Mean   :5.563   Mean   :1971   Mean   :1984  
##  3rd Qu.: 7.000   3rd Qu.:6.000   3rd Qu.:2001   3rd Qu.:2004  
##  Max.   :10.000   Max.   :9.000   Max.   :2010   Max.   :2010  
##                                                                
##   Roof.Style         Roof.Matl         Exterior.1st       Exterior.2nd      
##  Length:2930        Length:2930        Length:2930        Length:2930       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  Mas.Vnr.Type        Mas.Vnr.Area     Exter.Qual         Exter.Cond       
##  Length:2930        Min.   :   0.0   Length:2930        Length:2930       
##  Class :character   1st Qu.:   0.0   Class :character   Class :character  
##  Mode  :character   Median :   0.0   Mode  :character   Mode  :character  
##                     Mean   : 101.9                                        
##                     3rd Qu.: 164.0                                        
##                     Max.   :1600.0                                        
##                     NA's   :23                                            
##   Foundation         Bsmt.Qual          Bsmt.Cond         Bsmt.Exposure     
##  Length:2930        Length:2930        Length:2930        Length:2930       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  BsmtFin.Type.1      BsmtFin.SF.1    BsmtFin.Type.2      BsmtFin.SF.2    
##  Length:2930        Min.   :   0.0   Length:2930        Min.   :   0.00  
##  Class :character   1st Qu.:   0.0   Class :character   1st Qu.:   0.00  
##  Mode  :character   Median : 370.0   Mode  :character   Median :   0.00  
##                     Mean   : 442.6                      Mean   :  49.72  
##                     3rd Qu.: 734.0                      3rd Qu.:   0.00  
##                     Max.   :5644.0                      Max.   :1526.00  
##                     NA's   :1                           NA's   :1        
##   Bsmt.Unf.SF     Total.Bsmt.SF    Heating           Heating.QC       
##  Min.   :   0.0   Min.   :   0   Length:2930        Length:2930       
##  1st Qu.: 219.0   1st Qu.: 793   Class :character   Class :character  
##  Median : 466.0   Median : 990   Mode  :character   Mode  :character  
##  Mean   : 559.3   Mean   :1052                                        
##  3rd Qu.: 802.0   3rd Qu.:1302                                        
##  Max.   :2336.0   Max.   :6110                                        
##  NA's   :1        NA's   :1                                           
##  Central.Air         Electrical         X1st.Flr.SF      X2nd.Flr.SF    
##  Length:2930        Length:2930        Min.   : 334.0   Min.   :   0.0  
##  Class :character   Class :character   1st Qu.: 876.2   1st Qu.:   0.0  
##  Mode  :character   Mode  :character   Median :1084.0   Median :   0.0  
##                                        Mean   :1159.6   Mean   : 335.5  
##                                        3rd Qu.:1384.0   3rd Qu.: 703.8  
##                                        Max.   :5095.0   Max.   :2065.0  
##                                                                         
##  Low.Qual.Fin.SF    Bsmt.Full.Bath   Bsmt.Half.Bath      Full.Bath    
##  Min.   :   0.000   Min.   :0.0000   Min.   :0.00000   Min.   :0.000  
##  1st Qu.:   0.000   1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:1.000  
##  Median :   0.000   Median :0.0000   Median :0.00000   Median :2.000  
##  Mean   :   4.677   Mean   :0.4314   Mean   :0.06113   Mean   :1.567  
##  3rd Qu.:   0.000   3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:2.000  
##  Max.   :1064.000   Max.   :3.0000   Max.   :2.00000   Max.   :4.000  
##                     NA's   :2        NA's   :2                        
##    Half.Bath      Bedroom.AbvGr   Kitchen.AbvGr   Kitchen.Qual      
##  Min.   :0.0000   Min.   :0.000   Min.   :0.000   Length:2930       
##  1st Qu.:0.0000   1st Qu.:2.000   1st Qu.:1.000   Class :character  
##  Median :0.0000   Median :3.000   Median :1.000   Mode  :character  
##  Mean   :0.3795   Mean   :2.854   Mean   :1.044                     
##  3rd Qu.:1.0000   3rd Qu.:3.000   3rd Qu.:1.000                     
##  Max.   :2.0000   Max.   :8.000   Max.   :3.000                     
##                                                                     
##  TotRms.AbvGrd     Functional          Fireplaces     Fireplace.Qu      
##  Min.   : 2.000   Length:2930        Min.   :0.0000   Length:2930       
##  1st Qu.: 5.000   Class :character   1st Qu.:0.0000   Class :character  
##  Median : 6.000   Mode  :character   Median :1.0000   Mode  :character  
##  Mean   : 6.443                      Mean   :0.5993                     
##  3rd Qu.: 7.000                      3rd Qu.:1.0000                     
##  Max.   :15.000                      Max.   :4.0000                     
##                                                                         
##  Garage.Type        Garage.Yr.Blt  Garage.Finish       Garage.Cars   
##  Length:2930        Min.   :1895   Length:2930        Min.   :0.000  
##  Class :character   1st Qu.:1960   Class :character   1st Qu.:1.000  
##  Mode  :character   Median :1979   Mode  :character   Median :2.000  
##                     Mean   :1978                      Mean   :1.767  
##                     3rd Qu.:2002                      3rd Qu.:2.000  
##                     Max.   :2207                      Max.   :5.000  
##                     NA's   :159                       NA's   :1      
##   Garage.Area     Garage.Qual        Garage.Cond        Paved.Drive       
##  Min.   :   0.0   Length:2930        Length:2930        Length:2930       
##  1st Qu.: 320.0   Class :character   Class :character   Class :character  
##  Median : 480.0   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 472.8                                                           
##  3rd Qu.: 576.0                                                           
##  Max.   :1488.0                                                           
##  NA's   :1                                                                
##   Wood.Deck.SF     Open.Porch.SF    Enclosed.Porch     X3Ssn.Porch     
##  Min.   :   0.00   Min.   :  0.00   Min.   :   0.00   Min.   :  0.000  
##  1st Qu.:   0.00   1st Qu.:  0.00   1st Qu.:   0.00   1st Qu.:  0.000  
##  Median :   0.00   Median : 27.00   Median :   0.00   Median :  0.000  
##  Mean   :  93.75   Mean   : 47.53   Mean   :  23.01   Mean   :  2.592  
##  3rd Qu.: 168.00   3rd Qu.: 70.00   3rd Qu.:   0.00   3rd Qu.:  0.000  
##  Max.   :1424.00   Max.   :742.00   Max.   :1012.00   Max.   :508.000  
##                                                                        
##   Screen.Porch   Pool.Area         Pool.QC             Fence          
##  Min.   :  0   Min.   :  0.000   Length:2930        Length:2930       
##  1st Qu.:  0   1st Qu.:  0.000   Class :character   Class :character  
##  Median :  0   Median :  0.000   Mode  :character   Mode  :character  
##  Mean   : 16   Mean   :  2.243                                        
##  3rd Qu.:  0   3rd Qu.:  0.000                                        
##  Max.   :576   Max.   :800.000                                        
##                                                                       
##  Misc.Feature          Misc.Val           Mo.Sold          Yr.Sold    
##  Length:2930        Min.   :    0.00   Min.   : 1.000   Min.   :2006  
##  Class :character   1st Qu.:    0.00   1st Qu.: 4.000   1st Qu.:2007  
##  Mode  :character   Median :    0.00   Median : 6.000   Median :2008  
##                     Mean   :   50.64   Mean   : 6.216   Mean   :2008  
##                     3rd Qu.:    0.00   3rd Qu.: 8.000   3rd Qu.:2009  
##                     Max.   :17000.00   Max.   :12.000   Max.   :2010  
##                                                                       
##   Sale.Type         Sale.Condition    
##  Length:2930        Length:2930       
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##                                       
## 

Data Cleaning

I select only variables relevant to the regression model and remove missing values.

ames_clean <- ames |>
  select(price, area, Overall.Qual, Garage.Area, Neighborhood) |>
  filter(!is.na(price),!is.na(area), !is.na(Overall.Qual), !is.na(Garage.Area))

Exploratory Data Analysis (EDA)

Distribution of House Prices

ggplot(ames_clean, aes(x = price)) +
  geom_histogram(bins = 30, fill = "orange", color = "black") +
  labs(title = "Distribution of House Prices",
       x = "Price",
       y = "Count")


Price by Neighborhood (Basic grouping preview)

ggplot(ames_clean, aes(x = price, y = Neighborhood, fill = Neighborhood)) +
  geom_boxplot() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "House Prices by Neighborhood",
       x = "Neighborhood",
       y = "Price")


Summary of Cleaning Step

The dataset has been reduced to key predictive variables relevant to housing price modeling. This ensures a clean structure for regression analysis in the next section.

Regression Analysis

The primary objective of this analysis is to determine whether a home’s living area, overall quality, and garage area can significantly predict its selling price. Multiple linear regression is appropriate because the response variable, house price, is quantitative, and there are multiple quantitative predictor variables. Before fitting the regression model, the relationships between the variables are explored using correlation analysis.


Correlation Analysis

correlation_data <- ames_clean |>
  select(price, area, Overall.Qual, Garage.Area)

cor(correlation_data)
##                  price      area Overall.Qual Garage.Area
## price        1.0000000 0.7069307    0.7992639   0.6404008
## area         0.7069307 1.0000000    0.5708278   0.4848923
## Overall.Qual 0.7992639 0.5708278    1.0000000   0.5635025
## Garage.Area  0.6404008 0.4848923    0.5635025   1.0000000

The correlation matrix above summarizes the strength and direction of the linear relationship between the response variable and each predictor. Positive correlation values indicate that as one variable increases, the other tends to increase as well.


Fit the Multiple Linear Regression Model

house_model1 <- lm(price ~ area ,  data = ames_clean)

summary(house_model1)
## 
## Call:
## lm(formula = price ~ area, data = ames_clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -483611  -30182   -1961   22742  334275 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13268.557   3269.535   4.058 5.07e-05 ***
## area          111.723      2.066  54.075  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 56520 on 2927 degrees of freedom
## Multiple R-squared:  0.4998, Adjusted R-squared:  0.4996 
## F-statistic:  2924 on 1 and 2927 DF,  p-value: < 2.2e-16
house_model <- lm(price ~ area + Overall.Qual + Garage.Area,  data = ames_clean)

house_model
## 
## Call:
## lm(formula = price ~ area + Overall.Qual + Garage.Area, data = ames_clean)
## 
## Coefficients:
##  (Intercept)          area  Overall.Qual   Garage.Area  
##   -104174.44         51.07      28392.86         74.73
## Model Summary
summary(house_model)
## 
## Call:
## lm(formula = price ~ area + Overall.Qual + Garage.Area, data = ames_clean)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -413865  -21713   -1702   18451  292647 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -1.042e+05  3.268e+03  -31.88   <2e-16 ***
## area          5.107e+01  1.804e+00   28.32   <2e-16 ***
## Overall.Qual  2.839e+04  6.841e+02   41.51   <2e-16 ***
## Garage.Area   7.473e+01  4.214e+00   17.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 39320 on 2925 degrees of freedom
## Multiple R-squared:  0.7581, Adjusted R-squared:  0.7578 
## F-statistic:  3055 on 3 and 2925 DF,  p-value: < 2.2e-16

Interpretation

Slope (Living Area): For every 1-unit increase in living area, the predicted house price increases by approximately $51.07, holding overall quality and garage area constant.

Slope (Overall Quality): For every 1-unit increase in the overall quality rating, the predicted house price increases by approximately $28,392.86, holding the other variables constant.

Slope (Garage Area): For every 1-unit increase in garage area, the predicted house price increases by approximately $74.73, holding the other predictors constant.

P-values: All predictors have p-values less than 0.001, indicating that living area, overall quality, and garage area are all statistically significant predictors of house price. The overall model is also highly significant (F-test p-value < 0.001), providing strong evidence that the model explains variate house prices

The intercept (-104,174.44) is the predicted house price when the living area, overall quality, and garage area are all zero. However, these values are unrealistic for a house.

The residuals are 0.7578, indicating that the model generally provides unbiased predictions

Regression Equation

The fitted regression model has the form

Price = −104174.44+51.07(Area)+28392.86(Overall.Qual)+74.73(Garage.Area)

where

Price is the predicted selling price.

Area is the home’s living area.

Overall Quality is the overall construction quality rating.

Garage Area represents the garage size.


Regression Diagnostic Plot

ggplot(ames_clean,aes(x = area,y = price)) +
  geom_point(alpha = 0.3) +
  geom_smooth(method = "lm",color = "red",se = FALSE) +
  labs(title = "Regression Line: Living Area vs House Price",
       x = "Living Area",
       y = "House Price")
## `geom_smooth()` using formula = 'y ~ x'

The scatterplot with the fitted regression line illustrates the positive relationship between living area and house price.


Residual Plot

plot(house_model, which = 1)

Interpretation

The residuals are generally scattered around zero, which suggests that the linear regression model is reasonable. However, the curved red trend line and the increasing spread of the residuals at higher fitted values indicate slight non-linearity and non-constant variance (heteroscedasticity)

Normal Q-Q Plot

plot(house_model, which = 2)

Interpretation

The points deviate noticeably at both ends. This indicates that the residuals are not perfectly normally distributed and that a few outliers are present.


Calculate RMSE

rmse <- sqrt(mean(residuals(house_model)^2))

rmse
## [1] 39293.87

Interpretation

The Root Mean Squared Error indicates that the model’s predicted house prices differ from the actual selling prices by about $39,294 on average. A smaller RMSE indicates better predictive accuracy.


Predict House Price

new_house <- data.frame( area = 2000, Overall.Qual = 7, Garage.Area = 500)

predict(house_model,newdata = new_house)
##        1 
## 234083.1

Interpretation

The model predicts that a house with 2,000 square feet of living area, an overall quality rating of 7, and a garage area of 500 square feet will have an estimated selling price of approximately $234,083.


Model Assessment

The multiple linear regression model shows that living area, overall quality, and garage area are all significant predictors of house price (p < 0.001). Specifically, for every 1-square-foot increase in living area, the predicted house price increases by about $51.07, holding the other variables constant. For every 1-unit increase in overall quality, the predicted house price increases by about $28,392.86, and for every 1-square-foot increase in garage area, the predicted price increases by about $74.73, assuming the other predictors remain unchanged. The model explains approximately 75.8% of the variation in house prices (R² = 0.758), indicating a strong fit and good predictive performance.

Discussion

The regression model, Multiple linear regression explains how measurable housing characteristics influence selling price. These analyses provide a more comprehensive understanding of the housing market in Ames, Iowa.


Conclusion

The objective of this project was to investigate whether important housing characteristics could predict the selling price of homes in Ames, Iowa. Using multiple linear regression, the relationship between house price and three important predictors—living area, overall quality, and garage area—was examined.

The results of the regression model indicate that these variables are useful predictors of house price. Larger homes, higher construction quality, and larger garage areas are generally associated with higher selling prices. The coefficient of determination (R-squared) describes how much of the variation in house prices is explained by the model, while the RMSE provides an estimate of the model’s prediction error.

Overall, the analyses demonstrate that both structural characteristics and neighborhood location contribute to explaining variation in residential property values.


Future Directions

Several improvements could strengthen this model in future studies.

Additional predictors such as the year built, basement area, number of bathrooms, lot size, and overall condition could be incorporated into a more comprehensive regression model. Feature selection techniques could also be used to determine the most influential variables. Furthermore, interaction effects between variables and nonlinear regression models may improve predictive performance.

Future research may also compare several regression models using cross-validation to determine which model produces the most accurate predictions.


References

OpenIntro. (2024). Ames Housing Dataset.

https://www.openintro.org/data/

Wickham, H., Averick, M., Bryan, J., et al. (2019). Welcome to the tidyverse. Journal of Open Source Software.

R Core Team. (2024). R: A Language and Environment for Statistical Computing. Vienna, Austria.

Quarto Documentation.

https://quarto.org/