Data101 Project 3

A- Introduction (1-2 paragraphs):

My research question:

How do various house features like square footage, number of bedrooms, age of house(when house was built), bathroom count, or neighborhood quality affect the sales price of homes?

My Data-set:

The data-set I will be using is called “ames”. This data-set contains 2930 observations, and 82 variables. Ames is a city in Iowa, and this data-set’s information was gathered from the Ames Assessor’s Office the information gathered was assessed through computing software in order to assess the residential property value of houses sold. In this project, I will be looking at houses built 1900’s and greater, using the variables like Sale_Price(as the continuous values), Year_Built, Neighborhood, Overall_Qual, and Gr_Liv_Area. I chose these variables because I found that location, size, and condition of the house are the top factors for sales prices of homes.(Nancy-Nash 2025)

Data-set link: https://www.openintro.org/data/index.php?data=ames

B- Data Analysis (1 paragraph and 3-5 chunks of code): In your paragraph, describe the type of data analysis you will perform and the types of plots you will generate to address your research question.

In this section, I will filter the years of the houses build in 1900 and greater, select the variables I will be using(SalePrice, Year.Built, Neighborhood, Overall.Qual, Gr.Liv.Area), and filter out any NAs found in my data variables so I don’t run into any errors when coding my plots. I will be preforming a Multi Linear Regression because I will be analyzing how these factors effect the prices of the houses in Ames, Iowa.

library(tidyverse)
library(ggplot2)
library(dplyr)

#Setting Working directory
setwd("C:/Users/Joanne G/OneDrive/Data101(Fall 2025)/Datasets")

#read the babies.csv in here
ames_houses_df <- read.csv("ames.csv")

Clean the data-set and conduct exploratory data analysis (EDA) to better understand the data (2 functions minimum)

# EDA Data-set Chunk

#dimensions
dim(ames_houses_df)
## [1] 2930   82
#head
head(ames_houses_df)
##   Order       PID MS.SubClass MS.Zoning Lot.Frontage Lot.Area Street Alley
## 1     1 526301100          20        RL          141    31770   Pave  <NA>
## 2     2 526350040          20        RH           80    11622   Pave  <NA>
## 3     3 526351010          20        RL           81    14267   Pave  <NA>
## 4     4 526353030          20        RL           93    11160   Pave  <NA>
## 5     5 527105010          60        RL           74    13830   Pave  <NA>
## 6     6 527105030          60        RL           78     9978   Pave  <NA>
##   Lot.Shape Land.Contour Utilities Lot.Config Land.Slope Neighborhood
## 1       IR1          Lvl    AllPub     Corner        Gtl        NAmes
## 2       Reg          Lvl    AllPub     Inside        Gtl        NAmes
## 3       IR1          Lvl    AllPub     Corner        Gtl        NAmes
## 4       Reg          Lvl    AllPub     Corner        Gtl        NAmes
## 5       IR1          Lvl    AllPub     Inside        Gtl      Gilbert
## 6       IR1          Lvl    AllPub     Inside        Gtl      Gilbert
##   Condition.1 Condition.2 Bldg.Type House.Style Overall.Qual Overall.Cond
## 1        Norm        Norm      1Fam      1Story            6            5
## 2       Feedr        Norm      1Fam      1Story            5            6
## 3        Norm        Norm      1Fam      1Story            6            6
## 4        Norm        Norm      1Fam      1Story            7            5
## 5        Norm        Norm      1Fam      2Story            5            5
## 6        Norm        Norm      1Fam      2Story            6            6
##   Year.Built Year.Remod.Add Roof.Style Roof.Matl Exterior.1st Exterior.2nd
## 1       1960           1960        Hip   CompShg      BrkFace      Plywood
## 2       1961           1961      Gable   CompShg      VinylSd      VinylSd
## 3       1958           1958        Hip   CompShg      Wd Sdng      Wd Sdng
## 4       1968           1968        Hip   CompShg      BrkFace      BrkFace
## 5       1997           1998      Gable   CompShg      VinylSd      VinylSd
## 6       1998           1998      Gable   CompShg      VinylSd      VinylSd
##   Mas.Vnr.Type Mas.Vnr.Area Exter.Qual Exter.Cond Foundation Bsmt.Qual
## 1        Stone          112         TA         TA     CBlock        TA
## 2         None            0         TA         TA     CBlock        TA
## 3      BrkFace          108         TA         TA     CBlock        TA
## 4         None            0         Gd         TA     CBlock        TA
## 5         None            0         TA         TA      PConc        Gd
## 6      BrkFace           20         TA         TA      PConc        TA
##   Bsmt.Cond Bsmt.Exposure BsmtFin.Type.1 BsmtFin.SF.1 BsmtFin.Type.2
## 1        Gd            Gd            BLQ          639            Unf
## 2        TA            No            Rec          468            LwQ
## 3        TA            No            ALQ          923            Unf
## 4        TA            No            ALQ         1065            Unf
## 5        TA            No            GLQ          791            Unf
## 6        TA            No            GLQ          602            Unf
##   BsmtFin.SF.2 Bsmt.Unf.SF Total.Bsmt.SF Heating Heating.QC Central.Air
## 1            0         441          1080    GasA         Fa           Y
## 2          144         270           882    GasA         TA           Y
## 3            0         406          1329    GasA         TA           Y
## 4            0        1045          2110    GasA         Ex           Y
## 5            0         137           928    GasA         Gd           Y
## 6            0         324           926    GasA         Ex           Y
##   Electrical X1st.Flr.SF X2nd.Flr.SF Low.Qual.Fin.SF Gr.Liv.Area Bsmt.Full.Bath
## 1      SBrkr        1656           0               0        1656              1
## 2      SBrkr         896           0               0         896              0
## 3      SBrkr        1329           0               0        1329              0
## 4      SBrkr        2110           0               0        2110              1
## 5      SBrkr         928         701               0        1629              0
## 6      SBrkr         926         678               0        1604              0
##   Bsmt.Half.Bath Full.Bath Half.Bath Bedroom.AbvGr Kitchen.AbvGr Kitchen.Qual
## 1              0         1         0             3             1           TA
## 2              0         1         0             2             1           TA
## 3              0         1         1             3             1           Gd
## 4              0         2         1             3             1           Ex
## 5              0         2         1             3             1           TA
## 6              0         2         1             3             1           Gd
##   TotRms.AbvGrd Functional Fireplaces Fireplace.Qu Garage.Type Garage.Yr.Blt
## 1             7        Typ          2           Gd      Attchd          1960
## 2             5        Typ          0         <NA>      Attchd          1961
## 3             6        Typ          0         <NA>      Attchd          1958
## 4             8        Typ          2           TA      Attchd          1968
## 5             6        Typ          1           TA      Attchd          1997
## 6             7        Typ          1           Gd      Attchd          1998
##   Garage.Finish Garage.Cars Garage.Area Garage.Qual Garage.Cond Paved.Drive
## 1           Fin           2         528          TA          TA           P
## 2           Unf           1         730          TA          TA           Y
## 3           Unf           1         312          TA          TA           Y
## 4           Fin           2         522          TA          TA           Y
## 5           Fin           2         482          TA          TA           Y
## 6           Fin           2         470          TA          TA           Y
##   Wood.Deck.SF Open.Porch.SF Enclosed.Porch X3Ssn.Porch Screen.Porch Pool.Area
## 1          210            62              0           0            0         0
## 2          140             0              0           0          120         0
## 3          393            36              0           0            0         0
## 4            0             0              0           0            0         0
## 5          212            34              0           0            0         0
## 6          360            36              0           0            0         0
##   Pool.QC Fence Misc.Feature Misc.Val Mo.Sold Yr.Sold Sale.Type Sale.Condition
## 1    <NA>  <NA>         <NA>        0       5    2010       WD          Normal
## 2    <NA> MnPrv         <NA>        0       6    2010       WD          Normal
## 3    <NA>  <NA>         Gar2    12500       6    2010       WD          Normal
## 4    <NA>  <NA>         <NA>        0       4    2010       WD          Normal
## 5    <NA> MnPrv         <NA>        0       3    2010       WD          Normal
## 6    <NA>  <NA>         <NA>        0       6    2010       WD          Normal
##   SalePrice
## 1    215000
## 2    105000
## 3    172000
## 4    244000
## 5    189900
## 6    195500
summary(ames_houses_df)
##      Order             PID             MS.SubClass      MS.Zoning        
##  Min.   :   1.0   Min.   :5.263e+08   Min.   : 20.00   Length:2930       
##  1st Qu.: 733.2   1st Qu.:5.285e+08   1st Qu.: 20.00   Class :character  
##  Median :1465.5   Median :5.355e+08   Median : 50.00   Mode  :character  
##  Mean   :1465.5   Mean   :7.145e+08   Mean   : 57.39                     
##  3rd Qu.:2197.8   3rd Qu.:9.072e+08   3rd Qu.: 70.00                     
##  Max.   :2930.0   Max.   :1.007e+09   Max.   :190.00                     
##                                                                          
##   Lot.Frontage       Lot.Area         Street             Alley          
##  Min.   : 21.00   Min.   :  1300   Length:2930        Length:2930       
##  1st Qu.: 58.00   1st Qu.:  7440   Class :character   Class :character  
##  Median : 68.00   Median :  9436   Mode  :character   Mode  :character  
##  Mean   : 69.22   Mean   : 10148                                        
##  3rd Qu.: 80.00   3rd Qu.: 11555                                        
##  Max.   :313.00   Max.   :215245                                        
##  NA's   :490                                                            
##   Lot.Shape         Land.Contour        Utilities          Lot.Config       
##  Length:2930        Length:2930        Length:2930        Length:2930       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   Land.Slope        Neighborhood       Condition.1        Condition.2       
##  Length:2930        Length:2930        Length:2930        Length:2930       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   Bldg.Type         House.Style         Overall.Qual     Overall.Cond  
##  Length:2930        Length:2930        Min.   : 1.000   Min.   :1.000  
##  Class :character   Class :character   1st Qu.: 5.000   1st Qu.:5.000  
##  Mode  :character   Mode  :character   Median : 6.000   Median :5.000  
##                                        Mean   : 6.095   Mean   :5.563  
##                                        3rd Qu.: 7.000   3rd Qu.:6.000  
##                                        Max.   :10.000   Max.   :9.000  
##                                                                        
##    Year.Built   Year.Remod.Add  Roof.Style         Roof.Matl        
##  Min.   :1872   Min.   :1950   Length:2930        Length:2930       
##  1st Qu.:1954   1st Qu.:1965   Class :character   Class :character  
##  Median :1973   Median :1993   Mode  :character   Mode  :character  
##  Mean   :1971   Mean   :1984                                        
##  3rd Qu.:2001   3rd Qu.:2004                                        
##  Max.   :2010   Max.   :2010                                        
##                                                                     
##  Exterior.1st       Exterior.2nd       Mas.Vnr.Type        Mas.Vnr.Area   
##  Length:2930        Length:2930        Length:2930        Min.   :   0.0  
##  Class :character   Class :character   Class :character   1st Qu.:   0.0  
##  Mode  :character   Mode  :character   Mode  :character   Median :   0.0  
##                                                           Mean   : 101.9  
##                                                           3rd Qu.: 164.0  
##                                                           Max.   :1600.0  
##                                                           NA's   :23      
##   Exter.Qual         Exter.Cond         Foundation         Bsmt.Qual        
##  Length:2930        Length:2930        Length:2930        Length:2930       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   Bsmt.Cond         Bsmt.Exposure      BsmtFin.Type.1      BsmtFin.SF.1   
##  Length:2930        Length:2930        Length:2930        Min.   :   0.0  
##  Class :character   Class :character   Class :character   1st Qu.:   0.0  
##  Mode  :character   Mode  :character   Mode  :character   Median : 370.0  
##                                                           Mean   : 442.6  
##                                                           3rd Qu.: 734.0  
##                                                           Max.   :5644.0  
##                                                           NA's   :1       
##  BsmtFin.Type.2      BsmtFin.SF.2      Bsmt.Unf.SF     Total.Bsmt.SF 
##  Length:2930        Min.   :   0.00   Min.   :   0.0   Min.   :   0  
##  Class :character   1st Qu.:   0.00   1st Qu.: 219.0   1st Qu.: 793  
##  Mode  :character   Median :   0.00   Median : 466.0   Median : 990  
##                     Mean   :  49.72   Mean   : 559.3   Mean   :1052  
##                     3rd Qu.:   0.00   3rd Qu.: 802.0   3rd Qu.:1302  
##                     Max.   :1526.00   Max.   :2336.0   Max.   :6110  
##                     NA's   :1         NA's   :1        NA's   :1     
##    Heating           Heating.QC        Central.Air         Electrical       
##  Length:2930        Length:2930        Length:2930        Length:2930       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##   X1st.Flr.SF      X2nd.Flr.SF     Low.Qual.Fin.SF     Gr.Liv.Area  
##  Min.   : 334.0   Min.   :   0.0   Min.   :   0.000   Min.   : 334  
##  1st Qu.: 876.2   1st Qu.:   0.0   1st Qu.:   0.000   1st Qu.:1126  
##  Median :1084.0   Median :   0.0   Median :   0.000   Median :1442  
##  Mean   :1159.6   Mean   : 335.5   Mean   :   4.677   Mean   :1500  
##  3rd Qu.:1384.0   3rd Qu.: 703.8   3rd Qu.:   0.000   3rd Qu.:1743  
##  Max.   :5095.0   Max.   :2065.0   Max.   :1064.000   Max.   :5642  
##                                                                     
##  Bsmt.Full.Bath   Bsmt.Half.Bath      Full.Bath       Half.Bath     
##  Min.   :0.0000   Min.   :0.00000   Min.   :0.000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.00000   1st Qu.:1.000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.00000   Median :2.000   Median :0.0000  
##  Mean   :0.4314   Mean   :0.06113   Mean   :1.567   Mean   :0.3795  
##  3rd Qu.:1.0000   3rd Qu.:0.00000   3rd Qu.:2.000   3rd Qu.:1.0000  
##  Max.   :3.0000   Max.   :2.00000   Max.   :4.000   Max.   :2.0000  
##  NA's   :2        NA's   :2                                         
##  Bedroom.AbvGr   Kitchen.AbvGr   Kitchen.Qual       TotRms.AbvGrd   
##  Min.   :0.000   Min.   :0.000   Length:2930        Min.   : 2.000  
##  1st Qu.:2.000   1st Qu.:1.000   Class :character   1st Qu.: 5.000  
##  Median :3.000   Median :1.000   Mode  :character   Median : 6.000  
##  Mean   :2.854   Mean   :1.044                      Mean   : 6.443  
##  3rd Qu.:3.000   3rd Qu.:1.000                      3rd Qu.: 7.000  
##  Max.   :8.000   Max.   :3.000                      Max.   :15.000  
##                                                                     
##   Functional          Fireplaces     Fireplace.Qu       Garage.Type       
##  Length:2930        Min.   :0.0000   Length:2930        Length:2930       
##  Class :character   1st Qu.:0.0000   Class :character   Class :character  
##  Mode  :character   Median :1.0000   Mode  :character   Mode  :character  
##                     Mean   :0.5993                                        
##                     3rd Qu.:1.0000                                        
##                     Max.   :4.0000                                        
##                                                                           
##  Garage.Yr.Blt  Garage.Finish       Garage.Cars     Garage.Area    
##  Min.   :1895   Length:2930        Min.   :0.000   Min.   :   0.0  
##  1st Qu.:1960   Class :character   1st Qu.:1.000   1st Qu.: 320.0  
##  Median :1979   Mode  :character   Median :2.000   Median : 480.0  
##  Mean   :1978                      Mean   :1.767   Mean   : 472.8  
##  3rd Qu.:2002                      3rd Qu.:2.000   3rd Qu.: 576.0  
##  Max.   :2207                      Max.   :5.000   Max.   :1488.0  
##  NA's   :159                       NA's   :1       NA's   :1       
##  Garage.Qual        Garage.Cond        Paved.Drive         Wood.Deck.SF    
##  Length:2930        Length:2930        Length:2930        Min.   :   0.00  
##  Class :character   Class :character   Class :character   1st Qu.:   0.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :   0.00  
##                                                           Mean   :  93.75  
##                                                           3rd Qu.: 168.00  
##                                                           Max.   :1424.00  
##                                                                            
##  Open.Porch.SF    Enclosed.Porch     X3Ssn.Porch       Screen.Porch
##  Min.   :  0.00   Min.   :   0.00   Min.   :  0.000   Min.   :  0  
##  1st Qu.:  0.00   1st Qu.:   0.00   1st Qu.:  0.000   1st Qu.:  0  
##  Median : 27.00   Median :   0.00   Median :  0.000   Median :  0  
##  Mean   : 47.53   Mean   :  23.01   Mean   :  2.592   Mean   : 16  
##  3rd Qu.: 70.00   3rd Qu.:   0.00   3rd Qu.:  0.000   3rd Qu.:  0  
##  Max.   :742.00   Max.   :1012.00   Max.   :508.000   Max.   :576  
##                                                                    
##    Pool.Area         Pool.QC             Fence           Misc.Feature      
##  Min.   :  0.000   Length:2930        Length:2930        Length:2930       
##  1st Qu.:  0.000   Class :character   Class :character   Class :character  
##  Median :  0.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :  2.243                                                           
##  3rd Qu.:  0.000                                                           
##  Max.   :800.000                                                           
##                                                                            
##     Misc.Val           Mo.Sold          Yr.Sold      Sale.Type        
##  Min.   :    0.00   Min.   : 1.000   Min.   :2006   Length:2930       
##  1st Qu.:    0.00   1st Qu.: 4.000   1st Qu.:2007   Class :character  
##  Median :    0.00   Median : 6.000   Median :2008   Mode  :character  
##  Mean   :   50.63   Mean   : 6.216   Mean   :2008                     
##  3rd Qu.:    0.00   3rd Qu.: 8.000   3rd Qu.:2009                     
##  Max.   :17000.00   Max.   :12.000   Max.   :2010                     
##                                                                       
##  Sale.Condition       SalePrice     
##  Length:2930        Min.   : 12789  
##  Class :character   1st Qu.:129500  
##  Mode  :character   Median :160000  
##                     Mean   :180796  
##                     3rd Qu.:213500  
##                     Max.   :755000  
## 

Use a minimum of three dplyr functions (filter, select, mutate, summary, mean, max, etc.,) to manipulate the data-set and prepare it for modeling.

houses_cleaned <- ames_houses_df |>
  
  filter(Year.Built > 1900) |>
  select(SalePrice, Year.Built, Neighborhood, Overall.Qual, Gr.Liv.Area) |>
  filter(!is.na(SalePrice), !is.na(Overall.Qual))

  summary(houses_cleaned)
##    SalePrice        Year.Built   Neighborhood        Overall.Qual   
##  Min.   : 12789   Min.   :1901   Length:2875        Min.   : 1.000  
##  1st Qu.:130000   1st Qu.:1955   Class :character   1st Qu.: 5.000  
##  Median :161900   Median :1974   Mode  :character   Median : 6.000  
##  Mean   :181617   Mean   :1973                      Mean   : 6.109  
##  3rd Qu.:214000   3rd Qu.:2001                      3rd Qu.: 7.000  
##  Max.   :755000   Max.   :2010                      Max.   :10.000  
##   Gr.Liv.Area  
##  Min.   : 334  
##  1st Qu.:1124  
##  Median :1440  
##  Mean   :1494  
##  3rd Qu.:1734  
##  Max.   :5642

C- Regression Analysis (1 paragraph and 1-3 chunks of code):

Clearly state your final model (use lm() or glm(family = binomial))

Final Model (using lm()):

multiple_reg_model <- lm(SalePrice ~ Year.Built + Neighborhood + Overall.Qual + Gr.Liv.Area, data = houses_cleaned)

Present the model summary with coefficients, standard errors, p-values, and confidence intervals

summary(multiple_reg_model)
## 
## Call:
## lm(formula = SalePrice ~ Year.Built + Neighborhood + Overall.Qual + 
##     Gr.Liv.Area, data = houses_cleaned)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -340882  -16391    -276   14709  278128 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         -9.305e+05  1.037e+05  -8.976  < 2e-16 ***
## Year.Built           4.509e+02  5.227e+01   8.626  < 2e-16 ***
## NeighborhoodBlueste -1.800e+04  1.308e+04  -1.376 0.168805    
## NeighborhoodBrDale  -3.133e+04  9.455e+03  -3.314 0.000931 ***
## NeighborhoodBrkSide  1.110e+04  8.349e+03   1.330 0.183784    
## NeighborhoodClearCr  3.278e+04  8.810e+03   3.721 0.000202 ***
## NeighborhoodCollgCr  1.232e+04  7.043e+03   1.749 0.080400 .  
## NeighborhoodCrawfor  3.610e+04  8.114e+03   4.449 8.94e-06 ***
## NeighborhoodEdwards  1.283e+03  7.599e+03   0.169 0.865973    
## NeighborhoodGilbert -2.985e+03  7.267e+03  -0.411 0.681223    
## NeighborhoodGreens   5.713e+03  1.429e+04   0.400 0.689368    
## NeighborhoodGrnHill  9.242e+04  2.589e+04   3.570 0.000363 ***
## NeighborhoodIDOTRR   1.966e+03  8.588e+03   0.229 0.818967    
## NeighborhoodLandmrk -2.662e+04  3.600e+04  -0.739 0.459698    
## NeighborhoodMeadowV -1.164e+04  9.145e+03  -1.273 0.203186    
## NeighborhoodMitchel  1.220e+04  7.602e+03   1.605 0.108624    
## NeighborhoodNAmes    1.126e+04  7.289e+03   1.545 0.122401    
## NeighborhoodNoRidge  6.020e+04  8.092e+03   7.439 1.33e-13 ***
## NeighborhoodNPkVill -1.618e+04  1.006e+04  -1.609 0.107808    
## NeighborhoodNridgHt  7.150e+04  7.291e+03   9.806  < 2e-16 ***
## NeighborhoodNWAmes   4.506e+03  7.541e+03   0.597 0.550245    
## NeighborhoodOldTown  3.756e+02  8.121e+03   0.046 0.963109    
## NeighborhoodSawyer   1.313e+04  7.629e+03   1.720 0.085471 .  
## NeighborhoodSawyerW -1.445e+03  7.466e+03  -0.194 0.846572    
## NeighborhoodSomerst  1.655e+04  7.185e+03   2.303 0.021344 *  
## NeighborhoodStoneBr  7.420e+04  8.395e+03   8.839  < 2e-16 ***
## NeighborhoodSWISU   -7.436e+03  9.236e+03  -0.805 0.420817    
## NeighborhoodTimber   3.539e+04  7.908e+03   4.475 7.92e-06 ***
## NeighborhoodVeenker  3.691e+04  9.933e+03   3.716 0.000206 ***
## Overall.Qual         1.987e+04  8.129e+02  24.447  < 2e-16 ***
## Gr.Liv.Area          5.778e+01  1.741e+00  33.181  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 35360 on 2844 degrees of freedom
## Multiple R-squared:  0.806,  Adjusted R-squared:  0.8039 
## F-statistic: 393.7 on 30 and 2844 DF,  p-value: < 2.2e-16

Interpret the coefficients in the context of your research question (including odds ratios for logistic regression)

To interpret the coefficients, I will start off with Gr.Liv.Area(Above-Ground living area), Holding Year Built, Neighborhood, and Overall Quality constant, each additional square foot of above-ground living area is associated with an estimated increase in Sale Price equal to the coefficient value. Moving onto the Overall.Qual(Overall Quality of the house), Controlling for the other variables, for each 1-unit increase in overall quality (e.g., from 5 → 6), the model predicts an increase in Sale Price equal to the coefficient. Next with Year.Built(the year the house was built), Holding all other variables constant, each additional year newer the home is predicts a small but meaningful increase in Sale Price. Lastly, for the Neighborhood coefficients, each neighborhood coefficient represents the difference in mean Sale Price compared to the reference neighborhood (whatever R alphabetically chooses as baseline). As far as the interpretation of the multiple R² value, the value 0.806 ~ 0.81 represents that 81% of the variation in home sale prices is explained by the combination of Year the house was built, Neighborhood the house was built in, Overall Quality of the house, and the Ground living space of the house. I also want to know that since this model is an lm() linear regression, I only interpreted the raw changes in the outcome, not odds ratios.

D- Model Assumptions and Diagnostics (1-2 paragraphs and 2-4 chunks of code)

Explicitly check and discuss the following assumptions:

  1. Linearity

Multiple linear regression assumes a linear relationship between each predictor and the outcome variable. After fitting the model, the Residuals vs. Fitted plot should show points scattered randomly around 0 without a pattern. If we observe curvature, funnel shapes, or clustering, it suggests the relationship may not be linear or that transformations may be required. In our model, we inspect the plot to evaluate whether the residuals deviate from linearity.

  1. Independence of observations:

Independence means that the residuals for one home should not depend on any other home in the dataset. Since the Ames Housing dataset consists of individual, unrelated home sales, independence is generally reasonable.

  1. Homoscedasticity:

Homoscedasticity means that the residuals have constant variance across all fitted values. This is assessed with both: Residuals vs.Fitted plot and Scale-Location plot. If residuals spread out as fitted values increase (fan shape), the assumption is violated. If they remain evenly scattered, the assumption is met.

  1. Normality of residuals:

Normality is tested by examining the Normal Q–Q Plot. If residuals follow the diagonal line closely, the assumption is satisfied. Moderate deviations at the tails are common in real datasets; severe S-shaped curves suggest non-normality.

  1. Multicollinearity:

Multicollinearity occurs when predictors are highly correlated with each other. You check this using VIF (Variance Inflation Factor).

  • VIF < 5 → acceptable

  • VIF > 10 → problematic

Include diagnostic plots (residuals vs fitted, Normal Q-Q, Scale-Location, Residuals vs Leverage) and interpret them

residuals vs fitted:

plot(multiple_reg_model, which = 1)  

Interpretation:

In this plot, the residuals generally cluster near the horizontal line at zero, but there is some mild curvature and spreading as fitted SalePrice values increase. This indicates that the relationship between the predictors—Year Built, Neighborhood, Overall Quality, and Above-Ground Living Area—and SalePrice is mostly linear but may include nonlinear components, especially for very high-priced homes. Additionally, the model predicts expensive houses less consistently than lower-priced ones. This is common in housing data, where luxury homes vary much more widely in price. The presence of a few extreme observations (e.g., labels 1734, 2137, 1467) suggests outliers that may influence the regression fit.t. Overall, the linear model is reasonable but not perfect—linearity is mostly met

Normal Q-Q:

plot(multiple_reg_model, which = 2)
## Warning: not plotting observations with leverage one:
##   2734

Interpretation:

In the Q-Q plot, the middle portion of the residuals aligns fairly well with the theoretical normal line, indicating that most residuals approximate normality. However, the tails show clear deviations: the lower tail dips below the line, and the upper tail rises sharply above it. This indicates heavy-tailed distributions, meaning there are more extreme residuals than a normal distribution would expect. This is likely to arise from homes with unusually high or low sale prices relative to what the model predicts based on Year Built, Neighborhood, Overall Quality, and Living Area. To conclude, we see slight violation of normality, it suggests that some model estimates, especially for confidence intervals,may be slightly less reliable.

Residuals vs Leverage:

plot(multiple_reg_model, which = 5)
## Warning: not plotting observations with leverage one:
##   2734

Interpretation:

Lastly, in this Residuals vs Leverage plot, we can see that Most points fall within acceptable leverage and Cook’s Distance regions, suggesting the majority of homes do not disproportionately affect the model. However, a few labeled points (e.g., 1467, 2212, 2838) have higher leverage or unusually large residuals.These homes are likely outliers or high-influence cases, such as extremely old or newly built homes, unusually large houses, or highly atypical neighborhoods within Ames. While the model overall is not dominated by outliers, these influential cases should be investigated to ensure they reflect real properties and not data errors.

E- Conclusion and Future Directions(1-2 paragraphs): Summarize the key findings of your analysis, discuss the implications of your results and their relevance to the research question, and suggest potential avenues for future research or further analysis

Overall, I have seen that 81% of the variation in home sale prices is explained by the combination of Year the house was built, Neighborhood the house was built in, Overall Quality of the house, and the Ground living space of the house. This was exactly my prediction, that these would have a high percentage that affect the price of houses(because that is what I had researched). Although I did notice a lot of patterns when it came to analyzing the different plots. That pattern is among luxury houses suggesting that extreme values may be acting as outliers, potentially skewing results and highlighting the disproportionate influence of high-end properties on the model. This emphasizes the importance of carefully considering outliers in real estate data analysis, as they can exaggerate relationships and affect the interpretation of predictors. With that being analyzed, I think some future steps I would’ve taken for this project is to separate modeling for luxury properties, or inclusion of additional market indicators to better account for extreme values and improve predictive accuracy, ultimately providing a more nuanced understanding of housing price dynamics.