1. Key Factors Affecting Housing Prices in Ames, Iowa

Created by Olivia Staud. Updated 4/17/25

This analysis aims to identify key factors that influence residential property sale prices in Ames, Iowa, using data from the Ames Assessor’s Office (2006-2010).

The main question: What housing characteristics have the biggest impact on property values?

Upon completion of my analysis, it’s been revealed that:

The model has an R2 of 0.814, meaning these 4 variables alone can explain 81.4% of the variation in home prices in Ames. The RMSE is $34,449, which is typical.

2. Data Description

ames <- read.csv("ames.csv")

print(paste("# of properties:", nrow(ames)))
## [1] "# of properties: 2930"
print(paste("# of variables:", ncol(ames)))
## [1] "# of variables: 79"

The dataset has 2,930 property sales in Ames, Iowa from 2006-2010, with 79 characteristics, collected by the Ames Assessor’s Officer. Each observation is a home sale with information about the property’s characteristics. The data was cleaned up with standardization of column names and transforming variables to enhance predictions.

Key variables:

3. Methods

library(janitor)
library(ggplot2)

options(scipen = 999)

ames <- clean_names(ames)

ames$home_age <- 2010 - ames$year_built
ames$total_bathrooms <- ames$full_bath + (0.5 * ames$half_bath) +
                        ames$bsmt_full_bath + (0.5 * ames$bsmt_half_bath)

model <- lm(sale_price ~ factor(overall_qual) + gr_liv_area + total_bathrooms + home_age, data = ames)


summary(model)
## 
## Call:
## lm(formula = sale_price ~ factor(overall_qual) + gr_liv_area + 
##     total_bathrooms + home_age, data = ames)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -446997  -17509    -513   14628  226547 
## 
## Coefficients:
##                                      Estimate Std. Error t value
## (Intercept)                         81517.155   3218.886  25.325
## factor(overall_qual)Average         -7945.064   1823.393  -4.357
## factor(overall_qual)Below_Average  -24447.511   2751.935  -8.884
## factor(overall_qual)Excellent      148709.171   3879.863  38.328
## factor(overall_qual)Fair           -35314.466   5741.889  -6.150
## factor(overall_qual)Good            18200.697   2022.979   8.997
## factor(overall_qual)Poor           -42470.992   9803.280  -4.332
## factor(overall_qual)Very_Excellent 189894.424   6785.412  27.986
## factor(overall_qual)Very_Good       67855.266   2507.482  27.061
## factor(overall_qual)Very_Poor      -63709.969  17352.845  -3.671
## gr_liv_area                            50.373      1.811  27.819
## total_bathrooms                     11606.209   1146.744  10.121
## home_age                             -424.696     29.689 -14.305
##                                                Pr(>|t|)    
## (Intercept)                        < 0.0000000000000002 ***
## factor(overall_qual)Average              0.000013622494 ***
## factor(overall_qual)Below_Average  < 0.0000000000000002 ***
## factor(overall_qual)Excellent      < 0.0000000000000002 ***
## factor(overall_qual)Fair                 0.000000000879 ***
## factor(overall_qual)Good           < 0.0000000000000002 ***
## factor(overall_qual)Poor                 0.000015252532 ***
## factor(overall_qual)Very_Excellent < 0.0000000000000002 ***
## factor(overall_qual)Very_Good      < 0.0000000000000002 ***
## factor(overall_qual)Very_Poor                  0.000246 ***
## gr_liv_area                        < 0.0000000000000002 ***
## total_bathrooms                    < 0.0000000000000002 ***
## home_age                           < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 34530 on 2917 degrees of freedom
## Multiple R-squared:  0.814,  Adjusted R-squared:  0.8132 
## F-statistic:  1064 on 12 and 2917 DF,  p-value: < 0.00000000000000022
summary_stats <- summary(model)

I performed two transformations:

  1. Created a ‘home_age’ variable by subtracting the year built from 2010, which is the latest year in the dataset.
  2. Combined all bathroom counts (full, half, basement) into a ‘total_bathrooms’ variable, making sure to count the half baths as 0.5.
ggplot(ames, aes(x = sale_price, y = predict(model))) +
  geom_point(alpha = 0.5, color = "pink") +
  labs(
    title = "Actual vs. Predicted Sale Prices",
    x = "Actual Price ($)",
    y = "Predicted Price ($)"
  ) 

r2 <- summary_stats$r.squared
rmse <- sqrt(mean((predict(model) - ames$sale_price)^2))

paste("R-squared:", r2)
## [1] "R-squared: 0.813978393135251"
paste("RMSE: $", rmse)
## [1] "RMSE: $ 34449.4276895676"

Key findings:

The regression model shows that the value of homes in Ames are driven by quality, size, number of bathrooms, and age:

  • Overall quality is the stronger predictor

  • Each square foot of living area increases the value of a home

  • More bathrooms increase a home’s value

  • Newer homes are higher priced than older ones with similar features

The model performs well across most price ranges but seems to be less accurate for more expensive homes.

4. Limitations

5. References

Sources used:

Published at: https://rpubs.com/ostaud/1299148