Created by Olivia Staud. Updated 4/17/25
This analysis aims to identify key factors that influence residential property sale prices in Ames, Iowa, using data from the Ames Assessor’s Office (2006-2010).
The main question: What housing characteristics have the biggest impact on property values?
Upon completion of my analysis, it’s been revealed that:
Overall Quality: the strongest predictor of home value. Homes rated as ‘Very Excellent’ sell for about $189,894 more than average homes, ‘Excellent’ quality homes for about $148,709 more than average, and ‘Good’ quality homes sell for $18,200 more than average ones.
Living Area: Each additional square foot of living areas increases home value by $50.37. To put it into perspective, adding a 200 sq. ft. room would increase value by around $10,074.
Bathrooms: Each additional bathroom adds $11,606 to the value of a home, making it a significant factor.
Age: The age of a home reduces the value by around $424.70 per year.
The model has an R2 of 0.814, meaning these 4 variables alone can explain 81.4% of the variation in home prices in Ames. The RMSE is $34,449, which is typical.
ames <- read.csv("ames.csv")
print(paste("# of properties:", nrow(ames)))
## [1] "# of properties: 2930"
print(paste("# of variables:", ncol(ames)))
## [1] "# of variables: 79"
The dataset has 2,930 property sales in Ames, Iowa from 2006-2010, with 79 characteristics, collected by the Ames Assessor’s Officer. Each observation is a home sale with information about the property’s characteristics. The data was cleaned up with standardization of column names and transforming variables to enhance predictions.
Key variables:
sale_price: Property sale price in dollars
overall_qual: Rating of overall material and finish on a scale of 1-10
gr_liv_area: Above ground living area in square feet
year_built: Year of construction
full_bath/half_bath: Number of bathrooms above ground
bsmt_full_bath/bsmt_half_bath: Number of bathrooms in the basement
library(janitor)
library(ggplot2)
options(scipen = 999)
ames <- clean_names(ames)
ames$home_age <- 2010 - ames$year_built
ames$total_bathrooms <- ames$full_bath + (0.5 * ames$half_bath) +
ames$bsmt_full_bath + (0.5 * ames$bsmt_half_bath)
model <- lm(sale_price ~ factor(overall_qual) + gr_liv_area + total_bathrooms + home_age, data = ames)
summary(model)
##
## Call:
## lm(formula = sale_price ~ factor(overall_qual) + gr_liv_area +
## total_bathrooms + home_age, data = ames)
##
## Residuals:
## Min 1Q Median 3Q Max
## -446997 -17509 -513 14628 226547
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 81517.155 3218.886 25.325
## factor(overall_qual)Average -7945.064 1823.393 -4.357
## factor(overall_qual)Below_Average -24447.511 2751.935 -8.884
## factor(overall_qual)Excellent 148709.171 3879.863 38.328
## factor(overall_qual)Fair -35314.466 5741.889 -6.150
## factor(overall_qual)Good 18200.697 2022.979 8.997
## factor(overall_qual)Poor -42470.992 9803.280 -4.332
## factor(overall_qual)Very_Excellent 189894.424 6785.412 27.986
## factor(overall_qual)Very_Good 67855.266 2507.482 27.061
## factor(overall_qual)Very_Poor -63709.969 17352.845 -3.671
## gr_liv_area 50.373 1.811 27.819
## total_bathrooms 11606.209 1146.744 10.121
## home_age -424.696 29.689 -14.305
## Pr(>|t|)
## (Intercept) < 0.0000000000000002 ***
## factor(overall_qual)Average 0.000013622494 ***
## factor(overall_qual)Below_Average < 0.0000000000000002 ***
## factor(overall_qual)Excellent < 0.0000000000000002 ***
## factor(overall_qual)Fair 0.000000000879 ***
## factor(overall_qual)Good < 0.0000000000000002 ***
## factor(overall_qual)Poor 0.000015252532 ***
## factor(overall_qual)Very_Excellent < 0.0000000000000002 ***
## factor(overall_qual)Very_Good < 0.0000000000000002 ***
## factor(overall_qual)Very_Poor 0.000246 ***
## gr_liv_area < 0.0000000000000002 ***
## total_bathrooms < 0.0000000000000002 ***
## home_age < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 34530 on 2917 degrees of freedom
## Multiple R-squared: 0.814, Adjusted R-squared: 0.8132
## F-statistic: 1064 on 12 and 2917 DF, p-value: < 0.00000000000000022
summary_stats <- summary(model)
I performed two transformations:
ggplot(ames, aes(x = sale_price, y = predict(model))) +
geom_point(alpha = 0.5, color = "pink") +
labs(
title = "Actual vs. Predicted Sale Prices",
x = "Actual Price ($)",
y = "Predicted Price ($)"
)
r2 <- summary_stats$r.squared
rmse <- sqrt(mean((predict(model) - ames$sale_price)^2))
paste("R-squared:", r2)
## [1] "R-squared: 0.813978393135251"
paste("RMSE: $", rmse)
## [1] "RMSE: $ 34449.4276895676"
The regression model shows that the value of homes in Ames are driven by quality, size, number of bathrooms, and age:
Overall quality is the stronger predictor
Each square foot of living area increases the value of a home
More bathrooms increase a home’s value
Newer homes are higher priced than older ones with similar features
The model performs well across most price ranges but seems to be less accurate for more expensive homes.
Sources used:
Published at: https://rpubs.com/ostaud/1299148