knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(knitr)
library(caret)
## Loading required package: lattice
##
## Attaching package: 'caret'
##
## The following object is masked from 'package:purrr':
##
## lift
library(corrplot)
## corrplot 0.95 loaded
This analysis is to look at the primary factors that influence residenial property values in Ames, Iowa Using data from the Ames Assessor’s Office from 2006 to 2010. I have created a linear regression model. This helps balance simplicity with predictive accuracy. The goal is to figure out which housing attributes have the most influence on sale prices so that homeowners, buyers and real estate professionals can make decisions easier.
The Ames Housing dataset shows us information about 2,930 different properties. It has 79 variables about the residential homes as well. It gives insight to property dimensions, room count, location, quality assessments and self-related data.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 12789 129500 160000 180796 213500 755000
## [1] 180796.1
## [1] 160000
The sale price distribution is right-skewed, with a mean of 180796.1 and a median of 160000. Majority of homes sell at a moderate price, though their are some homes that sale at a higher value, creating some outliers.
Now, I will clean the data.
I created a new variable called AgeofHome to see the age of the properties at the time of the sell. I also logged the sale price to make the data distribution more normal.
I looked at 4 key predictors: above grade living area, overall quality rating, age of home at time of sell, and the neighborhood.
This model explains 77.87% of the variation in home sale prices while looking at the 4 key predictors I chose (above grade living area, overall quality rating, age of home at time of sell, and the neighborhood.) There is an increase in above grade living area associated with an increase in sale price. Homes with excellent quality ratings sell for more than average quality homes. Each additional year of age is associated with a decrease in value. Location significantly impacts home values.
The residual plot shows that our model performs well across most of the price range, but tends to underestimate values properties that are high-end.
This analysis shows that a model that focuese on these 4 variables(above grade living area, overall quality rating, age of home at time of sell, and the neighborhood) can effectively predict home sale prices in Ames. These variables describe much of the variation in housing prices, showing that size, condition, and location remain as drivers of residential real estate value.
For real estate professionals as well as homeowners, these findings show that investments in quality improvements as well as renovations that increase living space are likely to show the highest returns, especially in neighborhoods that are very liked or wanted.