Hello, college graduate! Are you interested in buying your first home? Would you like to stay here in Cincinnati, Ohio? At this point in time the housing market is considered a Seller’s Market, meaning there are more buyers than sellers, so sellers have the upper hand. Now more than ever you want to ensure you are paying no more than similar homes in the area.
For all of my home owners out there who are looking to sell without an agent, this study is for you as well. Read below to ensure you are an informed seller and are not over or under pricing the value of your home.
We performed a study of recently sold homes in the Cincinnati area in an attempt to model the factors that contribute to a home’s value. Our data was pulled from Zillow, an online real estate database for the United States. We only used data on homes that were recently sold to ensure we captured finalized sales price data.
We first want to acquaint you with the data used in this study. Should you like to reperform our analysis, see below.
The packages required to run the code in this study are the following:
library(readxl) # To read in the data
library(tidyverse) # For general data manipulation and regression analysis
library(knitr) # To format tables
library(DAAG)
library(ggplot2) # To visualize data
library(psych)
library(qwraps2) # To make summary tables
library(naniar) # To replace missing values
library(formattable) # To format tables into currency
library(MASS) # To calculate VIF
The following code illustrates how to load in the housing data obtained from Zillow and get it in a proper/workable format. This includes handling missing values in our data.
# Read in the data
data <- read_xlsx("zillow data.xlsx")
# Remove the column that includes a link to the listing
zillow <- data[,-1]
# Coerse 'N/A' in the data to be read as NAs
zillow <- zillow %>%
replace_with_na_all(condition = ~.x %in% common_na_strings)
# Coerse variables to their proper data type
# str(zillow)
zillow$zip <- as.factor(zillow$zip)
zillow$heating <- as.factor(zillow$heating)
zillow$cooling <- as.factor(zillow$cooling)
zillow$garage <- as.factor(zillow$garage)
zillow$basement_status <- as.factor(zillow$basement_status)
zillow$lot_size_ft <- as.numeric(zillow$lot_size_ft)
# Correct typo in a header
names(zillow)[15]<-"num_stories"
# Impute NA in Square Footage with the Average Square Footage in that Zip Code
zillow %>% filter(zip == 45202) %>%
summarise(mean(sqr_ft, na.rm = TRUE))
## # A tibble: 1 x 1
## `mean(sqr_ft, na.rm = TRUE)`
## <dbl>
## 1 2602.
# 2566.32
zillow$sqr_ft[is.na(zillow$sqr_ft)] <- 2566.32
# Impute NA in Lot Size with the Average Lot Size in that Zip Code
zillow %>% filter(zip == 45202) %>%
summarise(mean(lot_size_ft, na.rm = TRUE))
## # A tibble: 1 x 1
## `mean(lot_size_ft, na.rm = TRUE)`
## <dbl>
## 1 5066.
# 13233.12
zillow$lot_size_ft[is.na(zillow$lot_size_ft)] <- 13233.12
# Impute 0 in the Fireplace option, because we assume if it wasn't listed, they don't have it
zillow$Fireplace[is.na(zillow$Fireplace)] <- 0
# Confirm all missing values are taken care of
colSums(is.na(zillow))
## Address zip sold_price sold_date
## 0 0 0 0
## num_bed num_bath sqr_ft year_built
## 0 0 0 0
## lot_size_ft heating cooling parking_spaces
## 0 0 0 0
## garage basement_status num_stories Pool?
## 0 0 0 0
## type_and_style Fireplace
## 0 0
To enhance this analysis, we additionally obtained Median Home Values (per Square Foot) by Zip Code from Zillow’s Research website, found here. To upload this data into our workspace, please do the following:
home_val <- read_csv("Zip_MedianValuePerSqft_AllHomes.csv")
We then want to filter this dataset for only zip codes of interest, which are the zipcodes included in the zillow dataset (which are Cincinnati zip codes).
home_val_f <- filter(home_val, RegionName %in% unique(zillow$zip))
With this new information, we can calculate an average of the median prices (which are given by month) from the last 12 months for each zip code. This new variable will be shown in the preview on the next tab.
home_val_avgmed <- data.frame(ID=home_val_f[,2], Means=rowMeans(home_val_f[,17:28]))
To simplify this analysis, we can merge the two datasets together. The only important information in the Home Value dataset is the median value of homes for each zip code. The average of those median values for each zipcode was calculated and stored in the home_val_avgmed data frame.
zillow$zip <- as.character(zillow$zip)
home_val_avgmed$RegionName <- as.character(home_val_avgmed$RegionName)
zillow1 <- left_join(x = zillow, y = home_val_avgmed, by = c("zip" = "RegionName"))
# Convert the Median value per square foot to median value for similar homes.
zillow2 <- zillow1 %>% mutate(med_val = sqr_ft * Means)
zillow <- zillow2[,-19]
zillow$zip <- as.factor(zillow$zip)
Another point - For analysis purposes, it may be easier to transform the year_built variable into an age variable. In other words, how many years old is the building as of 2019?
zillow$age <- 2019 - zillow$year_built
In the background, we can make sure our data output reads well to the viewer.
#assign a class
class(zillow$sold_price) <- c("money", class(zillow$sold_price))
#S3 print method for the class
print.money <- function(x, ...) {
print.default(paste0("$", formatC(as.numeric(x), format="f", digits=0, big.mark=",")))
}
#format method, which is necessary for formating in a data.frame
format.money <- function(x, ...) {
paste0("$", formatC(as.numeric(x), format="f", digits=0, big.mark=","))
}
Data Dictionary This dataset includes the following variables:
| Variable | Type | Description |
|---|---|---|
| Zip | Factor | Zip Code in which the Home is located |
| Price | Double | Final Sales Price of the Home |
| # Bed | Double | Number of Bedrooms |
| # Bath | Double | Number of Bathrooms (full- or half-bath) |
| Square Footage | Double | Square Footage of the Home |
| Year Built | Character | Year in which the Home was first built |
| Lot Size | Double | Square Footage of the Property on which the Home sits |
| Heating | Factor | Method of Heating, such as Gas, Electric, Baseboard, None, or Other |
| Cooling | Factor | Method of Cooling, such as Central, Wall Units, None, or Other |
| Parking Spaces | Double | Number of Parking Spaces in the Garage. Zero if no garage. |
| Garage | Factor | Type of Garage: Attached, Detatched, Carport, or None |
| Basement | Factor | Status of the Basement: Finished, Partially Finished, Unfinished, or None |
| Stories | Double | Number of Floors in the Home |
| Pool | Boolean/Numeric | Does the Home have a private Pool; 1 if Yes, 0 if no |
| Style | Factor | Style of Home, such as Historical, Traditional, Transitional, Contemporary, etc. |
| Fireplace | Double | How many fireplaces are in the home, working or otherwise? |
| Average Median Home Value in Zip | Double | The Average of the Median Home Values for each Zip Code. Taken from Zillow’s monthly data |
| Years Old | Double | Number of Years since being built. [ 2019 - Year Built ] |
Below is a brief preview of the data:
| Zip | Price | # Bed | # Bath | Square Footage | Year Built | Lot Size | Heating | Cooling | Parking Spaces | Garage | Basement | Stories | Pool | Style | Fireplace | Average Median Home Value in Zip | Years Old |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 45204 | $51,000 | 4 | 2.0 | 2000 | 1930 | 4095.00 | gas | central | 1 | detached | Unfinished | 3 | 0 | other | 0 | $153,667 | 89 |
| 45205 | $20,000 | 2 | 1.0 | 1320 | 1892 | 130680.00 | gas | central | 0 | none | Partially Finished | 2 | 0 | traditional | 0 | $54,670 | 127 |
| 45205 | $16,000 | 2 | 1.0 | 1200 | 1892 | 1263.00 | gas | central | 0 | none | Partially Finished | 2 | 0 | traditional | 1 | $49,700 | 127 |
| 45238 | $124,900 | 2 | 2.0 | 1731 | 1950 | 8276.00 | gas | central | 1 | none | Finished | 1 | 0 | ranch | 1 | $151,463 | 69 |
| 45208 | $399,900 | 3 | 3.0 | 1653 | 1916 | 9147.00 | gas | central | 2 | detached | Partially Finished | 2 | 0 | traditional | 1 | $362,971 | 103 |
| 45208 | $414,250 | 3 | 2.0 | 1754 | 1959 | 9583.00 | gas | central | 1 | attached | Finished | 4 | 0 | transitional | 1 | $385,149 | 60 |
| 45225 | $80,025 | 5 | 2.0 | 2189 | 1890 | 2613.00 | gas | wall units | 1 | none | Partially Finished | 2 | 0 | traditional | 3 | $144,292 | 129 |
| 45230 | $170,000 | 3 | 2.0 | 1291 | 1960 | 6098.00 | gas | central | 1 | attached | Partially Finished | 1 | 0 | ranch | 1 | $155,458 | 59 |
| 45237 | $145,000 | 4 | 2.0 | 1779 | 1937 | 5924.00 | gas | central | 1 | none | Finished | 1 | 0 | cape cod | 1 | $127,643 | 82 |
| 45237 | $158,000 | 4 | 2.0 | 1472 | 1925 | 5488.00 | gas | central | 1 | detached | Partially Finished | 1 | 0 | cape cod | 1 | $105,616 | 94 |
| 45224 | $122,900 | 2 | 2.0 | 1051 | 1922 | 10628.00 | gas | central | 2 | none | Finished | 1 | 0 | traditional | 1 | $83,817 | 97 |
| 45237 | $195,000 | 5 | 3.0 | 2542 | 1936 | 13939.20 | gas | central | 2 | attached | Partially Finished | 3 | 0 | colonial | 1 | $182,389 | 83 |
| 45206 | $219,000 | 2 | 2.0 | 1432 | 1895 | 3659.00 | gas | central | 0 | none | Partially Finished | 2 | 0 | traditional | 0 | $206,805 | 124 |
| 45237 | $133,400 | 3 | 2.0 | 1470 | 1934 | 4443.00 | gas | central | 1 | attached | Partially Finished | 1 | 0 | cape cod | 1 | $105,473 | 85 |
| 45237 | $59,000 | 4 | 2.0 | 1680 | 1948 | 4791.00 | gas | none | 2 | detached | Partially Finished | 1 | 0 | multifamily | 0 | $120,540 | 71 |
| 45223 | $325,000 | 4 | 3.0 | 1925 | 1918 | 4791.00 | gas | central | 0 | none | Partially Finished | 3 | 0 | traditional | 0 | $160,738 | 101 |
| 45230 | $75,000 | 2 | 2.0 | 1473 | 1895 | 7710.00 | gas | central | 0 | none | Unfinished | 1 | 0 | cape cod | 0 | $177,374 | 124 |
| 45208 | $142,000 | 1 | 1.0 | 1135 | 1900 | 2047.00 | gas | central | 0 | none | Unfinished | 2 | 0 | cape cod | 0 | $249,227 | 119 |
| 45217 | $205,000 | 4 | 2.0 | 1920 | 1930 | 11325.60 | gas | central | 1 | attached | Unfinished | 2 | 0 | traditional | 1 | $133,600 | 89 |
| 45206 | $204,900 | 2 | 3.0 | 1284 | 1985 | 1655.00 | gas | central | 1 | attached | Partially Finished | 2 | 0 | transitional | 1 | $185,431 | 34 |
| 45230 | $154,900 | 3 | 1.0 | 1456 | 1951 | 17424.00 | Baseboard | central | 1 | attached | Partially Finished | 1 | 0 | other | 2 | $175,327 | 68 |
| 45211 | $198,000 | 3 | 2.0 | 1276 | 2007 | 6843.00 | gas | central | 2 | attached | Partially Finished | 1 | 0 | bungalow | 1 | $96,125 | 12 |
| 45208 | $975,000 | 5 | 7.0 | 4190 | 1941 | 15246.00 | gas | central | 3 | attached | Finished | 2 | 0 | traditional | 1 | $920,054 | 78 |
| 45208 | $835,000 | 4 | 4.0 | 3947 | 1907 | 14374.80 | Baseboard | central | 2 | detached | Unfinished | 3 | 0 | other | 1 | $866,695 | 112 |
| 45238 | $70,000 | 4 | 1.0 | 1373 | 1948 | 7710.00 | gas | central | 2 | none | Partially Finished | 1 | 0 | cape cod | 1 | $120,138 | 71 |
| 45211 | $195,000 | 4 | 4.0 | 2348 | 1976 | 14374.80 | electric | central | 2 | attached | Finished | 2 | 0 | traditional | 1 | $176,883 | 43 |
| 45239 | $138,000 | 4 | 2.0 | 1558 | 1951 | 7884.00 | gas | central | 1 | attached | Finished | 1 | 0 | cape cod | 0 | $133,988 | 68 |
| 45211 | $139,000 | 4 | 3.0 | 2270 | 1929 | 9147.00 | electric | wall units | 2 | detached | Partially Finished | 2 | 0 | tudor | 0 | $171,007 | 90 |
| 45208 | $465,000 | 3 | 3.0 | 1976 | 2010 | 3049.00 | gas | central | 1 | attached | Partially Finished | 2 | 0 | traditional | 1 | $433,897 | 9 |
| 45239 | $183,000 | 4 | 2.0 | 2263 | 1948 | 10149.00 | gas | wall units | 3 | attached | Partially Finished | 2 | 0 | multifamily | 0 | $194,618 | 71 |
| 45213 | $184,900 | 2 | 2.0 | 1308 | 1920 | 3484.00 | gas | central | 3 | none | Finished | 1 | 0 | ranch | 0 | $161,865 | 99 |
| 45208 | $378,000 | 4 | 3.0 | 2011 | 1925 | 7535.00 | gas | wall units | 3 | detached | Partially Finished | 2 | 0 | traditional | 1 | $441,582 | 94 |
| 45220 | $125,000 | 2 | 2.0 | 1632 | 1912 | 4704.00 | gas | central | 1 | none | Unfinished | 2 | 0 | victorian | 2 | $186,184 | 107 |
| 45206 | $445,250 | 5 | 4.0 | 3318 | 1905 | 5314.00 | forced air | central | 0 | none | Partially Finished | 2 | 0 | historical | 1 | $479,174 | 114 |
| 45238 | $138,900 | 3 | 3.0 | 1740 | 1964 | 13068.00 | gas | central | 2 | attached | Finished | 1 | 0 | ranch | 1 | $152,250 | 55 |
| 45211 | $108,000 | 3 | 2.0 | 1279 | 1937 | 8712.00 | gas | central | 1 | detached | Partially Finished | 1 | 0 | cape cod | 1 | $96,351 | 82 |
| 45238 | $92,000 | 3 | 1.0 | 1267 | 1954 | 8058.00 | gas | central | 1 | attached | Partially Finished | 1 | 0 | cape cod | 0 | $110,863 | 65 |
| 45209 | $230,000 | 2 | 1.0 | 1100 | 1935 | 3781.00 | gas | central | 1 | attached | Partially Finished | 1 | 0 | historical | 1 | $217,433 | 84 |
| 45211 | $75,000 | 4 | 1.0 | 1221 | 1953 | 13503.60 | gas | central | 1 | none | Unfinished | 1 | 0 | cape cod | 0 | $91,982 | 66 |
| 45205 | $121,500 | 4 | 2.0 | 1500 | 1910 | 8407.00 | gas | central | 2 | attached | Partially Finished | 2 | 0 | traditional | 1 | $62,125 | 109 |
| 45208 | $243,000 | 4 | 2.0 | 1565 | 1925 | 4573.00 | gas | central | 1 | attached | Partially Finished | 2 | 0 | tudor | 1 | $343,648 | 94 |
| 45205 | $55,000 | 3 | 1.0 | 1200 | 1915 | 4225.00 | gas | central | 0 | none | Unfinished | 2 | 0 | traditional | 0 | $49,700 | 104 |
| 45208 | $799,000 | 5 | 3.0 | 2370 | 1908 | 6577.00 | Baseboard | central | 2 | detached | Unfinished | 2 | 0 | other | 0 | $520,413 | 111 |
| 45208 | $595,000 | 4 | 4.0 | 2280 | 1986 | 11325.60 | other | central | 2 | detached | Partially Finished | 2 | 1 | other | 1 | $500,650 | 33 |
| 45226 | $285,000 | 3 | 3.0 | 1916 | 1890 | 5314.00 | gas | central | 0 | none | Partially Finished | 2 | 0 | victorian | 0 | $369,469 | 129 |
| 45208 | $545,000 | 4 | 4.0 | 1932 | 1993 | 8276.00 | gas | central | 2 | attached | Finished | 2 | 0 | traditional | 1 | $424,235 | 26 |
| 45227 | $162,630 | 3 | 1.0 | 1403 | 1940 | 24393.60 | oil | central | 1 | attached | Unfinished | 1 | 0 | cape cod | 1 | $177,830 | 79 |
| 45205 | $78,000 | 2 | 2.0 | 1198 | 1929 | 5625.00 | forced air | central | 2 | detached | Partially Finished | 1 | 0 | other | 1 | $49,617 | 90 |
| 45209 | $307,000 | 2 | 3.0 | 1600 | 1904 | 4181.00 | gas | central | 5 | detached | Finished | 1 | 0 | craftsman | 1 | $316,267 | 115 |
| 45209 | $622,000 | 5 | 5.0 | 2505 | 1913 | 8494.00 | gas | central | 2 | detached | Partially Finished | 2 | 0 | victorian | 0 | $495,155 | 106 |
| 45245 | $141,000 | 3 | 1.0 | 1572 | 1972 | 10497.00 | gas | central | 1 | attached | Partially Finished | 2 | 0 | split level | 0 | $183,138 | 47 |
| 45245 | $80,000 | 3 | 1.0 | 1050 | 1964 | 10019.00 | gas | none | 1 | attached | none | 1 | 0 | ranch | 0 | $122,325 | 55 |
| 45226 | $692,500 | 4 | 4.0 | 3483 | 1941 | 9408.00 | Baseboard | central | 2 | attached | Finished | 2 | 0 | other | 1 | $671,639 | 78 |
| 45209 | $202,000 | 2 | 2.0 | 1776 | 1923 | 9975.00 | gas | wall units | 1 | detached | Partially Finished | 2 | 0 | multifamily | 0 | $351,056 | 96 |
| 45209 | $651,815 | 2 | 1.0 | 1240 | 1912 | 5662.00 | gas | wall units | 1 | detached | Unfinished | 2 | 0 | new traditional | 0 | $245,107 | 107 |
| 45223 | $160,000 | 2 | 2.0 | 1094 | 1922 | 3310.00 | gas | central | 0 | none | Partially Finished | 1 | 0 | transitional | 1 | $91,349 | 97 |
| 45245 | $190,000 | 3 | 2.0 | 1828 | 1993 | 65775.60 | electric | central | 2 | attached | none | 1 | 0 | ranch | 1 | $212,962 | 26 |
| 45226 | $570,000 | 3 | 4.0 | 2204 | 2014 | 1306.00 | gas | central | 2 | attached | Finished | 2 | 0 | contemporary | 1 | $425,005 | 5 |
| 45245 | $173,000 | 2 | 2.0 | 1336 | 2010 | 3759.00 | gas | central | 2 | attached | none | 1 | 0 | ranch | 0 | $155,644 | 9 |
| 45245 | $173,000 | 2 | 3.0 | 1336 | 2009 | 3759.00 | gas | central | 2 | attached | none | 1 | 0 | ranch | 0 | $155,644 | 10 |
| 45209 | $380,600 | 3 | 3.0 | 1950 | 1914 | 5576.00 | gas | central | 2 | none | Unfinished | 2 | 0 | traditional | 0 | $385,450 | 105 |
| 45245 | $175,000 | 3 | 3.0 | 1570 | 1979 | 10018.00 | electric | central | 1 | attached | Finished | 1 | 1 | split level | 0 | $182,905 | 40 |
| 45245 | $154,900 | 3 | 2.0 | 1032 | 1952 | 27878.40 | gas | central | 1 | detached | none | 1 | 0 | ranch | 0 | $120,228 | 67 |
| 45223 | $265,000 | 3 | 3.0 | 1592 | 1910 | 3649.00 | gas | central | 2 | none | Partially Finished | 2 | 0 | multifamily | 2 | $132,932 | 109 |
| 45223 | $225,000 | 3 | 3.0 | 1905 | 1900 | 2613.00 | electric | central | 1 | detached | Partially Finished | 1 | 0 | traditional | 0 | $159,068 | 119 |
| 45223 | $152,500 | 3 | 2.0 | 1344 | 1914 | 2352.00 | gas | central | 1 | none | Unfinished | 2 | 0 | traditional | 0 | $112,224 | 105 |
| 45209 | $155,000 | 2 | 1.0 | 966 | 1940 | 4965.00 | Baseboard | central | 1 | attached | Partially Finished | 1 | 0 | other | 0 | $190,946 | 79 |
| 45206 | $145,000 | 2 | 3.0 | 1989 | 1993 | 10018.80 | gas | central | 2 | attached | Finished | 2 | 0 | other | 0 | $287,245 | 26 |
| 45202 | $284,000 | 3 | 2.0 | 1904 | 1890 | 21780.00 | gas | central | 0 | none | Partially Finished | 2 | 0 | historical | 0 | $437,127 | 129 |
| 45238 | $118,500 | 4 | 3.0 | 1314 | 1950 | 7623.00 | gas | central | 1 | attached | Partially Finished | 1 | 0 | cape cod | 0 | $114,975 | 69 |
| 45205 | $51,000 | 3 | 1.0 | 1208 | 1935 | 5837.00 | gas | central | 1 | attached | Partially Finished | 2 | 0 | cape cod | 1 | $50,031 | 84 |
| 45223 | $97,000 | 3 | 1.0 | 1348 | 1920 | 17424.00 | gas | central | 1 | none | Partially Finished | 1 | 0 | cape cod | 0 | $112,558 | 99 |
| 45227 | $35,000 | 3 | 1.0 | 1338 | 1911 | 5662.00 | gas | none | 0 | none | Partially Finished | 1 | 0 | cape cod | 0 | $169,592 | 108 |
| 45255 | $200,000 | 3 | 2.0 | 1584 | 2004 | 19602.00 | gas | central | 2 | attached | none | 1 | 0 | ranch | 1 | $198,264 | 15 |
| 45202 | $757,000 | 5 | 5.0 | 3678 | 1865 | 9147.00 | gas | central | 3 | detached | Partially Finished | 3 | 0 | historical | 0 | $844,408 | 154 |
| 45239 | $138,000 | 4 | 2.0 | 1915 | 1961 | 6403.00 | gas | central | 1 | attached | Finished | 2 | 0 | transitional | 0 | $164,690 | 58 |
| 45227 | $199,000 | 2 | 2.0 | 1096 | 1909 | 7274.00 | gas | central | 2 | none | Partially Finished | 2 | 0 | traditional | 0 | $138,918 | 110 |
| 45239 | $77,500 | 3 | 2.0 | 1231 | 1949 | 10890.00 | gas | central | 1 | attached | Partially Finished | 2 | 0 | cape cod | 0 | $105,866 | 70 |
| 45230 | $165,000 | 4 | 2.0 | 2173 | 1960 | 14810.40 | Baseboard | central | 2 | attached | Partially Finished | 2 | 0 | other | 1 | $261,665 | 59 |
| 45230 | $135,100 | 2 | 1.0 | 1216 | 1919 | 11325.60 | gas | central | 1 | attached | Partially Finished | 2 | 0 | traditional | 0 | $146,427 | 100 |
| 45202 | $240,000 | 2 | 2.0 | 2105 | 1880 | 2308.00 | electric, gas | central | 2 | detached | Partially Finished | 2 | 0 | historical | 1 | $483,273 | 139 |
| 45227 | $200,000 | 3 | 2.0 | 2086 | 1910 | 7448.00 | Baseboard | none | 0 | none | Partially Finished | 2 | 0 | other | 0 | $264,401 | 109 |
| 45233 | $130,000 | 3 | 2.0 | 1227 | 1959 | 7623.00 | gas | central | 1 | detached | Partially Finished | 1 | 0 | ranch | 0 | $133,027 | 60 |
| 45230 | $150,000 | 3 | 2.0 | 1580 | 1937 | 23958.00 | Baseboard | central | 1 | attached | Partially Finished | 1 | 0 | other | 1 | $190,258 | 82 |
| 45213 | $380,000 | 3 | 2.0 | 1756 | 1927 | 11761.20 | gas | central | 2 | detached | Partially Finished | 2 | 0 | transitional | 1 | $217,305 | 92 |
| 45224 | $265,000 | 3 | 4.0 | 2250 | 1948 | 23958.00 | gas | central | 2 | attached | Finished | 2 | 0 | traditional | 1 | $179,438 | 71 |
| 45224 | $197,250 | 5 | 2.0 | 2430 | 1963 | 9100.00 | gas | central | 2 | attached | Unfinished | 1 | 0 | other | 1 | $193,793 | 56 |
| 45227 | $83,000 | 2 | 1.0 | 1210 | 1915 | 4356.00 | gas | central | 1 | none | Partially Finished | 2 | 0 | traditional | 1 | $153,368 | 104 |
| 45227 | $159,900 | 2 | 2.0 | 1278 | 1938 | 4356.00 | gas | central | 1 | attached | Finished | 2 | 0 | cape cod | 1 | $161,987 | 81 |
| 45230 | $150,000 | 3 | 2.0 | 1322 | 1951 | 6316.00 | electric, gas | central | 3 | detached | Partially Finished | 1 | 0 | cape cod | 0 | $159,191 | 68 |
| 45227 | $80,000 | 2 | 1.0 | 958 | 1928 | 4791.00 | gas | central | 1 | detached | Partially Finished | 1 | 0 | ranch | 0 | $121,427 | 91 |
| 45233 | $91,000 | 2 | 1.0 | 1000 | 1960 | 7840.00 | gas | central | 1 | none | Partially Finished | 1 | 0 | ranch | 0 | $108,417 | 59 |
| 45245 | $184,900 | 3 | 3.0 | 1128 | 1970 | 20037.60 | gas | central | 2 | attached | Finished | 1 | 0 | traditional | 1 | $131,412 | 49 |
| 45216 | $103,500 | 4 | 4.0 | 2408 | 1898 | 6969.00 | gas | central | 1 | none | Partially Finished | 2 | 0 | traditional | 4 | $136,052 | 121 |
| 45204 | $159,000 | 3 | 2.0 | 1422 | 1913 | 4356.00 | gas | central | 0 | none | Partially Finished | 2 | 0 | colonial | 1 | $109,257 | 106 |
| 45220 | $675,000 | 5 | 5.0 | 4630 | 1942 | 93654.00 | Baseboard | central | 3 | attached | Partially Finished | 2 | 1 | other | 1 | $528,206 | 77 |
| 45204 | $200,000 | 2 | 1.0 | 2280 | 1900 | 8407.00 | gas | central | 1 | none | Unfinished | 2 | 0 | multifamily | 0 | $175,180 | 119 |
| 45204 | $105,000 | 3 | 2.0 | 1352 | 1900 | 1655.00 | gas | none | 0 | none | Partially Finished | 2 | 0 | transitional | 0 | $103,879 | 119 |
| 45238 | $70,000 | 2 | 2.0 | 916 | 1939 | 5488.00 | gas | central | 1 | attached | Partially Finished | 1 | 0 | cape cod | 0 | $80,150 | 80 |
| 45245 | $213,000 | 3 | 3.0 | 1680 | 2011 | 10018.00 | gas | central | 2 | attached | Unfinished | 2 | 0 | colonial | 0 | $195,720 | 8 |
| 45208 | $2,632,000 | 6 | 7.0 | 7900 | 1986 | 104544.00 | Baseboard | central | 2 | attached | Finished | 2 | 0 | other | 0 | $1,734,708 | 33 |
| 45227 | $455,000 | 2 | 3.0 | 2416 | 1929 | 5885.00 | gas | central | 1 | detached | Finished | 2 | 0 | tudor | 0 | $306,228 | 90 |
| 45205 | $99,900 | 3 | 1.0 | 1794 | 1922 | 7274.00 | Baseboard | central | 0 | none | Partially Finished | 1 | 0 | other | 1 | $74,302 | 97 |
| 45238 | $110,000 | 4 | 2.0 | 1534 | 1949 | 6098.00 | gas | central | 1 | attached | Partially Finished | 1 | 0 | cape cod | 0 | $134,225 | 70 |
| 45208 | $500,000 | 3 | 4.0 | 1788 | 1957 | 6534.00 | gas | central | 2 | attached | Finished | 2 | 0 | traditional | 1 | $392,615 | 62 |
| 45202 | $320,000 | 4 | 2.0 | 2424 | 1880 | 3920.00 | gas | central | 1 | detached | Unfinished | 3 | 0 | traditional | 1 | $556,510 | 139 |
| 45202 | $1,000,000 | 3 | 4.0 | 3848 | 1996 | 2178.00 | gas | central | 2 | attached | Finished | 3 | 0 | other | 1 | $883,437 | 23 |
| 45240 | $173,655 | 4 | 3.0 | 2250 | 1974 | 9191.16 | gas | central | 2 | attached | Partially Finished | 2 | 0 | traditional | 1 | $193,875 | 45 |
| 45249 | $187,000 | 3 | 3.0 | 1504 | 1988 | 11761.00 | electric | central | 2 | attached | Finished | 1 | 0 | other | 1 | $217,203 | 31 |
| 45202 | $562,500 | 3 | 5.0 | 3304 | 2005 | 2470.00 | gas | central | 2 | attached | Finished | 2 | 0 | contemporary | 1 | $758,543 | 14 |
| 45214 | $298,000 | 3 | 3.0 | 2162 | 2008 | 1991.00 | gas | central | 2 | attached | Finished | 3 | 0 | traditional | 1 | $142,512 | 11 |
| 45202 | $581,000 | 3 | 2.0 | 3208 | 1875 | 2250.00 | gas | central | 0 | none | Partially Finished | 2 | 0 | historical | 5 | $736,503 | 144 |
| 45206 | $357,700 | 4 | 3.0 | 1883 | 1916 | 3920.00 | gas | central | 2 | none | Unfinished | 2 | 0 | traditional | 1 | $271,937 | 103 |
| 45202 | $1,300,000 | 3 | 4.0 | 3582 | 1900 | 3005.00 | gas | central | 2 | attached | Unfinished | 3 | 0 | other | 1 | $822,368 | 119 |
| 45202 | $582,500 | 2 | 3.0 | 2226 | 1872 | 1385.00 | gas | central | 2 | attached | Unfinished | 3 | 0 | historical | 2 | $511,053 | 147 |
| 45202 | $300,000 | 4 | 4.0 | 2800 | 1952 | 7927.00 | Baseboard | central | 1 | attached | Unfinished | 3 | 0 | modern | 1 | $642,833 | 67 |
| 45212 | $80,000 | 2 | 1.0 | 1065 | 1926 | 2744.28 | gas | none | 0 | none | Unfinished | 2 | 0 | traditional | 0 | $109,784 | 93 |
| 45202 | $645,000 | 3 | 3.0 | 2000 | 2015 | 1002.00 | gas | central | 2 | attached | none | 3 | 0 | loft | 0 | $459,167 | 4 |
| 45202 | $475,000 | 2 | 3.0 | 2100 | 1881 | 1655.00 | gas | central | 2 | attached | None | 4 | 0 | contemporary | 0 | $482,125 | 138 |
| 45202 | $500,000 | 3 | 5.0 | 2291 | 2018 | 13233.12 | gas | central | 1 | attached | None | 4 | 0 | contemporary | 0 | $525,975 | 1 |
| 45202 | $530,000 | 3 | 4.0 | 2300 | 2018 | 13233.12 | gas | central | 1 | attached | none | 4 | 0 | contemporary | 0 | $528,042 | 1 |
| 45208 | $2,700,000 | 4 | 5.0 | 5080 | 1941 | 53579.00 | Baseboard | central | 1 | attached | Finished | 2 | 0 | other | 1 | $1,115,483 | 78 |
| 45215 | $419,000 | 4 | 3.0 | 2305 | 1988 | 20908.80 | gas | central | 2 | attached | Finished | 2 | 0 | other | 1 | $236,263 | 31 |
| 45211 | $148,000 | 3 | 2.0 | 1767 | 1926 | 8537.76 | Baseboard | wall units | 2 | detached | Unfinished | 1 | 0 | traditional | 1 | $133,114 | 93 |
| 45206 | $637,000 | 4 | 4.0 | 4748 | 1890 | 11935.44 | gas | central | 1 | none | Partially Finished | 2 | 0 | victorian | 6 | $685,690 | 129 |
| 45202 | $575,000 | 3 | 3.0 | 2810 | 1890 | 2178.00 | gas | central | 1 | attached | Unfinished | 3 | 0 | traditional | 0 | $645,129 | 129 |
| 45202 | $254,000 | 2 | 2.0 | 1008 | 1881 | 784.00 | gas | central | 0 | none | Finished | 3 | 0 | historical | 0 | $231,420 | 138 |
| 45202 | $270,000 | 3 | 3.0 | 2000 | 1869 | 6098.00 | gas | other | 0 | none | Finished | 3 | 0 | other | 1 | $459,167 | 150 |
| 45208 | $595,000 | 5 | 3.0 | 3190 | 1914 | 10018.00 | gas | central | 3 | detached | Unfinished | 3 | 0 | traditional | 3 | $700,471 | 105 |
| 45208 | $2,300,000 | 3 | 4.0 | 3253 | 1952 | 30492.00 | Baseboard | central | 3 | Carport | Finished | 1 | 1 | other | 1 | $714,305 | 67 |
| 45202 | $787,000 | 3 | 4.0 | 3506 | 2018 | 5140.00 | gas | central | 2 | attached | none | 3 | 0 | contemporary | 1 | $804,919 | 1 |
| 45202 | $1,890,000 | 3 | 3.0 | 3840 | 1960 | 27878.00 | Baseboard | central | 2 | attached | Partially Finished | 1 | 0 | other | 0 | $881,600 | 59 |
| 45238 | $88,000 | 4 | 2.0 | 1821 | 1948 | 10018.00 | gas | central | 1 | attached | Finished | 1 | 0 | cape cod | 1 | $159,338 | 71 |
| 45244 | $600,000 | 3 | 3.0 | 4147 | 2004 | 17424.00 | gas | central | 3 | attached | Finished | 2 | 0 | other | 1 | $548,441 | 15 |
| 45208 | $967,000 | 4 | 3.0 | 3908 | 2005 | 12196.80 | gas | central | 2 | attached | Finished | 2 | 0 | TRANSITIONAL | 1 | $858,132 | 14 |
| 45206 | $145,000 | 2 | 3.0 | 1989 | 1993 | 12196.80 | gas | central | 2 | attached | Finished | 2 | 0 | other | 0 | $287,245 | 26 |
| 45238 | $100,000 | 4 | 1.0 | 1394 | 1941 | 5227.00 | Baseboard | none | 1 | attached | Partially Finished | 1 | 0 | cape cod | 1 | $121,975 | 78 |
| 45208 | $645,000 | 4 | 4.0 | 3777 | 1990 | 8319.00 | gas | central | 2 | attached | Finished | 2 | 0 | traditional | 1 | $829,366 | 29 |
| 45243 | $568,500 | 4 | 3.0 | 2719 | 1948 | 48787.20 | gas | central | 3 | detached | Finished | 2 | 0 | traditional | 2 | $564,193 | 71 |
| 45202 | $125,000 | 2 | 2.0 | 1230 | 1870 | 4138.00 | gas | central | 2 | none | Unfinished | 2 | 0 | historical | 0 | $282,388 | 149 |
| 45202 | $371,000 | 4 | 2.0 | 2400 | 1887 | 3136.00 | forced air | central | 0 | none | Partially Finished | 3 | 0 | other | 2 | $551,000 | 132 |
| 45244 | $147,000 | 2 | 2.0 | 937 | 1921 | 27007.20 | gas | central | 2 | detached | Unfinished | 2 | 0 | cape cod | 0 | $123,918 | 98 |
| 45243 | $1,852,040 | 4 | 5.0 | 5172 | 2016 | 95832.00 | gas | central | 1 | attached | Partially Finished | 2 | 0 | TRANSITIONAL | 1 | $1,073,190 | 3 |
| 45213 | $189,370 | 4 | 2.0 | 1232 | 1930 | 8233.00 | gas | central | 1 | attached | Finished | 1 | 0 | bungalow | 0 | $152,460 | 89 |
| 45202 | $240,000 | 2 | 2.0 | 2105 | 1880 | 2308.00 | gas | central | 2 | detached | Unfinished | 2 | 0 | historical | 1 | $483,273 | 139 |
| 45245 | $184,900 | 3 | 3.0 | 1128 | 1970 | 20037.60 | gas | central | 2 | attached | Finished | 1 | 0 | traditional | 1 | $131,412 | 49 |
| 45248 | $244,000 | 3 | 2.0 | 1521 | 1985 | 17293.32 | gas | central | 2 | attached | Partially Finished | 1 | 0 | other | 1 | $160,973 | 34 |
| 45255 | $259,000 | 3 | 4.0 | 1678 | 1941 | 48917.88 | gas | central | 2 | detached | Partially Finished | 1 | 0 | traditional | 1 | $210,030 | 78 |
| 45243 | $1,230,000 | 4 | 5.0 | 6096 | 2013 | 40075.00 | other | central | 3 | attached | Partially Finished | 2 | 0 | other | 2 | $1,264,920 | 6 |
| 45245 | $170,000 | 4 | 3.0 | 1898 | 1974 | 14810.40 | electric | central | 2 | attached | Finished | 1 | 0 | other | 1 | $221,117 | 45 |
| 45249 | $467,500 | 4 | 3.0 | 3318 | 1987 | 39465.36 | gas | central | 2 | attached | Finished | 2 | 0 | other | 1 | $479,174 | 32 |
| 45202 | $435,000 | 3 | 3.0 | 2053 | 1920 | 1306.00 | gas | central | 0 | none | Partially Finished | 3 | 0 | traditional | 4 | $471,335 | 99 |
| 45231 | $94,000 | 2 | 2.0 | 792 | 1959 | 6011.28 | gas | central | 2 | Carport | Partially Finished | 1 | 0 | other | 0 | $66,990 | 60 |
| 45202 | $1,223,000 | 2 | 5.0 | 4295 | 2007 | 2090.00 | gas | central | 2 | attached | none | 4 | 0 | contemporary | 1 | $986,060 | 12 |
| 45202 | $550,000 | 3 | 3.0 | 1920 | 1986 | 5227.00 | gas | central | 2 | attached | Finished | 1 | 0 | contemporary | 1 | $440,800 | 33 |
| 45202 | $625,000 | 4 | 4.0 | 2727 | 2002 | 2265.00 | Baseboard | central | 2 | attached | Partially Finished | 2 | 0 | TRANSITIONAL | 1 | $626,074 | 17 |
| 45230 | $109,900 | 2 | 1.0 | 925 | 1954 | 5401.00 | gas | central | 1 | detached | Partially Finished | 1 | 0 | other | 0 | $111,385 | 65 |
| 45227 | $695,000 | 4 | 4.0 | 3254 | 1962 | 9234.00 | gas | central | 1 | detached | Finished | 2 | 0 | colonial | 2 | $412,445 | 57 |
| 45230 | $153,500 | 3 | 1.5 | 1548 | 1970 | 9539.00 | Baseboard | central | 1 | attached | Partially Finished | 2 | 0 | split level | 0 | $186,405 | 49 |
| 45237 | $55,000 | 2 | 1.0 | 1627 | 1936 | 7753.00 | Baseboard | none | 1 | attached | Partially Finished | 1 | 0 | other | 0 | $116,737 | 83 |
| 45204 | $66,500 | 5 | 2.0 | 1840 | 1890 | 7405.00 | Baseboard | none | 1 | detached | Partially Finished | 2 | 0 | other | 0 | $141,373 | 129 |
| 45227 | $525,000 | 4 | 4.0 | 2915 | 1951 | 8480.00 | gas | central | 2 | attached | Finished | 2 | 0 | colonial | 1 | $369,476 | 68 |
| 45208 | $949,000 | 4 | 4.0 | 4500 | 2013 | 12197.00 | electric | central | 3 | attached | Finished | 2 | 0 | traditional | 1 | $988,125 | 6 |
| 45245 | $275,000 | 4 | 3.0 | 2903 | 2002 | 13939.00 | gas | central | 2 | attached | Unfinished | 2 | 0 | transitional | 1 | $338,200 | 17 |
| 45206 | $490,924 | 4 | 3.0 | 5090 | 1865 | 12632.00 | gas | central | 0 | none | Unfinished | 3 | 0 | historical | 3 | $735,081 | 154 |
| 45208 | $280,000 | 2 | 1.0 | 1060 | 1921 | 4791.00 | gas | central | 1 | detached | Partially Finished | 1 | 0 | bungalow | 1 | $232,758 | 98 |
| 45227 | $313,000 | 3 | 2.0 | 1922 | 1948 | 4725.00 | gas | central | 1 | attached | Finished | 1 | 0 | cape cod | 1 | $243,614 | 71 |
| 45230 | $189,000 | 4 | 2.0 | 1385 | 1949 | 7579.00 | electric, gas | central | 1 | attached | Unfinished | 1 | 0 | cape cod | 1 | $166,777 | 70 |
| 45255 | $237,500 | 3 | 3.0 | 1261 | 1957 | 43996.00 | electric, gas | central | 2 | attached | Finished | 1 | 0 | traditional | 1 | $157,835 | 62 |
| 45219 | $169,530 | 5 | 3.0 | 1936 | 1912 | 6359.00 | gas | wall units | 2 | Carport | Finished | 2 | 0 | traditional | 1 | $176,821 | 107 |
| 45209 | $215,000 | 2 | 2.0 | 855 | 1940 | 5009.00 | gas | central | 1 | attached | Partially Finished | 1 | 0 | bungalow | 1 | $169,005 | 79 |
| 45226 | $627,000 | 3 | 4.0 | 2834 | 2012 | 2613.00 | gas | central | 2 | attached | Finished | 4 | 0 | traditional | 1 | $546,490 | 7 |
| 45212 | $135,000 | 2 | 1.0 | 1152 | 1925 | 2874.00 | Baseboard | central | 1 | detached | Unfinished | 2 | 0 | traditional | 1 | $118,752 | 94 |
| 45233 | $100,000 | 3 | 2.0 | 1053 | 1957 | 7405.00 | gas | central | 1 | attached | Partially Finished | 1 | 0 | other | 0 | $114,163 | 62 |
| 45220 | $560,000 | 4 | 3.0 | 3071 | 1970 | 27442.80 | electric, gas | central | 2 | attached | Partially Finished | 3 | 0 | contemporary | 1 | $350,350 | 49 |
| 45229 | $220,000 | 3 | 4.0 | 1719 | 1924 | 6250.00 | gas | central | 1 | attached | Unfinished | 2 | 0 | other | 1 | $140,099 | 95 |
| 45229 | $239,900 | 3 | 2.0 | 2080 | 1928 | 7971.00 | gas | central | 2 | detached | Unfinished | 3 | 0 | traditional | 1 | $169,520 | 91 |
| 45229 | $169,900 | 4 | 3.0 | 2372 | 1930 | 7753.00 | gas | central | 2 | attached | Partially Finished | 1 | 0 | tudor | 1 | $193,318 | 89 |
| 45216 | $189,000 | 4 | 3.0 | 2079 | 1882 | 14810.40 | Baseboard | central | 1 | detached | Unfinished | 2 | 0 | traditional | 0 | $117,464 | 137 |
| 45216 | $74,860 | 2 | 1.0 | 1119 | 1942 | 4356.00 | Baseboard | other | 1 | none | Partially Finished | 1 | 0 | other | 0 | $63,224 | 77 |
| 45216 | $101,000 | 4 | 2.0 | 1543 | 1945 | 4356.00 | Baseboard | central | 1 | attached | Finished | 1 | 0 | other | 0 | $87,180 | 74 |
| 45216 | $113,500 | 4 | 2.0 | 1772 | 1946 | 4830.00 | gas | central | 1 | attached | Finished | 1 | 0 | cape cod | 0 | $100,118 | 73 |
| 45219 | $189,000 | 3 | 2.0 | 1346 | 1916 | 2439.00 | forced air | other | 1 | none | Finished | 2 | 0 | other | 0 | $122,935 | 103 |
| 45232 | $52,000 | 3 | 2.0 | 960 | 1960 | 6098.00 | electric | other | 1 | attached | Partially Finished | 1 | 0 | other | 0 | $59,120 | 59 |
| 45232 | $184,000 | 4 | 2.0 | 2046 | 1909 | 8276.00 | gas | central | 2 | detached | Unfinished | 3 | 0 | traditional | 1 | $126,000 | 110 |
| 45232 | $139,900 | 3 | 2.0 | 1442 | 1925 | 6490.00 | electric, gas | central | 1 | attached | Partially Finished | 2 | 0 | tudor | 1 | $88,803 | 94 |
| 45232 | $120,000 | 2 | 2.0 | 832 | 1926 | 3746.00 | gas | central | 1 | none | Partially Finished | 1 | 0 | traditional | 0 | $51,237 | 93 |
| 45219 | $189,000 | 2 | 2.0 | 1234 | 1905 | 4500.00 | Baseboard | none | 1 | detached | Partially Finished | 2 | 0 | other | 0 | $112,705 | 114 |
| 45219 | $215,000 | 4 | 3.0 | 2355 | 1897 | 4704.00 | gas | central | 0 | none | Partially Finished | 2 | 0 | traditional | 1 | $215,090 | 122 |
| 45243 | $248,000 | 3 | 2.0 | 1496 | 1956 | 9600.00 | gas | central | 1 | attached | Partially Finished | 3 | 0 | other | 1 | $310,420 | 63 |
| 45243 | $493,000 | 5 | 4.0 | 3061 | 2014 | 11325.60 | electric, gas | central | 2 | attached | Finished | 2 | 0 | traditional | 1 | $635,158 | 5 |
The data captured is from the sell dates of 2017-10-25 to 2019-11-22. There are 191 observations in the dataset that range over 35 zipcodes.
The following characteristics that were captured for each sold home are repeated below, and explained in the prior tab:
## [1] "Address" "zip" "sold_price"
## [4] "sold_date" "num_bed" "num_bath"
## [7] "sqr_ft" "year_built" "lot_size_ft"
## [10] "heating" "cooling" "parking_spaces"
## [13] "garage" "basement_status" "num_stories"
## [16] "Pool?" "type_and_style" "Fireplace"
## [19] "med_val" "age"
Below are some summary statistics of the houses’ sales price and pysical size:
| Summary Statistics Table of Home Prices and Size | zillow (N = 191) |
|---|---|
| Final Sale Price | |
| min | 16000 |
| median (Q1, Q3) | 197,250.00 (136,550.00, 450,125.00) |
| mean (sd) | 344,773.82 ± 404,522.37 |
| max | 2700000 |
| Missing | 0 |
| Square Footage of Home | |
| min | 792 |
| median (Q1, Q3) | 1,776.00 (1,321.00, 2,302.50) |
| mean (sd) | 2,043.58 ± 1,058.74 |
| max | 7900 |
| Missing | 0 |
| Square Footage of Lot | |
| min | 784 |
| median (Q1, Q3) | 7,535.00 (4,399.50, 11,325.60) |
| mean (sd) | 11,809.30 ± 17,006.72 |
| max | 130680 |
| Missing | 0 |
The average price of homes in Cincinnati is $344,774, but this price ranges from $16,000 to $2,700,000.
The average size of homes in Cincinnati is 2,043 square feet, which sit on an average lot size of 11,809 square feet. (This is approximately 0.2711043 acres.)
The same data but by zip code:
| Zip Code | Number Sold in ZIP | Median Home Value | Avg Closing Price | Avg Square Footage | Avg Lot Size (Sq Ft) |
|---|---|---|---|---|---|
| 45202 | 26 | $597,482 | $593,154 | 2,602 | 5,694 |
| 45204 | 5 | $136,671 | $116,300 | 1,779 | 5,184 |
| 45205 | 7 | $55,735 | $63,057 | 1,346 | 23,330 |
| 45206 | 8 | $392,326 | $330,597 | 2,717 | 7,666 |
| 45208 | 20 | $650,724 | $867,958 | 2,963 | 16,720 |
| 45209 | 8 | $296,302 | $345,427 | 1,499 | 5,955 |
| 45211 | 6 | $127,577 | $143,833 | 1,694 | 10,186 |
| 45212 | 2 | $114,268 | $107,500 | 1,108 | 2,809 |
| 45213 | 3 | $177,210 | $251,423 | 1,432 | 7,826 |
| 45214 | 1 | $142,512 | $298,000 | 2,162 | 1,991 |
| 45215 | 1 | $236,263 | $419,000 | 2,305 | 20,909 |
| 45216 | 5 | $100,807 | $116,372 | 1,784 | 7,064 |
| 45217 | 1 | $133,600 | $205,000 | 1,920 | 11,326 |
| 45219 | 4 | $156,888 | $190,633 | 1,718 | 4,500 |
| 45220 | 3 | $354,913 | $453,333 | 3,111 | 41,934 |
| 45223 | 6 | $128,145 | $204,083 | 1,535 | 5,690 |
| 45224 | 3 | $152,349 | $195,050 | 1,910 | 14,562 |
| 45225 | 1 | $144,292 | $80,025 | 2,189 | 2,613 |
| 45226 | 4 | $503,150 | $543,625 | 2,609 | 4,660 |
| 45227 | 11 | $229,026 | $264,321 | 1,807 | 7,873 |
| 45229 | 3 | $167,646 | $209,933 | 2,057 | 7,325 |
| 45230 | 10 | $173,027 | $145,240 | 1,437 | 11,016 |
| 45231 | 1 | $66,990 | $94,000 | 792 | 6,011 |
| 45232 | 4 | $81,290 | $123,975 | 1,320 | 6,152 |
| 45233 | 3 | $118,536 | $107,000 | 1,093 | 7,623 |
| 45237 | 6 | $126,400 | $124,233 | 1,762 | 7,056 |
| 45238 | 9 | $127,264 | $101,367 | 1,454 | 7,952 |
| 45239 | 4 | $149,791 | $134,125 | 1,742 | 8,832 |
| 45240 | 1 | $193,875 | $173,655 | 2,250 | 9,191 |
| 45243 | 5 | $769,576 | $878,308 | 3,709 | 41,124 |
| 45244 | 2 | $336,180 | $373,500 | 2,542 | 22,216 |
| 45245 | 12 | $179,226 | $176,225 | 1,538 | 17,546 |
| 45248 | 1 | $160,973 | $244,000 | 1,521 | 17,293 |
| 45249 | 2 | $348,189 | $327,250 | 2,411 | 25,613 |
| 45255 | 3 | $188,710 | $232,167 | 1,508 | 37,505 |
The center, shape, and spread of the data:
As shown in the histogram and boxplots below, our data is highly skewed to the right. This is intuitive, as all values must be nonnegative and therefore have a lower bound.
Below are some boxplots to see the spread of the data. It’s interesting to note that the lot size on which homes are built are rather standard/highly concentrated around 7,535 square feet, causing a few outliers of homes that are on large plots of land.
This bar graph shows the number of homes built throughout the years.
Here we can see the average home price by zip code to identify the general price of homes by neighborhood.
Here we can look for correlation between covariates by plotting a pairplot.
The process of determining an appropriate model to represent home values in the Cincinnati is two-fold. First, we need to perform a residual analysis. Through this analysis, we can ensure that the full model - that is all the possible covariates along with the response variable - meets the following criteria for linear regression:
The relationship between the regressors and the response variable is approximately linear
Errors are independent
Errors are normally distributed
Error term has an equal/constant variance
If all of these assumptions are not met, variables in the model must be transformed and checked in a process called Model Adequacy Checking, which involves Transformation and Residual Analysis.
Let’s begin with a full model. We can create a dataset on which to fit a model, shown below. Note that we are removing Address, sold_date, year_built, and type_and_style from this dataset, as they are not covariates we want to include in our model. Address and sold_date are not generalizable, year_built is represented in the age variable, and type_and_style has too many categories and its contribution to the model may not be worth the work of including it.
zillow_slim <- zillow[,-c(1,4,8,17)]
We can start our modeling process by using all covariates available.
zillow_fit_full <- lm(sold_price ~ ., data = zillow_slim)
Observations regarding our first model:
zillow_slim[c(130,170),]
## zip sold_price num_bed num_bath sqr_ft lot_size_ft heating
## 130 45208 2300000 3 4 3253 30492 Baseboard
## 170 45219 169530 5 3 1936 6359 gas
## cooling parking_spaces garage basement_status num_stories Pool?
## 130 central 3 Carport Finished 1 1
## 170 wall units 2 Carport Finished 2 0
## Fireplace med_val age
## 130 1 714304.6 67
## 170 1 176821.3 107
Because we identified a nonconstant variance, we can utilize the box-cox procedure to determine the appropriate transformation for the response variable to linearize the model.
bc <- MASS::boxcox(sold_price ~ . , data=zillow_slim)
(lambda <- bc$x[which.max(bc$y)])
## [1] 0.1010101
zillow_slim$sold_price2 <- (zillow_slim$sold_price ^ lambda - 1) / lambda
zillow_fit_bc <- lm(sold_price2 ~ . - sold_price, data = zillow_slim)
# obtain MSRes
( MSRes=summary(zillow_fit_bc)$sigma^2 )
## [1] 1.613653
After correcting for the nonconstant variance, we can now work on improving the normality of the model. There were countless models I used to come to this final one. For a concise report, the final product is below.
zillow_fit <- lm(sold_price2 ~ zip + num_bed + num_bath + sqr_ft + lot_size_ft + heating + cooling + parking_spaces + garage + basement_status + age + I(age^2) + num_stories + `Pool?` + Fireplace + med_val + sqr_ft:num_bath, data = zillow_slim)
# obtain MSRes
( MSRes=summary(zillow_fit)$sigma^2 )
## [1] 1.498322
Create the final Dataset
# new dataset called zillow_transformed
zillow_transformed <- zillow_slim %>% dplyr::select(sold_price2, zip, num_bed, num_bath, sqr_ft, lot_size_ft, heating, cooling, parking_spaces, garage, basement_status, age, num_stories, `Pool?`, Fireplace, med_val)
Create the final full Working Model
zillow_fit_full <- lm(sold_price2 ~ zip + num_bed + num_bath + sqr_ft + lot_size_ft + heating + cooling + parking_spaces + garage + basement_status + age + I(age^2) + num_stories + `Pool?` + Fireplace + med_val + sqr_ft:num_bath, data = zillow_transformed)
We have satisfied the linear regression assumptions by transforming the variables accordingly. Now we can move into the phase of selecting the variables from this full model to determine the optimal combination/selection of regressors to best model their relationship with Final Sold Price.
When attempting to use the step function, I keep getting a message that says “attempting model selection on an essentially perfect fit is nonsense”… but when tested, we get a predicted R squared of around 50%. My hypothesis is that this is due to varying levels of zip code. Therefore I would be more beneficial to manually select variables through forward selection and backward elimination.
null= lm(sold_price2 ~ 1, data=zillow_transformed) #This is why we use zillow_transformed, so it only includes the columns we want.
full= zillow_fit_full
add1(null, full, test="F")
add1(lm(sold_price2 ~ med_val, data = zillow_transformed), full, test="F")
add1(lm(sold_price2 ~ med_val + num_bath, data = zillow_transformed), full, test="F")
add1(lm(sold_price2 ~ med_val + num_bath + zip, data = zillow_transformed), full, test="F")
add1(lm(sold_price2 ~ med_val + num_bath + zip + sqr_ft, data = zillow_transformed), full, test="F")
add1(lm(sold_price2 ~ med_val + num_bath + zip + sqr_ft + num_bath:sqr_ft, data = zillow_transformed), full, test="F")
add1(lm(sold_price2 ~ med_val + num_bath + zip + sqr_ft + num_bath:sqr_ft + cooling, data = zillow_transformed), full, test="F")
add1(lm(sold_price2 ~ med_val + num_bath + zip + sqr_ft + num_bath:sqr_ft + cooling + heating, data = zillow_transformed), full, test="F")
add1(lm(sold_price2 ~ med_val + num_bath + zip + sqr_ft + num_bath:sqr_ft + cooling + heating + `Pool?`, data = zillow_transformed), full, test="F")
add1(lm(sold_price2 ~ med_val + num_bath + zip + sqr_ft + num_bath:sqr_ft + cooling + heating + `Pool?` + age + I(age^2), data = zillow_transformed), full, test="F")
drop1(lm(sold_price2 ~ med_val + num_bath + zip + sqr_ft + num_bath:sqr_ft + cooling + heating + `Pool?` + age + I(age^2), data = zillow_transformed), test="F")
#age doesn't seem to add much to the model, because it tells us to drop it
We found the best model to be:
test_fit <- lm(sold_price2 ~ med_val + num_bath + num_bath:sqr_ft + sqr_ft + cooling + `Pool?` + zip + heating, data = zillow_transformed)
##
## Call:
## lm(formula = sold_price2 ~ med_val + num_bath + num_bath:sqr_ft +
## sqr_ft + cooling + `Pool?` + zip + heating, data = zillow_transformed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2112 -0.6239 0.0000 0.5206 4.0747
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.020e+01 9.495e-01 21.271 < 2e-16 ***
## med_val 5.929e-06 2.597e-06 2.283 0.023937 *
## num_bath 1.026e+00 1.931e-01 5.315 4.04e-07 ***
## sqr_ft 1.193e-03 4.326e-04 2.758 0.006578 **
## coolingnone -1.595e+00 4.805e-01 -3.320 0.001145 **
## coolingother -6.881e-01 7.123e-01 -0.966 0.335648
## coolingwall units 2.922e-02 5.334e-01 0.055 0.956387
## `Pool?` 1.579e+00 7.478e-01 2.111 0.036487 *
## zip45204 -1.165e+00 1.037e+00 -1.123 0.263143
## zip45205 -2.817e+00 1.039e+00 -2.711 0.007527 **
## zip45206 -8.035e-01 7.573e-01 -1.061 0.290436
## zip45208 7.523e-01 3.860e-01 1.949 0.053230 .
## zip45209 1.065e+00 6.101e-01 1.746 0.082904 .
## zip45211 -6.103e-01 1.012e+00 -0.603 0.547474
## zip45212 7.697e-02 1.154e+00 0.067 0.946904
## zip45213 1.017e+00 9.455e-01 1.075 0.284016
## zip45214 1.005e+00 1.561e+00 0.644 0.520934
## zip45215 1.622e+00 1.456e+00 1.114 0.267130
## zip45216 -1.709e+00 1.138e+00 -1.501 0.135489
## zip45217 4.110e-01 1.515e+00 0.271 0.786595
## zip45219 1.555e-01 1.038e+00 0.150 0.881138
## zip45220 -4.337e-01 1.173e+00 -0.370 0.712059
## zip45223 4.723e-01 9.371e-01 0.504 0.615041
## zip45224 -3.266e-01 1.116e+00 -0.293 0.770153
## zip45225 -2.957e+00 1.619e+00 -1.826 0.069932 .
## zip45226 2.901e-01 6.997e-01 0.415 0.679071
## zip45227 1.463e-01 7.806e-01 0.187 0.851605
## zip45229 -3.877e-01 1.124e+00 -0.345 0.730685
## zip45230 -7.899e-01 7.889e-01 -1.001 0.318390
## zip45231 -9.818e-01 1.393e+00 -0.705 0.482012
## zip45232 -4.447e-01 1.062e+00 -0.419 0.676047
## zip45233 -8.579e-01 9.765e-01 -0.879 0.381141
## zip45237 -9.627e-01 1.029e+00 -0.936 0.350980
## zip45238 -1.444e+00 8.731e-01 -1.654 0.100402
## zip45239 -1.128e+00 1.011e+00 -1.116 0.266187
## zip45240 -1.217e+00 1.504e+00 -0.809 0.419718
## zip45243 4.189e-01 6.546e-01 0.640 0.523228
## zip45244 1.583e-01 1.096e+00 0.144 0.885335
## zip45245 -1.819e-01 7.614e-01 -0.239 0.811536
## zip45248 1.133e+00 1.398e+00 0.811 0.418891
## zip45249 2.254e-01 1.065e+00 0.212 0.832631
## zip45255 5.032e-02 9.507e-01 0.053 0.957860
## heatingelectric -1.732e+00 5.575e-01 -3.106 0.002287 **
## heatingelectric, gas -3.797e-01 5.825e-01 -0.652 0.515479
## heatingforced air -3.490e-02 7.199e-01 -0.048 0.961405
## heatinggas -6.442e-01 3.162e-01 -2.037 0.043485 *
## heatingoil -3.847e-01 1.314e+00 -0.293 0.770160
## heatingother -2.087e+00 1.004e+00 -2.079 0.039429 *
## num_bath:sqr_ft -2.457e-04 6.586e-05 -3.731 0.000275 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.205 on 142 degrees of freedom
## Multiple R-squared: 0.8869, Adjusted R-squared: 0.8486
## F-statistic: 23.2 on 48 and 142 DF, p-value: < 2.2e-16
However, this model is not easy to interpret. There are too many zip code categories that make our model less clear. A solution to this is manually creating dummy variables for the zip code variable and group the nonsignificant zip codes into a single “other zipcode” column. Below, we only kept the one zip code that was highly significant: 45205.
zillow_transformed <- zillow_transformed %>%
mutate(zip_group = case_when(
zip == 45205 ~ "45205",
TRUE ~ "other"
))
zillow_transformed$zip_group <- as.factor(zillow_transformed$zip_group)
zillow_transformed$zip_group <- relevel(x = zillow_transformed$zip_group, ref="other")
We can do the same for heating and cooling types and see how it effects the model.
zillow_transformed <- zillow_transformed %>%
mutate(heating_group = case_when(
heating == "electric" ~ "electric",
TRUE ~ "other"
))
zillow_transformed$heating_group <- as.factor(zillow_transformed$heating_group)
zillow_transformed$heating_group <- relevel(x = zillow_transformed$heating_group, ref="other")
zillow_transformed <- zillow_transformed %>%
mutate(cooling_group = case_when(
cooling == "none" ~ "none",
TRUE ~ "yes"
))
zillow_transformed$cooling_group <- as.factor(zillow_transformed$cooling_group)
zillow_transformed$cooling_group <- relevel(x = zillow_transformed$cooling_group, ref="yes")
Below is the updated model with zip code, heating, and cooling in groups.
#test_fit <- lm(sold_price2 ~ zip_group + med_val + num_bath + garage, data = zillow_transformed)
test_fit <- lm(sold_price2 ~ med_val + num_bath + num_bath:sqr_ft + sqr_ft + cooling_group + `Pool?` + zip_group + heating_group, data = zillow_transformed)
##
## Call:
## lm(formula = sold_price2 ~ med_val + num_bath + num_bath:sqr_ft +
## sqr_ft + cooling_group + `Pool?` + zip_group + heating_group,
## data = zillow_transformed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6399 -0.7158 0.0042 0.6728 5.1095
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.947e+01 4.759e-01 40.913 < 2e-16 ***
## med_val 8.996e-06 8.183e-07 10.994 < 2e-16 ***
## num_bath 1.084e+00 1.667e-01 6.502 7.40e-10 ***
## sqr_ft 6.630e-04 2.849e-04 2.327 0.021072 *
## cooling_groupnone -1.441e+00 4.273e-01 -3.373 0.000908 ***
## `Pool?` 1.776e+00 6.566e-01 2.705 0.007490 **
## zip_group45205 -2.073e+00 5.122e-01 -4.048 7.63e-05 ***
## heating_groupelectric -1.010e+00 4.427e-01 -2.282 0.023624 *
## num_bath:sqr_ft -2.539e-04 5.411e-05 -4.691 5.31e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.265 on 182 degrees of freedom
## Multiple R-squared: 0.8402, Adjusted R-squared: 0.8332
## F-statistic: 119.6 on 8 and 182 DF, p-value: < 2.2e-16
All regressors are significant. The Adjusted R-squared is greater than .80, which indicates a strong model. It is also significant, with a p-value less than .01.
Ridge Regression
We know for a fact that there exists multicollinearity in this model. Below, the VIF’s for sqr_ft and its interaction term num_bath:sqr_ft show undeniable multicollinearity.
vif(test_fit)
## med_val num_bath sqr_ft cooling_group
## 5.795886 4.399345 10.797945 1.080441
## `Pool?` zip_group heating_group num_bath:sqr_ft
## 1.054497 1.105089 1.049527 14.629011
We cannot perform Ridge Regression to fix sqr_ft and num_bath:sqr_ft because our model contains more than just numeric variables. While we ideally would keep these in our model, the VIFs are too great to do so. Fortunately, square footage is a direct contributer to med_val. Let us remove them and test the performance of this final model.
test_fit <- lm(sold_price2 ~ med_val + num_bath + cooling_group + `Pool?` + zip_group + heating_group, data = zillow_transformed)
summary(test_fit)
##
## Call:
## lm(formula = sold_price2 ~ med_val + num_bath + cooling_group +
## `Pool?` + zip_group + heating_group, data = zillow_transformed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8886 -0.7910 0.0647 0.7590 4.8564
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.112e+01 2.682e-01 78.736 < 2e-16 ***
## med_val 7.586e-06 5.335e-07 14.217 < 2e-16 ***
## num_bath 5.401e-01 1.289e-01 4.190 4.32e-05 ***
## cooling_groupnone -1.688e+00 4.474e-01 -3.774 0.000217 ***
## `Pool?` 1.741e+00 6.932e-01 2.512 0.012861 *
## zip_group45205 -2.493e+00 5.305e-01 -4.699 5.10e-06 ***
## heating_groupelectric -9.717e-01 4.667e-01 -2.082 0.038733 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.337 on 184 degrees of freedom
## Multiple R-squared: 0.8197, Adjusted R-squared: 0.8139
## F-statistic: 139.5 on 6 and 184 DF, p-value: < 2.2e-16
The Final Model to be Validated:
\[ sold.price2 = 21.12 -2.49*zip.group.45205 + 0.0000076*med.val +0.54*num.bath -1.69*cooling.group.none + 1.74*pool-0.97*heating.group.electric \]
Fourteen new observations were collected to test our model. Below we read in that test set and prepare it to be used in the model.
zillow_test <- read_csv("zillow_data_new.csv")
We need to calculate the median value of homes in the zipcodes included in the test data, as well as prepare the dataset for prediction in the model.
home_val_f <- filter(home_val, RegionName %in% unique(zillow_test$zip))
home_val_avgmed <- data.frame(ID=home_val_f[,2], Means=rowMeans(home_val_f[,17:28]))
home_val_avgmed$Means <- currency(home_val_avgmed$Means, digits = 2L)
zillow_test$zip <- as.factor(zillow_test$zip)
home_val_avgmed$RegionName <- as.factor(home_val_avgmed$RegionName)
zillow_test <- left_join(x = zillow_test, y = home_val_avgmed, by = c("zip" = "RegionName"))
# Convert the Median value per square foot to median value for similar homes. Also trasnform the sold price.
zillow2 <- zillow_test %>% mutate(med_val = sqr_ft * Means,
sold_price2 = (sold_price ^ lambda - 1) / lambda)
We can also remove columns that are not needed for the model.
# Remove columns not used in model
zillow_test <- zillow2[,-c(1:2,4:5,9,18,21)]
zillow_test <- zillow_test %>% dplyr::select(sold_price2, everything())
Finally, we need to create the category dummy variables for zip code and change the garage levels.
# Add zip level to test data
zillow_test <- zillow_test %>%
mutate(zip_group = case_when(
zip == 45205 ~ "45205",
TRUE ~ "other"
))
zillow_test$zip_group <- as.factor(zillow_test$zip_group)
zillow_test$zip_group <- relevel(x = zillow_test$zip_group, ref="other")
zillow_test <- zillow_test %>%
mutate(heating_group = case_when(
heating == "electric" ~ "electric",
TRUE ~ "other"
))
zillow_test$heating_group <- as.factor(zillow_test$heating_group)
zillow_test$heating_group <- relevel(x = zillow_test$heating_group, ref="other")
zillow_test <- zillow_test %>%
mutate(cooling_group = case_when(
cooling == "none" ~ "none",
TRUE ~ "yes"
))
zillow_test$cooling_group <- as.factor(zillow_test$cooling_group)
zillow_test$cooling_group <- relevel(x = zillow_test$cooling_group, ref="yes")
We run the test data into the model and get the following predictions and confidence bands:
| Actual Price | Predicted Price | Lower Estimate | Upper Estimate |
|---|---|---|---|
| $149,900.00 | $337,414.13 | $304,452.31 | $373,549.74 |
| $75,000.00 | $119,854.92 | $107,244.13 | $133,783.34 |
| $267,500.00 | $285,695.60 | $251,870.64 | $323,550.24 |
| $895,000.00 | $722,723.48 | $658,902.47 | $792,048.37 |
| $130,000.00 | $155,121.45 | $143,798.75 | $167,239.37 |
| $205,000.00 | $155,323.09 | $143,996.43 | $167,444.44 |
| $285,000.00 | $330,644.35 | $295,956.49 | $368,944.85 |
| $278,000.00 | $230,973.95 | $214,569.62 | $248,497.17 |
| $86,000.00 | $135,734.65 | $121,179.10 | $151,843.32 |
| $235,000.00 | $174,866.76 | $133,859.02 | $226,838.21 |
| $685,000.00 | $799,558.79 | $724,281.96 | $881,796.71 |
| $208,000.00 | $167,423.04 | $155,777.09 | $179,845.86 |
| $195,000.00 | $176,839.60 | $132,792.23 | $233,607.95 |
| $124,900.00 | $140,698.09 | $129,563.50 | $152,685.56 |
| $239,000.00 | $177,530.01 | $165,481.92 | $190,360.96 |
| $335,000.00 | $276,666.69 | $259,404.91 | $294,954.26 |
| $535,000.00 | $561,683.05 | $516,304.10 | $610,616.27 |
| $118,500.00 | $150,735.36 | $139,489.06 | $162,790.26 |
Calculate the mean squared prediction error.
MSPE = sum( (zillow_test$sold_price2 - prediction[,1])^2 ) / dim(zillow_test)[1]
MSPE
## [1] 1.079118
Calculate the PRESS statistic.
PRESS = sum( (zillow_test$sold_price2 - prediction[,1])^2 )
PRESS
## [1] 19.42413
Calculate the predicted R-squared.
pred_Rsq = 1 - PRESS/sum((zillow_test$sold_price2 - mean(zillow_test$sold_price2))^2)
pred_Rsq
## [1] 0.7938061
Compare the predicted R-squared with the model’s R-squared and Adjusted R-squared to see how comparable they are.
summary(test_fit)$r.squared
## [1] 0.8197372
summary(test_fit)$adj.r
## [1] 0.813859
There are many other validation techniques and comparison charts that cannot be made automatically due to the nature of the categorical variables. The statistics above for our final model are satisfactory for our purposes.
Again, the final model chosen to predict the price of a home in Cincinnati is below:
summary(test_fit)
##
## Call:
## lm(formula = sold_price2 ~ med_val + num_bath + cooling_group +
## `Pool?` + zip_group + heating_group, data = zillow_transformed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.8886 -0.7910 0.0647 0.7590 4.8564
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.112e+01 2.682e-01 78.736 < 2e-16 ***
## med_val 7.586e-06 5.335e-07 14.217 < 2e-16 ***
## num_bath 5.401e-01 1.289e-01 4.190 4.32e-05 ***
## cooling_groupnone -1.688e+00 4.474e-01 -3.774 0.000217 ***
## `Pool?` 1.741e+00 6.932e-01 2.512 0.012861 *
## zip_group45205 -2.493e+00 5.305e-01 -4.699 5.10e-06 ***
## heating_groupelectric -9.717e-01 4.667e-01 -2.082 0.038733 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.337 on 184 degrees of freedom
## Multiple R-squared: 0.8197, Adjusted R-squared: 0.8139
## F-statistic: 139.5 on 6 and 184 DF, p-value: < 2.2e-16
We hope this helps you gain a general expectation of the price you should be willing to pay given the type of home and amenities you are looking for. Please reach out with your dream ideas and we can help you determine must-haves vs would-be-nice to find a home solution that meets your budget.
We wish you the best of luck on your house-hunting journey!
disclaimers
With the housing market constantly fluctuating, this analysis should not be extrapolated into 2020 without re-validating the model on new data and median housing prices.
The number of observations used in this analysis is limited and could affect the accuracy of this model