Hamilton County Sales January 1st, 2018 - December 31st, 2021
Introduction
The city of Cincinnati has expressed concern surrounding housing affordability in the city at large. To assess these interests and concerns, I will be investigating recent property sales in the neighborhoods in and around the Xavier University campus.
The data provided from the Hamilton County Auditor records every residential property sale in the area from January 1, 2018, to December 31, 2021. Each row represents the transference of a residential property in one of nine neighborhoods adjacent to the university campus. Each row records the details of the transaction including the following list of variables.
Dummy variable indicating the property is in the Cincinnati public school district.
norwood_schools
Dummy variable indicating the property is in the Norwood public school district.
street_address
Street address name of the property.
unit_id
Unit ID (if present) of the property
street_name
Street address name of the property.
Use
County assigned use number for the property.
401 - Multifamily apartment with 4-19 units.
402 - Multifamily apartment with 20-39 units.
403 - Multifamily apartment with more than 40 units.
510 - Single-family dwelling.
520 - Two family dwelling (duplex).
530 - Three family dwelling (tri-plex).
550 - Single condominium unit.
555 - Single landominium unit.
yr_blt
Year the building on the property was built
Day
Day of transaction.
Month
Month of transaction.
Year
Year of transaction.
Value
Value of transaction.
Neighborhood
Neighborhood of property
total_rooms
Total number of rooms.
Bedrooms
Total number of bedrooms.
full_bath
Total number of full bathrooms.
half_bath
Total number of half bathrooms.
finished_sqft
Size of the building in square feet.
Date
Year, month and day of the transaction.
Simple Trends and Analysis
Understanding the Data
Square Feet Distribution
The distribution of square feet is relatively normal with a left skewed bell curve and outliers above the mean. On average, homes in Norwood are around 1,500 square feet. Most home in Norwood do not pass the 3,000 square feet threshold.
Neighborhood Analysis
The variation in the bathroom-to-bedroom ratio across different neighborhoods may be influenced by the affluence of each area. Wealthier neighborhoods typically feature larger homes, which often include more bedrooms and bathrooms, resulting in a higher ratio. For example, more affluent neighborhoods like Walnut Hills, Mount Adams, and Hyde Park show a higher number of bedrooms and bathrooms, with a closer 1:1 ratio. In contrast, older or lower-income areas such as Avondale, Clifton, and Norwood tend to exhibit a weaker correlation, likely due to the age of the homes and the demographics of low-income families and college students. Many homes in these areas are either historic properties or converted duplexes/triplexes, which may prioritize rental functionality over the bed-to-bath ratio.
The demographics of a neighborhood also play a role in this ratio. Areas designed for families, with amenities such as schools, parks, and shopping centers, are likely to have a higher ratio. Conversely, neighborhoods catering to students or singles may have a lower ratio, reflecting different housing needs and preferences.
Does the housing market appear to exhibit seasonality?
As seen in the graph below, the most common time of year for a peak in home transactions are in the warmer months, ranging from June to October. The sale values are lowest in January and February; based on the knowledge that moving homes with nicer weather is much more ideal, this visualization makes sense. Having said that, you can notice a slight peak in sales in December, which is an interesting decision for the holidays, but is understanding when factoring in the time-off and the flexibility this period of the year gives individuals.
Directed Analysis
If you were a current real estate agent or broker in the Cincinnati area, where and when would you be most successful in securing a large payday?
The following visualizations highlight the neighborhoods with the highest home values in Cincinnati, including Mount Adams, Hyde Park, and Clifton. Within these areas, there is a clear relationship between a home’s value and its size, as well as the number of bedrooms; generally, larger homes with more bedrooms tend to have higher values. It is important to note that while the age of a home does not significantly impact its value, the condition of renovations and modernization efforts should be considered.
As seen earlier in this analysis, homes typically sell more frequently during the warmer summer months, with transaction activity peaking at the beginning of the week, particularly on Mondays and Tuesdays.
Price by Sqft and Bedrooms
Price by Age of Home
Price by Month
Price by Day of Week
Are Investment Firms Driving Up Residential Home Prices in Cincinnati for Rental Property Gains?
In recent years, government officials in Cincinnati have raised concerns about the increasing number of investment firms purchasing residential homes in the city, with the intent to convert what would traditionally be owner-occupied properties into rental units. In the analysis presented below, I examine the evidence for and against this claim. While it is important to note that not all investment firms may be captured in these visualizations, the data suggests that investment firms are purchasing residential homes at a rate comparable to that of individual buyers. Notably, in 2021, corporate investments in residential properties nearly doubled compared to the previous year. However, the rising home prices over the past four years do not show a strong enough correlation to solely attribute this increase to corporate buyers.
Source Code
---title: "Assignment 4"subtitle: "Cincinnati Housing Analysis"author: "Kacey Barta"output: html_document show_col_types: falsetoc: true # Generates an automatic table of contents.format: # Options related to formatting. html: # Options related to HTML output. code-tools: TRUE # Allow the code tools option showing in the output. embed-resources: TRUE # Embeds all components into a single HTML file. execute: # Options related to the execution of code chunks. warning: FALSE # FALSE: Code chunk sarnings are hidden by default. message: FALSE # FALSE: Code chunk messages are hidden by default. echo: FALSE # TRUE: Show all code in the output.---# Xavier Property Sales## Hamilton County Sales January 1st, 2018 - December 31st, 2021#### IntroductionThe city of Cincinnati has expressed concern surrounding housing affordability in the city at large. To assess these interests and concerns, I will be investigating recent property sales in the neighborhoods in and around the Xavier University campus.The data provided from the Hamilton County Auditor records every residential property sale in the area from January 1, 2018, to December 31, 2021. Each row represents the transference of a residential property in one of nine neighborhoods adjacent to the university campus. Each row records the details of the transaction including the following list of variables.```{r}#| label: loading in the libraries/data#| echo: falselibrary(tidyverse) # For all the tidy things!library(skimr) #scanning data quickly to check for missing / strange numbers library(lubridate) # Convenient transforming of date values.library(knitr) # Useful tools when 'knitting' (rendering) Quarto documents.library(dplyr) #Load the package into the session, making all its functions available to use. library(ggplot2) #Used for visualization and graphing datanorwoodprop <-read_csv("http://asayanalytics.com/xu_prop-csv")``````{r}#| label: Adressing Data Errors#| include: FALSEsummary(norwoodprop)# properly defining data types in vectors#convert datesnorwoodprop <- norwoodprop %>%mutate(date =dmy(paste(day, month, year, sep ="-" )))# Remove impossible dates (e.g., February 30)norwoodprop <- norwoodprop %>%mutate(date =if_else(month(date) ==2&day(date) >29, NA_Date_, date))# Homes cannot be built in the future (assuming current year is 2025)norwoodprop <- norwoodprop %>%mutate(yr_blt =if_else(yr_blt >2025, NA, yr_blt))# Sale amount cannot be negative, set as NAnorwoodprop <- norwoodprop %>%mutate(value =if_else(value <0, NA, value))# Check for missing valuescolSums(is.na(norwoodprop))# Change invalid data in unit_id and value vector to missingnorwoodprop <- norwoodprop %>%mutate(unit_id =ifelse(unit_id <=0, NA, yr_blt),value =ifelse(value <=0, NA, value))``````{r}#| label: Variable Creation#| echo: falsenorwoodprop <- norwoodprop %>%mutate(date =dmy(paste(day, month, year, sep ="-" ))) %>%select(-(day:year))#Is the property Multifamily? (T/F)norwoodprop <- norwoodprop %>%mutate(multi_fam =ifelse(use >=400& use <500, TRUE, FALSE))#Signify the property's standard deviation in relation from the mean valuenorwoodprop %>%summarize(value_mean =mean(value, na.rm =TRUE), value_sd =sd(value, na.rm =TRUE))norwoodprop <- norwoodprop %>%mutate(value_deviation =ifelse(value >mean(value) +sd(value), "Within 1 SD" ))norwoodprop<- norwoodprop %>%mutate(value_category =case_when(is.na(value) ~"Missing", # Case for missing values value >= (mean(norwoodprop$value, na.rm =TRUE) -sd(norwoodprop$value, na.rm =TRUE)) & value <= (mean(norwoodprop$value, na.rm =TRUE) +sd(norwoodprop$value, na.rm =TRUE)) ~"Within 1 SD of Mean", # Within 1 SD of the mean value > (mean(norwoodprop$value, na.rm =TRUE) +sd(norwoodprop$value, na.rm =TRUE)) ~"More than 1 SD Above Mean", # More than 1 SD above the mean value < (mean(norwoodprop$value, na.rm =TRUE) -sd(norwoodprop$value, na.rm =TRUE)) ~"More than 1 SD Below Mean"# More than 1 SD below the mean ) )```::: panel-tabset## Data Dictionary+-----------------+-------------------------------------------------------------------------------------+| Variable | Description |+=================+=====================================================================================+| Parcel_ID | Unique ID of the property |+-----------------+-------------------------------------------------------------------------------------+| Purchaser | Name of the Purchaser |+-----------------+-------------------------------------------------------------------------------------+| CPS | Dummy variable indicating the property is in the Cincinnati public school district. |+-----------------+-------------------------------------------------------------------------------------+| norwood_schools | Dummy variable indicating the property is in the Norwood public school district. |+-----------------+-------------------------------------------------------------------------------------+| street_address | Street address name of the property. |+-----------------+-------------------------------------------------------------------------------------+| unit_id | Unit ID (if present) of the property |+-----------------+-------------------------------------------------------------------------------------+| street_name | Street address name of the property. |+-----------------+-------------------------------------------------------------------------------------+| Use | County assigned use number for the property. || | || | 401 - Multifamily apartment with 4-19 units. || | || | 402 - Multifamily apartment with 20-39 units. || | || | 403 - Multifamily apartment with more than 40 units. || | || | 510 - Single-family dwelling. || | || | 520 - Two family dwelling (duplex). || | || | 530 - Three family dwelling (tri-plex). || | || | 550 - Single condominium unit. || | || | 555 - Single landominium unit. |+-----------------+-------------------------------------------------------------------------------------+| yr_blt | Year the building on the property was built |+-----------------+-------------------------------------------------------------------------------------+| Day | Day of transaction. |+-----------------+-------------------------------------------------------------------------------------+| Month | Month of transaction. |+-----------------+-------------------------------------------------------------------------------------+| Year | Year of transaction. |+-----------------+-------------------------------------------------------------------------------------+| Value | Value of transaction. |+-----------------+-------------------------------------------------------------------------------------+| Neighborhood | Neighborhood of property |+-----------------+-------------------------------------------------------------------------------------+| total_rooms | Total number of rooms. |+-----------------+-------------------------------------------------------------------------------------+| Bedrooms | Total number of bedrooms. |+-----------------+-------------------------------------------------------------------------------------+| full_bath | Total number of full bathrooms. |+-----------------+-------------------------------------------------------------------------------------+| half_bath | Total number of half bathrooms. |+-----------------+-------------------------------------------------------------------------------------+| finished_sqft | Size of the building in square feet. |+-----------------+-------------------------------------------------------------------------------------+| Date | Year, month and day of the transaction. |+-----------------+-------------------------------------------------------------------------------------+:::## Simple Trends and Analysis#### Understanding the DataSquare Feet DistributionThe distribution of square feet is relatively normal with a left skewed bell curve and outliers above the mean. On average, homes in Norwood are around 1,500 square feet. Most home in Norwood do not pass the 3,000 square feet threshold.```{r}#| echo: falsenorwoodprop %>%group_by(use) %>%filter(use =="510"| use =="550"| use =="555") %>%ggplot(aes(x = finished_sqft)) +geom_histogram(bins =100) +labs(title ="Distribution of Home Square Feet",x ="Finished Square Feet",y ="Number of Homes")```#### Neighborhood AnalysisThe variation in the bathroom-to-bedroom ratio across different neighborhoods may be influenced by the affluence of each area. Wealthier neighborhoods typically feature larger homes, which often include more bedrooms and bathrooms, resulting in a higher ratio. For example, more affluent neighborhoods like Walnut Hills, Mount Adams, and Hyde Park show a higher number of bedrooms and bathrooms, with a closer 1:1 ratio. In contrast, older or lower-income areas such as Avondale, Clifton, and Norwood tend to exhibit a weaker correlation, likely due to the age of the homes and the demographics of low-income families and college students. Many homes in these areas are either historic properties or converted duplexes/triplexes, which may prioritize rental functionality over the bed-to-bath ratio.The demographics of a neighborhood also play a role in this ratio. Areas designed for families, with amenities such as schools, parks, and shopping centers, are likely to have a higher ratio. Conversely, neighborhoods catering to students or singles may have a lower ratio, reflecting different housing needs and preferences.```{r}#| echo: falsenorwoodprop %>%ggplot(aes(x = neighborhood, y = full_bath / bedrooms)) +geom_jitter(alpha =0.4, color ="blue", width =0.2) +geom_boxplot(alpha =0.5, outlier.shape =NA) +labs(title ="Full Bath to Bedroom Ratio by Neighborhood",x ="Neighborhood",y ="Full Bath to Bedroom Ratio") ```#### Does the housing market appear to exhibit seasonality?As seen in the graph below, the most common time of year for a peak in home transactions are in the warmer months, ranging from June to October. The sale values are lowest in January and February; based on the knowledge that moving homes with nicer weather is much more ideal, this visualization makes sense. Having said that, you can notice a slight peak in sales in December, which is an interesting decision for the holidays, but is understanding when factoring in the time-off and the flexibility this period of the year gives individuals.```{r}#| echo: falsenorwoodprop %>%mutate(month =month(date, label =TRUE)) %>%group_by(month, neighborhood) %>%summarize(total_value =sum(value, na.rm =TRUE)) %>%ggplot(aes(x = month, y = total_value, fill = neighborhood)) +geom_col(position ="dodge") +labs(title ="Total Home Transaction Value by Neighborhood throughout the Year",x ="Month",y ="Total Value") +scale_y_continuous(label = scales::comma) +theme_minimal()```## Directed Analysis#### If you were a current real estate agent or broker in the Cincinnati area, where and when would you be most successful in securing a large payday?The following visualizations highlight the neighborhoods with the highest home values in Cincinnati, including Mount Adams, Hyde Park, and Clifton. Within these areas, there is a clear relationship between a home's value and its size, as well as the number of bedrooms; generally, larger homes with more bedrooms tend to have higher values. It is important to note that while the age of a home does not significantly impact its value, the condition of renovations and modernization efforts should be considered.As seen earlier in this analysis, homes typically sell more frequently during the warmer summer months, with transaction activity peaking at the beginning of the week, particularly on Mondays and Tuesdays.```{r}#| echo: falsenorwoodprop %>%group_by(neighborhood) %>%summarize(mean_price =mean(value, na.rm =TRUE)) %>%arrange(desc(mean_price)) %>%ggplot(aes(x = neighborhood, y = mean_price, fill = neighborhood)) +geom_col(position ="dodge") +labs(title ="Average Value of Home by Neighborhood",x ="Neighborhood",y ="Mean Home Value") +scale_y_continuous(labels = scales::comma) +# Format y-axis labels with commastheme_minimal() ```#### Price by Sqft and Bedrooms```{r}#| echo: falsenorwoodprop %>%ggplot(aes(x = finished_sqft, y = value, color = bedrooms)) +geom_point(alpha =0.5) +geom_smooth(method ="lm", se =FALSE) +labs(title ="Property Value by Square Footage and Bedrooms",x ="Square Footage",y ="Property Value",color ="Bedrooms") +scale_y_continuous(label = scales::comma)```#### Price by Age of Home```{r}#| echo: falsenorwoodprop %>%mutate(age =2022- yr_blt) %>%group_by(age) %>%summarize(mean_price =mean(value, na.rm =TRUE)) %>%ggplot(aes(x = age, y = mean_price)) +geom_point(alpha = .5) +geom_smooth(method ="lm", se =FALSE) +labs(title ="Homes' Age in relationship to its Value",x ="Age of House (Years)",y ="Average Price") +scale_y_continuous(label = scales::comma)```#### Price by Month```{r}norwoodprop %>%mutate(month =month(date, label =TRUE)) %>%group_by(month) %>%summarize(mean_value =mean(value, na.rm =TRUE)) %>%ggplot(aes(x = month, y = mean_value, group =1)) +geom_line() +geom_point() +labs(title ="Average Property Value by Month",x ="Month",y ="Average Property Value") +scale_y_continuous(label = scales::comma)```#### Price by Day of Week```{r}#| echo: falsenorwoodprop %>%mutate(dwk =wday(date, label =TRUE)) %>%group_by(dwk) %>%summarize(mean_value =mean(value, na.rm =TRUE)) %>%ggplot(aes(x = dwk, y = mean_value, group =1)) +geom_line() +geom_point() +labs(title ="Average Property Value by Day of the Week",x ="Day of the Week",y ="Average Property Value") +scale_y_continuous(label = scales::comma)```## Are Investment Firms Driving Up Residential Home Prices in Cincinnati for Rental Property Gains?In recent years, government officials in Cincinnati have raised concerns about the increasing number of investment firms purchasing residential homes in the city, with the intent to convert what would traditionally be owner-occupied properties into rental units. In the analysis presented below, I examine the evidence for and against this claim. While it is important to note that not all investment firms may be captured in these visualizations, the data suggests that investment firms are purchasing residential homes at a rate comparable to that of individual buyers. Notably, in 2021, corporate investments in residential properties nearly doubled compared to the previous year. However, the rising home prices over the past four years do not show a strong enough correlation to solely attribute this increase to corporate buyers.```{r}#| echo: falsenorwoodprop %>%mutate(year =year(ymd(date))) %>%filter(year >=2018, year <2022) %>%ggplot(aes(x = year, y = value)) +geom_bar(stat ="summary", fun = mean) +scale_y_continuous(labels = scales::dollar) +labs(title ="Average Value per year",x ="Year of Transaction",y ="Average Value of Property" )``````{r}#| echo: falsenorwoodprop %>%mutate(is_corporation =if_else(str_detect(purchaser, 'LLC|Group|Corp|Inc|Partners|TR|Trustee|Holding|Property|Management'), "Corporation", "Individual"),year =year(ymd(date))) %>%# Extract year from the 'date' columnfilter(year >=2018, year <=2021) %>%# Filter data to include only transactions from 2019 onwardgroup_by(year, is_corporation) %>%# Group by year and ownership typesummarise(houses_Bought =n()) %>%# Calculate average home value per year and ownership typeggplot(aes(x = year, y = houses_Bought, fill = is_corporation)) +# Bar chart with fill based on 'is_corporation'geom_col() +# Add bars with transparencyscale_y_continuous() +# Format y-axis as dollarslabs(title ="Houses Bought vs. Year of Transaction",x ="Year of Transaction",y ="Number of Houses Bought",fill ="Ownership Type") +facet_wrap(~ is_corporation) ``````{r}norwoodprop %>%mutate(is_corporation =if_else(str_detect(purchaser, 'LLC|Group|Corp|Inc|Partners|TR|Trustee|Holding|Property|Management'), "Corporation", "Individual")) %>%mutate(year =year(date)) %>%filter(year >=2018, year <=2021) %>%group_by(year, is_corporation) %>%summarise(houses_Bought =n()) %>%ggplot(aes(x = year, y = houses_Bought, color = is_corporation)) +geom_point(size =3) +labs(title ="Home Purchases by Year for Corporation vs. Individual",x ="Year of Transaction", y ="Number of Homes Bought", color ="Ownership Type")```