The leadership of Xavier University is interested in understanding how economic policies related to the COVID-19 pandemic have influenced residential property prices in the neighborhoods surrounding the university campus. Additionally, the city of Cincinnati has expressed concerns about housing affordability in the area. To address these issues, this analysis investigates property sales trends from 2018 to 2021, focusing on price fluctuations, neighborhood differences, and the overall impact of economic shifts on real estate.
This analysis utilizes data provided by the Hamilton County Auditor, which includes detailed records of residential property transactions in nine neighborhoods adjacent to Xavier University. The dataset captures key attributes such as sale price, property size, number of bedrooms and bathrooms, year built, and neighborhood classification. By leveraging data visualization techniques and statistical analysis, this report aims to make the insights accessible to a broad audience and assist policymakers in making informed decisions regarding housing affordability and market trends.
3.1) IS SQFT Normally Distributed
This graph shows that SQFT for single family houses is skewed slightly to the right, so it does not look normally distributed.
3.2) Bath to Bed Ratio
The class of a neighborhood may have an eeffect on the bed to bath ration. For example, in Mount Adms Higher end neighborhoods have a higher bed to bath ratio because they have more bathrooms for each bedroom. For example, in a high-end home, each bedroom may have its own bathroom.
3.3) Total Value of home transactions each month per area
There may be a bit of a spike in the housing market in the summer months. So some seasonality
4.1) Gifted a property, what would I choose?
What Neighborhood
First, I would want to purchase a house in Mount Adams because the area has the most expensive homes on average. Because of this, I want to live there because the expensive area should help support the value of the house I am gifted.
How Many Bedrooms
I would want to have 4 bedrooms in my house in Mount Adams, because in this area 4 bedroom has the highest home value
How Many Bathrooms
This column chart shows the optimal number of bathrooms for a 4 bedroom homes in mount adams. Four bathrooms are the best choice to maximize home value
Year Built
This column chart shows that homes built in 2018 have the highest average home value. This graph accounts for there being a representative sample for each year of houses being built
Month
I would sell the house in October because it is the 10th month that has the highest total accumulated sales value (in Mount Adams). This means a ton of homes are being bought in October or that the homes being bought in October are high in value
What House Would I Buy?
I would choose to buy a house in Mount Adams because it is home to some of the highest-value properties in the area. Living in a neighborhood with upscale homes would likely boost the value of my own property. In terms of specific features, I would prefer a house with 4 bedrooms and 4 bathrooms, as these properties in Mount Adams tend to have the highest home values. I would aim for a house built in 2018, as this is a year that has a lot of high end homes that were built
5.1) Corporations effect on Housing
Average Home Value vs. Year of Transaction
This column chart shows over time, Individuals are buying more expensive house and corporations are buying less expensive houses. Someone could assume that corporations are buying these houses and flipping them for more money to individuals, causing individuals to pay more for housing
Houses Bought vs. Year of Transaction
This column chart shows houses bought each year comparing corporations and individuals and the number of houses bought. Both are trending
Average Value per Year
This column chart shows that over time, property values have rose on average through the years 2018 through 2021.
Claim About Investments
Over the last 4 years, corporations are trending to purchase more homes each year, but so are individuals. Individuals are also buying more houses each year which shows that in general, more people could simply be buying more homes in these areas.
Additionally, properties are being increased by price. This may be because of other External Variables and should not be blamed just on corperations.
Corporations are in fact buying more lower end homes, and individuals have been paying more for houses. One could speculate that that corporations could be buying and reselling these homes for a higher price.
Conclusion: I think the data can both support and refute the claim that corperations are causing individuals to pay more for housing. Overall, I think there is some effect, because corperations are buying cheaper and cheaper homes while individuals are buying more expensive homes.
The echo: false option disables the printing of code (only output is displayed).
Source Code
---title: "Midterm Practice" # Name of your HTML outputsubtitle: "with an introduction to Quarto"author: "Patrick Berry" # Author nametoc: true # Generates an automatic table of contents.format: # Options related to formatting. html: # Options related to HTML output. code-tools: TRUE # Allow the code tools option showing in the output. embed-resources: TRUE # Embeds all components into a single HTML file. execute: # Options related to the execution of code chunks. warning: FALSE # FALSE: Code chunk sarnings are hidden by default. message: FALSE # FALSE: Code chunk messages are hidden by default. echo: FALSE ---# 1.1)## IntroductionThe leadership of Xavier University is interested in understanding how economic policies related to the COVID-19 pandemic have influenced residential property prices in the neighborhoods surrounding the university campus. Additionally, the city of Cincinnati has expressed concerns about housing affordability in the area. To address these issues, this analysis investigates property sales trends from 2018 to 2021, focusing on price fluctuations, neighborhood differences, and the overall impact of economic shifts on real estate.This analysis utilizes data provided by the Hamilton County Auditor, which includes detailed records of residential property transactions in nine neighborhoods adjacent to Xavier University. The dataset captures key attributes such as sale price, property size, number of bedrooms and bathrooms, year built, and neighborhood classification. By leveraging data visualization techniques and statistical analysis, this report aims to make the insights accessible to a broad audience and assist policymakers in making informed decisions regarding housing affordability and market trends.```{r}#| label: loading in the libraries#| include: FALSElibrary(tidyverse) # Collection of packages for data manipulation, visualization, and analysislibrary(skimr) # Provides a better summary of data frames than base summary()library(lubridate) # Simplifies working with date-time datalibrary(dplyr) # Part of Tidyverse; used for data manipulation (filter, select, mutate, etc.)``````{r}#| label: loading in the dataprop <-read_csv("http://asayanalytics.com/xu_prop-csv") # Read the CSV file``````{r}#|label: SQFT data error# if there is a room, has to have square feetprop <- prop %>%mutate(finished_sqft =ifelse(finished_sqft ==0& total_rooms >0, NA, finished_sqft)) ``````{r}#|label: ID data error# These were not ID's, they were datesprop[45:47,6] <-NA``````{r}#|label: total rooms data error# With my knowledge of real estate, you can not have more full bathrooms and bedrooms than total rooms in a house. I saw this in some houses in this data setprop <- prop %>%mutate(total_rooms =ifelse(full_bath + bedrooms + (half_bath /2) > total_rooms, NA, total_rooms),bedrooms =ifelse(full_bath + bedrooms + (half_bath /2) > total_rooms, NA, bedrooms),full_bath =ifelse(full_bath + bedrooms + (half_bath /2) > total_rooms, NA, full_bath),half_bath =ifelse(full_bath + bedrooms + (half_bath /2) > total_rooms, NA, half_bath) )``````{r}#|label: Date variable creationprop <- prop %>%mutate(date =ymd(paste(year, month, day, sep ="-"))) %>%select(-year, -month, -day) ``````{r}#|label: multi fam variable creationprop <- prop %>%mutate(multifamilyy =ifelse(use %in%c(401, 402, 403, 520), 1, 0) )``````{r}#|label: SD variable creation# Mutate the dataframe to create 'value_category'prop <- prop %>%mutate(value_category =case_when(is.na(value) ~"Missing", # Case for missing values value >= (mean(prop$value, na.rm =TRUE) -sd(prop$value, na.rm =TRUE)) & value <= (mean(prop$value, na.rm =TRUE) +sd(prop$value, na.rm =TRUE)) ~"Within 1 SD of Mean", # Within 1 SD of the mean value > (mean(prop$value, na.rm =TRUE) +sd(prop$value, na.rm =TRUE)) ~"More than 1 SD Above Mean", # More than 1 SD above the mean value < (mean(prop$value, na.rm =TRUE) -sd(prop$value, na.rm =TRUE)) ~"More than 1 SD Below Mean"# More than 1 SD below the mean ) )```# 3.1) IS SQFT Normally Distributed```{r}# Filter data for single-family properties and create the plotprop %>%filter(multifamilyy ==0) %>%ggplot(aes(x = finished_sqft)) +# Set the x-axis to the variable of interestgeom_histogram(bins =30, fill ="blue", color ="black", alpha =0.7) +# Specify histogram aestheticslabs(title ="Distribution of Single-Family Dwelling Sizes",x ="Finished Square Feet",y ="Frequency")```This graph shows that SQFT for single family houses is skewed slightly to the right, so it does not look normally distributed.# 3.2) Bath to Bed Ratio```{r}prop %>%group_by(neighborhood) %>%summarise(total_bathrooms =sum(full_bath, na.rm =TRUE),total_bedrooms =sum(bedrooms, na.rm =TRUE),bedtobath_Ratio = total_bathrooms / total_bedrooms ) %>%select(neighborhood, bedtobath_Ratio) %>%arrange(desc(bedtobath_Ratio)) %>%ggplot(aes(x = neighborhood, y = bedtobath_Ratio)) +geom_col() ```The class of a neighborhood may have an eeffect on the bed to bath ration. For example, in Mount Adms Higher end neighborhoods have a higher bed to bath ratio because they have more bathrooms for each bedroom. For example, in a high-end home, each bedroom may have its own bathroom.# 3.3) Total Value of home transactions each month per area```{r}prop %>%mutate(month =month(date)) %>%group_by(neighborhood, month) %>%summarise(total_sale_value =sum(value, na.rm =TRUE)) %>%ggplot(aes(x = month, y = total_sale_value)) +geom_col() +facet_wrap(~ neighborhood) +scale_y_continuous(labels = scales::dollar) +# Format y-axis as dollarsscale_x_continuous(breaks =1:12) +# Set x-axis to full numbers from 1 to 12 (months)labs(y ="Total Sale Value", x ="Month") # Label axes```There may be a bit of a spike in the housing market in the summer months. So some seasonality# 4.1) Gifted a property, what would I choose?## What Neighborhood```{r}prop %>%group_by(neighborhood) %>%summarise(avg_value =mean(value, na.rm =TRUE)) %>%ggplot(aes(x =reorder(neighborhood, avg_value), y = avg_value)) +# reorder neighborhoods by avg_valuegeom_col(fill ="skyblue") +# Create column barsscale_y_continuous(labels = scales::dollar) +# Format y-axis as dollar amountslabs(title ="Average Home Value by Neighborhood", x ="Neighborhood", y ="Average Home Value") +theme(axis.text.x =element_text(angle =45, hjust =1)) # Rotate x-axis labels for better readability```First, I would want to purchase a house in Mount Adams because the area has the most expensive homes on average. Because of this, I want to live there because the expensive area should help support the value of the house I am gifted.## How Many Bedrooms```{r}prop %>%filter(neighborhood =='Mount Adams') %>%group_by(bedrooms) %>%summarise(avg_sale_value =mean(value, na.rm =TRUE)) %>%ggplot(aes(x = bedrooms, y = avg_sale_value)) +geom_col(fill ="skyblue") +scale_y_continuous(labels = scales::dollar) +labs(title ="Average Sale Price by Number of Bedrooms in Mount Adams", x ="Number of Bedrooms", y ="Average Sale Price") +theme_minimal()```I would want to have 4 bedrooms in my house in Mount Adams, because in this area 4 bedroom has the highest home value## How Many Bathrooms```{r}prop %>%filter(neighborhood =='Mount Adams', bedrooms ==4) %>%group_by(full_bath) %>%summarise(avg_sale_value =mean(value, na.rm =TRUE)) %>%ggplot(aes(x = full_bath, y = avg_sale_value)) +geom_col(fill ="skyblue") +scale_y_continuous(labels = scales::dollar) +labs(title ="Average Sale Price by Number of Full Bathrooms for 4-Bedroom Homes in Mount Adams", x ="Number of Full Bathrooms", y ="Average Sale Price") +theme_minimal()```This column chart shows the optimal number of bathrooms for a 4 bedroom homes in mount adams. Four bathrooms are the best choice to maximize home value## Year Built```{r}prop %>%group_by(yr_blt) %>%summarise(count =n(), avg_value =mean(value, na.rm =TRUE)) %>%# Get count and avg_value for each groupfilter(count >=30) %>%# Filter for years with 30 or more propertiestop_n(5, avg_value) %>%# Filter for the top 5 years with the highest average valuesggplot(aes(x =reorder(yr_blt, avg_value), y = avg_value)) +# Reorder the years by average valuegeom_col() +# Column chart with average value per year builtscale_y_continuous(labels = scales::dollar) +labs(title ="Top 5 Years built with Highest Average Value", x ="Year Built", y ="Average Value", fill ="Average Value") +# Label for the color legendtheme_minimal()```This column chart shows that homes built in 2018 have the highest average home value. This graph accounts for there being a representative sample for each year of houses being built## Month```{r}prop %>%filter(neighborhood =='Mount Adams') %>%mutate(month =month(date)) %>%group_by(month) %>%summarise(total_sale_value =sum(value, na.rm =TRUE)) %>%ggplot(aes(x = month, y = total_sale_value)) +geom_col() +scale_y_continuous(labels = scales::dollar) +# Format y-axis as dollarsscale_x_continuous(breaks =1:12) +# Set x-axis to full numbers from 1 to 12 (months)labs(y ="Total Sale Value", x ="Month") # Label axes```I would sell the house in October because it is the 10th month that has the highest total accumulated sales value (in Mount Adams). This means a ton of homes are being bought in October or that the homes being bought in October are high in value## What House Would I Buy?I would choose to buy a house in Mount Adams because it is home to some of the highest-value properties in the area. Living in a neighborhood with upscale homes would likely boost the value of my own property. In terms of specific features, I would prefer a house with 4 bedrooms and 4 bathrooms, as these properties in Mount Adams tend to have the highest home values. I would aim for a house built in 2018, as this is a year that has a lot of high end homes that were built# 5.1) Corporations effect on Housing## Average Home Value vs. Year of Transaction```{r}prop %>%mutate(is_corporation =if_else(str_detect(purchaser, 'LLC|Group|Corp|Inc|Partners|TR|Trustee|Holding|Property|Management'), "Corporation", "Individual"),year =year(ymd(date))) %>%# Extract year from the 'date' columnfilter(year >=2018, year <=2021) %>%# Filter data to include only transactions from 2019 onwardgroup_by(year, is_corporation) %>%# Group by year and ownership typesummarise(avg_value =mean(value, na.rm =TRUE)) %>%# Calculate average home value per year and ownership typeggplot(aes(x = year, y = avg_value, fill = is_corporation)) +# Bar chart with fill based on 'is_corporation'geom_col() +# Add bars with transparencyscale_y_continuous(labels = scales::dollar) +# Format y-axis as dollarslabs(title ="Average Home Value vs. Year of Transaction",x ="Year of Transaction",y ="Average Home Value",fill ="Ownership Type") +facet_wrap(~ is_corporation) # Create separate plots for Corporation and Individual```This column chart shows over time, Individuals are buying more expensive house and corporations are buying less expensive houses. Someone could assume that corporations are buying these houses and flipping them for more money to individuals, causing individuals to pay more for housing## Houses Bought vs. Year of Transaction```{r}prop %>%mutate(is_corporation =if_else(str_detect(purchaser, 'LLC|Group|Corp|Inc|Partners|TR|Trustee|Holding|Property|Management'), "Corporation", "Individual"),year =year(ymd(date))) %>%# Extract year from the 'date' columnfilter(year >=2018, year <=2021) %>%# Filter data to include only transactions from 2019 onwardgroup_by(year, is_corporation) %>%# Group by year and ownership typesummarise(houses_Bought =n()) %>%# Calculate average home value per year and ownership typeggplot(aes(x = year, y = houses_Bought, fill = is_corporation)) +# Bar chart with fill based on 'is_corporation'geom_col() +# Add bars with transparencyscale_y_continuous() +# Format y-axis as dollarslabs(title ="Houses Bought vs. Year of Transaction",x ="Year of Transaction",y ="Number of Houses Bought",fill ="Ownership Type") +facet_wrap(~ is_corporation) # Create separate plots for Corporation and Individual```This column chart shows houses bought each year comparing corporations and individuals and the number of houses bought. Both are trending## Average Value per Year```{r}prop %>%mutate(year =year(ymd(date))) %>%filter(year >=2018, year <2022) %>%ggplot(aes(x = year, y = value)) +geom_bar(stat ="summary", fun = mean) +scale_y_continuous(labels = scales::dollar) +labs(title ="Average Value per year",x ="Year of Transaction",y ="Average Value of Property" )```This column chart shows that over time, property values have rose on average through the years 2018 through 2021.## Claim About Investments- Over the last 4 years, corporations are trending to purchase more homes each year, but so are individuals. Individuals are also buying more houses each year which shows that in general, more people could simply be buying more homes in these areas.- Additionally, properties are being increased by price. This may be because of other External Variables and should not be blamed just on corperations.- Corporations are in fact buying more lower end homes, and individuals have been paying more for houses. One could speculate that that corporations could be buying and reselling these homes for a higher price.- Conclusion: I think the data can both support and refute the claim that corperations are causing individuals to pay more for housing. Overall, I think there is some effect, because corperations are buying cheaper and cheaper homes while individuals are buying more expensive homes.The `echo: false` option disables the printing of code (only output is displayed).