Assignment 4 - Practice Midterm
Xavier Unievrsity - BAIS-462 - Williams College of Business
Introduction
1.1 - Greetings everyone! Thank you for taking the time to read through my analysis on housing prices in the Cincinnati area, specifically during the COVID era. The purpose of this article i to provide you all with a better understanding of how house prices can fluctuate in response to something like a global pandemic. The properties within this data set are all located within the proximity of Xavier University’s campus, and the hope is this blog post will better inform you all on pricing trends. Happy Reading!
Data Preparation: (Cleaning & Wrangling)
Before we can dive into any form of data manipulation or analysis, we need to first import the data set necessary for this assignment, and clean it. The data has been provided by the Hamilton Country Auditor and has a record of every residential property sale in the area from January 1, 2018, to December 31, 2021. You can learn more about the Hamilton County Auditor and view every property sale in the county by visiting their government page here: https://www.hamiltoncountyauditor.org/.
Simple Trends & Analysis
This next section of the assignment is to curate and explain three visualizations. These visualizations are created as a direct response to popular questions about the housing market within the vicinity of Xavier University. A visualization is made to answer the question, and each one will have a brief description about what is being told.
The first visualization that we will create is one that shows the distribution of the square footage of a house as it relates to single-family dwelling homes. That visualization is below:
From this visualization, we can tell that SQFT for single-family homes is not normally distributed. We say this because there is a heavy left skew to the data in our histogram, with a vast majority of the houses falling between 1000-3000 SQFT.
The second visualization in this assignment will show the ratio of full bathrooms to bedrooms in each neighborhood around Xavier University:
This visualization does a fantastic job at showing us the phenomena that higher income neighborhoods in the Cincinnati are have a higher Full Bed and Bath ratio than lower income neighborhoods.
Our final visualization in this section of our assignment will show the total value of home transactions in each month for each neighborhood:
As you can see from this visualization, the summer months tend to have a higher total transaction value than the rest of the year. this could be due to extended vacations in the summertime, or the warmer weather, which makes moving much easier than if it were cold. This concludes the Simple Trends & Analysis section of our project.
Directed Analysis
This portion of our midterm tasks us to gather any necessary insight to answer the following question to the best of our ability:
If you were gifted a residential property in this area and you intended to sell it for the highest price:
In what neighborhood would you want it to be located?
What features (size, rooms, bedrooms, etc.) would you want it to have?
How old would you want it to be?
What time of year or day of the week would you want to sell (or does it matter?)
To answer this question as best as possible, the following code was run:
best_neighborhood <- housing_data %>%
group_by(neighborhood) %>%
summarize(avg_sale_price = mean(value, na.rm = TRUE)) %>%
filter(avg_sale_price == max(avg_sale_price, na.rm = TRUE)) %>%
select(neighborhood)
best_SQFT <- housing_data %>%
filter(value == max(value, na.rm = TRUE)) %>%
select(finished_sqft)
best_rooms <- housing_data %>%
filter(value == max(value, na.rm = TRUE)) %>%
select(total_rooms)
best_bedrooms <- housing_data %>%
filter(value == max(value, na.rm = TRUE)) %>%
select(bedrooms)
best_fullbath <- housing_data %>%
filter(value == max(value, na.rm = TRUE)) %>%
select(full_bath)
best_halfbath <- housing_data %>%
filter(value == max(value, na.rm = TRUE)) %>%
select(half_bath)
best_age <- housing_data %>%
group_by(yr_blt) %>%
summarize(price_per_age = mean(value, na.rm = TRUE)) %>%
filter(price_per_age == max(price_per_age, na.rm = TRUE)) %>%
select(yr_blt)
housing_data <- housing_data %>%
mutate(month = month(transaction_date, label = TRUE),
day_of_week = wday(transaction_date, label = TRUE))
month_analysis <- housing_data %>%
group_by(month) %>%
summarize(month_price = mean(value, na.rm = TRUE)) %>%
filter(month_price == max(month_price, na.rm = TRUE)) %>%
select(month)
day_analysis <- housing_data %>%
group_by(day_of_week) %>%
summarize(day_price = mean(value, na.rm = TRUE)) %>%
filter(day_price == max(day_price, na.rm = TRUE)) %>%
select(day_of_week)Before delving into the process I did to answer this question, I thought it would be better to give the answer that I found first. If I were to be gifted a property in this area, I would want it to be located in the city of Mount Adams. I say this because they have the highest average sales price. I would want it to be 8,374, have 11 total rooms, 5 bedrooms, 7 full bath, and 3 half bath. All of these values were determined by examining the house that held the most value in the entire data set. I would want to house to be built 1860, making the property 165 years old. I would want to house to be sold during the summer, specifically in August on a Tuesday, as these are the respective Month and Day of the Week that return the highest average value.
Now, for my methods, I did a very repetitive process, as you can see. For every value I found in this analysis, I grouped by the respective value, and filtered to find the max of that value. If the value included multiple outputs, I found the average first, like when I determined that Mount Adams was the best neighborhood. Doing this process allowed me to conclude the descriptors that would likely generate a property with the most value.
Self-Directed Analysis
Recently, the city of Cincinnati has grown rather concerned at the rate in which corporations, more specifically investment firms, have purchased residential homes. This is done as way to make passive income by renting out the properties to people, rather than giving the people an opportunity to become homeowners. The city has injected themselves into this issue, finding it hard to continuously outbid firms at home auctions in the city, and it is getting rather expensive too. I wanted to analyze the housing data to either confirm or deny the city’s concerns on this matter, and whether or not is has become a contributing factor to the increased pricing in the area. I decided to include my code in the section to highlight my thinking process and show the practices I used to find my answers. Here is what I found:


Based on my analysis, I cannot state with confidence that LLC’s are purchasing homes at an alarming rate for the city to get involved, nor are they contributing to any form of clear price gouging. The first graph shows the total amount LLC involvement over the years as a percentage. The value has always been lower, but spiked in2021, before going back down in 2022. This makes me believe that there are still way more homeowners than companies in the market at this point, and LLC involvement only spiked in 2021, and has been going down according to data. The second graph shows us the AVG transaction value for LLC owned homes and regular homes. The value of regular homes is much greater than that of LLC owned ones, which leads me to believe that the increased price idea has to be coming from another source, and not LLC involvement.