prop <- read_csv("http://asayanalytics.com/xu_prop-csv") # Read the CSV fileMidterm Practice
with an introduction to Quarto
1.1)
Introduction
The leadership of Xavier University is interested in understanding how economic policies related to the COVID-19 pandemic have influenced residential property prices in the neighborhoods surrounding the university campus. Additionally, the city of Cincinnati has expressed concerns about housing affordability in the area. To address these issues, this analysis investigates property sales trends from 2018 to 2021, focusing on price fluctuations, neighborhood differences, and the overall impact of economic shifts on real estate.
This analysis utilizes data provided by the Hamilton County Auditor, which includes detailed records of residential property transactions in nine neighborhoods adjacent to Xavier University. The dataset captures key attributes such as sale price, property size, number of bedrooms and bathrooms, year built, and neighborhood classification. By leveraging data visualization techniques and statistical analysis, this report aims to make the insights accessible to a broad audience and assist policymakers in making informed decisions regarding housing affordability and market trends.
1.2)
2.1)
2.2 A)
prop <- prop %>%
mutate(finished_sqft = ifelse(finished_sqft == 0 & total_rooms > 0, NA, finished_sqft)) prop[45:47,6] <- NA2.2 B)
prop <- prop %>%
mutate(date = ymd(paste(year, month, day, sep = "-"))) %>%
select(-year, -month, -day) # Create the 'multifamily' column with 0 and 1 and join it back to the original dataset
prop <- prop %>%
left_join(
prop %>%
group_by(street_address, street_name) %>%
summarise(purchase_count = n(), .groups = 'drop') %>%
mutate(multifamily = as.integer(purchase_count > 1)), # Convert TRUE/FALSE to 1/0
by = c("street_address", "street_name")
)
# View the updated dataset with the 'multifamily' column
head(prop)# A tibble: 6 × 19
parcel_id purchaser cps norwood_schools street_address unit_id street_name
<chr> <chr> <lgl> <lgl> <dbl> <chr> <chr>
1 068-0002-0… LAURENT … TRUE FALSE 713 #1 E MCMILLAN…
2 041-0002-0… BRABES I… TRUE FALSE 3443 #1 SHAW AVE
3 053-0001-0… MITCHELL… TRUE FALSE 2324 #1809 MADISON RD
4 053-0001-0… OKADA KE… TRUE FALSE 2324 #1810 MADISON RD
5 055-0004-0… HUBER MA… TRUE FALSE 1720 #2 DEXTER AVE
6 086-0001-0… TDPDX LLC TRUE FALSE 536 #2 LIBERTY HI…
# ℹ 12 more variables: use <dbl>, yr_blt <dbl>, value <dbl>,
# neighborhood <chr>, total_rooms <dbl>, bedrooms <dbl>, full_bath <dbl>,
# half_bath <dbl>, finished_sqft <dbl>, date <date>, purchase_count <int>,
# multifamily <int>
#| message: FALSE
#| warning: FALSE
# Mutate the dataframe to create 'value_category'
prop <- prop %>%
mutate(
value_category = case_when(
is.na(value) ~ "Missing", # Case for missing values
value >= (mean(prop$value, na.rm = TRUE) - sd(prop$value, na.rm = TRUE)) & value <= (mean(prop$value, na.rm = TRUE) + sd(prop$value, na.rm = TRUE)) ~ "Within 1 SD of Mean", # Within 1 SD of the mean
value > (mean(prop$value, na.rm = TRUE) + sd(prop$value, na.rm = TRUE)) ~ "More than 1 SD Above Mean", # More than 1 SD above the mean
value < (mean(prop$value, na.rm = TRUE) - sd(prop$value, na.rm = TRUE)) ~ "More than 1 SD Below Mean" # More than 1 SD below the mean
)
)3.1
# Filter data for single-family properties and create the plot
prop %>%
filter(multifamily == 0) %>%
ggplot(aes(x = finished_sqft)) + # Set the x-axis to the variable of interest
geom_histogram(bins = 30, fill = "blue", color = "black", alpha = 0.7) + # Specify histogram aesthetics
labs(title = "Distribution of Single-Family Dwelling Sizes",
x = "Finished Square Feet",
y = "Frequency")This graph shows that SQFT for single family houses is skewed slightly to the right, so it does not look normally distributed
3.2
prop %>%
group_by(neighborhood) %>%
summarise(
total_bathrooms = sum(full_bath, na.rm = TRUE),
total_bedrooms = sum(bedrooms, na.rm = TRUE),
bedtobath_Ratio = total_bathrooms / total_bedrooms
) %>%
select(neighborhood, bedtobath_Ratio) %>%
arrange(desc(bedtobath_Ratio)) %>%
ggplot(aes(x = neighborhood, y = bedtobath_Ratio)) +
geom_col() Higher end neighborhoods have a higher bed to bath ratio because they have more bathrooms for each bedroom. For example, in a high-end home, each bedroom may have its own bathroom.
3.3
prop %>%
mutate(month = month(date)) %>%
group_by(neighborhood, month) %>%
summarise(total_sale_value = sum(value, na.rm = TRUE)) %>%
ggplot(aes(x = month, y = total_sale_value)) +
geom_col() +
facet_wrap(~ neighborhood) +
scale_y_continuous(labels = scales::dollar) + # Format y-axis as dollars
scale_x_continuous(breaks = 1:12) + # Set x-axis to full numbers from 1 to 12 (months)
labs(y = "Total Sale Value", x = "Month") # Label axesThere may be a bit of a spike in the housing market in the summer months. So some seasonality
4.1
What Neighborhood
prop %>%
group_by(neighborhood) %>%
summarise(avg_value = mean(value, na.rm = TRUE)) %>%
ggplot(aes(x = reorder(neighborhood, avg_value), y = avg_value)) + # reorder neighborhoods by avg_value
geom_col(fill = "skyblue") + # Create column bars
scale_y_continuous(labels = scales::dollar) + # Format y-axis as dollar amounts
labs(title = "Average Home Value by Neighborhood",
x = "Neighborhood",
y = "Average Home Value") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels for better readabilityFirst, I would want to purchase a house in Mount Adams because the area has the most expensive homes on average. Because of this, I want to live there because the expensive area should help support the value of the house I am gifted.
prop %>%
filter(neighborhood == 'Mount Adams') %>%
group_by(bedrooms) %>%
summarise(avg_sale_value = mean(value, na.rm = TRUE)) %>%
ggplot(aes(x = bedrooms, y = avg_sale_value)) +
geom_col(fill = "skyblue") +
scale_y_continuous(labels = scales::dollar) +
labs(title = "Average Sale Price by Number of Bedrooms in Mount Adams",
x = "Number of Bedrooms", y = "Average Sale Price") +
theme_minimal()I would want to have 4 bedrooms in my house in Mount Adams, because in this area 4 bedroom has the highest home value
prop %>%
filter(neighborhood == 'Mount Adams', bedrooms == 4) %>%
group_by(full_bath) %>%
summarise(avg_sale_value = mean(value, na.rm = TRUE)) %>%
ggplot(aes(x = full_bath, y = avg_sale_value)) +
geom_col(fill = "skyblue") +
scale_y_continuous(labels = scales::dollar) +
labs(title = "Average Sale Price by Number of Full Bathrooms for
4-Bedroom Homes in Mount Adams",
x = "Number of Full Bathrooms", y = "Average Sale Price") +
theme_minimal()For 4 bedroom homes in mount adams, four bathrooms are the best choice to maximize home value
prop %>%
group_by(yr_blt) %>%
summarise(count = n(), avg_value = mean(value, na.rm = TRUE)) %>% # Get count and avg_value for each group
filter(count >= 30) %>% # Filter for years with 30 or more properties
top_n(5, avg_value) %>% # Filter for the top 5 years with the highest average values
ggplot(aes(x = reorder(yr_blt, avg_value), y = avg_value)) + # Reorder the years by average value
geom_col() + # Column chart with average value per year built
scale_y_continuous(labels = scales::dollar) +
labs(title = "Top 5 Years built with Highest Average Value",
x = "Year Built",
y = "Average Value",
fill = "Average Value") + # Label for the color legend
theme_minimal()I want to have a newer home built in 2018. homes built in 2018 have the highest average home value. This graph accounts for there being a representative sample for each year of houses being built
prop %>%
filter(neighborhood == 'Mount Adams') %>%
mutate(month = month(date)) %>%
group_by(month) %>%
summarise(total_sale_value = sum(value, na.rm = TRUE)) %>%
ggplot(aes(x = month, y = total_sale_value)) +
geom_col() +
scale_y_continuous(labels = scales::dollar) + # Format y-axis as dollars
scale_x_continuous(breaks = 1:12) + # Set x-axis to full numbers from 1 to 12 (months)
labs(y = "Total Sale Value", x = "Month") # Label axesI would sell the house in October because it is the 10th month that has the highest total accumulated sales value (in Mount Adams). This means a ton of homes are being bought in October or that the homes being bought in October are high in value
What House Would I Buy?
I would choose to buy a house in Mount Adams because it is home to some of the highest-value properties in the area. Living in a neighborhood with upscale homes would likely boost the value of my own property. In terms of specific features, I would prefer a house with 4 bedrooms and 4 bathrooms, as these properties in Mount Adams tend to have the highest home values. I would aim for a house built in 2018, as this is a year that has a lot of high end homes that were built
5.1
prop %>%
mutate(is_corporation = if_else(str_detect(purchaser, 'LLC|Group|Corp|Inc|Partners|TR|Trustee|Holding|Property|Management'), "Corporation", "Individual"),
year = year(ymd(date))) %>% # Extract year from the 'date' column
filter(year >= 2018) %>% # Filter data to include only transactions from 2019 onward
group_by(year, is_corporation) %>% # Group by year and ownership type
summarise(avg_value = mean(value, na.rm = TRUE)) %>% # Calculate average home value per year and ownership type
ggplot(aes(x = year, y = avg_value, fill = is_corporation)) %>% + # Bar chart with fill based on 'is_corporation'
geom_col() %>% + # Add bars with transparency
scale_y_continuous(labels = scales::dollar) +
labs(title = "Average Home Value vs. Year of Transaction",
x = "Year of Transaction",
y = "Average Home Value",
fill = "Ownership Type") +
facet_wrap(~ is_corporation) # Create separate plots for Corporation and Individualprop %>%
mutate(is_corporation = if_else(str_detect(purchaser, 'LLC|Group|Corp|Inc|Partners|TR|Trustee|Holding|Property|Management'), 'corporation', 'individual'),
year = year(ymd(date))) %>% # Extract year from the 'date' column
filter(year >=2018, year <=2021) %>%
group_by(year, is_corporation) %>% # Group by year and ownership type
summarise(total = n()) %>%
ggplot(aes(x = year, y = total)) + # Plot the percentage of corporation-owned properties
geom_col() + # Use geom_col() to display the percentages as bars
labs(title = "Number of Properties obtained by Corporations vs. Individuals per year",
x = "Year of Transaction",
y = "Properties Obtained",
fill = "Ownership Type") +
facet_wrap(~ is_corporation)prop %>%
mutate(year = year(ymd(date))) %>%
filter(year >= 2018, year < 2022) %>%
ggplot(aes(x = year, y = value)) +
geom_bar(stat = "summary", fun = mean) +
scale_y_continuous(labels = scales::dollar) +
labs(
title = "Average Value per year",
x = "Year of Transaction",
y = "Average Value of Property"
)Over the last 4 years, corporations are trending to purchase more homes each year, but so are individuals. Individuals are also buying more houses each year which shows that in general, more people could simply be buying more homes.
Additionally, properties are being increased slightly by price. This may be because of other variables like inflation, and it is not that massive of an increase.
Corporations are in fact buying more lower end homes. One could speculate that that corporations could be buying and reselling these homes for a higher price.
Conclusion: Corperations may have a minor effect on the increased price of housing.
The echo: false option disables the printing of code (only output is displayed).