Midterm Practice

with an introduction to Quarto

Author

Patrick Berry

1.1)

Introduction

The leadership of Xavier University is interested in understanding how economic policies related to the COVID-19 pandemic have influenced residential property prices in the neighborhoods surrounding the university campus. Additionally, the city of Cincinnati has expressed concerns about housing affordability in the area. To address these issues, this analysis investigates property sales trends from 2018 to 2021, focusing on price fluctuations, neighborhood differences, and the overall impact of economic shifts on real estate.

This analysis utilizes data provided by the Hamilton County Auditor, which includes detailed records of residential property transactions in nine neighborhoods adjacent to Xavier University. The dataset captures key attributes such as sale price, property size, number of bedrooms and bathrooms, year built, and neighborhood classification. By leveraging data visualization techniques and statistical analysis, this report aims to make the insights accessible to a broad audience and assist policymakers in making informed decisions regarding housing affordability and market trends.

1.2)

2.1)

prop <- read_csv("http://asayanalytics.com/xu_prop-csv")  # Read the CSV file

2.2 A)

prop <- prop %>%
  mutate(finished_sqft = ifelse(finished_sqft == 0 & total_rooms > 0, NA, finished_sqft))

prop[45:47,6] <- NA

2.2 B)

prop <- prop %>%
  mutate(date = ymd(paste(year, month, day, sep = "-"))) %>%
  select(-year, -month, -day)

# Create the 'multifamily' column with 0 and 1 and join it back to the original dataset
prop <- prop %>%
  left_join(
    prop %>%
      group_by(street_address, street_name) %>%
      summarise(purchase_count = n(), .groups = 'drop') %>% 
      mutate(multifamily = as.integer(purchase_count > 1)),  # Convert TRUE/FALSE to 1/0
    by = c("street_address", "street_name")
  )

# View the updated dataset with the 'multifamily' column
head(prop)

# A tibble: 6 × 19
  parcel_id   purchaser cps   norwood_schools street_address unit_id street_name
  <chr>       <chr>     <lgl> <lgl>                    <dbl> <chr>   <chr>      
1 068-0002-0… LAURENT … TRUE  FALSE                      713 #1      E MCMILLAN…
2 041-0002-0… BRABES I… TRUE  FALSE                     3443 #1      SHAW AVE   
3 053-0001-0… MITCHELL… TRUE  FALSE                     2324 #1809   MADISON RD 
4 053-0001-0… OKADA KE… TRUE  FALSE                     2324 #1810   MADISON RD 
5 055-0004-0… HUBER MA… TRUE  FALSE                     1720 #2      DEXTER AVE 
6 086-0001-0… TDPDX LLC TRUE  FALSE                      536 #2      LIBERTY HI…
# ℹ 12 more variables: use <dbl>, yr_blt <dbl>, value <dbl>,
#   neighborhood <chr>, total_rooms <dbl>, bedrooms <dbl>, full_bath <dbl>,
#   half_bath <dbl>, finished_sqft <dbl>, date <date>, purchase_count <int>,
#   multifamily <int>

#| message: FALSE
#| warning: FALSE
# Mutate the dataframe to create 'value_category'
prop <- prop %>%
  mutate(
    value_category = case_when(
      is.na(value) ~ "Missing",  # Case for missing values
      value >= (mean(prop$value, na.rm = TRUE) - sd(prop$value, na.rm = TRUE)) & value <= (mean(prop$value, na.rm = TRUE) + sd(prop$value, na.rm = TRUE)) ~ "Within 1 SD of Mean",  # Within 1 SD of the mean
      value > (mean(prop$value, na.rm = TRUE) + sd(prop$value, na.rm = TRUE)) ~ "More than 1 SD Above Mean",  # More than 1 SD above the mean
      value < (mean(prop$value, na.rm = TRUE) - sd(prop$value, na.rm = TRUE)) ~ "More than 1 SD Below Mean"  # More than 1 SD below the mean
    )
  )

3.1

# Filter data for single-family properties and create the plot
prop %>%
  filter(multifamily == 0) %>%
  ggplot(aes(x = finished_sqft)) +  # Set the x-axis to the variable of interest
  geom_histogram(bins = 30, fill = "blue", color = "black", alpha = 0.7) +  # Specify histogram aesthetics
  labs(title = "Distribution of Single-Family Dwelling Sizes",
       x = "Finished Square Feet",
       y = "Frequency")

This graph shows that SQFT for single family houses is skewed slightly to the right, so it does not look normally distributed

3.2

prop %>%
  group_by(neighborhood) %>%
  summarise(
    total_bathrooms = sum(full_bath, na.rm = TRUE),
    total_bedrooms = sum(bedrooms, na.rm = TRUE),
    bedtobath_Ratio = total_bathrooms / total_bedrooms
  ) %>%
  select(neighborhood, bedtobath_Ratio) %>%
  arrange(desc(bedtobath_Ratio)) %>%
  ggplot(aes(x = neighborhood, y = bedtobath_Ratio)) +
  geom_col()

Higher end neighborhoods have a higher bed to bath ratio because they have more bathrooms for each bedroom. For example, in a high-end home, each bedroom may have its own bathroom.

3.3

prop %>%
  mutate(month = month(date)) %>%
  group_by(neighborhood, month) %>%
  summarise(total_sale_value = sum(value, na.rm = TRUE)) %>%
  ggplot(aes(x = month, y = total_sale_value)) +
  geom_col() +
  facet_wrap(~ neighborhood) +
  scale_y_continuous(labels = scales::dollar) +  # Format y-axis as dollars
  scale_x_continuous(breaks = 1:12) +  # Set x-axis to full numbers from 1 to 12 (months)
  labs(y = "Total Sale Value", x = "Month")  # Label axes

There may be a bit of a spike in the housing market in the summer months. So some seasonality

4.1

What Neighborhood

prop %>%
  group_by(neighborhood) %>%
  summarise(avg_value = mean(value, na.rm = TRUE)) %>%
  ggplot(aes(x = reorder(neighborhood, avg_value), y = avg_value)) +  # reorder neighborhoods by avg_value
  geom_col(fill = "skyblue") +  # Create column bars
  scale_y_continuous(labels = scales::dollar) +  # Format y-axis as dollar amounts
  labs(title = "Average Home Value by Neighborhood", 
       x = "Neighborhood", 
       y = "Average Home Value") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))  # Rotate x-axis labels for better readability

First, I would want to purchase a house in Mount Adams because the area has the most expensive homes on average. Because of this, I want to live there because the expensive area should help support the value of the house I am gifted.

prop %>%
  filter(neighborhood == 'Mount Adams') %>%
  group_by(bedrooms) %>%
  summarise(avg_sale_value = mean(value, na.rm = TRUE)) %>%
  ggplot(aes(x = bedrooms, y = avg_sale_value)) +
  geom_col(fill = "skyblue") +
  scale_y_continuous(labels = scales::dollar) + 
  labs(title = "Average Sale Price by Number of Bedrooms in Mount Adams", 
       x = "Number of Bedrooms", y = "Average Sale Price") +
  theme_minimal()

I would want to have 4 bedrooms in my house in Mount Adams, because in this area 4 bedroom has the highest home value

prop %>%
  filter(neighborhood == 'Mount Adams', bedrooms == 4) %>% 
  group_by(full_bath) %>% 
  summarise(avg_sale_value = mean(value, na.rm = TRUE)) %>%
  ggplot(aes(x = full_bath, y = avg_sale_value)) +
  geom_col(fill = "skyblue") +
  scale_y_continuous(labels = scales::dollar) + 
  labs(title = "Average Sale Price by Number of Full Bathrooms for 
       4-Bedroom Homes in Mount Adams", 
       x = "Number of Full Bathrooms", y = "Average Sale Price") +
  theme_minimal()

For 4 bedroom homes in mount adams, four bathrooms are the best choice to maximize home value

prop %>%
  group_by(yr_blt) %>%
  summarise(count = n(), avg_value = mean(value, na.rm = TRUE)) %>% # Get count and avg_value for each group
  filter(count >= 30) %>%  # Filter for years with 30 or more properties

  top_n(5, avg_value) %>%  # Filter for the top 5 years with the highest average values
  ggplot(aes(x = reorder(yr_blt, avg_value), y = avg_value)) +  # Reorder the years by average value
  geom_col() +  # Column chart with average value per year built
  scale_y_continuous(labels = scales::dollar) +
  labs(title = "Top 5 Years built with Highest Average Value", 
       x = "Year Built", 
       y = "Average Value", 
       fill = "Average Value") +  # Label for the color legend
  theme_minimal()

I want to have a newer home built in 2018. homes built in 2018 have the highest average home value. This graph accounts for there being a representative sample for each year of houses being built

prop %>%
  filter(neighborhood == 'Mount Adams') %>%
  mutate(month = month(date)) %>%
  group_by(month) %>%
  summarise(total_sale_value = sum(value, na.rm = TRUE)) %>%
  ggplot(aes(x = month, y = total_sale_value)) +
  geom_col() +
  scale_y_continuous(labels = scales::dollar) +  # Format y-axis as dollars
  scale_x_continuous(breaks = 1:12) +  # Set x-axis to full numbers from 1 to 12 (months)
  labs(y = "Total Sale Value", x = "Month")  # Label axes

I would sell the house in October because it is the 10th month that has the highest total accumulated sales value (in Mount Adams). This means a ton of homes are being bought in October or that the homes being bought in October are high in value

What House Would I Buy?

I would choose to buy a house in Mount Adams because it is home to some of the highest-value properties in the area. Living in a neighborhood with upscale homes would likely boost the value of my own property. In terms of specific features, I would prefer a house with 4 bedrooms and 4 bathrooms, as these properties in Mount Adams tend to have the highest home values. I would aim for a house built in 2018, as this is a year that has a lot of high end homes that were built

5.1

prop %>%
  mutate(is_corporation = if_else(str_detect(purchaser, 'LLC|Group|Corp|Inc|Partners|TR|Trustee|Holding|Property|Management'), "Corporation", "Individual"),
         year = year(ymd(date))) %>% # Extract year from the 'date' column
  filter(year >= 2018) %>%  # Filter data to include only transactions from 2019 onward
  group_by(year, is_corporation) %>% # Group by year and ownership type
  summarise(avg_value = mean(value, na.rm = TRUE)) %>%   # Calculate average home value per year and ownership type
  ggplot(aes(x = year, y = avg_value, fill = is_corporation)) %>% +  # Bar chart with fill based on 'is_corporation'
  geom_col() %>% +  # Add bars with transparency
  scale_y_continuous(labels = scales::dollar) +
  labs(title = "Average Home Value vs. Year of Transaction",
       x = "Year of Transaction",
       y = "Average Home Value",
       fill = "Ownership Type") +
  facet_wrap(~ is_corporation)  # Create separate plots for Corporation and Individual

prop %>%
  mutate(is_corporation = if_else(str_detect(purchaser, 'LLC|Group|Corp|Inc|Partners|TR|Trustee|Holding|Property|Management'), 'corporation', 'individual'),
         year = year(ymd(date))) %>%  # Extract year from the 'date' column
  filter(year >=2018, year <=2021) %>%
  group_by(year, is_corporation) %>% # Group by year and ownership type
  summarise(total = n()) %>% 
  ggplot(aes(x = year, y = total)) +  # Plot the percentage of corporation-owned properties
  geom_col() +  # Use geom_col() to display the percentages as bars
  labs(title = "Number of Properties obtained by Corporations vs. Individuals per year",
       x = "Year of Transaction",
       y = "Properties Obtained",
       fill = "Ownership Type") +
  facet_wrap(~ is_corporation)

prop %>%
  mutate(year = year(ymd(date))) %>% 
  filter(year >= 2018, year < 2022) %>% 
  ggplot(aes(x = year, y = value)) + 
  geom_bar(stat = "summary", fun = mean) +
  scale_y_continuous(labels = scales::dollar) +
  labs(
    title = "Average Value per year",
    x = "Year of Transaction",
    y = "Average Value of Property"
  )

Over the last 4 years, corporations are trending to purchase more homes each year, but so are individuals. Individuals are also buying more houses each year which shows that in general, more people could simply be buying more homes.
Additionally, properties are being increased slightly by price. This may be because of other variables like inflation, and it is not that massive of an increase.
Corporations are in fact buying more lower end homes. One could speculate that that corporations could be buying and reselling these homes for a higher price.
Conclusion: Corperations may have a minor effect on the increased price of housing.

The echo: false option disables the printing of code (only output is displayed).

--- title: "Midterm Practice" # Name of your HTML output subtitle: "with an introduction to Quarto" author: "Patrick Berry" # Author name toc: true # Generates an automatic table of contents. format: # Options related to formatting. html: # Options related to HTML output. code-tools: TRUE # Allow the code tools option showing in the output. embed-resources: TRUE # Embeds all components into a single HTML file. execute: # Options related to the execution of code chunks. warning: FALSE # FALSE: Code chunk sarnings are hidden by default. message: FALSE # FALSE: Code chunk messages are hidden by default. echo: TRUE --- # 1.1) ## Introduction The leadership of Xavier University is interested in understanding how economic policies related to the COVID-19 pandemic have influenced residential property prices in the neighborhoods surrounding the university campus. Additionally, the city of Cincinnati has expressed concerns about housing affordability in the area. To address these issues, this analysis investigates property sales trends from 2018 to 2021, focusing on price fluctuations, neighborhood differences, and the overall impact of economic shifts on real estate. This analysis utilizes data provided by the Hamilton County Auditor, which includes detailed records of residential property transactions in nine neighborhoods adjacent to Xavier University. The dataset captures key attributes such as sale price, property size, number of bedrooms and bathrooms, year built, and neighborhood classification. By leveraging data visualization techniques and statistical analysis, this report aims to make the insights accessible to a broad audience and assist policymakers in making informed decisions regarding housing affordability and market trends. # 1.2) ```{r} #| include: FALSE library(tidyverse) # Collection of packages for data manipulation, visualization, and analysis library(skimr) # Provides a better summary of data frames than base summary() library(lubridate) # Simplifies working with date-time data library(dplyr) # Part of Tidyverse; used for data manipulation (filter, select, mutate, etc.) ``` # 2.1) ```{r} #| message: FALSE #| warning: FALSE prop <- read_csv("http://asayanalytics.com/xu_prop-csv") # Read the CSV file ``` # 2.2 A) ```{r} prop <- prop %>% mutate(finished_sqft = ifelse(finished_sqft == 0 & total_rooms > 0, NA, finished_sqft)) ``` ```{r} prop[45:47,6] <- NA ``` # 2.2 B) ```{r} prop <- prop %>% mutate(date = ymd(paste(year, month, day, sep = "-"))) %>% select(-year, -month, -day) ``` ```{r} # Create the 'multifamily' column with 0 and 1 and join it back to the original dataset prop <- prop %>% left_join( prop %>% group_by(street_address, street_name) %>% summarise(purchase_count = n(), .groups = 'drop') %>% mutate(multifamily = as.integer(purchase_count > 1)), # Convert TRUE/FALSE to 1/0 by = c("street_address", "street_name") ) # View the updated dataset with the 'multifamily' column head(prop) ``` ```{r} #| message: FALSE #| warning: FALSE # Mutate the dataframe to create 'value_category' prop <- prop %>% mutate( value_category = case_when( is.na(value) ~ "Missing", # Case for missing values value >= (mean(prop$value, na.rm = TRUE) - sd(prop$value, na.rm = TRUE)) & value <= (mean(prop$value, na.rm = TRUE) + sd(prop$value, na.rm = TRUE)) ~ "Within 1 SD of Mean", # Within 1 SD of the mean value > (mean(prop$value, na.rm = TRUE) + sd(prop$value, na.rm = TRUE)) ~ "More than 1 SD Above Mean", # More than 1 SD above the mean value < (mean(prop$value, na.rm = TRUE) - sd(prop$value, na.rm = TRUE)) ~ "More than 1 SD Below Mean" # More than 1 SD below the mean ) ) ``` # 3.1 ```{r} #| message: FALSE #| warning: FALSE # Filter data for single-family properties and create the plot prop %>% filter(multifamily == 0) %>% ggplot(aes(x = finished_sqft)) + # Set the x-axis to the variable of interest geom_histogram(bins = 30, fill = "blue", color = "black", alpha = 0.7) + # Specify histogram aesthetics labs(title = "Distribution of Single-Family Dwelling Sizes", x = "Finished Square Feet", y = "Frequency") ``` This graph shows that SQFT for single family houses is skewed slightly to the right, so it does not look normally distributed # 3.2 ```{r} #| message: FALSE #| warning: FALSE prop %>% group_by(neighborhood) %>% summarise( total_bathrooms = sum(full_bath, na.rm = TRUE), total_bedrooms = sum(bedrooms, na.rm = TRUE), bedtobath_Ratio = total_bathrooms / total_bedrooms ) %>% select(neighborhood, bedtobath_Ratio) %>% arrange(desc(bedtobath_Ratio)) %>% ggplot(aes(x = neighborhood, y = bedtobath_Ratio)) + geom_col() ``` Higher end neighborhoods have a higher bed to bath ratio because they have more bathrooms for each bedroom. For example, in a high-end home, each bedroom may have its own bathroom. # 3.3 ```{r} #| message: FALSE #| warning: FALSE prop %>% mutate(month = month(date)) %>% group_by(neighborhood, month) %>% summarise(total_sale_value = sum(value, na.rm = TRUE)) %>% ggplot(aes(x = month, y = total_sale_value)) + geom_col() + facet_wrap(~ neighborhood) + scale_y_continuous(labels = scales::dollar) + # Format y-axis as dollars scale_x_continuous(breaks = 1:12) + # Set x-axis to full numbers from 1 to 12 (months) labs(y = "Total Sale Value", x = "Month") # Label axes ``` There may be a bit of a spike in the housing market in the summer months. So some seasonality # 4.1 What Neighborhood ```{r} #| message: FALSE #| warning: FALSE prop %>% group_by(neighborhood) %>% summarise(avg_value = mean(value, na.rm = TRUE)) %>% ggplot(aes(x = reorder(neighborhood, avg_value), y = avg_value)) + # reorder neighborhoods by avg_value geom_col(fill = "skyblue") + # Create column bars scale_y_continuous(labels = scales::dollar) + # Format y-axis as dollar amounts labs(title = "Average Home Value by Neighborhood", x = "Neighborhood", y = "Average Home Value") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels for better readability ``` First, I would want to purchase a house in Mount Adams because the area has the most expensive homes on average. Because of this, I want to live there because the expensive area should help support the value of the house I am gifted. ```{r} #| message: FALSE #| warning: FALSE prop %>% filter(neighborhood == 'Mount Adams') %>% group_by(bedrooms) %>% summarise(avg_sale_value = mean(value, na.rm = TRUE)) %>% ggplot(aes(x = bedrooms, y = avg_sale_value)) + geom_col(fill = "skyblue") + scale_y_continuous(labels = scales::dollar) + labs(title = "Average Sale Price by Number of Bedrooms in Mount Adams", x = "Number of Bedrooms", y = "Average Sale Price") + theme_minimal() ``` I would want to have 4 bedrooms in my house in Mount Adams, because in this area 4 bedroom has the highest home value ```{r} #| message: FALSE #| warning: FALSE prop %>% filter(neighborhood == 'Mount Adams', bedrooms == 4) %>% group_by(full_bath) %>% summarise(avg_sale_value = mean(value, na.rm = TRUE)) %>% ggplot(aes(x = full_bath, y = avg_sale_value)) + geom_col(fill = "skyblue") + scale_y_continuous(labels = scales::dollar) + labs(title = "Average Sale Price by Number of Full Bathrooms for 4-Bedroom Homes in Mount Adams", x = "Number of Full Bathrooms", y = "Average Sale Price") + theme_minimal() ``` For 4 bedroom homes in mount adams, four bathrooms are the best choice to maximize home value ```{r} #| message: FALSE #| warning: FALSE prop %>% group_by(yr_blt) %>% summarise(count = n(), avg_value = mean(value, na.rm = TRUE)) %>% # Get count and avg_value for each group filter(count >= 30) %>% # Filter for years with 30 or more properties top_n(5, avg_value) %>% # Filter for the top 5 years with the highest average values ggplot(aes(x = reorder(yr_blt, avg_value), y = avg_value)) + # Reorder the years by average value geom_col() + # Column chart with average value per year built scale_y_continuous(labels = scales::dollar) + labs(title = "Top 5 Years built with Highest Average Value", x = "Year Built", y = "Average Value", fill = "Average Value") + # Label for the color legend theme_minimal() ``` I want to have a newer home built in 2018. homes built in 2018 have the highest average home value. This graph accounts for there being a representative sample for each year of houses being built ```{r} #| message: FALSE #| warning: FALSE prop %>% filter(neighborhood == 'Mount Adams') %>% mutate(month = month(date)) %>% group_by(month) %>% summarise(total_sale_value = sum(value, na.rm = TRUE)) %>% ggplot(aes(x = month, y = total_sale_value)) + geom_col() + scale_y_continuous(labels = scales::dollar) + # Format y-axis as dollars scale_x_continuous(breaks = 1:12) + # Set x-axis to full numbers from 1 to 12 (months) labs(y = "Total Sale Value", x = "Month") # Label axes ``` I would sell the house in October because it is the 10th month that has the highest total accumulated sales value (in Mount Adams). This means a ton of homes are being bought in October or that the homes being bought in October are high in value # What House Would I Buy? I would choose to buy a house in Mount Adams because it is home to some of the highest-value properties in the area. Living in a neighborhood with upscale homes would likely boost the value of my own property. In terms of specific features, I would prefer a house with 4 bedrooms and 4 bathrooms, as these properties in Mount Adams tend to have the highest home values. I would aim for a house built in 2018, as this is a year that has a lot of high end homes that were built # 5.1 ```{r} #| message: FALSE #| warning: FALSE prop %>% mutate(is_corporation = if_else(str_detect(purchaser, 'LLC|Group|Corp|Inc|Partners|TR|Trustee|Holding|Property|Management'), "Corporation", "Individual"), year = year(ymd(date))) %>% # Extract year from the 'date' column filter(year >= 2018) %>% # Filter data to include only transactions from 2019 onward group_by(year, is_corporation) %>% # Group by year and ownership type summarise(avg_value = mean(value, na.rm = TRUE)) %>% # Calculate average home value per year and ownership type ggplot(aes(x = year, y = avg_value, fill = is_corporation)) %>% + # Bar chart with fill based on 'is_corporation' geom_col() %>% + # Add bars with transparency scale_y_continuous(labels = scales::dollar) + labs(title = "Average Home Value vs. Year of Transaction", x = "Year of Transaction", y = "Average Home Value", fill = "Ownership Type") + facet_wrap(~ is_corporation) # Create separate plots for Corporation and Individual ``` ```{r} #| message: FALSE #| warning: FALSE prop %>% mutate(is_corporation = if_else(str_detect(purchaser, 'LLC|Group|Corp|Inc|Partners|TR|Trustee|Holding|Property|Management'), 'corporation', 'individual'), year = year(ymd(date))) %>% # Extract year from the 'date' column filter(year >=2018, year <=2021) %>% group_by(year, is_corporation) %>% # Group by year and ownership type summarise(total = n()) %>% ggplot(aes(x = year, y = total)) + # Plot the percentage of corporation-owned properties geom_col() + # Use geom_col() to display the percentages as bars labs(title = "Number of Properties obtained by Corporations vs. Individuals per year", x = "Year of Transaction", y = "Properties Obtained", fill = "Ownership Type") + facet_wrap(~ is_corporation) ``` ```{r} #| message: FALSE #| warning: FALSE prop %>% mutate(year = year(ymd(date))) %>% filter(year >= 2018, year < 2022) %>% ggplot(aes(x = year, y = value)) + geom_bar(stat = "summary", fun = mean) + scale_y_continuous(labels = scales::dollar) + labs( title = "Average Value per year", x = "Year of Transaction", y = "Average Value of Property" ) ``` - Over the last 4 years, corporations are trending to purchase more homes each year, but so are individuals. Individuals are also buying more houses each year which shows that in general, more people could simply be buying more homes. - Additionally, properties are being increased slightly by price. This may be because of other variables like inflation, and it is not that massive of an increase. - Corporations are in fact buying more lower end homes. One could speculate that that corporations could be buying and reselling these homes for a higher price. - Conclusion: Corperations may have a minor effect on the increased price of housing. The `echo: false` option disables the printing of code (only output is displayed).