Population Density and Housing Price Dynamics

Advanced Predictive Methods in Data Analytics

Author

Christian McIntire, supported by Dr. Alexandre Scarcioffolo

Introduction

Within this research, I will be comparing and predicting housing price trends between Tennessee and Ohio, representing Southeastern and Midwestern U.S. housing markets, respectively. Both Tennessee and Ohio characterize their respective regions in the United States, and after living in both states for extended periods of time, it is fascinating to contrast the housing markets in these states to explore affordability and growth potential for young professionals navigating rising real estate costs, or looking for potential investments. Housing affordability is a critical concern for younger demographics, especially graduates entering the workforce. Because of rising housing prices in recent years, there has been a steady increase of adults choosing to live with their parents after graduating from college (Avery, 2022). Comparing these two states offers insights into regional differences in the United States and potential opportunities for future homeowners or investors. The goal of this research is to uncover how population growth and housing prices in Tennessee and Ohio compare over time, and understand what factors drive these trends. Can predictive models forecast future housing price trends in each state to guide young professionals in making informed decisions about affordability and opportunities?

This report will use historical housing price data along with population density statistics to identify trends and differences between Tennessee and Ohio. It will also apply predictive modeling techniques, such as time series analysis and machine learning, to forecast housing prices. The expected outcomes for this project are: A comparative analysis of housing price dynamics in Tennessee and Ohio, insights into the drivers of housing prices and their implications for young professionals and eager investors, and predictive modeling to guide decision-making on housing affordability, investment profitability, and long-term opportunities.

Ethical Considerations

When conducting research that pertains to county-specific housing price data, as well as state population data, it is imperative to understand the ethical considerations at hand. To begin, data privacy and sensitivity is a major ethical consideration within this research. While population data is often anonymized, the aggregation of demographic statistics can lead to potential risks like exposing sensitive trends about vulnerable communities or encouraging discriminatory practices. The purpose of this research is to understand and predict housing trends across Tennessee and Ohio as a whole, and focus on different factors that have led to housing pricing increases in both states. This approach avoids misinterpretation or misuse of the data, and does not focus on trends of any specific communities, but studies the larger picture of state and county-wide data. Population growth can often highlight disparities in access to housing, infrastructure, and resources, and results from this research can positively impact urban planning or development policies. However, careful framing and proper background is needed to prevent marginalizing underrepresented populations. Population data may also exclude transient or undocumented populations, which can lead to partially incomplete conclusions about growth or housing demands. According to authors Antony K Cooper and Serena Coetzee, another potential downside of using public data is the possibility of how “any data set is invariably a biased representation of the population” (Cooper & Coetzee, 2020). To combat this, I will be focusing on broader trends within my research and comparing larger picture results from my prediction models between Ohio and Tennessee.

Additionally, these findings are tailored for young professionals or eager investors, so my findings which highlight rising housing prices might inadvertently attract speculative investments, leading to higher levels of gentrification. Increased gentrification is not the desired result of this research, but rather informing readers of what trends are present and possible in future months and years. This research will also provide insights into regional housing affordability, contributing to how young professionals or low-income groups might be disproportionately affected by rising housing costs. Future research could suggest actionable strategies for addressing housing inequality, and focus on affordable but sustainable growth in the busier counties of both Tennessee and Ohio.

Another important consideration to take into account with this research is that the Housing Price Index Dataset, sourced from the Federal Reserve Bank of St. Louis, is not adjusted for inflation. Using these nominal prices without adjusting for inflation could misrepresent affordability trends over time, which I will be taking into account within this research.

I will mention these limitations to ensure that any relevant stakeholders understand the contextual nature and nuances of the data and any pertinent visualizations that I create. Also, in alignment with ethical standards outlined for real estate professionals, this research ensures that all visualizations and predictive models accurately represent trends without unnecessary exaggeration or bias (Treleaven et al., 2021).

Stakeholders

Other stakeholders may include, but are not limited to:

Young Professionals and Recent College Graduates
- Concerned about affordable housing options and proximity to urban job markets in larger cities within Tennessee and Ohio
- Likely interested in actionable insights to make informed decisions on housing investments or a potential relocation

Local and State Governments
- May use these findings for urban planning, zoning decisions, and housing policies

Real Estate Investors and Developers:
- Research provides insights into housing market trends
- Can inform investment strategies in both Tennessee and Ohio
- Ethical concerns may include avoiding exacerbating affordability crises through any speculative development

Academics and Future Researchers
- May use this research to leverage predictive methodology and predictive findings to broaden studies on population and housing dynamics

Overall, there are several key ethical principles I will keep in mind when conducting this research. The first of which is transparency, as I will clearly outline my methodologies, assumptions, and limitations of my models within this document. The next is equity, as I will strive to ensure that the findings within this research do not disproportionately harm or benefit any one demographic group, but provide any curious researcher with more information on past trends and how they may expand out into the future. The third is that I will prioritize recommendations that align with long-term societal benefits. Additionally, I have ensured that any data that I will be using and modeling with is properly sourced and cited within this research.

Comprehensive Analysis and Data Exploration

Data and Design

Due to the complexity of this project and the integration of multiple metrics, I utilized 5 key datasets that form the foundation of this analysis. These datasets were carefully selected to provide both historical depth and the aspects necessary for a comprehensive comparative analysis of housing price trends and population dynamics in Tennessee and Ohio. Because there was not one singular dataset that provided all of the aspects of this research, combining several publicly available datasets allows for robust predictive modeling across a variety of variables, along with interactive data visualizations, and insights into affordability, growth potential, and investment strategies for young professionals.

The first dataset that I will utilize is the All-Transactions House Price Index for Tennessee. The source for this dataset is the Federal Reserve Bank of St. Louis (FRED). This time series dataset covers housing price trends in Tennessee from 1975 to 2024. It provides a housing price index normalized to a baseline year, enabling the analysis of price changes over nearly five decades. This dataset offers a comprehensive view of historical trends within the housing price index of Tennessee, critical for identifying prolonged growth and market dynamics in the state.

The second dataset that I will utilize is also sourced from the Federal Reserve Bank of St. Louis, however this dataset is the All-Transactions House Price Index for Ohio. This will provide a direct comparison between the two states, allowing for precise time-series predictions and visualizations that are key for this report. By pairing this dataset with the Tennessee HPI, I will perform comparative analyses between Tennessee and Ohio, showcasing similarities and differences in regional housing markets over time.

The third dataset, County Median Home Prices and Monthly Mortgage Payments, is essential for interactive visualizations which will demonstrate the drastic variations in county home prices in each state. This dataset includes county-level median home prices and estimated monthly mortgage payments, offering insights into housing affordability. While the raw dataset encompasses all U.S. counties, I will be focusing on counties specific to Tennessee and Ohio, organizing and cleaning the dataset to enhance its utility for this report and research. This dataset is crucial for visualizing regional housing affordability trends and analyzing potential economic impacts on young professionals entering the housing market, and eager investors looking for sustained gain.

The fourth dataset in this report is sourced from the U.S. Census Bureau, by Macrotrends. Macrotrends provides a Tennessee Population Dataset (1900-2023), with over a century of annual population estimates for Tennessee. It allows for time series analysis of population growth and its correlation with housing price trends. While it does not differentiate between counties, the state-level data offers sufficient granularity for this project’s focus on macroeconomic and demographic drivers.

The fifth and final dataset, like the Tennessee dataset, provides annual population estimates for Ohio from 1900 to 2023. It enables a comparative analysis of population trends between the two states, supporting insights into how demographic shifts influence housing markets. This dataset is also from Macrotrends, sourced from the U.S. Census Bureau.

While county level historical population data would have provided further specificity, the lack of sufficiently detailed datasets made statewide population data the strongest available option for this project. By focusing on state-level trends for population dynamics and county-level data for housing prices, this approach balances historical depth with regional particularity. Statewide population trends are still very effective in modeling broad economic and housing market shifts, while county-level housing data will allow for specific visualizations and exploratory sections of this report. These datasets collectively address this project’s core objectives, which include understanding historical and predictive housing trends, exploring the drivers of these trends, and assessing affordability and growth potential for Tennessee and Ohio. This combination of objectives, completed through various statistical analysis, interactive data visualizations, and advanced predictive modeling, will ensure a comprehensive analysis grounded in robust and reliable data.

Visualization Preparation

To begin the analysis of this research, I will be creating two interactive visualizations, one for Tennessee and one for Ohio. The first two visualizations will be three-dimensional choropleth graphs of housing prices in Tennessee and Ohio using the rayshader package in Rstudio.

The three-dimensional element adds a new dimension to the visualizations, making it easier to identify disparities in housing prices across counties. For instance, counties with drastically higher prices will stand out visually, and will draw attention to those regions of economic regard or interest. Additionally, comparing Tennessee and Ohio through similar three-dimensional visualizations allows for a direct comparison of housing price distributions by county, which will produce insights within regional differences. By visualizing housing prices at the county level, this will assist in uncovering patterns that a state-wide analysis might not pick up. This perspective is key to understanding affordability trends for young professionals who may prioritize specific regions within each state. The visualizations will use a green gradient color palette and a simple, coherent design to allow for comparison continuity across diagrams. While the initial choropleth maps provide an overview of county-level trends, the three-dimensional visualizations serve as a bridge for future predictive modeling. They visually emphasize certain counties within each state that have high variance or noticeable outliers, which could warrant further investigation in future forecasting models.

Before I began gathering data for this visualization, I was interested in seeing median home pricing in various counties in Tennessee. The dataset that I found was through the National Association of Realtors, and after cleaning the data, I was able to map each county’s location with the corresponding median home price. Once I created a 2d choropleth of Tennessee, I created a green gradient color palette to visualize changes around the state. This simple green theme along with a plain black border would be the basis for a design theme that I would use for future visualizations in this project as well. Creating a choropleth visualization fit this assignment as it provided a very customizable base graph that I would be able to further build on. Next, I created a choropleth graph for Tennessee housing prices to expand my research and compare Tennessee prices with other state prices in the southeast, again based on county. To do so, I used the same dataset that again used the median county home price as the main metric. However, I knew I could better visualize the data with a more complex visualization. I then utilized the rayshader package to convert my static and flat choropleth graphs into three-dimensional visualizations, to view how drastic the changes across counties were. The new, three-dimensional visualizations now provide an additional design element through the height of each county, which adds an interesting element to the visualization. The graphs are identical in terms of labeling and consistency in order to produce a more consistent output.

Choropleth Visualizations

Data Cleaning

# CSV Cleaning

library(tidyverse)
file_path <- "/Users/cmacbook/Documents/1-Denison/Senior Fall/DA 352/1-Final Project/Data/USMedianHousingPrices.csv"

housing_data <- read_csv(file_path)

housing_data <- housing_data %>%
  mutate(
    `MedianHomePrice` = str_replace_all(`MedianHomePrice`, "[$,]", "") %>% 
      as.numeric(), # Remove dollar signs and commas, then convert to numeric
    state = str_extract(County, ",\\s*\\w+$") %>%    # Extract state after comma
      str_remove_all(",\\s*"),                      # Remove comma and spaces
    County = str_extract(County, "^[^,]+") %>%      # Extract the county name before the comma
      str_trim() %>%                                # Trim leading and trailing spaces
      tolower()                                     # Convert to lowercase
  ) %>%
  distinct(County, state, .keep_all = TRUE)          # Remove duplicates

write_csv(housing_data, "/Users/cmacbook/Documents/1-Denison/Senior Fall/DA 352/1-Final Project/Data/Cleaned_USMedianHousingPrices.csv")

Tennessee - 2D Choropleth Visualization

Visualization Code

library(tidyverse)
library(sf)
library(scales)

# This Code was based off of my visualization in DA301, and adjusted to provide a cleaner output for this visualization

USMedianHousingPrices <- normalizePath("/Users/cmacbook/Documents/1-Denison/Senior Fall/DA 352/1-Final Project/Data/Cleaned_USMedianHousingPrices.csv")
shapefile <- normalizePath("/Users/cmacbook/Documents/1-Denison/Senior Fall/DA 352/1-Final Project/Data/tl_2024_us_county.shp")

# Step 1: Load and prepare the housing data
home_sales <- read_csv(USMedianHousingPrices) %>%
  mutate(
    county = str_trim(tolower(County)),      # Standardize county names
    state = str_trim(tolower(state))        # Standardize state names
  ) %>%
  distinct(county, state, .keep_all = TRUE) # Remove duplicates

home_sales <- home_sales %>%
  mutate(county = str_replace(county, " county$", ""))

# Step 2: Load and prepare the Tennessee shapefile
tennessee_geo <- st_read(shapefile, quiet = TRUE) %>%
  filter(STATEFP == "47") %>%
  mutate(
    NAME = str_trim(tolower(NAME)),
    state = "tennessee"
  ) %>%
  distinct(NAME, state, .keep_all = TRUE)

# Step 3: Merge shapefile with housing data, matching by both `county` and `state`
tennessee_geo <- tennessee_geo %>%
  left_join(home_sales, by = c("NAME" = "county", "state" = "state")) %>%
  mutate(price = ifelse(is.na(`MedianHomePrice`), 
                        median(`MedianHomePrice`, na.rm = TRUE), 
                        `MedianHomePrice`)) # Handle missing prices

# Step 4: Create the 2D ggplot choropleth map
ggTNPrices <- ggplot(data = tennessee_geo) +
  geom_sf(aes(fill = price), color = "black", size = 0.3) + # Add subtle borders
  scale_fill_gradient(
    name = NULL, 
    low = "white", 
    high = "darkgreen", 
    na.value = "grey",
    limits = c(0, 1000000),                   # Set limits for the color scale
    breaks = c(0, 250000, 500000, 750000, 1000000), # Define specific breaks
    labels = scales::label_dollar(scale = 1) # Format labels with dollar signs and commas
  ) +
  labs(
    title = "County Median Home Prices Q1 2024 - Tennessee",
    subtitle = "Data Source: National Association of Realtors"
  ) +
  theme_linedraw() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16),
    plot.subtitle = element_text(hjust = 0.5, size = 12),
    legend.position = "right",
    axis.text = element_blank(),         # Remove axis text
    axis.ticks = element_blank(),        # Remove axis ticks
    axis.title = element_blank(),        # Remove axis titles
    panel.grid.major = element_blank(),  # Remove major gridlines
    panel.grid.minor = element_blank(),  # Remove minor gridlines
    plot.margin = margin(t = 30, r = 30, b = 30, l = 30)  # Add space around the plot
  ) +
  coord_sf(expand = TRUE, clip = "off") # Ensure the map fits nicely within the plot

# Step 5: Render the 2D map
print(ggTNPrices)

The 2D Tennessee Choropleth map provides an interesting visual representation of how median home prices range across Tennessee counties in the first quarter of 2024. The color gradient, ranging from light green (lower prices) to dark green (higher prices) shows disparities and trends within housing affordability across the state.

From this visualization, there are several key observations to be made. The first of which is the dominance of the Nashville Metropolitan area. The dark green shading in the counties surrounding Nashville, such as Davidson and Williamson county, clearly indicates that the highest median home prices reside around the greater Nashville area (County Median Home Prices and Monthly Mortgage Payment, 2024). There are also regional disparities to be analyzed, as rural and less urbanized areas in Tennessee exhibit much lighter shading, with significantly lower median home prices. This trend can also be reversed, as the darker shaded counties (with more expensive median housing prices) fall around more rural areas, namely Chattanooga, Memphis, and Knoxville, in addition to Nashville.

The Map captures subtle graduations in median home prices, making it easier to identify transitional areas where affordability starts to decline as proximity to urban centers increases. This visualization can be useful for policymakers and urban planners to pinpoint areas of high housing cost stress and potentially identify regions to establish affordable housing developments. Conversely, investors may look to Davidson and Williamson county and either be detracted from the high prices, or allured to further potential growth in these urban centers.

Tennessee - 3D Choropleth Visualization

* Click and Drag to rotate figure

Visualization Code

# Tennessee 3D Choropleth

# This code takes the previous 2D visualization and uses the Rayshader package to transform it into a 3D visualizaiton

# Step 1: Load and prepare the housing data
home_sales <- read_csv(USMedianHousingPrices) %>%
  mutate(
    county = str_trim(tolower(County)),      # Standardize county names
    state = str_trim(tolower(state))        # Standardize state names
  ) %>%
  distinct(county, state, .keep_all = TRUE) # Remove duplicates

home_sales <- home_sales %>%
  mutate(county = str_replace(county, " county$", ""))

# Step 2: Load and prepare the Tennessee shapefile
tennessee_geo <- st_read(shapefile, quiet = TRUE) %>% # Suppress reading output
  filter(STATEFP == "47") %>%               # Tennessee FIPS code
  mutate(
    NAME = str_trim(tolower(NAME)),         # Standardize county names
    state = "tennessee"                     # Add state column
  ) %>%
  distinct(NAME, state, .keep_all = TRUE)   # Remove duplicates

# Step 3: Merge shapefile with housing data, matching by both `county` and `state`
tennessee_geo <- tennessee_geo %>%
  left_join(home_sales, by = c("NAME" = "county", "state" = "state")) %>%
  mutate(price = ifelse(is.na(`MedianHomePrice`), 
                        median(`MedianHomePrice`, na.rm = TRUE), 
                        `MedianHomePrice`)) # Handle missing prices

# Step 4: Create a 2D ggplot choropleth map
ggTNPrices <- ggplot(data = tennessee_geo) +
  geom_sf(aes(fill = price), color = NA) +
  scale_fill_gradient(
    name = NULL, 
    low = "white", 
    high = "darkgreen", 
    na.value = "grey",
    limits = c(0, 1000000),                   # Set limits for the color scale
    breaks = c(0, 250000, 500000, 750000, 1000000), # Define specific breaks
    labels = scales::label_dollar(scale = 1) # Format labels with dollar signs and commas
  ) +
  labs(
    title = "County Median Home Prices Q1 2024 - Tennessee",
    subtitle = "Data Source: National Association of Realtors"
  ) +
  theme_linedraw() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 10, margin = margin(b = 15)),
    plot.subtitle = element_text(hjust = 0.5, size = 8, margin = margin(b = 25)),
    legend.text = element_text(size = 7),                         # Adjust legend text size
    axis.text = element_blank(),
    axis.ticks = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    plot.margin = margin(t = 30, r = 20, b = 10, l = 10)
  )

# Step 5: Render the 3D map with rayshader
plot_gg(ggTNPrices,
        multicore = TRUE,
        width = 5,
        height = 5,
        scale = 200,          # Adjust scale for height exaggeration
        windowsize = c(1400, 866),
        zoom = 0.6,
        phi = 25)

rgl::rglwidget()

The 3D visualization builds upon the 2D choropleth map by adding a height dimension, which further emphasizes disparities between counties, and highlights the drastic outlier of Williamson county, towering above other counties. This visually attracts viewers to the spike in the Nashville Metropolitan Area.

The height of the 3D spikes adds an intuitive sense of magnitude within this visualization, making it easier to grasp how much higher home prices are in the Nashville area compared to other counties. Williamson county, known for luxury housing and high median incomes, stands out as the tallest peak in the visualization. This visualization also provides a clearer geospatial understanding of expensive housing around the Nashville area, contrasted with the flatter landscape of rural counties, and as we will soon see, the spread of Ohio counties.

Ohio - 2D Choropleth Visualization

Visualization Code

# Ohio 2D Choropleth

# This graph takes the framework for the Tennessee 2D choropleth, and adapts it to the Ohio dataset for Median home prices

library(tidyverse)
library(sf)
library(scales)

USMedianHousingPrices <- normalizePath("/Users/cmacbook/Documents/1-Denison/Senior Fall/DA 352/1-Final Project/Data/Cleaned_USMedianHousingPrices.csv")
shapefile <- normalizePath("/Users/cmacbook/Documents/1-Denison/Senior Fall/DA 352/1-Final Project/Data/tl_2024_us_county.shp")

home_sales <- read_csv(USMedianHousingPrices) %>%
  mutate(
    county = str_trim(tolower(County)),        # Standardize county names
    state = str_trim(tolower(state))          # Standardize state names
  ) %>%
  distinct(county, state, .keep_all = TRUE)    # Remove duplicates

home_sales <- home_sales %>%
  mutate(county = str_replace(county, " county$", ""))

ohio_geo <- st_read(shapefile, quiet = TRUE) %>%  # Suppress metadata output
  filter(STATEFP == "39") %>%                   # Ohio FIPS code
  mutate(
    NAME = str_trim(tolower(NAME)),             # Standardize county names
    state = "ohio"                              # Add state column
  ) %>%
  distinct(NAME, state, .keep_all = TRUE)       # Remove duplicates

ohio_geo <- ohio_geo %>%
  left_join(home_sales, by = c("NAME" = "county", "state" = "state")) %>%
  mutate(price = ifelse(is.na(`MedianHomePrice`), 
                        median(`MedianHomePrice`, na.rm = TRUE), 
                        `MedianHomePrice`)) # Handle missing prices

ggOHPrices <- ggplot(data = ohio_geo) +
  geom_sf(aes(fill = price), color = "black", size = 0.3) + # Add black borders
  scale_fill_gradient(
    name = NULL, 
    low = "white", 
    high = "darkgreen", 
    na.value = "grey",
    limits = c(0, 600000),                   # Set limits for the color scale
    breaks = c(0, 200000, 400000, 600000),  # Define specific breaks
    labels = scales::label_dollar(scale = 1) # Format labels with dollar signs and commas
  ) +
  labs(
    title = "County Median Home Prices Q1 2024 - Ohio",
    subtitle = "Data Source: National Association of Realtors"
  ) +
  theme_linedraw() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16),
    plot.subtitle = element_text(hjust = 0.5, size = 12),
    legend.position = "right",
    axis.text = element_blank(),         # Remove axis text
    axis.ticks = element_blank(),        # Remove axis ticks
    axis.title = element_blank(),        # Remove axis titles
    panel.grid.major = element_blank(),  # Remove major gridlines
    panel.grid.minor = element_blank(),  # Remove minor gridlines
    plot.margin = margin(t = 30, r = 20, b = 10, l = 10)  # Adjust margins
  ) +
  coord_sf(expand = TRUE, clip = "off")

print(ggOHPrices)

This 2D choropleth visualization provides a visual representation of median home prices across Ohio counties in the first quarter of 2024. The color gradient is also a range from light to dark green to match the visual appeal of the Tennessee Choropleth. The only drastic change is the gradient only ranges up to a median home price value of $600,000, while the Tennessee choropleth ranges up to $1,000,000.

The dark green shading in Franklin County, which houses Columbus, indicates the highest median home prices in the state. This aligns with Columbus being the capital and largest city in Ohio, with a growing economy, increasing job opportunities, and an expanding urban population which drives housing demand and prices up. While the differences are not as drastic as with the Tennessee choropleth, there are still regional disparities in the Ohio choropleth. Rural counties in eastern and southeastern Ohio show lighter shading, indicating lower housing prices. Counties near Cincinnati (Hamilton County) and Cleveland (Cuyahoga County) exhibit intermediate shading, reflecting moderate-to-high housing prices (County Median Home Prices and Monthly Mortgage Payment, 2024). This is likely influenced by urbanization and economic activity near large city centers.

This visualization can guide stakeholders such as policymakers, urban planners and investors in identifying high-cost areas to focus efforts on affordable housing or economic investments.

Ohio - 3D Choropleth Visualization

* Click and Drag to rotate figure

Visualization Code

# Ohio 3D Choropleth

# This code also takes the 2D choropleth and transforms it to 3D with the Rayshader Package

library(tidyverse)
library(sf)
library(rayshader)
library(scales)
library(rgl)

USMedianHousingPrices <- normalizePath("/Users/cmacbook/Documents/1-Denison/Senior Fall/DA 352/1-Final Project/Data/Cleaned_USMedianHousingPrices.csv")
shapefile <- normalizePath("/Users/cmacbook/Documents/1-Denison/Senior Fall/DA 352/1-Final Project/Data/tl_2024_us_county.shp")

home_sales <- read_csv(USMedianHousingPrices) %>%
  mutate(
    county = str_trim(tolower(County)),        # Standardize county names
    state = str_trim(tolower(state))          # Standardize state names
  ) %>%
  distinct(county, state, .keep_all = TRUE)    # Remove duplicates

home_sales <- home_sales %>%
  mutate(county = str_replace(county, " county$", ""))

ohio_geo <- st_read(shapefile, quiet = TRUE) %>%  # Suppress shapefile metadata output
  filter(STATEFP == "39") %>%                   # Ohio FIPS code
  mutate(
    NAME = str_trim(tolower(NAME)),             # Standardize county names
    state = "ohio"                              # Add state column
  ) %>%
  distinct(NAME, state, .keep_all = TRUE)       # Remove duplicates

ohio_geo <- ohio_geo %>%
  left_join(home_sales, by = c("NAME" = "county", "state" = "state")) %>%
  mutate(price = ifelse(is.na(`MedianHomePrice`), 
                        median(`MedianHomePrice`, na.rm = TRUE), 
                        `MedianHomePrice`)) # Handle missing prices

ggOHPrices <- ggplot(data = ohio_geo) +
  geom_sf(aes(fill = price), color = NA) +
  scale_fill_gradient(
    name = NULL, 
    low = "white", 
    high = "darkgreen", 
    na.value = "grey",
    limits = c(0, 600000),                   # Set limits for the color scale
    breaks = c(0, 200000, 400000, 600000), # Define specific breaks
    labels = scales::label_dollar(scale = 1) # Format labels with dollar signs and commas
  ) +
  labs(
    title = "County Median Home Prices Q1 2024 - Ohio",
    subtitle = "Data Source: National Association of Realtors"
  ) +
  theme_linedraw() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 10, margin = margin(b = 15)),
    plot.subtitle = element_text(hjust = 0.5, size = 8, margin = margin(b = 25)),
    legend.text = element_text(size = 7),
    axis.text = element_blank(),
    axis.ticks = element_blank(),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    plot.margin = margin(t = 30, r = 20, b = 10, l = 10)
  )

# Rayshader package utilized for the 3D Visualization
plot_gg(ggOHPrices,
        multicore = TRUE,
        width = 5,
        height = 5,
        scale = 200, # Adjust scale for height exaggeration
        windowsize = c(1400, 866),
        zoom = 0.6,
        phi = 25)

rgl::rglwidget()

The 3D rendition of this choropleth further enhances the disparities in median home prices across Ohio counties by adding a height dimension. Counties like Franklin, with the highest home prices, appear as prominent spikes opposed to only darker shades of green, offering a more intuitive understanding of price differences across the state, similar to the Tennessee visualizations.

The 3D perspective showcases the clustering of expensive housing around Columbus, Cincinnati, and Cleveland, which reinforces the influence of urbanization and economic activity on housing prices. Also, the 3D spikes illustrate the affordability challenges in Ohio’s urban centers and emphasize a possible need for interventions in high cost areas like Franklin County. Conversely, these more expensive areas offer potential for investments if the trajectory of Ohio’s housing markets remains constant.

Housing Price Index

While the 2D and 3D choropleth maps effectively visualize county-level variations in median home prices, they only provide a snapshot of the housing market at a single point in time, specifically the first quarter of 2024. To complement this static analysis and understand the broader housing trends over time, the Housing Price Index (HPI) can be analyzed for Tennessee and Ohio.

The HPI is a time-series metric which tracks changes in residential property prices over time, relative to a baseline period. This baseline period is 1980, and the baseline HPI is 100. Unlike median home price data, which reflects absolute price levels, the HPI measures relative price appreciation or depreciation, making it a valuable tool for analyzing dynamics and growth trends for the housing market (Cornell, 1981). For example, an HPI value of 200 indicates a 100% increase in home prices since the baseline year, which allows for a standardized comparison of price trends across states. In this research, across Tennessee and Ohio.

By analyzing the HPI, this research can provide valuable insights into how housing prices in Tennessee and Ohio have evolved over several decades. It can also provide a basis for predictive modeling for forecasting future price fluctuations. This perspective is essential for addressing the core research question of how housing markets in these two states compare over time, and what factors influence these trends.

HPI Visualizations

Housing Price Index Trends (1975-2024)

Visualization Code

# Housing Price Index Visualizations

# I created a basic HPI line chart and adjusted certain parameters, as well as debugged with ChatGPT-4o

library(tidyverse)

TN_HPI <- "/Users/cmacbook/Documents/1-Denison/Senior Fall/DA 352/1-Final Project/Data/TNHousingPriceIndex.csv"
OH_HPI <- "/Users/cmacbook/Documents/1-Denison/Senior Fall/DA 352/1-Final Project/Data/OHHousingPriceIndex.csv"

tn_hpi <- read_csv(TN_HPI) %>%
  rename(Date = `DATE`, HPI = `TNSTHPI`) %>%  # Adjust column names to match the dataset
  mutate(State = "Tennessee")  # Add State column for distinction

oh_hpi <- read_csv(OH_HPI) %>%
  rename(Date = `DATE`, HPI = `OHSTHPI`) %>%  # Adjust column names to match the dataset
  mutate(State = "Ohio")  # Add State column for distinction

hpi_data <- bind_rows(tn_hpi, oh_hpi) %>%
  mutate(Date = as.Date(Date))  # Ensure Date column is in Date format

hpi_plot <- ggplot(data = hpi_data, aes(x = Date, y = HPI, color = State)) +
  geom_line(size = 1.2) +  # Line plot
  scale_color_manual(values = c("Tennessee" = "darkblue", "Ohio" = "darkgreen")) +
  labs(
    title = "Housing Price Index Trends (1975–2024)",
    subtitle = "Tennessee vs. Ohio",
    x = NULL,
    y = "Housing Price Index (Base = 100)",
    color = "State"
  ) +
  theme_linedraw() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14),
    plot.subtitle = element_text(hjust = 0.5, size = 10),
    legend.position = "right",
    axis.text = element_text(size = 10),
    axis.title = element_text(size = 12)
  )

print(hpi_plot)

This chart takes data from the Federal Reserve Bank of St. Louis, and compares Tenenssee and Ohio’s Housing Price Index with 100 set as the base year value. The states experienced a major divergence after the dip during the 2008 financial crisis, as Tennessee’s HPI grows sharply, while Ohio’s remains steady, signaling stronger housing demand and sharper growth after 2012. Tennessee’s HPI, visualized in dark blue, has accelerated in growth between 2018 and 2024, driven by factors such as population growth, urbanization, and economic expansion. This suggests affordability challenges for residents, and investment payout for long term investors before the drastic increase. Ohio’s moderate and steady growth, pictured in dark green, reflects a much more stable housing market with fewer affordability pressures, making it more attractive for those seeking consistency in housing prices, and offering a lower barrier to entry within the housing market.

Housing Price Index Trends - ARIMA Modeling for Tennessee

Visualization Code

# ARIMA Model for Tennessee HPI

# This ARIMA model was completed with the assitance of ChatGPT-4o, where I found the the function auto.arima() to mimic the ARIMA visualizations that we created in class and in the slides

library(forecast)
library(ggplot2)
library(lubridate)

tn_hpi <- read_csv("/Users/cmacbook/Documents/1-Denison/Senior Fall/DA 352/1-Final Project/Data/TNHousingPriceIndex.csv") %>%
  rename(Date = `DATE`, HPI = `TNSTHPI`) %>%
  mutate(Date = as.Date(Date))  # Convert Date to Date format

tn_hpi_ts <- ts(tn_hpi$HPI, start = c(1975, 1), frequency = 4)  # Quarterly data

arima_model <- auto.arima(tn_hpi_ts)

end_date <- ymd("2030-12-31")
quarters_to_forecast <- as.numeric((year(end_date) - year(max(tn_hpi$Date))) * 4) +
  quarter(end_date) - quarter(max(tn_hpi$Date))

forecast_hpi <- forecast(arima_model, h = quarters_to_forecast)  # Extend forecast to 2030

autoplot(forecast_hpi) +
  labs(
    title = "ARIMA Forecast of Tennessee HPI (2024–2030)",
    x = NULL,
    y = "Housing Price Index (Base = 100)",
    caption = "Source: Federal Reserve Bank of St. Louis"
  ) +
  scale_y_continuous(labels = scales::label_comma()) +  # Add commas to y-axis labels
  theme_linedraw() +                                    # Apply theme_linedraw
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, margin = margin(b = 10)),
    plot.subtitle = element_text(hjust = 0.5, size = 10, margin = margin(b = 15)),
    plot.caption = element_text(hjust = 0, size = 9, margin = margin(t = 10)),
    legend.position = "none",                          # Remove legend
    axis.text = element_text(size = 10),               # Adjust axis text size
    axis.title = element_text(size = 12)               # Adjust axis title size
  )

Visualization Code

# ARIMA Model for Tennessee HPI (Recency)

# I then took the previous graph and adjusted the time frame to only see a zoomed in image

library(forecast)
library(ggplot2)
library(lubridate)

tn_hpi <- read_csv("/Users/cmacbook/Documents/1-Denison/Senior Fall/DA 352/1-Final Project/Data/TNHousingPriceIndex.csv") %>%
  rename(Date = `DATE`, HPI = `TNSTHPI`) %>%
  mutate(Date = as.Date(Date))  # Convert Date to Date format

tn_hpi_ts <- ts(tn_hpi$HPI, start = c(1975, 1), frequency = 4)  # Quarterly data

# Fit ARIMA model
arima_model <- auto.arima(tn_hpi_ts)

end_date <- ymd("2030-12-31")
quarters_to_forecast <- as.numeric((year(end_date) - year(max(tn_hpi$Date))) * 4) +
  quarter(end_date) - quarter(max(tn_hpi$Date))
forecast_hpi <- forecast(arima_model, h = quarters_to_forecast)  # Forecast until the end of 2030

forecast_df <- as.data.frame(forecast_hpi)
forecast_df$Date <- seq(max(tn_hpi$Date) + months(3), by = "3 months", length.out = nrow(forecast_df))

# Prepare data for zoomed-in view (last 20 years)
zoom_start <- as.numeric(time(tn_hpi_ts)[length(tn_hpi_ts)]) - 20  # Last 20 years
zoom_data <- window(tn_hpi_ts, start = zoom_start)
zoom_forecast <- forecast(arima_model, h = quarters_to_forecast)

# Plot zoomed-in view
autoplot(zoom_forecast, series = "Forecast") +
  autolayer(zoom_data, series = "Historical Data", color = "black") +  # Historical line in black
  labs(
    title = "ARIMA Forecast of Tennessee HPI (2024–2030)",
    x = NULL,
    y = "Housing Price Index (Base = 100)",
    caption = "Source: Federal Reserve Bank of St. Louis"
  ) +
  scale_y_continuous(labels = scales::label_comma()) +  # Add commas to y-axis labels
  coord_cartesian(xlim = c(zoom_start, max(time(zoom_forecast$mean)) + 1)) +  # Adjust x-axis limits
  theme_linedraw() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, margin = margin(b = 10)),
    plot.subtitle = element_text(hjust = 0.5, size = 10, margin = margin(b = 15)),
    plot.caption = element_text(hjust = 0, size = 9, margin = margin(t = 10)),
    legend.position = "none",                         # Remove the legend
    axis.text = element_text(size = 10),               # Adjust axis text size
    axis.title = element_text(size = 12)               # Adjust axis title size
  )

These two graphs display the ARIMA model’s forecast for Tennessee’s Housing Price Index (HPI) from 2024 to 2030, with historical trends and confidence intervals to illustrate future uncertainty.

The first graph presents a long term view, as from 1975 to 2020, Tennessee’s HPI showed steady growth, with a significant increase that began around 2015 and a sharp post-pandemic surge following 2020. The second graph is a zoomed-in snapshot of the more recent trends which shows the ARIMA model’s prediction and confidence intervals with more detail. This graph highlights the post-2005 rapid increase in Tennessee’s HPI and transitions into the forecasted period between 2024 and 2030.

The solid blue line predicts continued upward growth in Tennessee’s HPI through 2030, though at a slightly slower pace.

Confidence Intervals:

The 80% confidence interval, shown in dark blue, shows where the HPI is most likely to fall, with 80% certainty.
The 95% confidence interval, shown in lighter blue, reflects a wider range of possibilities, increasing over time as the forecast horizon extends out.

By analyzing these graphs, it can be expected that the HPI will steadily increase, driven by factors such as population growth and housing demand. There is also uncertainty over time, as near-term predictions are stable but as the confidence interval broadens after 2026, there is potential for more drastic shifts from factors such as economic changes, inflation, or shifts within the housing market. These forecasts, paired with the two levels of confidence intervals, can provide valuable insights for young professionals and college graduates, as well as investors planning for affordability and market dynamics.

Housing Price Index Trends - ARIMA Modeling for Ohio

Visualization Code

# ARIMA Model for Ohio HPI

# The same process was used for this graph as for the Tennessee graph, with auto.arima() to provide a dynamic yet accurate ARIMA model. 

library(forecast)
library(ggplot2)
library(lubridate)

oh_hpi <- read_csv("/Users/cmacbook/Documents/1-Denison/Senior Fall/DA 352/1-Final Project/Data/OHHousingPriceIndex.csv") %>%
  rename(Date = `DATE`, HPI = `OHSTHPI`) %>%
  mutate(Date = as.Date(Date))  # Convert Date to Date format

oh_hpi_ts <- ts(oh_hpi$HPI, start = c(1975, 1), frequency = 4)  # Quarterly data

arima_model_oh <- auto.arima(oh_hpi_ts)

end_date <- ymd("2030-12-31")
quarters_to_forecast <- as.numeric((year(end_date) - year(max(oh_hpi$Date))) * 4) +
  quarter(end_date) - quarter(max(oh_hpi$Date))

forecast_hpi_oh <- forecast(arima_model_oh, h = quarters_to_forecast)  # Extend forecast to 2030

autoplot(forecast_hpi_oh) +
  labs(
    title = "ARIMA Forecast of Ohio HPI (2024–2030)",
    x = NULL,
    y = "Housing Price Index (Base = 100)",
    caption = "Source: Federal Reserve Bank of St. Louis"
  ) +
  scale_y_continuous(labels = scales::label_comma()) +  # Add commas to y-axis labels
  theme_linedraw() +                                    # Apply theme_linedraw
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, margin = margin(b = 10)),
    plot.subtitle = element_text(hjust = 0.5, size = 10, margin = margin(b = 15)),
    plot.caption = element_text(hjust = 0, size = 9, margin = margin(t = 10)),
    legend.position = "none",                          # Remove legend
    axis.text = element_text(size = 10),               # Adjust axis text size
    axis.title = element_text(size = 12)               # Adjust axis title size
  )

Visualization Code

library(forecast)
library(ggplot2)
library(lubridate)

oh_hpi <- read_csv("/Users/cmacbook/Documents/1-Denison/Senior Fall/DA 352/1-Final Project/Data/OHHousingPriceIndex.csv") %>%
  rename(Date = `DATE`, HPI = `OHSTHPI`) %>%
  mutate(Date = as.Date(Date))  # Convert Date to Date format

oh_hpi_ts <- ts(oh_hpi$HPI, start = c(1975, 1), frequency = 4)  # Quarterly data

# Fit ARIMA model
arima_model_oh <- auto.arima(oh_hpi_ts)

# Forecast future values
end_date <- ymd("2030-12-31")
quarters_to_forecast <- as.numeric((year(end_date) - year(max(oh_hpi$Date))) * 4) +
  quarter(end_date) - quarter(max(oh_hpi$Date))
forecast_hpi_oh <- forecast(arima_model_oh, h = quarters_to_forecast)  # Forecast until the end of 2030

forecast_df_oh <- as.data.frame(forecast_hpi_oh)
forecast_df_oh$Date <- seq(max(oh_hpi$Date) + months(3), by = "3 months", length.out = nrow(forecast_df_oh))

zoom_start_oh <- as.numeric(time(oh_hpi_ts)[length(oh_hpi_ts)]) - 20  # Last 20 years
zoom_data_oh <- window(oh_hpi_ts, start = zoom_start_oh)
zoom_forecast_oh <- forecast(arima_model_oh, h = quarters_to_forecast)

autoplot(zoom_forecast_oh, series = "Forecast") +
  autolayer(zoom_data_oh, series = "Historical Data", color = "black") +  # Historical line in black
  labs(
    title = "ARIMA Forecast of Ohio HPI (2024–2030)",
    x = NULL,
    y = "Housing Price Index (Base = 100)",
    caption = "Source: Federal Reserve Bank of St. Louis"
  ) +
  scale_y_continuous(labels = scales::label_comma()) +  # Add commas to y-axis labels
  coord_cartesian(xlim = c(zoom_start_oh, max(time(zoom_forecast_oh$mean)) + 1)) +  # Adjust x-axis limits
  theme_linedraw() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, margin = margin(b = 10)),
    plot.subtitle = element_text(hjust = 0.5, size = 10, margin = margin(b = 15)),
    plot.caption = element_text(hjust = 0, size = 9, margin = margin(t = 10)),
    legend.position = "none",                         # Remove the legend
    axis.text = element_text(size = 10),               # Adjust axis text size
    axis.title = element_text(size = 12)               # Adjust axis title size
  )

The ARIMA forecasts for Ohio’s Housing Price Index (HPI) from 2024-2030 illustrate the anticipated trends in housing prices with corresponding confidence intervals. Ohio’s HPI has demonstrated steady growth all the way from the late 1970’s to the early 2000’s, but was stagnated by the financial crisis of 2008. There was however, a sharp increase after 2015, with a sustained upward trend, mirroring broader economic recovery after the housing market collapse, as demand increased.

The ARIMA model predicts growth in Ohio’s HPI through 2030, however it was at a moderate pace compared to the rapid increases from the previous decade.

Confidence Intervals:

The 80% confidence interval, shown in dark blue, shows where the HPI is most likely to fall, again with 80% certainty.
The 95% confidence interval, shown in lighter blue, reflects a wider range of possibilities, increasing over time as the forecast horizon extends out.

Compared to Tenenssee’s ARIMA forecast, Ohio’s growth trajectory is slower and less steep. This reflects the structural differences in economic and population growth, as well as housing demand between Tennessee and Ohio. Overall, this ARIMA model for Ohio’s HPI growth predicts a steady upward trend in the housing market, and captures uncertainty with two levels of confidence intervals. While housing prices are expected to rise in this model, the growth rate remains moderate compared to that of Tennessee, offering opportunities for college graduates and longer-term investors looking to have a more affordable entry point into the housing market.

Summary Table of Tennessee / Ohio ARIMA Forecasts

Summary Table Code

library(gt)
library(dplyr)
library(forecast)
library(lubridate)

# I wanted to create an organized table, so I referred to the gt package to summarize the ARIMA models. This was done through the usage of ChatGPT-4o to organize and clean the data before generating the table. 

# Fit ARIMA models if not already defined
tn_hpi_ts <- ts(tn_hpi$HPI, start = c(1975, 1), frequency = 4)
oh_hpi_ts <- ts(oh_hpi$HPI, start = c(1975, 1), frequency = 4)

arima_model_tn <- auto.arima(tn_hpi_ts)
arima_model_oh <- auto.arima(oh_hpi_ts)

# Extract full ARIMA model specifications
arima_model_tn_spec <- paste0("ARIMA(", arimaorder(arima_model_tn)[1], ",",
                              arimaorder(arima_model_tn)[2], ",",
                              arimaorder(arima_model_tn)[3], ")")
arima_model_oh_spec <- paste0("ARIMA(", arimaorder(arima_model_oh)[1], ",",
                              arimaorder(arima_model_oh)[2], ",",
                              arimaorder(arima_model_oh)[3], ")")

# Forecast future HPI
quarters_to_forecast <- 28  # Forecast for 7 years (28 quarters)
forecast_tn <- forecast(arima_model_tn, h = quarters_to_forecast, level = c(80, 95))
forecast_oh <- forecast(arima_model_oh, h = quarters_to_forecast, level = c(80, 95))

# Convert forecast objects to data frames
forecast_tn_df <- as.data.frame(forecast_tn)
forecast_tn_df$Date <- seq(max(tn_hpi$Date) + months(3), by = "3 months", length.out = nrow(forecast_tn_df))

forecast_oh_df <- as.data.frame(forecast_oh)
forecast_oh_df$Date <- seq(max(oh_hpi$Date) + months(3), by = "3 months", length.out = nrow(forecast_oh_df))

forecast_summary <- bind_rows(
  forecast_tn_df %>%
    summarise(
      State = "Tennessee",
      ARIMA_Model = arima_model_tn_spec,
      Latest_Historical = max(tn_hpi$HPI),
      Forecast_Start = min(Date),
      Forecast_End = max(Date),
      Mean_HPI = mean(`Point Forecast`, na.rm = TRUE),
      `80%_CI_Lower` = mean(`Lo 80`, na.rm = TRUE),
      `80%_CI_Upper` = mean(`Hi 80`, na.rm = TRUE),
      `95%_CI_Lower` = mean(`Lo 95`, na.rm = TRUE),
      `95%_CI_Upper` = mean(`Hi 95`, na.rm = TRUE)),
  forecast_oh_df %>%
    summarise(
      State = "Ohio",
      ARIMA_Model = arima_model_oh_spec,
      Latest_Historical = max(oh_hpi$HPI),
      Forecast_Start = min(Date),
      Forecast_End = max(Date),
      Mean_HPI = mean(`Point Forecast`, na.rm = TRUE),
      `80%_CI_Lower` = mean(`Lo 80`, na.rm = TRUE),
      `80%_CI_Upper` = mean(`Hi 80`, na.rm = TRUE),
      `95%_CI_Lower` = mean(`Lo 95`, na.rm = TRUE),
      `95%_CI_Upper` = mean(`Hi 95`, na.rm = TRUE)))

forecast_summary %>%
  gt() %>%
  tab_header(
    title = "Summary of Housing Price Index Forecasts",
    subtitle = "Tennessee and Ohio (Forecasting 2024–2030)") %>%
  fmt_date(columns = c(Forecast_Start, Forecast_End), date_style = 3) %>%
  fmt_number(
    columns = starts_with("Mean_HPI"),
    decimals = 2) %>%
  fmt_number(
    columns = starts_with("80%_CI") | starts_with("95%_CI"),
    decimals = 2) %>%
  cols_label(
    Latest_Historical = "Latest Historical HPI",
    Forecast_Start = "Forecast Start Date",
    Forecast_End = "Forecast End Date",
    ARIMA_Model = "ARIMA Model",
    Mean_HPI = "Mean Forecasted HPI",
    `80%_CI_Lower` = "80% CI Lower",
    `80%_CI_Upper` = "80% CI Upper",
    `95%_CI_Lower` = "95% CI Lower",
    `95%_CI_Upper` = "95% CI Upper") %>%
  tab_source_note(source_note = "Source: Federal Reserve Bank of St. Louis") %>%
  tab_options(
    table.font.size = "small",
    column_labels.font.weight = "bold",
    table.border.top.width = px(2),
    table.border.bottom.width = px(2),
    heading.align = "center")

Summary of Housing Price Index Forecasts
Tennessee and Ohio (Forecasting 2024–2030)
State	ARIMA Model	Latest Historical HPI	Forecast Start Date	Forecast End Date	Mean Forecasted HPI	80% CI Lower	80% CI Upper	95% CI Lower	95% CI Upper
Tennessee	ARIMA(0,2,1)	675.37	Mon, Jul 1, 2024	Tue, Apr 1, 2031	830.40	731.91	928.88	679.78	981.01
Ohio	ARIMA(0,2,3)	481.60	Mon, Jul 1, 2024	Tue, Apr 1, 2031	618.59	577.11	660.07	555.16	682.03
Source: Federal Reserve Bank of St. Louis

In order to see specific values within the ARIMA model forecast, I created a table with the (gt) package. One interesting insight was that the mean foretasted HPI in Ohio is 618.59, which is approaching the most recently recorded value of Tennessee’s HPI, at 675.37. This suggests that while both states HPI values are increasing, Ohio is over 6 years behind the HPI value of Tennessee, despite both states having very similar values until the year 2005, prior to the financial crisis. To further see these trends, the next visualization overlays both ARIMA models onto one graph, displaying both state HPI trends simultaneously.

Housing Price Index Trends - ARIMA Modeling Overlay

Visualization Code

library(forecast)
library(ggplot2)
library(lubridate)

# This graph was challenging to create as it required both ARIMA modeling, and then extensive revisions to see visible confidence intervals and color schemes. The base model was adjusted by ChatGPT-4o and then numerous revisions were made following the initial rendered visualization. 

tn_hpi <- read_csv("/Users/cmacbook/Documents/1-Denison/Senior Fall/DA 352/1-Final Project/Data/TNHousingPriceIndex.csv") %>%
  rename(Date = `DATE`, HPI = `TNSTHPI`) %>%
  mutate(Date = as.Date(Date))

tn_hpi_ts <- ts(tn_hpi$HPI, start = c(1975, 1), frequency = 4)
arima_model_tn <- auto.arima(tn_hpi_ts)
quarters_to_forecast <- as.numeric((year(ymd("2030-12-31")) - year(max(tn_hpi$Date))) * 4) +
  quarter(ymd("2030-12-31")) - quarter(max(tn_hpi$Date))
forecast_tn <- forecast(arima_model_tn, h = quarters_to_forecast, level = c(80, 95))

oh_hpi <- read_csv("/Users/cmacbook/Documents/1-Denison/Senior Fall/DA 352/1-Final Project/Data/OHHousingPriceIndex.csv") %>%
  rename(Date = `DATE`, HPI = `OHSTHPI`) %>%
  mutate(Date = as.Date(Date))

oh_hpi_ts <- ts(oh_hpi$HPI, start = c(1975, 1), frequency = 4)
arima_model_oh <- auto.arima(oh_hpi_ts)
forecast_oh <- forecast(arima_model_oh, h = quarters_to_forecast, level = c(80, 95))

forecast_tn_df <- as.data.frame(forecast_tn) %>%
  mutate(Date = seq(from = max(tn_hpi$Date) + months(3), 
                    by = "3 months", 
                    length.out = nrow(.)),
         State = "Tennessee")

forecast_oh_df <- as.data.frame(forecast_oh) %>%
  mutate(Date = seq(from = max(oh_hpi$Date) + months(3), 
                    by = "3 months", 
                    length.out = nrow(.)),
         State = "Ohio")

historical_tn <- tn_hpi %>%
  select(Date, HPI) %>%
  mutate(State = "Tennessee")

historical_oh <- oh_hpi %>%
  select(Date, HPI) %>%
  mutate(State = "Ohio")

forecast_combined <- bind_rows(
  forecast_tn_df %>% rename(HPI = `Point Forecast`),
  forecast_oh_df %>% rename(HPI = `Point Forecast`),
  historical_tn,
  historical_oh
)

ggplot(data = forecast_combined, aes(x = Date, y = HPI, color = State)) +
  geom_line(data = filter(forecast_combined, Date <= max(historical_tn$Date)), 
            aes(color = State), size = 0.6, linetype = "solid") +  # Historical data
  geom_line(size = 0.9) +
  geom_ribbon(data = forecast_tn_df, aes(x = Date, ymin = `Lo 80`, ymax = `Hi 80`, fill = "Tennessee 80%"), 
              alpha = 0.5, inherit.aes = FALSE) +  # 80% Confidence interval for TN
  geom_ribbon(data = forecast_tn_df, aes(x = Date, ymin = `Lo 95`, ymax = `Hi 95`, fill = "Tennessee 95%"), 
              alpha = 0.35, inherit.aes = FALSE) +  # 95% Confidence interval for TN
  geom_ribbon(data = forecast_oh_df, aes(x = Date, ymin = `Lo 80`, ymax = `Hi 80`, fill = "Ohio 80%"), 
              alpha = 0.5, inherit.aes = FALSE) +  # 80% Confidence interval for OH
  geom_ribbon(data = forecast_oh_df, aes(x = Date, ymin = `Lo 95`, ymax = `Hi 95`, fill = "Ohio 95%"), 
              alpha = 0.35, inherit.aes = FALSE) +  # 95% Confidence interval for OH
  scale_color_manual(values = c("Tennessee" = "darkblue", "Ohio" = "darkgreen")) +
  scale_fill_manual(
    values = c("Tennessee 80%" = "royalblue1", "Tennessee 95%" = "lightblue3", 
               "Ohio 80%" = "palegreen4", "Ohio 95%" = "palegreen"),
    name = "Confidence Intervals") +
  scale_x_date(
    breaks = seq(as.Date("1980-01-01"), as.Date("2030-01-01"), by = "10 years"),
    labels = scales::label_date(format = "%Y")
  ) +
  labs(
    title = "ARIMA Forecast of Housing Price Index (1975–2030)",
    subtitle = "Comparison of Historical and Predicted Data for Tennessee and Ohio",
    x = NULL,
    y = "Housing Price Index (Base = 100)",
    caption = "Source: Federal Reserve Bank of St. Louis",
    color = "State"
  ) +
  scale_y_continuous(labels = scales::label_comma()) +
  theme_linedraw() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, margin = margin(b = 10)),
    plot.subtitle = element_text(hjust = 0.5, size = 10, margin = margin(b = 15)),
    plot.caption = element_text(hjust = 0, size = 9, margin = margin(t = 10)),
    legend.position = "right",
    axis.text = element_text(size = 10),
    axis.title = element_text(size = 12))

The ARIMA forecast of Housing Price Index (HPI) for Tennessee and Ohio highlights significant differences in housing market trends and forecasts between these two states. Tennessee has historically seen much sharper growth, especially since 2015, and accelerated further after 2020. This indicates a rapidly increasing housing demand, likely driven by economic growth factors, population influx, and urbanization. One major area that influences these trends in Tennessee is the Nashville Metropolitan Area, which was highlighted in previous choropleth visualizations. Conversely, Ohio’s HPI has a more gradual and stable trajectory, showing slower and steadier economic growth.

Moving on to the forecasted growth, Tennessee’s ARIMA model’s forecast has a steeper upward trajectory, with a mean forecast significantly higher than Ohio. This suggests Tennessee’s housing prices will continue to rise at a greater rate than Ohio, driven by sustained demand. Ohio’s growth is also forecasted by the ARIMA model, but at a calmer rate. This aligns with Ohio’s historically more stable housing market, influenced by its broader urbanization among several major cities in the state. These include Cincinnati, Cleveland, and namely Columbus. Tennessee’s confidence intervals widen dramatically beyond 2025, while Ohio’s are narrower, reflecting a more predictable and stable housing market with less volatility. Riskier investors would likely be drawn to the rapid growth of Nashville, while other interested buyers might look toward Ohio for steadier growth over time within their housing investments

Population Metrics

The Housing Price Index (HPI) trends for Tennessee and Ohio reveal significant differences in housing market dynamics, with Tennessee experiencing rapid appreciation within its prices, and Ohio seeing slower but steady and reliable growth over time. These variations in housing prices are often closely linked to underlying population growth patterns, as changes in population growth drive housing demand, economic development, and urban expansion.

Rapid increases in Tennessee’s HPI may reflect a surge in the state’s population, notably to large metropolitan areas and urbanized locations. Nashville’s Metropolitan Area is likely to attract new residents, and economic opportunities may be attractive to young professionals moving to a bustling new city.

In order to better understand the drivers behind these trends, it is imperative to examine population growth over time, and compare differences between Ohio’s growth with the growth of Tennessee. Analyzing long-term and more recent population growth patterns in both states can provide insights into relationships between demographic shifts and housing market behavior. The following visualizations will first give an overview on these trends, then dive into more specific correlations between these variables.

Population Visualizations

Comparing Population Trends in Tennessee and Ohio

1900-Present

Visualization Code

library(ggplot2)
library(readr)
library(dplyr)
library(lubridate)

tn_population <- read_csv("/Users/cmacbook/Documents/1-Denison/Senior Fall/DA 352/1-Final Project/Data/TNPopulationData.csv") %>%
  rename(Date = `Date`, Population = `estPopCountTN`) %>%
  mutate(State = "Tennessee", Date = as.Date(Date))  # Add state column and ensure Date format

oh_population <- read_csv("/Users/cmacbook/Documents/1-Denison/Senior Fall/DA 352/1-Final Project/Data/OHPopulationData.csv") %>%
  rename(Date = `Date`, Population = `EstPopCountOH`) %>%
  mutate(State = "Ohio", Date = as.Date(Date))  # Add state column and ensure Date format

population_data <- bind_rows(tn_population, oh_population)

ggplot(data = population_data, aes(x = Date, y = Population, color = State)) +
  geom_line(size = 1) +
  scale_color_manual(values = c("Tennessee" = "darkblue", "Ohio" = "darkgreen")) +
  labs(
    title = "Population Trends in Tennessee and Ohio (1900–Present)",
    x = NULL,
    y = "Population",
    color = "State",
    caption = "Source: U.S. Census Bureau - Population Estimates"
  ) +
  scale_y_continuous(labels = scales::label_comma()) +  # Add commas to y-axis labels
  theme_linedraw() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, margin = margin(b = 10)),
    plot.subtitle = element_text(hjust = 0.5, size = 10, margin = margin(b = 15)),
    plot.caption = element_text(hjust = 0, size = 9, margin = margin(t = 10)),
    axis.text = element_text(size = 10),
    axis.title = element_text(size = 12)
  )

Ohio, shown in the green, started in 1900 with a much larger population compared to Tennessee, shown in blue. Ohio then saw rapid growth through the first half of the 20th century, reaching a peak around the 1970s. Since then, Ohio’s population growth has mostly plateaued, with moderate increases, showing only slight economic and demographic expansion. In contrast, Tennessee has experienced consistent and gradual population growth, with less variability over time since 1900. Notably, Tennessee’s growth rate has accelerated since 1950, reflecting increasing migration to Tennessee, driven by factors such as job opportunities and economic development in cities like Nashville.

While Ohio’s population remains higher overall, Tennessee’s population trajectory shows faster growth in recent decades. This trend also indicates a shift in regional dynamics, with Tennessee emerging as a destination state, while Ohio faces a stagnating population. Overall, Tennessee’s accelerating population growth aligns with its sharper increases in the HPI. As rising demand accompanies population increases in Tennessee, Ohio’s population remains more constant with a stable housing market.

1980-Present

Visualization Code

population_data_filtered <- population_data %>%
  filter(Date >= as.Date("1980-01-01"))

# Plot the population data
ggplot(data = population_data_filtered, aes(x = Date, y = Population, color = State)) +
  geom_line(size = 1) +
  scale_color_manual(values = c("Tennessee" = "darkblue", "Ohio" = "darkgreen")) +
  labs(
    title = "Population Trends in Tennessee and Ohio (1980–Present)",
    x = NULL,
    y = "Population",
    color = "State",
    caption = "Source: U.S. Census Bureau - Population Estimates"
  ) +
  scale_y_continuous(labels = scales::label_comma()) +  # Add commas to y-axis labels
  theme_linedraw() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, margin = margin(b = 10)),
    plot.subtitle = element_text(hjust = 0.5, size = 10, margin = margin(b = 15)),
    plot.caption = element_text(hjust = 0, size = 9, margin = margin(t = 10)),
    axis.text = element_text(size = 10),
    axis.title = element_text(size = 12)
  )

Because this research aims to provide insights into more current trends, I again chose to focus the population growth visualization to a more recent timeline, ranging from 1980 to the present day. This visualization shows again that since 1980, Ohio’s population has remained relatively flat, with minimal growth for over 40 years. The overall trend indicates stagnation, and showcases slower economic development and even emigration to other states.

One of these states which is experiencing the inverse, however, is none other than Tennessee. Contrastly, for more than 40 years, Tennessee has experienced a consistent upward trajectory, with accelerations that can be seen after 2000. This is likely caused by Tennessee’s increasing attractiveness from factors such as economic expansion, job opportunities, and affordable living costs. However the rapid increase in population has increased demand for housing, escalating the prices of houses in Tennessee, leading to counties around Nashville experiencing median home prices of upwards of $1,000,000.

While Ohio’s population remains larger overall, with several large urban centers, the gap has since been narrowing. Tennessee’s steady growth contrasts with Ohio’s stagnation, indicating a shift in demographic and economic dynamics. This is again consistent with the rapid housing price increases of Tennessee, and now more affordable housing of Ohio. Tennessee’s consistent population growth appeals to new residents because of its economic vitality, while Ohio’s flat trend suggests a calmer and stable environment. This population shift underscores broader regional patterns observed within this analysis of the housing market.

Predicting Population Growth with Random Futures

Tennessee Population Growth - Random Futures (1980–Present)

Visualization Code

library(ggplot2)
library(fable)
library(fpp3)
library(tsibble)
library(dplyr)
library(readr)

# I based this code off of the slides example "Total Short Term Visitors to Austrailia", with assistance for debugging from ChatGPT-4o. I then customized the output until the prediciton model gave me my desired output. 

tn_population <- read_csv("/Users/cmacbook/Documents/1-Denison/Senior Fall/DA 352/1-Final Project/Data/TNPopulationData.csv", show_col_types = FALSE) %>%
  rename(Date = `Date`, Population = `estPopCountTN`) %>%
  select(Date, Population) %>%
  mutate(Date = yearmonth(Date)) %>%
  drop_na(Population) %>%
  as_tsibble(index = Date)

fit <- tn_population %>%
  model(ETS(Population))

set.seed(123)
simulations <- fit %>%
  generate(h = 15, times = 100) %>%
  as_tibble() %>%
  mutate(Date = as.Date(Date))  # Ensure Date consistency

confidence_intervals <- simulations %>%
  group_by(Date) %>%
  summarise(
    Lower_80 = quantile(.sim, probs = 0.10),  # 80% confidence interval
    Upper_80 = quantile(.sim, probs = 0.90),
    Lower_95 = quantile(.sim, probs = 0.025), # 95% confidence interval
    Upper_95 = quantile(.sim, probs = 0.975)
  )

tn_population_tbl <- tn_population %>%
  as_tibble() %>%
  mutate(Source = "Historical", Date = as.Date(Date))  # Convert Date format

full_data <- bind_rows(
  tn_population_tbl,
  simulations %>%
    rename(Population = .sim) %>%
    mutate(Source = "Forecast")
)

ggplot() +
  # Historical data
  geom_line(data = full_data %>% filter(Source == "Historical"), 
            aes(x = Date, y = Population), color = "darkblue", size = 1) +
  
  # Forecasted random futures
  geom_line(data = full_data %>% filter(Source == "Forecast"), 
            aes(x = Date, y = Population, group = .rep), 
            color = "steelblue", alpha = 0.3) +
  
  # 95% Confidence Interval
  geom_ribbon(data = confidence_intervals, 
              aes(x = Date, ymin = Lower_95, ymax = Upper_95, fill = "95% Confidence Interval"), 
              alpha = 0.5) +
  
  # 80% Confidence Interval
  geom_ribbon(data = confidence_intervals, 
              aes(x = Date, ymin = Lower_80, ymax = Upper_80, fill = "80% Confidence Interval"), 
              alpha = 0.4) +
  
  # Labels and titles
  labs(
    title = "Tennessee Population Growth (1980-2040)",
    x = NULL,
    y = "Population",
    caption = "Source: U.S. Census Bureau - Population Estimates",
    fill = "Confidence Intervals"  # Legend title
  ) +
  
  # Improved Y-axis formatting
  scale_y_continuous(labels = scales::label_comma()) +
  
  # Custom fill colors for confidence intervals
  scale_fill_manual(
    values = c("80% Confidence Interval" = "blue3", 
               "95% Confidence Interval" = "lawngreen")
  ) +
  
  # Coordinate settings
  coord_cartesian(xlim = as.Date(c("1980-01-01", max(simulations$Date)))) +
  
  theme_linedraw() +
  theme(
    legend.position = "right",
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )

This graph illustrates Tennessee’s population growth, now from 1980 to 2040, with forecasts and associated confidence intervals. The historical population data shows an upward trend as previously mentioned, and the projection period from 2023 to 2040 includes 100 random future paths generated through ETS modeling. This is then overlaid with 80% and 95% confidence intervals to capture uncertainty within the random future forecast.

There is a steady increase from 4.5 million residents in Tennessee to over 7.5 million residents in more recent years, with no periods of stagnation or decline. The future projections through random futures suggest a continued upward trend in the population, with slight variability to account for unexpected events.

Confidence Intervals:

The 80% confidence interval, shown in darker blue, indicates where the population is most likely to fall, with 80% certainty.
The 95% confidence interval, shown in light green, reflects a wider range of possibilities, increasing over time as the forecast horizon extends out and accounting for greater uncertainty as the forecast expands.

The population estimate is projected to reach 8 to 8.5 million by 2040, with a slight change of deviation above or below that range. This graphic provides a solid foundation for continued housing market growth, in conjunction with an increasing population in Tennessee. This aligns with previous graph analysis which highlights the interdependence of population growth with an increase in prices within the housing market, visualized by the rising HPI.

Ohio Population Growth - Random Futures (1980–Present)

Visualization Code

library(ggplot2)
library(fable)
library(fpp3)
library(tsibble)
library(dplyr)
library(readr)

# This graph mirrored from the previous, with Ohio population data replacing Tennessee population data, again sourced from the U.S. Census Bureau. 

oh_population <- read_csv("/Users/cmacbook/Documents/1-Denison/Senior Fall/DA 352/1-Final Project/Data/OHPopulationData.csv", show_col_types = FALSE) %>%
  rename(Date = `Date`, Population = `EstPopCountOH`) %>%
  select(Date, Population) %>%
  mutate(Date = yearmonth(Date)) %>%
  drop_na(Population) %>%
  as_tsibble(index = Date)

fit <- oh_population %>%
  model(ETS(Population))

set.seed(123)
simulations <- fit %>%
  generate(h = 15, times = 100) %>%
  as_tibble() %>%
  mutate(Date = as.Date(Date))  # Ensure Date consistency

confidence_intervals <- simulations %>%
  group_by(Date) %>%
  summarise(
    Lower_80 = quantile(.sim, probs = 0.10),  # 80% confidence interval
    Upper_80 = quantile(.sim, probs = 0.90),
    Lower_95 = quantile(.sim, probs = 0.025), # 95% confidence interval
    Upper_95 = quantile(.sim, probs = 0.975)
  )

# Convert historical data to tibble for compatibility
oh_population_tbl <- oh_population %>%
  as_tibble() %>%
  mutate(Source = "Historical", Date = as.Date(Date))

full_data <- bind_rows(
  oh_population_tbl,
  simulations %>%
    rename(Population = .sim) %>%
    mutate(Source = "Forecast")
)

ggplot() +
  # Historical data
  geom_line(data = full_data %>% filter(Source == "Historical"), 
            aes(x = Date, y = Population), color = "darkgreen", size = 1) +
  
  # Forecasted random futures
  geom_line(data = full_data %>% filter(Source == "Forecast"), 
            aes(x = Date, y = Population, group = .rep), 
            color = "springgreen4", alpha = 0.6) +
  
  # 95% Confidence Interval
  geom_ribbon(data = confidence_intervals, 
              aes(x = Date, ymin = Lower_95, ymax = Upper_95, fill = "95% Confidence Interval"), 
              alpha = 0.4) +
  
  # 80% Confidence Interval
  geom_ribbon(data = confidence_intervals, 
              aes(x = Date, ymin = Lower_80, ymax = Upper_80, fill = "80% Confidence Interval"), 
              alpha = 0.5) +
  
  # Labels and titles
  labs(
    title = "Ohio Population Growth (1980-2040)",
    x = NULL,
    y = "Population",
    caption = "Source: U.S. Census Bureau - Population Estimates",
    fill = "Confidence Intervals"  # Legend title
  ) +
  
  # Improved Y-axis formatting
  scale_y_continuous(labels = scales::label_comma()) +
  
  # Custom fill colors for confidence intervals
  scale_fill_manual(
    values = c("80% Confidence Interval" = "blue3", 
               "95% Confidence Interval" = "lawngreen")
  ) +
  
  # Coordinate settings
  coord_cartesian(xlim = as.Date(c("1980-01-01", max(simulations$Date)))) +
  
  # Theme adjustments
  theme_linedraw() +
  theme(
    legend.position = "right",
    legend.title = element_text(size = 10, face = "bold"),
    legend.text = element_text(size = 9)
  )

This visualization displays Ohio’s population growth from 1980 to 2040, also incorporating a forecast generated by 100 random future paths. While this graph also shows an increase in population over time, it is much more gradual than the increase of Tennessee. This visualization is also overlaid with 80% and 95% confidence intervals.

Ohio’s population experienced minimal growth since 1980, residing between 11 million and 12 million residents. Unlike Tennessee, there is noticeable stagnation, which indicates slower demographic expansion and possible economic limitations.

Confidence Intervals:

The 80% confidence interval, shown in blue for contrast, indicates where the population is most likely to fall, with 80% certainty.
The 95% confidence interval, shown in lighter green, again reflects a wider range of possibilities, accounting for greater uncertainty as the forecast expands.

One conclusion that can be derived from this visualization is that there is heightened risk of continued stagnation in Ohio’s growth. This can be seen in the random future possibilities as well as the confidence intervals. This contrasts with Tennessee’s population projections, which trend upward. In contrast with Ohio’s HPI ARIMA model forecast which only trends upward, this forecast forecasts possibilities of increases and decreases in the state’s population. Despite this population stagnation, however, housing prices may still increase due to limited housing supply, increased urbanization, or economic conditions that are dissociated from population size or growth.

Statistical Analysis and Interpretation

Methodology

ARIMA (AutoRegressive Integrated Moving Average) models were applied to the Housing Price Index (HPI) time-series data for Tennessee and Ohio forecasts for 2024 to 2030. This allowed for a robust statistical approach which accounted for past values, differencing, and stationarity, as well as the dependency of current values on past residual errors.

Model Selection

To achieve an accurate ARIMA model forecast, the auto.arima() function was applied for both Tennessee and Ohio based on the AIC values to balance model accuracy and complexity. Both models forecasted 28 quarters ahead, or 7 years, to extend predictions until the year 2030. 80% and 95% confidence intervals were calculated to account for forecast uncertainty.

Model Interpretation

Tennessee: ARIMA(0,2,1)

p = 0: No autoregressive terms were used, meaning that past values did not directly contribute to this model predicting the current value.
d = 2: This series was differenced twice, in order to achieve stationarity, which removed the strong upward trend in the HPI. This also suggests that a long term trend was present.
q =1: the model does include one moving average term, which accounts for short term noise or errors within the data.

Ohio: ARIMA(0,2,3)

No autoregressive terms were used, again meaning that past values did not directly contribute to this model predicting the current value.
d = 2: This series was different twice, in order to achieve stationarity, which removed trends present.
This model includes three moving average terms, meaning the forecast was based on the past three periods of forecast errors or noise.

Findings

This research provides a comparative analysis of housing prices and population growth growth between Tennessee and Ohio, and focuses on insights relevant to young professionals, policymakers, and potential investors. The integration of visualizations, predictive models, and statistical analysis reveals regional dynamics in both states, key affordability drivers in the housing market, and future potential trends.

Housing Price Disparities

Housing prices show sharp disparities across counties, with the Nashville Metropolitan Area (namely Williamson and Davidson County) standing out as the most expensive region within the state. Other wealthier regions include counties surrounding Knoxville and Chattanooga. Median housing prices exceeded $1,000,000 in Williamson county during the first quarter of 2024, reflecting significant demands and economic growth up until that point in time. More rural areas in Tennessee exhibit much lower median housing prices, which provides a striking contrast when visualizing Tennessee’s counties.

Ohio’s housing price trends are less pronounced. Counties surrounding major cities like Columbus (Franklin County), Cincinnati (Hamilton County) and Cleveland (Cuyahoga County) display moderate-to-high housing costs. Ohio’s housing prices are significantly more affordable than Tennessee’s, and include less outliers.

Housing Price Trends Over Time

The Housing Price Index visualizations and analysis highlighted several differences within the growth of both states. Tennessee experienced rapid growth in HPI after the 2008 financial crisis, with steep acceleration post-2015 due to increasing population levels as well as urbanization and economic growth. Ohio’s HPI also saw similar trends but at a slower degree of growth, reflecting a more stable housing market with less volatility. Ohio’s stability offers a lower barrier to entry for housing, which can be appealing to buyers looking for more consistent market trends.

ARIMA Forecasting Models

Tennessee’s ARIMA(0,2,1) model forecasted upward growth in housing prices up until 2030, and the corresponding confidence intervals indicate growing uncertainty with housing trends after 2025. This could reflect sensitivity with economic conditions, interest rates, or supply constraints. Higher risk and faster appreciation make Tennessee attractive for speculative investors, but also provides an affordability challenge.

Ohio’s ARIMA(0,2,3) model predicts moderate and steady growth until 2030, with narrower confidence intervals reflecting market stability. Ohio’s predictable growth offers safer long-term investment opportunities for young professionals, college graduates, or investors looking for a lower barrier to entry into the housing market.

Population Growth Trends

Population growth in both states has accelerated since 1950, with even faster rates in Tennessee after 2000. Ohio’s growth has since stagnated since the 1970s, and saw minimal changes over the last 40 years. This stagnation, however, may not entirely influence housing prices as they may still increase due to urbanization and limited housing supply. Tennessee’s population increase is likely to increase housing prices, as demand rises in urban centers such as Nashville or Knoxville. Tennessee’s accelerating growth in its population attracts many young people, but also corresponds to sharper housing price increases. Ohio’s flatter population growth aligns with a subtler, more stable housing market which may attract buyers.

Integration of Visualizations

The 2D and 3D choropleth visualizations convey housing price disparities, showing Tennessee’s sharp spikes in urban centers, and Ohio’s flatter landscape with smaller peaks around Columbus, Cincinnati, and Cleveland. Tennessee’s Housing prices and population growth show positive correlation, indicating sustained demand over time for migration to Tennessee, as well as its bustling housing market. Ohio’s slower growth suggests a more balanced market, with less drive and desire for individuals to move to the state.

Conclusions

Key Takeaways

For Young Professionals and College Graduates:

Tennessee offers economic opportunities, and an exciting atmosphere that is rapidly growing, but comes with higher housing costs. This presents higher growth potential, with higher risk and affordability challenges.
Ohio provides a more affordable entry point that contains more consistent housing trends. This reflects stability and affordability with moderate growth, making a safer housing market for college graduates or investors to break into real estate.

For Policymakers and Investors:

Tennessee’s rapid urbanization, especially around the Nashville Metropolitan Area, requires careful and sustainable planning to counteract issues with affordability.
Ohio’s stable market offers long-term investment opportunities that bring lower risk at the cost of slower growth potential.

Future Research

Future research within this topic can involve more complex datasets, as well as county-specific median housing prices over time. Additional advanced modeling techniques can also enhance the depth and applicability of these findings. Analyzing county-specific trends over time can reveal localized housing market dynamics, and can uncover trends such as urban sprawl, gentrification, and economic stagnation within rural areas. Dynamic visualizations, such as an animated 3D choropleth graph, could allow for interactive comparisons between counties, and provide insights into affordability changes over time, while remaining visually appealing and interesting. Future research can also expand the predictive modeling to include more advanced techniques, such as gradient boosting, random forests, or neural networks, to predict housing prices based on multiple features.

Finally, future research will undoubtedly expand the scope of this research to compare the Southeastern United States to the Midwestern United States. The purpose of this would be to uncover broader trends in the US beyond comparisons between Tennessee and Ohio.

Sources

All-Transactions House price Index for Ohio. (2024, November 26).  
    https://fred.stlouisfed.org/series/OHSTHPI  

All-Transactions House price Index for Tennessee. (2024, November 26).  
    https://fred.stlouisfed.org/series/TNSTHPI  

Avery, R. (2022, January 13). Cities with the most adults still living with their parents.  
    KSJB AM 600. https://www.ksjbam.com/2022/01/13/cities-with-the-most-adults-still-living-with-their-parents/  

Cooper, A. K., & Coetzee, S. (2020). On the Ethics of Using Publicly-Available Data.  
    In Lecture notes in computer science (pp. 159–171). https://doi.org/10.1007/978-3-030-45002-1_14
    
Cornell, B. (1981, July 1). Relative vs. Absolute Price Changes: An Empirical - 
    ProQuest.https://www.proquest.com/docview/1297272800?pq-origsite=gscholar&fromopenview=true&sourcetype=Scholarly%20Journals&imgSeq=1

County median home prices and monthly mortgage payment. (2024, July 9).  
    https://www.nar.realtor/research-and-statistics/housing-statistics/county-median-home-prices-and-monthly-mortgage-payment  

Ohio population 1900-2023. (n.d.). MacroTrends.  
    https://www.macrotrends.net/global-metrics/states/ohio/population  

Tennessee population 1900-2023. (n.d.). MacroTrends.  
    https://www.macrotrends.net/global-metrics/states/tennessee/population  

Treleaven, P., Barnett, J., Knight, A., & Serrano, W. (2021). Real Estate Data Marketplace.  
    AI And Ethics, 1(4), 445–462. https://doi.org/10.1007/s43681-021-00053-4

Hyperlinks

https://fred.stlouisfed.org/series/OHSTHPI

https://fred.stlouisfed.org/series/TNSTHPI

https://www.ksjbam.com/2022/01/13/cities-with-the-most-adults-still-living-with-their-parents/

https://doi.org/10.1007/978-3-030-45002-1_14

https://www.proquest.com/docview/1297272800?fromopenview=true&imgSeq=1&pq-origsite=gscholar&sourcetype=Scholarly%20Journals

https://www.nar.realtor/research-and-statistics/housing-statistics/county-median-home-prices-and-monthly-mortgage-payment

https://www.macrotrends.net/global-metrics/states/ohio/population

https://www.macrotrends.net/global-metrics/states/tennessee/population

https://doi.org/10.1007/s43681-021-00053-4