Research Question: How does states’ outdoor recreation economy affect housing prices in the U.S.?
Outdoor recreational activity is one major source of entertainment and it helps to improve people’s quality of life and well-being. The positive perception of it can increase the value of housings nearby. If this is true, local government should take into account of the economic benefits brought by recreational activities when making investment decisions.
library(dagitty)
library(ggdag)
##
## Attaching package: 'ggdag'
## The following object is masked from 'package:stats':
##
## filter
dag <- dagitty("dag {
D -> Y
X -> Y
E -> Y
D -> E
T -> Y
D -> T
T -> E
}")
ggdag(dag) +
theme_dag()
Y: housing price D: outdoor recreation economy X: housing
characteristics T: tourism E: employment opportunities
While housing characteristics and outdoor recreation economy can affect the housing prices, there are also some other important variables that are internally involved in this scenario. Outdoor recreational facilities attract tourists and create jobs, so that outdoor recreation economy also influence housing prices through the effects on tourism and employment opportunities.
D -> Y D -> T -> Y D -> E -> Y D -> E <- T -> Y
E is a collider.
Y = ß1D + ß2X + e
setwd("/Users/gracegao/Desktop/EEFE530/PS2")
recreationdata <- read.csv("Table.csv") ### data downloaded from https://www.bea.gov/
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks ggdag::filter(), stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
recreationdata$abbr <- state.abb[match(recreationdata$GeoName, state.name)]
colnames(recreationdata) <- gsub("^X", "", colnames(recreationdata))
recreationdata[9, "abbr"] <- "DC"
recreationdata <- subset(recreationdata, select = -GeoName)
recreation_long <- recreationdata %>%
pivot_longer(
cols = starts_with("20"),
names_to = "year",
values_to = "recreation"
)
names(housingdata)[names(housingdata) == "year_built"] <- "year"
housing_filtered <- housingdata %>%
filter(year >= 2012 & year <= 2023)
recreation_filtered <- recreation_long %>%
filter(year <= 2019)
recreation_filtered$year <- as.integer(recreation_filtered$year)
mergeddata <- left_join(housing_filtered, recreation_filtered, by = c("abbr", "year"))
library(ggplot2)
library(usmap)
library(cowplot)
##
## Attaching package: 'cowplot'
## The following object is masked from 'package:lubridate':
##
## stamp
mean_recreation <- mergeddata %>%
group_by(abbr, year) %>%
summarize(mean_recreation = mean(recreation))
## `summarise()` has grouped output by 'abbr'. You can override using the
## `.groups` argument.
ggplot(mean_recreation, aes(x = year, y = mean_recreation, color = abbr)) +
geom_line() +
geom_point() +
labs(title = "Mean Recreation by State and Year", x = "Year", y = "Mean Recreation") +
theme_minimal()
states_fips <- usmap::fips_info()
mergeddata <- mergeddata %>%
left_join(states_fips, by = c("abbr"))
state_year_summary <- mergeddata %>%
group_by(fips, year) %>%
summarise(mean_recreation, .groups = "drop") %>%
ungroup()
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
## always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
years = 2012:2019
plots <- list()
for (i in years) {
plot_state_map <- subset(state_year_summary, year == i)
plots[[as.character(i)]] = plot_usmap(data = plot_state_map, values = "mean_recreation", labels = FALSE) +
scale_fill_continuous(low = "lightpink", high = "red", name = "mean recreation economy ($)", label = scales::comma) +
labs(title = as.character(i)) +
theme(legend.position = "right")
}
year_plot <- plot_grid(plotlist = plots, ncol = 4)
png(file = "MeanRecreationByYears.png", width = 1500, height = 400)
year_plot
dev.off()
## quartz_off_screen
## 2
rm(year_plot, years, i)
On average, the recreation economy of all states increase over time. CA and FL have the highest mean recreation economy among all states, while the majority have a recreation economy of less than $2.5 million.
library(ggplot2)
library(usmap)
library(cowplot)
mean_housing <- mergeddata %>%
group_by(abbr, year) %>%
summarize(mean_housing = mean(sale_amount))
## `summarise()` has grouped output by 'abbr'. You can override using the
## `.groups` argument.
ggplot(mean_housing, aes(x = year, y = mean_housing, color = abbr)) +
geom_line() +
geom_point() +
labs(title = "Mean Housing Price by State and Year", x = "Year", y = "Mean Housing Price") +
theme_minimal()
state_year_summary2 <- mergeddata %>%
group_by(fips, year) %>%
summarise(mean_housing, .groups = "drop") %>%
ungroup()
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
## always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
years = 2012:2019
plots <- list()
for (i in years) {
plot_state_map <- subset(state_year_summary2, year == i)
plots[[as.character(i)]] = plot_usmap(data = plot_state_map, values = "mean_housing", labels = FALSE) +
scale_fill_continuous(low = "lightpink", high = "red", name = "mean housing prices ($)", label = scales::comma) +
labs(title = as.character(i)) +
theme(legend.position = "right")
}
year_plot <- plot_grid(plotlist = plots, ncol = 4)
png(file = "MeanHousingPricesByYears.png", width = 1500, height = 400)
year_plot
dev.off()
## quartz_off_screen
## 2
rm(year_plot, years, i)
CA has significant higher housing prices than all other states. and it has gone up sharply from 2012 to 2016. Other states’ mean housing prices (except IA for some years) have a flatter are generally below $0.5 million.
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
vars <- c("sale_amount", "recreation", "year", "bedrooms_all_buildings", "number_of_bathrooms", "number_of_fireplaces", "stories_number", "land_square_footage", "AC_presence")
summary <- mergeddata %>%
select(all_of(vars)) %>%
summary()
stargazer(
mergeddata %>%
select(all_of(vars)),
type = "text",
digit = 2,
out = "summary.txt"
)
##
## ==================================================================================
## Statistic N Mean St. Dev. Min Max
## ----------------------------------------------------------------------------------
## sale_amount 154,245 503,406.200 1,089,720.000 100.000 135,635,876.000
## recreation 154,245 2,003,903.000 1,963,290.000 137,388 9,279,814
## year 154,245 2,015.345 2.148 2,012 2,019
## bedrooms_all_buildings 154,245 3.816 1.060 1 114
## number_of_bathrooms 154,245 3.049 1.170 0.250 205.000
## number_of_fireplaces 154,245 1.130 0.737 1 198
## stories_number 154,245 1.527 0.508 0.750 12.000
## land_square_footage 154,245 38,180.140 309,648.700 1 54,550,188
## AC_presence 154,245 0.638 0.481 0 1
## ----------------------------------------------------------------------------------
##
## =
## 2
## -
library(fixest)
library(kableExtra)
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
model <- feols(sale_amount ~ recreation + year + bedrooms_all_buildings + number_of_bathrooms + number_of_fireplaces + stories_number + land_square_footage + AC_presence | abbr, data = mergeddata, cluster~abbr)
tab <- etable(model, se.below = T)
tab %>%
kbl() %>%
kable_classic(font_size = 12)
| model | |
|---|---|
| Dependent Var.: | sale_amount |
| recreation | 0.0885 |
| (0.0639) | |
| year | 4,368.8 |
| (6,972.4) | |
| bedrooms_all_buildings | -20,461.2 |
| (36,927.9) | |
| number_of_bathrooms | 208,826.9 |
| (117,675.9) | |
| number_of_fireplaces | 45,540.1 |
| (36,010.5) | |
| stories_number | 6,589.3 |
| (40,834.4) | |
| land_square_footage | 0.0143 |
| (0.0237) | |
| AC_presence | 55,369.9** |
| (14,663.7) | |
| Fixed-Effects: | ————– |
| abbr | Yes |
| ______________________ | ______________ |
| S.E.: Clustered | by: abbr |
| Observations | 154,245 |
| R2 | 0.15187 |
| Within R2 | 0.05198 |
Based on the model with state-fixed effects, every dollar increase in investment in outdoor recreational activities would bring $0.0885 rise to local housing prices. However, this relationship is not statistically significant at 5%, and the magnitude is pretty small, which may not be economically meaningful in real world. Only one housing characteristic – AC – has a statistically significant coefficient in this model, implying that having AC in the house tends to increase housing values by $55369.9 on average. The R2 is small; the missing of other relevant variables causes the low explanatory power and possibly the non-significant coefficients as well. In addition, recreational economy might not be on buyers’ priority lists as people generally care more about facotrs such as housing characteristics, safety, and school zone.