Problem 1

i

Research Question: How does states’ outdoor recreation economy affect housing prices in the U.S.?

Outdoor recreational activity is one major source of entertainment and it helps to improve people’s quality of life and well-being. The positive perception of it can increase the value of housings nearby. If this is true, local government should take into account of the economic benefits brought by recreational activities when making investment decisions.

ii

library(dagitty)
library(ggdag)
## 
## Attaching package: 'ggdag'
## The following object is masked from 'package:stats':
## 
##     filter
dag <- dagitty("dag {
  D -> Y
  X -> Y
  E -> Y
  D -> E
  T -> Y
  D -> T
  T -> E
}")

ggdag(dag) +
  theme_dag()

Y: housing price D: outdoor recreation economy X: housing characteristics T: tourism E: employment opportunities

While housing characteristics and outdoor recreation economy can affect the housing prices, there are also some other important variables that are internally involved in this scenario. Outdoor recreational facilities attract tourists and create jobs, so that outdoor recreation economy also influence housing prices through the effects on tourism and employment opportunities.

iii

D -> Y D -> T -> Y D -> E -> Y D -> E <- T -> Y

iv

E is a collider.

v

Y = ß1D + ß2X + e

Problem 2

i

setwd("/Users/gracegao/Desktop/EEFE530/PS2")
recreationdata <- read.csv("Table.csv") ### data downloaded from https://www.bea.gov/

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks ggdag::filter(), stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)

recreationdata$abbr <- state.abb[match(recreationdata$GeoName, state.name)]
colnames(recreationdata) <- gsub("^X", "", colnames(recreationdata))
recreationdata[9, "abbr"] <- "DC"
recreationdata <- subset(recreationdata, select = -GeoName)

recreation_long <- recreationdata %>%
  pivot_longer(
    cols = starts_with("20"),
    names_to = "year",
    values_to = "recreation"
  )

names(housingdata)[names(housingdata) == "year_built"] <- "year"
housing_filtered <- housingdata %>%
  filter(year >= 2012 & year <= 2023)

recreation_filtered <- recreation_long %>%
  filter(year <= 2019)
recreation_filtered$year <- as.integer(recreation_filtered$year)

mergeddata <- left_join(housing_filtered, recreation_filtered, by = c("abbr", "year"))

ii

library(ggplot2)
library(usmap)
library(cowplot)
## 
## Attaching package: 'cowplot'
## The following object is masked from 'package:lubridate':
## 
##     stamp
mean_recreation <- mergeddata %>%
  group_by(abbr, year) %>%
  summarize(mean_recreation = mean(recreation))
## `summarise()` has grouped output by 'abbr'. You can override using the
## `.groups` argument.
ggplot(mean_recreation, aes(x = year, y = mean_recreation, color = abbr)) +
  geom_line() +
  geom_point() +
  labs(title = "Mean Recreation by State and Year", x = "Year", y = "Mean Recreation") +
  theme_minimal()

states_fips <- usmap::fips_info()

mergeddata <- mergeddata %>%
  left_join(states_fips, by = c("abbr"))
state_year_summary <- mergeddata %>% 
  group_by(fips, year) %>%
  summarise(mean_recreation, .groups = "drop") %>%
  ungroup()
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
years = 2012:2019
plots <- list()
for (i in years) {
plot_state_map <- subset(state_year_summary, year == i)
plots[[as.character(i)]] = plot_usmap(data = plot_state_map, values = "mean_recreation", labels = FALSE) +
    scale_fill_continuous(low = "lightpink", high = "red", name = "mean recreation economy ($)", label = scales::comma) +
    labs(title = as.character(i)) +
    theme(legend.position = "right")
}

year_plot <- plot_grid(plotlist = plots, ncol = 4)
png(file = "MeanRecreationByYears.png", width = 1500, height = 400)
year_plot
dev.off()
## quartz_off_screen 
##                 2
rm(year_plot, years, i)

On average, the recreation economy of all states increase over time. CA and FL have the highest mean recreation economy among all states, while the majority have a recreation economy of less than $2.5 million.

iii

library(ggplot2)
library(usmap)
library(cowplot)

mean_housing <- mergeddata %>%
  group_by(abbr, year) %>%
  summarize(mean_housing = mean(sale_amount))
## `summarise()` has grouped output by 'abbr'. You can override using the
## `.groups` argument.
ggplot(mean_housing, aes(x = year, y = mean_housing, color = abbr)) +
  geom_line() +
  geom_point() +
  labs(title = "Mean Housing Price by State and Year", x = "Year", y = "Mean Housing Price") +
  theme_minimal()

state_year_summary2 <- mergeddata %>% 
  group_by(fips, year) %>%
  summarise(mean_housing, .groups = "drop") %>%
  ungroup()
## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
years = 2012:2019
plots <- list()
for (i in years) {
plot_state_map <- subset(state_year_summary2, year == i)
plots[[as.character(i)]] = plot_usmap(data = plot_state_map, values = "mean_housing", labels = FALSE) +
    scale_fill_continuous(low = "lightpink", high = "red", name = "mean housing prices ($)", label = scales::comma) +
    labs(title = as.character(i)) +
    theme(legend.position = "right")
}

year_plot <- plot_grid(plotlist = plots, ncol = 4)
png(file = "MeanHousingPricesByYears.png", width = 1500, height = 400)
year_plot
dev.off()
## quartz_off_screen 
##                 2
rm(year_plot, years, i)

CA has significant higher housing prices than all other states. and it has gone up sharply from 2012 to 2016. Other states’ mean housing prices (except IA for some years) have a flatter are generally below $0.5 million.

iv

library(stargazer)
## 
## Please cite as:
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
vars <- c("sale_amount", "recreation", "year", "bedrooms_all_buildings", "number_of_bathrooms", "number_of_fireplaces", "stories_number", "land_square_footage", "AC_presence")

summary <- mergeddata %>%
  select(all_of(vars)) %>%
  summary()

stargazer(
  mergeddata %>%
  select(all_of(vars)),
  type = "text",
  digit = 2,
  out = "summary.txt"
)
## 
## ==================================================================================
## Statistic                 N        Mean        St. Dev.      Min         Max      
## ----------------------------------------------------------------------------------
## sale_amount            154,245  503,406.200  1,089,720.000 100.000 135,635,876.000
## recreation             154,245 2,003,903.000 1,963,290.000 137,388    9,279,814   
## year                   154,245   2,015.345       2.148      2,012       2,019     
## bedrooms_all_buildings 154,245     3.816         1.060        1          114      
## number_of_bathrooms    154,245     3.049         1.170      0.250      205.000    
## number_of_fireplaces   154,245     1.130         0.737        1          198      
## stories_number         154,245     1.527         0.508      0.750      12.000     
## land_square_footage    154,245  38,180.140    309,648.700     1      54,550,188   
## AC_presence            154,245     0.638         0.481        0           1       
## ----------------------------------------------------------------------------------
## 
## =
## 2
## -

v

library(fixest)
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
model <- feols(sale_amount ~ recreation + year + bedrooms_all_buildings + number_of_bathrooms + number_of_fireplaces + stories_number + land_square_footage + AC_presence | abbr, data = mergeddata, cluster~abbr)

tab <- etable(model, se.below = T)
tab %>% 
  kbl() %>%
  kable_classic(font_size = 12)
model
Dependent Var.: sale_amount
recreation 0.0885
(0.0639)
year 4,368.8
(6,972.4)
bedrooms_all_buildings -20,461.2
(36,927.9)
number_of_bathrooms 208,826.9
(117,675.9)
number_of_fireplaces 45,540.1
(36,010.5)
stories_number 6,589.3
(40,834.4)
land_square_footage 0.0143
(0.0237)
AC_presence 55,369.9**
(14,663.7)
Fixed-Effects: ————–
abbr Yes
______________________ ______________
S.E.: Clustered by: abbr
Observations 154,245
R2 0.15187
Within R2 0.05198

Based on the model with state-fixed effects, every dollar increase in investment in outdoor recreational activities would bring $0.0885 rise to local housing prices. However, this relationship is not statistically significant at 5%, and the magnitude is pretty small, which may not be economically meaningful in real world. Only one housing characteristic – AC – has a statistically significant coefficient in this model, implying that having AC in the house tends to increase housing values by $55369.9 on average. The R2 is small; the missing of other relevant variables causes the low explanatory power and possibly the non-significant coefficients as well. In addition, recreational economy might not be on buyers’ priority lists as people generally care more about facotrs such as housing characteristics, safety, and school zone.