Module 9: Multiple Regression
Lab 9: Visualizing Economic Effects of Disasters
This lab examines 96 disaster affected Japanese municipalities over 7 years, from 2011 to 2017 (jp_matching_experiment.csv), totaling 672 city-year observations. This includes 53 coastal municipalities hit by the 2011 tsunami and 43 municipalities, just next door, damaged by the earthquake. How might this disaster have shaped these local economies? Let’s load in our data and get started.
0. Import Data
Load Data
In this dataset, each row is a city-year!
# Load Packages
library(tidyverse) # for data manipulation
library(viridis) # for colors
library(moderndive) # for easy model summaries
# Load Data
muni <- read_csv("jp_disaster_hit_cities.csv") %>%
# Tell R to treat year and pref as ordered categories
mutate(year = factor(year)) %>%
# Rescale our continuous predictors of interest into Z-scores
mutate(
income_per_capita = scale(income_per_capita),
damage_rate = scale(damage_rate),
pop_density = scale(pop_density),
unemployment = scale(unemployment),
relief_rate = scale(relief_rate),
pop_women = scale(pop_women),
pop_over_age_65 = scale(pop_over_age_65))View Data
# View first 3 rows of dataset
muni %>% head(3)| muni_code | muni | year | income_per_capita | damage_rate | pop_density | unemployment | relief_rate | pop_women | pop_over_age_65 |
|---|---|---|---|---|---|---|---|---|---|
| 04100 | Sendai | 2011 | 0.57 | 0.56 | 1.01 | 0.32 | 0.12 | 0.51 | -1.59 |
| 04202 | Ishinomaki | 2011 | -0.39 | 2.40 | -0.31 | 0.85 | 0.20 | 0.41 | 0.06 |
| 04203 | Shiogama | 2011 | -0.17 | 0.53 | 1.27 | 1.59 | -0.24 | 1.63 | 0.26 |
Codebook
In this dataset, our variables mean:
muni: municipality name.year: year of observation.
Outcome Variable
income_per_capita: income per capita in millions of yen per capita. Measure of economic recovery.
Explanatory Variable
damage_rate: rate of buildings damaged or destroyed by earthquake and tsunami, per million residents.
Control Variables
pop_density: population in 1000s of residents per square kilometerunemployment: unemployment rate per 1,000 residents.relief_rate: spending on disaster relief in 1000s of yen per capita.pop_women: % residents who are womenpop_over_age_65: % residents over age 65
In this lab, you’re going to make a model m that predicts income_per_capita for each disaster-hit municipality over time. Please test the effect of damage_rate, controlling for pop_density, unemployment, pop_women, pop_over_age_65, relief_rate, and year. Then, use get_regression_table() and save the output data.frame as tab.
# Run a regression model
m <- muni %>%
lm(formula = income_per_capita ~ damage_rate + pop_density +
unemployment + relief_rate + pop_women + pop_over_age_65 +
year)
# Get regression table results
tab <- m %>%
get_regression_table()# View Table
tab| term | estimate | std_error | statistic | p_value | lower_ci | upper_ci |
|---|---|---|---|---|---|---|
| intercept | -0.133 | 0.048 | -2.777 | 0.006 | -0.226 | -0.039 |
| damage_rate | -0.071 | 0.023 | -3.068 | 0.002 | -0.117 | -0.026 |
| pop_density | 0.519 | 0.022 | 23.452 | 0.000 | 0.476 | 0.563 |
| unemployment | -0.085 | 0.019 | -4.574 | 0.000 | -0.121 | -0.048 |
| relief_rate | -0.189 | 0.023 | -8.162 | 0.000 | -0.234 | -0.143 |
| pop_women | -0.100 | 0.021 | -4.687 | 0.000 | -0.141 | -0.058 |
| pop_over_age_65 | -0.320 | 0.024 | -13.117 | 0.000 | -0.368 | -0.272 |
| year: 2012 | -0.146 | 0.068 | -2.158 | 0.031 | -0.279 | -0.013 |
| year: 2013 | 0.034 | 0.068 | 0.500 | 0.617 | -0.099 | 0.166 |
| year: 2014 | 0.092 | 0.067 | 1.366 | 0.172 | -0.040 | 0.225 |
| year: 2015 | 0.181 | 0.067 | 2.682 | 0.007 | 0.048 | 0.313 |
| year: 2016 | 0.318 | 0.067 | 4.718 | 0.000 | 0.186 | 0.451 |
| year: 2017 | 0.449 | 0.067 | 6.650 | 0.000 | 0.316 | 0.581 |
We can use the lower_ci and upper_ci in moderndive’s get_regression_table() function to build a 95% confidence interval around our estimates. In other words, the range where we are 95% sure the true effect (beta) of that variable lies.
tab %>%
ggplot(mapping = aes(x = term, y = estimate,
ymin = lower_ci, ymax = upper_ci)) +
geom_linerange() +
geom_point() +
# Usually need to flip x and y axes for these plots,
# otherwise they're pretty rough.
coord_flip()It’s pretty cool, but I think we can improve it! Upgrades needed:
Filtering
Better Variable Labels
Aesthetic Choices
Task 1. Filtering
The intercept has a different meaning than our beta coefficients. The intercept is the predicted income per capita if all our explanatory variables equal 0. But beta coefficients are the predicted increase in income per capita if an explanatory variable increases by 1, holding all else constant. Apples and oranges.
Question
filter() out the intercept from our tab dataframe.
Answer
tab1 <- tab %>%
filter(term != "intercept")# Tada! The Intercept row is gone.
tab1| term | estimate | std_error | statistic | p_value | lower_ci | upper_ci |
|---|---|---|---|---|---|---|
| damage_rate | -0.071 | 0.023 | -3.068 | 0.002 | -0.117 | -0.026 |
| pop_density | 0.519 | 0.022 | 23.452 | 0.000 | 0.476 | 0.563 |
| unemployment | -0.085 | 0.019 | -4.574 | 0.000 | -0.121 | -0.048 |
| relief_rate | -0.189 | 0.023 | -8.162 | 0.000 | -0.234 | -0.143 |
| pop_women | -0.100 | 0.021 | -4.687 | 0.000 | -0.141 | -0.058 |
| pop_over_age_65 | -0.320 | 0.024 | -13.117 | 0.000 | -0.368 | -0.272 |
| year: 2012 | -0.146 | 0.068 | -2.158 | 0.031 | -0.279 | -0.013 |
| year: 2013 | 0.034 | 0.068 | 0.500 | 0.617 | -0.099 | 0.166 |
| year: 2014 | 0.092 | 0.067 | 1.366 | 0.172 | -0.040 | 0.225 |
| year: 2015 | 0.181 | 0.067 | 2.682 | 0.007 | 0.048 | 0.313 |
| year: 2016 | 0.318 | 0.067 | 4.718 | 0.000 | 0.186 | 0.451 |
| year: 2017 | 0.449 | 0.067 | 6.650 | 0.000 | 0.316 | 0.581 |
Also, we know that we can’t easily compare the units of beta coefficients. So, we should separate out our numeric and categorical predictors in this plot.
Question
filter() to just numeric variables using term, saving that as tab2. Now visualize it!
Answer
tab2 <- tab1 %>%
filter(term %in% c("damage_rate", "pop_density",
"unemployment", "relief_rate",
"pop_women", "pop_over_age_65"))tab2 %>%
ggplot(mapping = aes(x = term, y = estimate,
ymin = lower_ci, ymax = upper_ci)) +
geom_linerange() +
geom_point() +
coord_flip()Extra: How does comparing effect sizes work again? (optional)
That’s why, when we imported our data above, we rescaled all our numeric variables to be measured in Z-scores. This means that a beta coefficient now shows: as X increases by 1 standard deviation, income increases by BETA standard deviations. But categorical effects can’t be rescaled. The year 2012 (beta = -0.146) just means that compared to the baseline year (2011), we project that income changes by -0.146 standard deviations.
Task 2. Better Variable Labels
We also might want to give more presentable labels. Underscores (_) are helpful in R, but don’t look very professional in a presentation. We can use recode_factor() from the dplyr / tidyverse package for this. See the example below.
tab3 <- tab2 %>%
mutate(term = term %>% recode_factor(
# You can rearrange these rows to rearrange the order of your variables!
"pop_over_age_65" = "% Over Age 65",
"pop_women" = "% Women",
# Insert other terms here!
# FYI: ggplot displays them in the reverse order
))Question
recode_factor(), and save your data as tab3, then visualize the new data.frame! How does your plot change?
Answer
tab3 <- tab2 %>%
mutate(term = term %>% recode_factor(
# You can rearrange these rows to rearrange the order of your variables!
"pop_over_age_65" = "% Over Age 65",
"pop_women" = "% Women",
"relief_rate" = "Relief Spending",
"unemployment" = "Unemployment",
"pop_density" = "Pop Density",
# FYI: ggplot displays them in the reverse order
"damage_rate" = "Damage"))tab3 %>%
ggplot(mapping = aes(x = term, y = estimate,
ymin = lower_ci, ymax = upper_ci)) +
geom_linerange() +
geom_point() +
coord_flip()
Task 3. Aesthetic Choices
Finally, several adjustments could make a big difference for this chart. These include good axis labels, colors, themes, and, most importantly, a visual benchmark that reminds the reader where 0 (no effect) is on the chart. Our visual benchmark can be added using: geom_hline(yintercept = 0, linetype = "dashed", color = "black").
Question
Answer
tab3 %>%
ggplot(mapping = aes(x = term, y = estimate,
ymin = lower_ci, ymax = upper_ci,
color = estimate)) +
geom_linerange(size = 2) +
geom_point(size = 5) +
coord_flip() +
# Add a visual benchmark - a dashed line at zero
geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
# Add a color scheme
scale_color_viridis(option = "plasma", begin = 0.2, end = 0.8) +
# Add labels
labs(x = "Variable",
y = "Standardized Effect on Income per capita
with 95% Confidence Intervals",
color = "Beta") +
# theme
theme_bw()You’re all done! These dot-and-whisker plots are very effective at conveying regression results quickly. We can see that the effect of disaster damage on these towns’ future income per capita is quite strong, negative, and significant. We know that it is significant because the 95% confidence interval never crosses zero.