Module 9: Multiple Regression

Lab 9: Visualizing Economic Effects of Disasters

Click here to run the code for this lab yourself on RStudio.Cloud!

This lab examines 96 disaster affected Japanese municipalities over 7 years, from 2011 to 2017 (jp_matching_experiment.csv), totaling 672 city-year observations. This includes 53 coastal municipalities hit by the 2011 tsunami and 43 municipalities, just next door, damaged by the earthquake. How might this disaster have shaped these local economies? Let’s load in our data and get started.

0. Import Data

Load Data

In this dataset, each row is a city-year!

# Load Packages
library(tidyverse) # for data manipulation
library(viridis) # for colors
library(moderndive) # for easy model summaries

# Load Data
muni <- read_csv("jp_disaster_hit_cities.csv") %>%
  # Tell R to treat year and pref as ordered categories
  mutate(year = factor(year)) %>%
  # Rescale our continuous predictors of interest into Z-scores
  mutate(
    income_per_capita = scale(income_per_capita),
    damage_rate = scale(damage_rate),
    pop_density = scale(pop_density),
    unemployment = scale(unemployment),
    relief_rate = scale(relief_rate),
    pop_women = scale(pop_women), 
    pop_over_age_65 = scale(pop_over_age_65))

View Data

# View first 3 rows of dataset
muni %>% head(3)

muni_code	muni	year	income_per_capita	damage_rate	pop_density	unemployment	relief_rate	pop_women	pop_over_age_65
04100	Sendai	2011	0.57	0.56	1.01	0.32	0.12	0.51	-1.59
04202	Ishinomaki	2011	-0.39	2.40	-0.31	0.85	0.20	0.41	0.06
04203	Shiogama	2011	-0.17	0.53	1.27	1.59	-0.24	1.63	0.26

Codebook

In this dataset, our variables mean:

muni: municipality name.
year: year of observation.

Outcome Variable

income_per_capita: income per capita in millions of yen per capita. Measure of economic recovery.

Explanatory Variable

damage_rate: rate of buildings damaged or destroyed by earthquake and tsunami, per million residents.

Control Variables

pop_density: population in 1000s of residents per square kilometer
unemployment: unemployment rate per 1,000 residents.
relief_rate: spending on disaster relief in 1000s of yen per capita.
pop_women: % residents who are women
pop_over_age_65: % residents over age 65

Note: Our numeric variables have all been rescaled, and are written in units of standard deviations from the mean (Z-scores).

In this lab, you’re going to make a model m that predicts income_per_capita for each disaster-hit municipality over time. Please test the effect of damage_rate, controlling for pop_density, unemployment, pop_women, pop_over_age_65, relief_rate, and year. Then, use get_regression_table() and save the output data.frame as tab.

# Run a regression model
m <- muni %>%
  lm(formula = income_per_capita ~ damage_rate + pop_density + 
       unemployment + relief_rate + pop_women + pop_over_age_65 +
       year)

# Get regression table results
tab <- m %>%
  get_regression_table()

# View Table
tab

term	estimate	std_error	statistic	p_value	lower_ci	upper_ci
intercept	-0.133	0.048	-2.777	0.006	-0.226	-0.039
damage_rate	-0.071	0.023	-3.068	0.002	-0.117	-0.026
pop_density	0.519	0.022	23.452	0.000	0.476	0.563
unemployment	-0.085	0.019	-4.574	0.000	-0.121	-0.048
relief_rate	-0.189	0.023	-8.162	0.000	-0.234	-0.143
pop_women	-0.100	0.021	-4.687	0.000	-0.141	-0.058
pop_over_age_65	-0.320	0.024	-13.117	0.000	-0.368	-0.272
year: 2012	-0.146	0.068	-2.158	0.031	-0.279	-0.013
year: 2013	0.034	0.068	0.500	0.617	-0.099	0.166
year: 2014	0.092	0.067	1.366	0.172	-0.040	0.225
year: 2015	0.181	0.067	2.682	0.007	0.048	0.313
year: 2016	0.318	0.067	4.718	0.000	0.186	0.451
year: 2017	0.449	0.067	6.650	0.000	0.316	0.581

We can use the lower_ci and upper_ci in moderndive’s get_regression_table() function to build a 95% confidence interval around our estimates. In other words, the range where we are 95% sure the true effect (beta) of that variable lies.

tab %>%
  ggplot(mapping = aes(x = term, y = estimate, 
                       ymin = lower_ci, ymax = upper_ci)) +
  geom_linerange() +
  geom_point() +
  # Usually need to flip x and y axes for these plots,
  # otherwise they're pretty rough.
  coord_flip()

It’s pretty cool, but I think we can improve it! Upgrades needed:

Filtering
Better Variable Labels
Aesthetic Choices

Task 1. Filtering

The intercept has a different meaning than our beta coefficients. The intercept is the predicted income per capita if all our explanatory variables equal 0. But beta coefficients are the predicted increase in income per capita if an explanatory variable increases by 1, holding all else constant. Apples and oranges.

Question

Let’s filter() out the intercept from our tab dataframe.

Answer

tab1 <- tab %>%
  filter(term != "intercept")

# Tada! The Intercept row is gone.
tab1

term	estimate	std_error	statistic	p_value	lower_ci	upper_ci
damage_rate	-0.071	0.023	-3.068	0.002	-0.117	-0.026
pop_density	0.519	0.022	23.452	0.000	0.476	0.563
unemployment	-0.085	0.019	-4.574	0.000	-0.121	-0.048
relief_rate	-0.189	0.023	-8.162	0.000	-0.234	-0.143
pop_women	-0.100	0.021	-4.687	0.000	-0.141	-0.058
pop_over_age_65	-0.320	0.024	-13.117	0.000	-0.368	-0.272
year: 2012	-0.146	0.068	-2.158	0.031	-0.279	-0.013
year: 2013	0.034	0.068	0.500	0.617	-0.099	0.166
year: 2014	0.092	0.067	1.366	0.172	-0.040	0.225
year: 2015	0.181	0.067	2.682	0.007	0.048	0.313
year: 2016	0.318	0.067	4.718	0.000	0.186	0.451
year: 2017	0.449	0.067	6.650	0.000	0.316	0.581

Also, we know that we can’t easily compare the units of beta coefficients. So, we should separate out our numeric and categorical predictors in this plot.

Question

filter() to just numeric variables using term, saving that as tab2. Now visualize it!

Answer

tab2 <- tab1 %>%
  filter(term %in% c("damage_rate", "pop_density",
                     "unemployment", "relief_rate", 
                     "pop_women", "pop_over_age_65"))

tab2 %>%
  ggplot(mapping = aes(x = term, y = estimate, 
                       ymin = lower_ci, ymax = upper_ci)) +
  geom_linerange() +
  geom_point() +
  coord_flip()

Extra: How does comparing effect sizes work again? (optional)

That’s why, when we imported our data above, we rescaled all our numeric variables to be measured in Z-scores. This means that a beta coefficient now shows: as X increases by 1 standard deviation, income increases by BETA standard deviations. But categorical effects can’t be rescaled. The year 2012 (beta = -0.146) just means that compared to the baseline year (2011), we project that income changes by -0.146 standard deviations.

Task 2. Better Variable Labels

We also might want to give more presentable labels. Underscores (_) are helpful in R, but don’t look very professional in a presentation. We can use recode_factor() from the dplyr / tidyverse package for this. See the example below.

tab3 <- tab2 %>%
  mutate(term = term %>% recode_factor(
     # You can rearrange these rows to rearrange the order of your variables!
    "pop_over_age_65" = "% Over Age 65",
    "pop_women" = "% Women",
    # Insert other terms here!
    # FYI: ggplot displays them in the reverse order
    ))

Question

Rename the variable labels using recode_factor(), and save your data as tab3, then visualize the new data.frame! How does your plot change?

Answer

tab3 <- tab2 %>%
  mutate(term = term %>% recode_factor(
    # You can rearrange these rows to rearrange the order of your variables!
    "pop_over_age_65" = "% Over Age 65",
    "pop_women" = "% Women",
    "relief_rate" = "Relief Spending",
    "unemployment" = "Unemployment",
    "pop_density" = "Pop Density",
    # FYI: ggplot displays them in the reverse order
    "damage_rate" = "Damage"))

tab3 %>%
  ggplot(mapping = aes(x = term, y = estimate, 
                       ymin = lower_ci, ymax = upper_ci)) +
  geom_linerange() +
  geom_point() +
  coord_flip()

Task 3. Aesthetic Choices

Finally, several adjustments could make a big difference for this chart. These include good axis labels, colors, themes, and, most importantly, a visual benchmark that reminds the reader where 0 (no effect) is on the chart. Our visual benchmark can be added using: geom_hline(yintercept = 0, linetype = "dashed", color = "black").

Question

Can you remake the visual below?

Answer

tab3 %>%
  ggplot(mapping = aes(x = term, y = estimate, 
                       ymin = lower_ci, ymax = upper_ci,
                       color = estimate)) +
  geom_linerange(size = 2) +
  geom_point(size = 5) +
  coord_flip() +
  # Add a visual benchmark - a dashed line at zero
  geom_hline(yintercept = 0, linetype = "dashed", color = "black") +
  # Add a color scheme
  scale_color_viridis(option = "plasma", begin = 0.2, end = 0.8) +
  # Add labels
  labs(x = "Variable",
       y = "Standardized Effect on Income per capita
with 95% Confidence Intervals",
       color = "Beta") +
  # theme
  theme_bw()

You’re all done! These dot-and-whisker plots are very effective at conveying regression results quickly. We can see that the effect of disaster damage on these towns’ future income per capita is quite strong, negative, and significant. We know that it is significant because the 95% confidence interval never crosses zero.