This document summarizes steps and code for collaborating on our analysis of disaster recovery committees in New York City neighborhoods after Hurricane Sandy. Please follow along below to get a sense of the dataset and analysis strategies!
0. Setup
0.1. Load Packages
You’ll need the following packages:
library(tidyverse) # data wrangling
library(broom) # for data wrangling
library(moderndive) # for familiar functions
library(viridis) # color palletes
library(GGally) # correlation matrices
library(texreg) # making tables
library(lmtest) # for hypothesis testing
library(simulate) # for simulation
0.2. Load Dataset
You’ll be working with this dataset
(raw_data/co_dataset.rds). This is a dataset of disaster
recovery committees in New York City neighborhoods, where each row
represents a committee active after Hurricane Sandy.
# Import data
codat <- read_rds("raw_data/co_dataset.rds") %>%
select(id:health_care)
# Let's check out its contents
codat %>% glimpse()
## Rows: 47
## Columns: 23
## $ id <dbl> 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15…
## $ committee <chr> "Gravesend and Bensonhurst", "Broad Channel", …
## $ region <chr> "New York City", "New York City", "New York Ci…
## $ ihard <dbl> 0.07531114, 0.05871634, 0.08314851, 0.09293561…
## $ isoft <dbl> 0.03486258, 0.03203687, 0.04511576, 0.04141895…
## $ isc <dbl> 0.008408815, 0.009475107, 0.009246055, 0.00767…
## $ iv <dbl> 0.02481181, 0.02483757, 0.02189349, 0.02353204…
## $ ih <dbl> 0.03468467, 0.03828717, 0.04234415, 0.02768607…
## $ ie <dbl> 0.02583709, 0.01693471, 0.01791286, 0.02460168…
## $ ilocal <dbl> 0.011079070, 0.005563093, 0.007801588, 0.01120…
## $ inonlocal <dbl> 0.002919036, 0.004145049, 0.004208752, 0.00429…
## $ idis <dbl> 0.03134647, 0.04884068, 0.04285741, 0.02614388…
## $ women <dbl> 33.33333, 18.18182, 33.33333, 50.00000, 24.137…
## $ business <dbl> 22.22222, 36.36364, 11.11111, 0.00000, 20.6896…
## $ social_org <dbl> 66.666667, 36.363636, 55.555556, 100.000000, 6…
## $ religious_org <dbl> 0.000000, 27.272727, 11.111111, 30.000000, 3.4…
## $ community_participation <dbl> 11.111111, 9.090909, 11.111111, 10.000000, 3.4…
## $ govt <dbl> 0.000000, 0.000000, 0.000000, 0.000000, 6.8965…
## $ emergency <dbl> 0.000000, 18.181818, 11.111111, 10.000000, 3.4…
## $ expert <dbl> 11.111111, 0.000000, 11.111111, 10.000000, 3.4…
## $ influential_citizen <dbl> 22.222222, 18.181818, 0.000000, 10.000000, 3.4…
## $ elected_official <dbl> 0.000000, 0.000000, 0.000000, 0.000000, 0.0000…
## $ health_care <dbl> 0.000000, 0.000000, 0.000000, 10.000000, 6.896…
0.3 Codebook
What do these variables mean?
id: unique id number per committeecommittee: name of neighborhoods representedregion: region of New York represented
Below are a list of weighted word frequency indices, describing the % (0 to 1) of total words in each committee’s recovery plan referencing each concept.
ihard: % about ‘hard’ policy tools (infrastructure), like bridges, seawalls, potholes, etc.isoft: % about ‘soft’ policy tools (community development), like social assistance.isc: % about social capital concepts, like trust, neighbors, friends, etc.iv: % about social vulnerability, like poverty, inequity, etc.ih: % about housing policy, like rentie: % about economic policy, likelocal: % about local actors, like neighborhoods, city council, etc.inonlocal: % about nonlocal actors, like state representatives, mayor, etc.idis: % about academic disaster resilience concepts, signifying expertise (eg. resilience, recovery, mitigation, adaptation)
Below are a list of traits that might affect the frequency of concepts in these recovery plans.
women: % of committee members who are womenbusiness: % of committee members representing businessessocial_org: % of members representing social organizations (nonprofits, community groups, neighborhood associations, etc).religious_org: % of members representing religious groups.community_participation: % of members representing social OR religious groups.govt: % of members representing local, state, or federal government.expert: % of members on committee for their expertise in an area (eg. engineering, construction, or academics).influential_citizen: % of members who are otherwise an influential local citizen, like an author, activist, community organizer, etc.elected_official: % of members on committee who are elected officials, at any level of government.health_care: % of members on committee representing health care systems, clinics, etc.
0.4. Summary of Tasks
In this project, we want to do 3 things:
describe types of recovery plans that committees wrote (eg. how much each was oriented towards soft, hard, social capital, vulnerability-focused policy, etc.).
describe membership patterns on committees (eg. whose interests were most frequently represented by membership?).
describe the relationship between membership patterns and types of recovery plans committees developed.
We will use the following resources to do so:
- RStudio Cloud Project (sandy)
- Overleaf Manuscript (sandy)
- Google Sheets (edgelist_sandy_2022)
To get you started, I suggest the following steps:
1. Descriptives
# Import data
codat <- read_rds("raw_data/co_dataset.rds")
codat %>%
select(`Hard` = ihard,
`Soft` = isoft,
`Social Capital` = isc,
`Vulnerability` = iv,
`Housing` = ih,
`Economy` = ie,
`Local` = ilocal,
`Non-Local` = inonlocal,
`Disaster Expert` = idis) %>%
ggcorr(low = "red", mid = "white", high = "blue", label = TRUE)
2. Models
For example, try out the following. Try to add as many variables as possible to the model.
library(tidyverse)
library(broom)
library(moderndive)
library(GGally)
# Import data
codat <- read_rds("raw_data/co_dataset.rds")
# How much MORE do they write about soft/other policy than hard policy?
m1 <- codat %>%
lm(formula = isoft - ihard ~
business + community_participation + women + govt)
m2 <- codat %>%
lm(formula = isc - ihard ~
business + community_participation + women + govt)
m3 <- codat %>%
lm(formula = iv - ihard ~
business + community_participation + women + govt)
m4 <- codat %>%
lm(formula = isoft + isc - ihard ~
business + community_participation + women + govt)
m5 <- codat %>%
lm(formula = isoft + isc + iv - ihard ~
business + community_participation + women + govt)
texreg::screenreg(list(m1,m2,m3,m4,m5),
stars = c(0.001, 0.01, 0.05, 0.10))
##
## =============================================================================
## Model 1 Model 2 Model 3 Model 4 Model 5
## -----------------------------------------------------------------------------
## (Intercept) -0.04 *** -0.06 *** -0.05 *** -0.03 *** -0.01
## (0.01) (0.01) (0.01) (0.01) (0.01)
## business 0.00 . 0.00 -0.00 0.00 . 0.00
## (0.00) (0.00) (0.00) (0.00) (0.00)
## community_participation 0.00 -0.00 0.00 0.00 0.00
## (0.00) (0.00) (0.00) (0.00) (0.00)
## women 0.00 -0.00 -0.00 0.00 0.00
## (0.00) (0.00) (0.00) (0.00) (0.00)
## govt -0.00 ** -0.00 ** -0.00 * -0.00 ** -0.00 **
## (0.00) (0.00) (0.00) (0.00) (0.00)
## -----------------------------------------------------------------------------
## R^2 0.34 0.19 0.14 0.34 0.26
## Adj. R^2 0.27 0.12 0.05 0.28 0.19
## Num. obs. 47 47 47 47 47
## =============================================================================
## *** p < 0.001; ** p < 0.01; * p < 0.05; . p < 0.1
remove(m1,m2,m3,m4,m5)
Then, try adding more terms. Does adding a predictor / using a different set of predictors significantly improve the model likelihood?
m1 <- codat %>%
lm(formula = isoft - ihard ~
business + social_org + religious_org + women + govt)
# Simplify to community participation
m2 <- codat %>%
lm(formula = isoft - ihard ~
business + community_participation + women + govt)
# Unclear. But the lrtest function may help guide you when choosing what to include or not include!
lrtest(m1,m2)
## Likelihood ratio test
##
## Model 1: isoft - ihard ~ business + social_org + religious_org + women +
## govt
## Model 2: isoft - ihard ~ business + community_participation + women +
## govt
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 7 150.96
## 2 6 150.86 -1 0.1884 0.6642