setwd("~/Downloads/DATA110")Project 1
Introduction:
This data set lists all requests for major disaster declarations and emergency declarations that have been denied. The data set is composed of historical data that was manually entered into FEMA’s National Emergency Management Information Systems after it launched in 1998. The variables I will use are: declarationRequestDate for the year 2014, requestStatusDate for the year 2014, states in Region 1, and requestedIncidentTypes. I plan to explore what types of major disasters were declared in Region 1 during 2014 and how long the declaration status was before it was ultimately “turned down”/denied.
Source: FEMA’s National Emergency Management Information Systems: https://www.fema.gov/openfema-data-page/declaration-denials-v1
In this data set, the states are split into regions 1-10. Region 1: Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont. Region 2: New Jersey, New York, Puerto Rico, and Virgin Islands. Region 3: Delaware, Maryland, Pennsylvania, Virginia, D.C., and West Virginia. Region 4: Alabama, Florida, Georgia, Kentucky, Mississippi, North Carolina, South Carolina, and Tennessee. Region 5: Illinois, Indiana, Michigan, Minnesota, Ohio, and Wisconsin. Region 6: Arkansas, Louisiana, New Mexico, Oklahoma, and Texas. Region 7: Iowa, Kansas, Missouri, and Nebraska. Region 8: Colorado, Montana, North Dakota, South Dakota, Utah, and Wyoming. Region 9: Arizona, California, Hawaii, Nevada, Guam, American Samoa, Commonwealth of Northern Mariana Islands, Republic of Marshall Islands, and Federated States of Micronesia. Region 10: Alaska, Idaho, Oregon, and Washington.
Source: https://www.fema.gov/about/regions
Set working directory to be able to access files
library(readr)
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(ggplot2)
library(lubridate)
Attaching package: 'lubridate'
The following objects are masked from 'package:base':
date, intersect, setdiff, union
Have R load the necessary libraries
I included the lubridate command ( https://cran.r-project.org/web/packages/lubridate/index.html ) to make cleaning up and using the multiple dates in the data set easier for me.
data <- read_csv("DeclarationDenials.csv")Rows: 1277 Columns: 20
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): stateAbbreviation, state, declarationRequestType, incidentName, re...
dbl (8): declarationRequestNumber, region, tribalRequest, ihProgramRequeste...
dttm (5): declarationRequestDate, requestedIncidentBeginDate, requestedIncid...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Load my data set using the correct command
data <- data |>
mutate(
declarationRequestDate = as.Date(declarationRequestDate),
requestStatusDate = as.Date(requestStatusDate)
)Converting the date columns to an easier format
region1_states <- c("CT", "ME", "MA", "NH", "RI", "VT")
region1_2014 <- data |>
filter(
stateAbbreviation %in% region1_states,
year(declarationRequestDate) == 2014,
year(requestStatusDate) == 2014
)Filtering for only Region 1 states in only the year 2014
Use of %in% after only using ‘in’ and getting error: https://www.statology.org/in-operator-in-r/
region1_2014 <- region1_2014 |>
mutate(days_open = as.numeric(requestStatusDate - declarationRequestDate))Creating a new variable to calculate the number of days between the request date and request status date
head(region1_2014$days_open)[1] 3 19 57 19
Able to now view how many days declaration was open in 2014 for region 1
Linear Regression Model
region1_all <- data |>
filter(stateAbbreviation %in% c("CT", "ME", "MA", "NH", "RI", "VT"))Changed to use all the years not just 2014 for linear regression model so LRM isn’t estimating
region1_all <- region1_all |>
mutate(days_open = as.numeric(requestStatusDate - declarationRequestDate))
model <- lm(days_open ~ stateAbbreviation + requestedIncidentTypes, data = region1_all)
summary(model)
Call:
lm(formula = days_open ~ stateAbbreviation + requestedIncidentTypes,
data = region1_all)
Residuals:
Min 1Q Median 3Q Max
-98.163 -18.689 -1.365 9.476 126.837
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.7302 39.6762 0.296 0.768554
stateAbbreviationMA -4.2672 15.3183 -0.279 0.781565
stateAbbreviationME -11.7302 13.9802 -0.839 0.404879
stateAbbreviationNH 61.5506 17.5165 3.514 0.000863 ***
stateAbbreviationRI -0.5653 15.2893 -0.037 0.970633
stateAbbreviationVT 8.6808 18.7857 0.462 0.645742
requestedIncidentTypesDrought 101.0000 52.5121 1.923 0.059347 .
requestedIncidentTypesFire 5.5396 40.6810 0.136 0.892157
requestedIncidentTypesFishing Losses 53.7685 42.1887 1.274 0.207578
requestedIncidentTypesFlood 7.5242 38.5718 0.195 0.846021
requestedIncidentTypesHuman Cause -7.4630 54.6194 -0.137 0.891791
requestedIncidentTypesHurricane 67.1860 47.3755 1.418 0.161495
requestedIncidentTypesOther -18.2808 48.6237 -0.376 0.708314
requestedIncidentTypesSevere Ice Storm 17.6349 46.0109 0.383 0.702916
requestedIncidentTypesSevere Storm 20.1733 39.8247 0.507 0.614390
requestedIncidentTypesSnowstorm 24.8821 40.1419 0.620 0.537782
requestedIncidentTypesTornado 37.2698 54.3412 0.686 0.495542
requestedIncidentTypesToxic Substances 14.1860 47.3755 0.299 0.765676
requestedIncidentTypesTsunami -12.4110 55.6964 -0.223 0.824448
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 37.13 on 58 degrees of freedom
Multiple R-squared: 0.3791, Adjusted R-squared: 0.1864
F-statistic: 1.967 on 18 and 58 DF, p-value: 0.02723
ggplot(region1_2014, aes(x = requestedIncidentTypes, y = days_open, fill = requestedIncidentTypes)) +
geom_boxplot() +
scale_fill_manual(values = c("#F5279C", "#27EEF5", "#CEFA2D")) +
labs(title = "Declaration Days Open by Incident Types (Region 1 in 2014)",
x = "Requested Incident Types",
y = "Days Between Request and Status",
caption = "Data Source: FEMA's National Emergency Management Information Systems") +
theme_minimal(base_size = 13) +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "right")Boxplot visualization to show Declaration Days Open by Incident Types (Region 1 in 2014)
Essay:
To clean up the data set I used the lubridate package because I wanted to be able to calculate the time span of when a declaration started (declarationRequestDate) and ended (requestStatusDate) for my visualization. I then used the ‘mutate’ and ‘as.Date’ commands to modify the date variables by converting the text date columns to easier date formats for R to identify and use. After that I filtered out only the data I wanted to use by creating a new data set for the specific states in region 1 in the year 2014 (region1_2014), and filtering only the start declaration and end declaration dates in that new data set. I also used the mutate command to create the ‘days_open’ variable to calculate the difference between the declaration start and end date and convert it using the ‘as.numeric’ command into the number of days.
What the visualization represents is the response time to certain major disaster declarations. Although the visualization is based on specific states only in the year of 2014, it still shows that between severe storms, severe ice storms, and flood incidents, flood incidents are deemed less important. It could be that FEMA needs more time to determine if the incident/damages are valid and not just personal water damage, but that could still play into them not being as important. The visualization shows that next are severe ice storms. This was interesting because the states in region 1 are way up north. Not as much as region 10 and some states in region 5, but I would think ice storms are a little more common and therefore response time would be quicker. Even though there were only 4 observations in this data set of specifically declaration denials, it was still interesting to see. What I wished I could have included would honestly be all the declaration denials of all incident types from the years of 2010-2023. That would make my visualization a lot richer and more interesting but on the other hand, I think that much data would make any kind of data visualization very clustered and not allow me to give good insights on the time span of different incident type declarations. I also think it would take a lot more time and would eventually lead to a very rushed and confusing data visualization.