Project 1

Author

Tikki Dibonge

Introduction:

This data set lists all requests for major disaster declarations and emergency declarations that have been denied. The data set is composed of historical data that was manually entered into FEMA’s National Emergency Management Information Systems after it launched in 1998. The variables I will use are: declarationRequestDate for the year 2014, requestStatusDate for the year 2014, states in Region 1, and requestedIncidentTypes. I plan to explore what types of major disasters were declared in Region 1 during 2014 and how long the declaration status was before it was ultimately “turned down”/denied.

Source: FEMA’s National Emergency Management Information Systems: https://www.fema.gov/openfema-data-page/declaration-denials-v1

In this data set, the states are split into regions 1-10. Region 1: Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, and Vermont. Region 2: New Jersey, New York, Puerto Rico, and Virgin Islands. Region 3: Delaware, Maryland, Pennsylvania, Virginia, D.C., and West Virginia. Region 4: Alabama, Florida, Georgia, Kentucky, Mississippi, North Carolina, South Carolina, and Tennessee. Region 5: Illinois, Indiana, Michigan, Minnesota, Ohio, and Wisconsin. Region 6: Arkansas, Louisiana, New Mexico, Oklahoma, and Texas. Region 7: Iowa, Kansas, Missouri, and Nebraska. Region 8: Colorado, Montana, North Dakota, South Dakota, Utah, and Wyoming. Region 9: Arizona, California, Hawaii, Nevada, Guam, American Samoa, Commonwealth of Northern Mariana Islands, Republic of Marshall Islands, and Federated States of Micronesia. Region 10: Alaska, Idaho, Oregon, and Washington.

Source: https://www.fema.gov/about/regions

setwd("~/Downloads/DATA110")

Set working directory to be able to access files

library(readr)
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(ggplot2)
library(lubridate)


Attaching package: 'lubridate'

The following objects are masked from 'package:base':

    date, intersect, setdiff, union

Have R load the necessary libraries

I included the lubridate command ( https://cran.r-project.org/web/packages/lubridate/index.html ) to make cleaning up and using the multiple dates in the data set easier for me.

data <- read_csv("DeclarationDenials.csv")

Rows: 1277 Columns: 20
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (7): stateAbbreviation, state, declarationRequestType, incidentName, re...
dbl  (8): declarationRequestNumber, region, tribalRequest, ihProgramRequeste...
dttm (5): declarationRequestDate, requestedIncidentBeginDate, requestedIncid...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Load my data set using the correct command

data <- data |>
  mutate(
    declarationRequestDate = as.Date(declarationRequestDate),
      requestStatusDate = as.Date(requestStatusDate)
  )

Converting the date columns to an easier format

region1_states <- c("CT", "ME", "MA", "NH", "RI", "VT")

region1_2014 <- data |>
  filter(
    stateAbbreviation %in% region1_states,
    year(declarationRequestDate) == 2014,
    year(requestStatusDate) == 2014
  )

Filtering for only Region 1 states in only the year 2014

Use of %in% after only using ‘in’ and getting error: https://www.statology.org/in-operator-in-r/

region1_2014 <- region1_2014 |>
  mutate(days_open = as.numeric(requestStatusDate - declarationRequestDate))

Creating a new variable to calculate the number of days between the request date and request status date

head(region1_2014$days_open)

[1]  3 19 57 19

Able to now view how many days declaration was open in 2014 for region 1

Linear Regression Model

region1_all <- data |>
  filter(stateAbbreviation %in% c("CT", "ME", "MA", "NH", "RI", "VT"))

Changed to use all the years not just 2014 for linear regression model so LRM isn’t estimating

region1_all <- region1_all |>
  mutate(days_open = as.numeric(requestStatusDate - declarationRequestDate))

model <- lm(days_open ~ stateAbbreviation + requestedIncidentTypes, data = region1_all)

summary(model)


Call:
lm(formula = days_open ~ stateAbbreviation + requestedIncidentTypes, 
    data = region1_all)

Residuals:
    Min      1Q  Median      3Q     Max 
-98.163 -18.689  -1.365   9.476 126.837 

Coefficients:
                                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)                             11.7302    39.6762   0.296 0.768554    
stateAbbreviationMA                     -4.2672    15.3183  -0.279 0.781565    
stateAbbreviationME                    -11.7302    13.9802  -0.839 0.404879    
stateAbbreviationNH                     61.5506    17.5165   3.514 0.000863 ***
stateAbbreviationRI                     -0.5653    15.2893  -0.037 0.970633    
stateAbbreviationVT                      8.6808    18.7857   0.462 0.645742    
requestedIncidentTypesDrought          101.0000    52.5121   1.923 0.059347 .  
requestedIncidentTypesFire               5.5396    40.6810   0.136 0.892157    
requestedIncidentTypesFishing Losses    53.7685    42.1887   1.274 0.207578    
requestedIncidentTypesFlood              7.5242    38.5718   0.195 0.846021    
requestedIncidentTypesHuman Cause       -7.4630    54.6194  -0.137 0.891791    
requestedIncidentTypesHurricane         67.1860    47.3755   1.418 0.161495    
requestedIncidentTypesOther            -18.2808    48.6237  -0.376 0.708314    
requestedIncidentTypesSevere Ice Storm  17.6349    46.0109   0.383 0.702916    
requestedIncidentTypesSevere Storm      20.1733    39.8247   0.507 0.614390    
requestedIncidentTypesSnowstorm         24.8821    40.1419   0.620 0.537782    
requestedIncidentTypesTornado           37.2698    54.3412   0.686 0.495542    
requestedIncidentTypesToxic Substances  14.1860    47.3755   0.299 0.765676    
requestedIncidentTypesTsunami          -12.4110    55.6964  -0.223 0.824448    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 37.13 on 58 degrees of freedom
Multiple R-squared:  0.3791,    Adjusted R-squared:  0.1864 
F-statistic: 1.967 on 18 and 58 DF,  p-value: 0.02723

ggplot(region1_2014, aes(x = requestedIncidentTypes, y = days_open, fill = requestedIncidentTypes)) +
  geom_boxplot() +
  scale_fill_manual(values = c("#F5279C", "#27EEF5", "#CEFA2D")) +
  labs(title = "Declaration Days Open by Incident Types (Region 1 in 2014)",
       x = "Requested Incident Types",
       y = "Days Between Request and Status",
       caption = "Data Source: FEMA's National Emergency Management Information Systems") + 
  theme_minimal(base_size = 13) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "right")

Boxplot visualization to show Declaration Days Open by Incident Types (Region 1 in 2014)

Essay:

To clean up the data set I used the lubridate package because I wanted to be able to calculate the time span of when a declaration started (declarationRequestDate) and ended (requestStatusDate) for my visualization. I then used the ‘mutate’ and ‘as.Date’ commands to modify the date variables by converting the text date columns to easier date formats for R to identify and use. After that I filtered out only the data I wanted to use by creating a new data set for the specific states in region 1 in the year 2014 (region1_2014), and filtering only the start declaration and end declaration dates in that new data set. I also used the mutate command to create the ‘days_open’ variable to calculate the difference between the declaration start and end date and convert it using the ‘as.numeric’ command into the number of days.

What the visualization represents is the response time to certain major disaster declarations. Although the visualization is based on specific states only in the year of 2014, it still shows that between severe storms, severe ice storms, and flood incidents, flood incidents are deemed less important. It could be that FEMA needs more time to determine if the incident/damages are valid and not just personal water damage, but that could still play into them not being as important. The visualization shows that next are severe ice storms. This was interesting because the states in region 1 are way up north. Not as much as region 10 and some states in region 5, but I would think ice storms are a little more common and therefore response time would be quicker. Even though there were only 4 observations in this data set of specifically declaration denials, it was still interesting to see. What I wished I could have included would honestly be all the declaration denials of all incident types from the years of 2010-2023. That would make my visualization a lot richer and more interesting but on the other hand, I think that much data would make any kind of data visualization very clustered and not allow me to give good insights on the time span of different incident type declarations. I also think it would take a lot more time and would eventually lead to a very rushed and confusing data visualization.