Reproducible Research Course Project - Descriptive analysis on damages and harmfulness of weather events recorded in the US between 1950 and 2011

Synopsis

This project analyses U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database between 1950 and 2011 to classify and describe extreme weather events and its economic damages and levels of harmfulness to population health expressed in casualties and injuries. Classification of events was tidied so as to match NOAA guidelines. Costs were adjusted by cpi inflation. Most damaging and harmful events per state are shown, and a small function providing most damaging and harmful events per state is presented for further use. Overall, the events that have caused most economic damages in the period considered, after adjusting for inflation were floods, and most harmful to population health are tornadoes (it is to note, however that records prior to 1993 mostly include tornadoes). If only events after full use of categories is used, from 1993 to 2011, most damaging events remain floods but most harmful were those related to excessive heat when it comes to casualties, followed by tornadoes.

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events:

Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?

The report is written as if it were to be read by a government or municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events. Regarding that, a function to extract most harmful events per state is provided

Data Processing

Data is downloaded from provided URLs EVTYPE variable was tidied in an iterative way, which is described here step by step, as plain code does not really provide full insight of the tidying proccess.

Common spelling mistakes, abbreviations and numbers were cleaned up.
A standardized table of events was created.
EVTYPE was joined with the standardized table of events
People affected and Economic damages were calculated
The weight of miscoded events was calculated
“great events” objects were created, examined, classified, and its EVTYPE then standarized adding the proper expression in step 1)
storm_data was reloaded from backup object and tidying was iterated

These steps greatly reduced dispersion on EVTYPE, but didn’t provide full standard classification for each and every event. I believe steps taken are sufficient, as more than 99% of both harm and damage is represented by an standardized event. On a side note, if I had to thoroughly classify each and every event in a standardized way I would first try to understand why NOAA does not collect them in a closed options way, and as it seems that correction at the time of collection is not possible, I would keep a thorough and open encoding table, where I would classify each and every event against its standard NOAA equivalent.

When it comes to population health, I opted to order harmfulness first by number of casualties, and then in terms of injured people. That means, for instance, that an event with one casualty and zero injuries is deemed more harmful than one with zero casualties and one thousand injuries.

Regarding damage, I corrected the figures using information provided and quoted in the code, and made adjustments for inflation, downloading the US CPI index

Finally, I created “most harmful” and “most damage” objects, grouping by EVTYPE, objects for representing maps and a small function to provided information classified by state and year.

#load libraries
library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.8
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

#clean environment
rm(list = ls())

#create data folder
if(!dir.exists("./data")) {dir.create("./data")}

#download and extract zip
if(!file.exists("./data/repdata_data_StormData.csv.bz2")) {
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                destfile = "./data/repdata_data_StormData.csv.bz2",
                method = "curl")}

#read raw data
storm_data <- read_csv("./data/repdata_data_StormData.csv.bz2")

## Rows: 902297 Columns: 37

## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (18): BGN_DATE, BGN_TIME, TIME_ZONE, COUNTYNAME, STATE, EVTYPE, BGN_AZI,...
## dbl (18): STATE__, COUNTY, BGN_RANGE, COUNTY_END, END_RANGE, LENGTH, WIDTH, ...
## lgl  (1): COUNTYENDN
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

#keep a copy for failures during analysis
storm_data_bkp <- storm_data

storm_data <- storm_data_bkp

#convert dates into appropiate format
storm_data <- storm_data %>% 
  mutate(BGN_DATE =  as.Date(mdy_hms(as.character(storm_data$BGN_DATE))))

#preliminary exploration
#cleaning EVTYPE, preliminary simplifying case, strange characters, common abreviations and joining with EVTYPE
summary_by_event <- storm_data %>% 
  group_by(EVTYPE) %>% 
  summarise(
    n = n()
  ) %>% 
  arrange(desc(n))


summary_by_event <- summary_by_event %>% 
  mutate(
    EVTYPE_tidy = str_to_upper(EVTYPE)) 

summary_by_event <- summary_by_event %>% 
  mutate(
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "TSTM", "THUNDERSTORM"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "WINDS", "WIND"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "URBAN/SML STREAM FLD", "FLASH FLOOD"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "FLOODING", "FLOOD"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, " FLD", " FLOOD"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "FLOODS", "FLOOD"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "WILD/FOREST", "WILD"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "FIRES", "FIRE"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "MPH", ""),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "SEVERE", ""),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "ASTRONOMICAL HIGH TIDE", "HIGH SURF"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "ABNORMAL", "")) %>% 
  mutate(
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "1", ""),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "2", ""),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "3", ""),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "4", ""),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "5", ""),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "6", ""),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "7", ""),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "8", ""),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "9", ""),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "0", ""),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "\\.", ""),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "\\(", ""),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "\\)", ""),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "\\-", ""),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "\\:", "")) %>% 
  mutate(
    EVTYPE_tidy = str_squish(EVTYPE_tidy))%>%
  mutate(
    EVTYPE_tidy = str_trim(EVTYPE_tidy, "both"))

#The contents for this mutate comes from reviewing events with great harm and or damage and correcting more in detail
summary_by_event <- summary_by_event %>% 
  mutate(
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "WILD FIRE", "WILDFIRE"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "^TYPHOON", "HURRICANE/TYPHOON"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "^STORM SURGE/TIDE", "STORM TIDE"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "^STORM SURGE", "STORM TIDE"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, " OPAL", ""),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, " ERIN", ""),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "RIVER FLOOD","FLOOD"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "HURRICANE$", "HURRICANE/TYPHOON"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "HEAVY RAIN/ WEATHER", "HEAVY RAIN"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "EXTREME COLD$", "EXTREME COLD/WIND CHILL"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "FLASH FLOOD/FLOOD", "FLOOD"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "THUNDERSTORM$", "THUNDERSTORM WIND"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "TORNADOES, THUNDERSTORM WIND, HAIL", "THUNDERSTORM WIND"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "FLOOD/FLASH FLOOD", "FLOOD"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "HEAT WAVE", "EXCESSIVE HEAT"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "^FOG$", "DENSE FOG"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "EXTREME HEAT", "EXCESSIVE HEAT"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "CURRENTS", "CURRENT"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, " GORDON", ""),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "HEAVY SURF/HIGH SURF", "HIGH SURF"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "HEAVY SURF", "HIGH SURF"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "THUNDERSTORM WIND/HAIL", "THUNDERSTORM WIND"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "GLAZE", "FREEZING FOG"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "LANDSLIDE", "DEBRIS FLOW"),
    EVTYPE_tidy = str_replace_all(EVTYPE_tidy, "^WIND$", "HIGH WIND")
        ) %>% 
  mutate(
    EVTYPE_tidy = str_squish(EVTYPE_tidy))%>%
  mutate(
    EVTYPE_tidy = str_trim(EVTYPE_tidy, "both"))

    

summary_by_event_tidy <- summary_by_event %>% 
  group_by(EVTYPE_tidy) %>% 
  summarise(
    n = sum(n)
)


#create standard evtype table  
evtype <- tibble(
  ev_id = c("7.1","7.2","7.3","7.4","7.5","7.6","7.7","7.8","7.9","7.10","7.11","7.12","7.13","7.14","7.15","7.16","7.17","7.18","7.19","7.20","7.21","7.22","7.23","7.24","7.25","7.26","7.27","7.28","7.29","7.30","7.31","7.32","7.33","7.34","7.35","7.36","7.37","7.38","7.39","7.40","7.41","7.42","7.43","7.44","7.45","7.46","7.47","7.48"),
  ev_name = str_to_upper(c("Astronomical Low Tide","Avalanche","Blizzard","Coastal Flood","Cold/Wind Chill","Debris Flow","Dense Fog","Dense Smoke","Drought","Dust Devil","Dust Storm","Excessive Heat","Extreme Cold/Wind Chill","Flash Flood","Flood","Freezing Fog","Frost/Freeze","Funnel Cloud","Hail","Heat","Heavy Rain","Heavy Snow","High Surf","High Wind","Hurricane/Typhoon","Ice Storm","Lakeshore Flood","LakeEffect Snow","Lightning","Marine Hail","Marine High Wind","Marine Strong Wind","Marine Thunderstorm Wind","Rip Current","Seiche","Sleet","Storm Tide","Strong Wind","Thunderstorm Wind","Tornado","Tropical Depression","Tropical Storm","Tsunami","Volcanic Ash","Waterspout","Wildfire","Winter Storm","Winter Weather")
))

#join with summary
summary_by_event <- left_join(summary_by_event, evtype, by = c("EVTYPE_tidy" = "ev_name"))

#check for standar events not appearing
events_not_appearing <- anti_join(evtype, summary_by_event, by = c("ev_name" = "EVTYPE_tidy"))

#joining with storm_data
storm_data <- storm_data %>% 
  left_join(summary_by_event)

## Joining, by = "EVTYPE"

#registering correct codification
storm_data <- storm_data %>% 
  mutate(
    EVTYPE_tidy_bin = if_else(is.na(ev_id), "OTHER_MISCODED", "EVTYPE_coded"),
  )


#HARM
#casualties and fatalities, rank per event
storm_data <- storm_data %>% 
  mutate(
    total_people_affected = FATALITIES + INJURIES
  ) %>%
  mutate(
    rank_fatalities = rank(desc(storm_data$FATALITIES)),
    rank_injuries = rank(desc(storm_data$INJURIES))) %>%
  arrange(rank_fatalities, rank_injuries) %>% 
  mutate(
    rank_harmful = 1:nrow(storm_data)
  )

#harmful miscoded
weight_harmful_miscoded <- storm_data %>% 
  group_by(EVTYPE_tidy_bin) %>% 
  summarise(
    people_affected = sum(total_people_affected)
  ) %>% 
  mutate(
    percent_people = people_affected / sum(people_affected)
  )

great_events_harmful <- storm_data %>% 
  filter(EVTYPE_tidy_bin == "OTHER_MISCODED") %>% 
  arrange(desc(total_people_affected))

great_events_harmful <- great_events_harmful %>% 
  group_by(EVTYPE_tidy) %>% 
  summarise(
    total_harm = sum(total_people_affected, na.rm = TRUE),
    number_events = n()
  ) %>% 
  arrange(desc(total_harm))


#DAMAGES AND ECONOMIC COSTS

#converting damages to proper figures
##here advice from https://rstudio-pubs-static.s3.amazonaws.com/58957_37b6723ee52b455990e149edde45e5b6.html was followed

storm_data <- storm_data %>% 
  mutate(
    property_damage = case_when(
      str_detect(storm_data$PROPDMGEXP, paste(c("k","K"), collapse = "|")) ~ PROPDMG * 1000,
      str_detect(storm_data$PROPDMGEXP, paste(c("m","M"), collapse = "|")) ~ PROPDMG * 1000000,
      str_detect(storm_data$PROPDMGEXP, paste(c("b","B"), collapse = "|")) ~ PROPDMG * 1000000000,
      str_detect(storm_data$PROPDMGEXP, paste(c("h","H"), collapse = "|")) ~ PROPDMG * 100,
      str_detect(storm_data$PROPDMGEXP, "\\+") ~ PROPDMG * 1,
      str_detect(storm_data$PROPDMGEXP, paste(c("1","2","3","4","5","6","7","8","0"), collapse = "|")) ~ PROPDMG * 10,
      TRUE ~ PROPDMG * 0
    ),
    crop_damage = case_when(
      str_detect(storm_data$CROPDMGEXP, paste(c("k","K"), collapse = "|")) ~ CROPDMG * 1000,
      str_detect(storm_data$CROPDMGEXP, paste(c("m","M"), collapse = "|")) ~ CROPDMG * 1000000,
      str_detect(storm_data$CROPDMGEXP, paste(c("b","B"), collapse = "|")) ~ CROPDMG * 1000000000,
      str_detect(storm_data$CROPDMGEXP, paste(c("h","H"), collapse = "|")) ~ CROPDMG * 100,
      str_detect(storm_data$CROPDMGEXP, "\\+") ~ CROPDMG * 1,
      str_detect(storm_data$CROPDMGEXP, paste(c("1","2","3","4","5","6","7","8","0"), collapse = "|")) ~ CROPDMG * 10,
      TRUE ~ CROPDMG * 0
    ),
    total_damage = case_when(
      is.na(property_damage) ~ crop_damage,
      is.na(crop_damage) ~ property_damage,
      !is.na(property_damage) & !is.na(crop_damage) ~ property_damage + crop_damage
    )
  )


#Data for adjusting for US inflation (values don't appear to be indexed)
#CPI seasonally adjusted index was used
## Reference URL: https://www.bls.gov/cpi/ and https://download.bls.gov/pub/time.series/cu/cu.series

cpi_usa <- read_table("https://download.bls.gov/pub/time.series/cu/cu.data.2.Summaries")

## 
## -- Column specification --------------------------------------------------------
## cols(
##   series_id = col_character(),
##   year = col_double(),
##   period = col_character(),
##   value = col_double(),
##   footnote_codes = col_logical()
## )

cpi_usa <- cpi_usa %>% 
  filter(series_id == "CUSR0000SA0")

cpi_usa <- cpi_usa %>% 
  mutate(
    month = as.numeric(str_remove(period,"M")),
    period_f = as.Date(ym(str_c(year, month, sep = "-")))
  )

current_year_figure <- max(cpi_usa$period_f)

current_cpi_val <- as.numeric(cpi_usa[cpi_usa$period_f == current_year_figure,4])

#converting values adjusting for inflation

storm_data <- storm_data %>% 
  mutate(
    year = year(storm_data$BGN_DATE),
    month = month(storm_data$BGN_DATE),
    period_f = as.Date(ym(str_c(year, month, sep = "-"))))

storm_data <- left_join(storm_data, cpi_usa)

## Joining, by = c("year", "month", "period_f")

storm_data <- storm_data %>% 
  mutate(
    curr_val = current_cpi_val)


storm_data <- storm_data %>% 
  mutate(
    property_damage_adj = property_damage / value * curr_val,
    crop_damage_adj = crop_damage / value * curr_val,
    total_damage_adj = total_damage / value * curr_val
  )

storm_data <- storm_data %>% 
  select(-footnote_codes, -value, -period, -series_id, -curr_val)


storm_data <- storm_data %>% 
  mutate(
    total_damage_log = log10(total_damage_adj+1)
  )


#check for events with great impact in damages
#info from this chunk of script was used to refine the list of adjusted expressions on evtype above

weight_damage_miscoded <- storm_data %>% 
  group_by(EVTYPE_tidy_bin) %>% 
  summarise(
    damage_total = sum(total_damage_adj)
  ) %>% 
  mutate(
    percent = damage_total / sum(damage_total)
  )

great_events <- storm_data %>% 
  filter(EVTYPE_tidy_bin == "OTHER_MISCODED") %>% 
  arrange(desc(total_damage_adj))


great_events <- great_events %>% 
  group_by(EVTYPE_tidy) %>% 
  summarise(
    total_dmg = sum(total_damage_adj, na.rm = TRUE),
    number_events = n()
  ) %>% 
  arrange(desc(total_dmg))


#general results

#simplfying codification
storm_data <- storm_data %>% 
  mutate(
    EVTYPE_tidy = if_else(is.na(ev_id), "OTHER_MISCODED", EVTYPE_tidy)
  )

Results

number_of_recorded_events_per_year <- storm_data %>% 
  group_by(year, EVTYPE_tidy) %>% 
  summarise() %>% 
  group_by(year) %>% 
  summarise(
    q = n()
  )

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

#general classification
  most_harmful_overall <- storm_data %>%
  group_by(EVTYPE_tidy) %>%
  summarise(
    total_casualties = sum(FATALITIES),
    total_injuries = sum(INJURIES)
  ) %>% 
  arrange(desc(total_casualties), desc(total_injuries))



#function most harmful

most_harm <- function(db=storm_data,ytd=1993,state="all"){
  
  if(state != "all"){
most_harmful_1993 <- db %>%
  filter(year >= ytd) %>%
  filter(STATE == state) %>% 
  group_by(EVTYPE_tidy) %>%
  summarise(
    total_casualties = sum(FATALITIES),
    total_injuries = sum(INJURIES)
  ) %>% 
  arrange(desc(total_casualties), desc(total_injuries))
  } else {
most_harmful_1993 <- db %>%
  filter(year >= ytd) %>%
  group_by(EVTYPE_tidy) %>%
  summarise(
    total_casualties = sum(FATALITIES),
    total_injuries = sum(INJURIES)
  ) %>% 
  arrange(desc(total_casualties), desc(total_injuries))

  }
  
  most_harmful_1993_graph <- most_harmful_1993 %>% 
  pivot_longer(cols = c("total_casualties", "total_injuries"), names_to = "harm")

#convert events to ordered factors
most_harmful_1993_graph$EVTYPE_tidy <- factor(most_harmful_1993_graph$EVTYPE_tidy, levels = most_harmful_1993$EVTYPE_tidy)

most_harmful_1993_graph$harm <- factor(most_harmful_1993_graph$harm, levels = c("total_injuries", "total_casualties"))

#slice top_10 events
most_harmful_1993_graph_slice <- head(most_harmful_1993_graph, 20)


graph_harm <- ggplot(most_harmful_1993_graph_slice, aes(x = EVTYPE_tidy, y = value, fill = harm)) + 
  geom_bar(stat = "identity", position = "dodge") +
  scale_x_discrete(limits=rev) +
  labs(y = "Number of people")+
  scale_fill_manual(values =  c("#56B4E9","#E69F00"))+
  coord_flip() +
  ggtitle(str_c("Most harmful events from ",ytd," to 2011, in ",state," US", sep = ""))

graph_harm

}

  
#economic damages  
most_economic_overall <- storm_data %>%
  group_by(EVTYPE_tidy) %>%
  summarise(
    total_damage = sum(total_damage_adj),
    total_crop = sum(crop_damage_adj),
    total_property = sum(property_damage_adj)
  ) %>% 
  arrange(desc(total_damage))


#function damages

most_damages <- function(db=storm_data,ytd=1993,state="all"){
  
  if(state != "all"){
    most_economic_1993 <- db %>%
      filter(year >= ytd) %>%
      filter(STATE == "state") %>% 
      group_by(EVTYPE_tidy) %>%
      summarise(
        total_damage = sum(total_damage_adj),
        total_crop = sum(crop_damage_adj),
    total_property = sum(property_damage_adj)
  ) %>% 
  arrange(desc(total_damage))
  } else {
    most_economic_1993 <- db %>%
      filter(year >= ytd) %>% 
      group_by(EVTYPE_tidy) %>%
      summarise(
        total_damage = sum(total_damage_adj),
        total_crop = sum(crop_damage_adj),
    total_property = sum(property_damage_adj)
  ) %>% 
  arrange(desc(total_damage))
  }


#damages
most_economic_1993_graph <- most_economic_1993 %>% 
  pivot_longer(cols = c("total_crop", "total_property"), names_to = "damages")

#convert events to ordered factors
most_economic_1993_graph$EVTYPE_tidy <- factor(most_economic_1993_graph$EVTYPE_tidy, levels = most_economic_1993$EVTYPE_tidy)

#slice top_10 events
most_economic_1993_graph_slice <- head(most_economic_1993_graph, 20)

graph_damages <- ggplot(most_economic_1993_graph_slice, aes(x = EVTYPE_tidy, y = value, fill = damages)) + 
  geom_bar(stat = "identity", position = "dodge") +
  scale_x_discrete(limits=rev) +
  labs(y = "Current US Dollars adjusted by inflation") +
  coord_flip()+
  ggtitle(str_c("Most damaging events from ",ytd," to 2011, in ",state," US", sep = ""))


graph_damages
}


#US maps with most harmful and most damaging
library("usmap")

most_harmful_by_state <- storm_data %>%
  group_by(EVTYPE_tidy, STATE) %>%
  summarise(
    total_casualties = sum(FATALITIES),
    total_injuries = sum(INJURIES)
  ) %>% 
  arrange(desc(total_casualties), desc(total_injuries)) %>% 
  ungroup() %>% 
  group_by(STATE) %>% 
  slice(n = 1) %>% 
  ungroup()

## `summarise()` has grouped output by 'EVTYPE_tidy'. You can override using the
## `.groups` argument.

most_harmful_by_state <- most_harmful_by_state %>% 
  ungroup() %>% 
  mutate(
    state = STATE,
    values = EVTYPE_tidy
  ) %>% 
  select(
    state,values
  )


map_harmful <- plot_usmap(data = most_harmful_by_state)+
  labs(title = "Most harmful type of event per US State, as per casualties from 1950 to 2011") +
  theme(legend.position="bottom")

#damages
most_economic_by_state <- storm_data %>%
  group_by(EVTYPE_tidy,STATE) %>%
  summarise(
    total_damage = sum(total_damage_adj)
  ) %>% 
  arrange(desc(total_damage)) %>% 
  ungroup() %>% 
  group_by(STATE) %>% 
  slice_head(n = 1) %>% 
  ungroup()

## `summarise()` has grouped output by 'EVTYPE_tidy'. You can override using the
## `.groups` argument.

most_economic_by_state <-most_economic_by_state %>% 
  ungroup() %>% 
  mutate(
    state = STATE,
    values = EVTYPE_tidy
  ) %>% 
  select(
    state,values
  )

map_damages <- plot_usmap(data = most_economic_by_state)+
  labs(title = "Most damaging type of event per US State, as per total damages in adjusted USD from 1950 to 2011") +
  theme(legend.position="bottom")

Overall, from 1950 to 2011 most harmful events were tornadoes, which caused 5633 casualties and 91346 injuries. They were followed by excessive heat and flash floods event types in terms of harmfulness.

Nevertheless, as most kinds of events were recorded from 1993, The following graphs presents this information in that given time frame, were it can be seen that excessive heat caused most casualties, followed by tornadoes (which yet caused a great amount of injuries) and floods.

most_harm()

When it comes to economic damages, from 1950 to 2011 most damaging events were floods with caused costs for 242 billion USD (adjusted per inflation). They were followed by tornadoes and hurricanes

Nevertheless, as most kinds of events were recorded from 1993, The following graphs presents this information in that given time frame. Again, floods were the most damaging events followed by hurricanes and storm tides. It is to note, however the impact of droughts when in comes to crop damages.

most_damages()

Regarding the most harmful and most damaging kind of event per state, the following maps display that information

library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

grid.arrange(map_harmful, map_damages)

As an extra, both harmfulness and damages plots were created by functions that can take as a parameters a specific cutoff year to begin, and a specific state to filter. For instance, most harmful events in CA since 1950 would be called by the following code (and the result matches classification provided in the map above).

most_harm(ytd = 1950, state = "CA")

Reproducible Research Course Project - Descriptive analysis on damages and harmfulness of weather events recorded in the US between 1950 and 2011

pguillemi

2022/05/29

Synopsis

Introduction

Data Processing

Results