Synopsis

This analysis summarizes the the effect of severe weather events on population health and in terms of economic impact. To this end, data from the NOAA Storm Database was analyzed, spanning from 1950 to 2011. The highest impact event by a large margin for total fatalities were tornadoes (5633 fatalities) followed by excessive heat (1903). For injuries, again tornadoes was most predominant with over 90000 injuries followed by other root cause all below 7000 injuries. The economic impact was most severe from floods (138 billion USD) followed by hurricanes/typhoons (29 billion USD) and tornados (17 billion USD).

Data Processing

The complete dataset was imported into R and unecessary columns removed, to reduce the overall size. The impact on population health (both from fatalities and injuries) was summarized by event type and filtered by the 10 predominent event types. Bar plots were created for both fatalities and injuries by event type. To analyze the economic impact, both property damages and crop damages were summed up per event type. Again the resulting dataset was filtered by the 10 predominent event types and a bar chart with top 10 economic impact per event type was created. For details on the data processing, please see the commented R code below.

library(readr)
## Warning: package 'readr' was built under R version 4.5.2
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

# loading the data
df1 <- read_csv("repdata_data_StormData.csv.bz2")
## Rows: 902297 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): BGN_DATE, BGN_TIME, TIME_ZONE, COUNTYNAME, STATE, EVTYPE, BGN_AZI,...
## dbl (18): STATE__, COUNTY, BGN_RANGE, COUNTY_END, END_RANGE, LENGTH, WIDTH, ...
## lgl  (1): COUNTYENDN
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# selecting the columns needed for this analysis
df2 <- df1 %>% select(EVTYPE, FATALITIES:CROPDMGEXP)

##----
## Q2 Across the United States, which types of events (as indicated in the  EVTYPE EVTYPEstart color red, start verbatim, EVTYPE, end verbatim, end color red variable) are most harmful with respect to population health?
##----

# grouping the dataset by EVTYPE, summarizing the total injuries by EVTYPE, and ungrouping again; arranging the
# data by EVTYPE with most frequent injuries and selecting the 10 EVTYPEs with the most frequent injuries
df2_inj <- df2 %>% group_by(EVTYPE) %>% summarise(tot_inj = sum(INJURIES, na.rm = TRUE), .groups = "drop") %>% 
  arrange(desc(tot_inj)) %>% slice_head(n = 10)

# creating a plot for the EVTYPE with the 10 most frequent fatalities
p_inj <- ggplot(df2_inj, aes(x = reorder(EVTYPE, -tot_inj), y = tot_inj, fill = tot_inj)) + geom_col() +
  geom_text(
    aes(label = round(tot_inj, 0)),
    vjust = -0.3,
    size = 3) +
    labs(
    x = "Event Type",
    y = "Total Injuries",
    title = "Total Injuries by Event Type (Top 10 Event Types)") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)
  )

# grouping the dataset by EVTYPE, summarizing the total fatalities by EVTYPE, and ungrouping again; arranging the
# data by EVTYPE with most frequent fatalities and selecting the 10 EVTYPEs with the most frequent fatalities
df2_fat <- df2 %>% group_by(EVTYPE) %>% summarise(tot_fat = sum(FATALITIES, na.rm = TRUE), .groups = "drop") %>%
  arrange(desc(tot_fat)) %>% slice_head(n = 10)

# creating a plot for the EVTYPE with the 10 most frequent fatalities
p_fat <- ggplot(df2_fat, aes(x = reorder(EVTYPE, -tot_fat), y = tot_fat, fill = tot_fat)) + geom_col() +
  geom_text(
    aes(label = round(tot_fat, 0)),
    vjust = -0.3,
    size = 3) +
  labs(
    x = "Event Type",
    y = "Total Fatalities",
    title = "Total Fatalities by Event Type (Top 10 Event Types)"
  ) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )


##----
## Q2 Across the United States, which types of events have the greatest economic consequences?
##----

# Fining out which exponents are present in the date for property and crop damage
p_exp <- df2 %>% distinct(PROPDMGEXP) %>% arrange(PROPDMGEXP)
c_exp <- df2 %>% distinct(CROPDMGEXP) %>% arrange(CROPDMGEXP)

# creating a new column where this exponents are translated to a multiplier for property damage
df2_f <- df2 %>% mutate(PROP_MULTI = case_when(
  PROPDMGEXP == "+" ~ 1,
  PROPDMGEXP == "-" ~ 1,
  PROPDMGEXP == 0 ~ 1,
  PROPDMGEXP == 1 ~ 10,
  PROPDMGEXP == 2 ~ 10^2,
  PROPDMGEXP == 3 ~ 10^3,
  PROPDMGEXP == 4 ~ 10^4,
  PROPDMGEXP == 5 ~ 10^5,
  PROPDMGEXP == 6 ~ 10^6,
  PROPDMGEXP == 7 ~ 10^7,
  PROPDMGEXP == 8 ~ 10^8,
  PROPDMGEXP == "K" ~ 10^3,
  PROPDMGEXP == "M" ~ 10^6,
  PROPDMGEXP == "B" ~ 10^9,
  PROPDMGEXP == "m" ~ 10^6,
  PROPDMGEXP == "h" ~ 10^2,
  PROPDMGEXP == "H" ~ 10^2))

# creating a new column where this exponents are translated to a multiplier for crop damage
df2_f <- df2_f %>% mutate(CROP_MULTI = case_when(
  CROPDMGEXP == 0 ~ 1,
  CROPDMGEXP == 2 ~ 10^2,
  CROPDMGEXP == "K" ~ 10^3,
  CROPDMGEXP == "k" ~ 10^3,
  CROPDMGEXP == "M" ~ 10^6,
  CROPDMGEXP == "B" ~ 10^9,
  CROPDMGEXP == "m" ~ 10^6,
  CROPDMGEXP == "?" ~ 1,
  CROPDMGEXP == NA ~ 1
))

# adding a new column where property damage and crop damage is calculated in Billion dollars
df2_f <- df2_f %>% mutate(PROPDMG_B = PROPDMG*PROP_MULTI/10^9) %>% mutate(CROPDMG_B = CROPDMG*CROP_MULTI/10^9) %>% mutate(DAMAGE_B = PROPDMG_B + CROPDMG_B)

# creating a new dataframe where the total damage is calculated and summed up by EVTYPE
df2_dmg <- df2_f %>% group_by(EVTYPE) %>% summarise(tot_dmg = sum(DAMAGE_B, na.rm = TRUE), .groups = "drop") %>% 
  arrange(desc(tot_dmg)) %>% slice_head(n = 10)

# creating the plot for the total damage by EVTYPE
p_dmg <- ggplot(df2_dmg, aes(x = reorder(EVTYPE, -tot_dmg), y = tot_dmg, fill = tot_dmg)) + geom_col() +
  
  geom_text(
    aes(label = round(tot_dmg, 1)),
    vjust = -0.3,
    size = 3
  ) +
  labs(
    x = "Event Type",
    y = "Damage (B$)",
    title = "Total Damage in Billion USD by Event Type (both property and crop damage)"
  ) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  
  )

Results

The top events responsible for fatalities are shown in the graph below. It was found that Tornadoes are the predominent cause of fatalities followed by excessive heat and flash floods.

p_fat

The top events responsible for injuries are shown in the graph below. It was found that Tornadoes are the predominent cause of fatalities followed by TSTM Wind and Flood.

p_inj

The total economic damage is shown in the figure below. Floods are responsible for 138 billion USD followed by hurricane/typhoon (29 BUSD) and tornadoes (17 BUSD).

p_dmg