Project Title:

Analysis of Economic and Health Impact of major US Storm events

Data Processing

We begin by loading the TidyVerse (dplyr, the pipe, ggplot, etc), unzip the file, and read the csv into R.

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.4.1
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Warning: package 'ggplot2' was built under R version 3.4.1
## Warning: package 'tidyr' was built under R version 3.4.1
## Warning: package 'readr' was built under R version 3.4.1
## Warning: package 'purrr' was built under R version 3.4.1
## Warning: package 'dplyr' was built under R version 3.4.1
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats
storm_dat <- read.csv(bzfile("stormdat.csv.bz2"), 
                      header=TRUE,
                      sep=",",
                      stringsAsFactors=FALSE)

We can then look at what the overall data looks like as it is now nicely stored in a dataframe.

str(storm_dat)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

I noticed the column named “PROPDMGEXP” which applies a letter or number to the PROPDMG column to tell us the exponent to take that data by. For example, K = 1000 = 10^3. We need to therefore apply this so that all of the data is stored in a single column for summarizing and plotting.

storm_dat$PROPDMGEXP <- toupper(storm_dat$PROPDMGEXP)

prop_exp_key <- c("\"\"" = 10^0, 
                  "-" = 10^0, 
                  "+" = 10^0, 
                  "0" = 10^0, 
                  "1" = 10^1, 
                  "2" = 10^2, 
                  "3" = 10^3, 
                  "4" = 10^4, 
                  "5" = 10^5, 
                  "6" = 10^6,
                  "7" = 10&6,
                  "8" = 10^8,
                  "9" = 10^9,
                  "H" = 10^2,
                  "K" = 10^3,
                  "M" = 10^6,
                  "B" = 10^9)

storm_dat$PROPDMGEXP <- prop_exp_key[as.character(storm_dat$PROPDMGEXP)]
storm_dat$PROPDMGEXP[is.na(storm_dat$PROPDMGEXP)] <- 10^0

We can now start to subset the data using dplyr, summarize, group_by, and the pipe. We will group by the Event type (EVTYPE), selecting only the columns of interest. I decided to focus on fatalities and on property damage.

sub <- storm_dat %>% select(EVTYPE, STATE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, STATE__) %>% 
  group_by(EVTYPE) %>% 
  summarise(fat = sum(FATALITIES), prop = sum(PROPDMG*PROPDMGEXP))
## Warning: package 'bindrcpp' was built under R version 3.4.1

We can then select only the fatalities or property damage in a new dataframe for plotting. I am selecting the top 10 events by # of fatalities or property damage.

fatal <- sub %>% select(EVTYPE, fat) %>% filter(fat > 1) %>% arrange(desc(fat)) %>% slice(1:10)
property <- sub %>% select(EVTYPE, prop) %>% filter(prop > 1) %>% arrange(desc(prop)) %>% slice(1:10)

Results

Q1 Which type of events are most harmful with respect to population health?

We can see the top 10 storm events by number of fatalities here. We can see that Tornadoes cause the highest number of fatalities. Excessive heat and flash flooding cause the 2nd and 3rd most deaths. Flooding caused the 7th most deaths, but as we can see in the following graph it was the most costly in terms of property damage.

(g1 <- ggplot(fatal, (aes(x = reorder(EVTYPE, -fat), y = fat))) + 
    geom_bar(stat="identity", aes(fill=EVTYPE), position="dodge") +
    xlab("Event Type") + ylab("Total number of fatalities") +
    theme(axis.text.x = element_text(angle=45, hjust=1)) +
    theme(legend.position = "none") +
    ggtitle("Chart of top 10 storm event by number of fatalities"))

Q2 Which types of events have the greatest economic consequences?

However, when we look at economic impact via property damage, we can see that flooding causes the most damage. Hurricanes/Typhoons and tornados cause substantial property damage at the 2nd and 3rd most costly.

(g2 <- ggplot(property, (aes(x = reorder(EVTYPE, -prop), y = prop))) + 
    geom_bar(stat="identity", aes(fill=EVTYPE), position="dodge") +
    xlab("Event Type") + ylab("Total property damage in dollars") +
    theme(axis.text.x = element_text(angle=45, hjust=1)) +
    theme(legend.position = "none") +
    ggtitle("Chart of top 10 storm event by property damage"))

Conclusions

Tornadoes cause the most fatalities and the 3rd most property damage. Flooding results in the most property damage, but only the 7th most fatalities.

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. We examined NOAA storm data to determine which types of storm events caused the most damage and fatalities in the US. This required us to summmate data across 61 years, and to combine two columns that indicated the exponent level of the property damage. Overall it appears that tornadoes are the most dangerous to human life, however flooding cause the most overall property damage.