Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

The purpose of this report is to present somo basic plots about the given information by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) in his storm data base. More information can be founded in their page.

Preliminaries

To avoid the presence of some warnings and messages of the code chunk and save some machine time, we set the following global options.

knitr::opts_chunk$set(warning = F, message = F, cache = T)

Loading data and formatting

First of all, we load the tidyverse library to manage dataframes and plots. Also, we set some global options for this report, i.e. we won’t show any warning and message produced by the code.

library(tidyverse)
library(lubridate)
library(gridExtra)

We can download the database from this link. Now, we show the dimensions of the data set and a quick view of the first 3 observations.

fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, destfile = "StormData.csv.bz2")

stormData <- read.csv("StormData.csv.bz2", header = T, stringsAsFactors = F)
dim(stormData)
## [1] 902297     37
head(stormData, 3)
##   STATE__          BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1 4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1 4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1 2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3

Also, we get the structure of this data set.

str(stormData)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

We modify the BGN_DATE and END_DATEas the proper format.

stormData$BGN_DATE <- as.Date(stormData$BGN_DATE, format = "%m/%d/%Y")
stormData$END_DATE <- as.Date(stormData$END_DATE, format = "%m/%d/%Y")

Subsetting data

According to the data base details provided by NOAA, before 1996 only Tornado, Thunderstorm Wind and Hail were reported. Therefore, we only consider observations reported after 1996 (including that year). Also, since this report wants to describe the disasters and the effect of some types of events, we only consider those events which had some economic damages.

stormData2 <- stormData %>% filter(year(BGN_DATE)>=1996) %>%
        mutate(ecoImpact = (PROPDMG > 0 | CROPDMG > 0 | FATALITIES > 0 | INJURIES > 0))

Results

Human health impact

Let’s see a quick review of the presence of all the event type. Since we have more than 985 different observations, we just show those events of which frequency is iver the mean of the frequency of every event. The right plot shows all the events and in the left we are showing only the events which has economic impact.

p <- stormData2 %>% group_by(EVTYPE) %>% summarise(n=n(), damage=sum(FATALITIES, INJURIES)) %>% slice_max(n, n=10) %>% ggplot() + geom_bar(aes(x=reorder(EVTYPE, -n), y=n, fill=damage), stat = "identity") + coord_flip() + theme(legend.position = "none") + scale_fill_gradient(low = "#add8e6", high = "#f08080")
q <- stormData2 %>% filter(ecoImpact) %>% group_by(EVTYPE) %>% summarise(n=n(), damage=sum(FATALITIES, INJURIES)) %>% slice_max(n, n=10) %>% ggplot() + geom_bar(aes(x=reorder(EVTYPE, -n), y=n, fill=damage), stat = "identity") + coord_flip() + scale_fill_gradient(name = "Human Damage", low = "#add8e6", high = "#f08080")
grid.arrange(p, q, ncol = 2)

Although hail was the type of event with the highest number of occurrences, thunderstorm winds (TSTM WIND) have been the most frequent event with economic impact. On the other hand, besides HAIL had the highest frequency it seems to be the fourth most frequent event with economic impact. However, in both plots is clear that TORNADOS were the most harmful for the human beings.

stormData2 %>% filter(ecoImpact) %>% group_by(EVTYPE) %>% summarise(n=n(), damage=sum(FATALITIES, INJURIES))
## # A tibble: 222 x 3
##    EVTYPE                       n damage
##    <chr>                    <int>  <dbl>
##  1 "   HIGH SURF ADVISORY"      1      0
##  2 " FLASH FLOOD"               1      0
##  3 " TSTM WIND"                 2      0
##  4 " TSTM WIND (G45)"           1      0
##  5 "AGRICULTURAL FREEZE"        3      0
##  6 "ASTRONOMICAL HIGH TIDE"     8      0
##  7 "ASTRONOMICAL LOW TIDE"      2      0
##  8 "AVALANCHE"                264    379
##  9 "Beach Erosion"              1      0
## 10 "BLACK ICE"                  1     25
## # ... with 212 more rows

Economic impact

To see the number of events that have had an economic impact over the years, we show the following time series where the most.

p <- stormData2 %>% filter(ecoImpact) %>% group_by(BGN_DATE) %>% summarise(n=n()) %>% ggplot() + geom_line(aes(x=BGN_DATE, y=n))+ scale_color_gradient()
q <- stormData2 %>% filter(ecoImpact) %>% group_by(BGN_DATE) %>% summarise(n=n()) %>% ggplot() + geom_bar(aes(x=month(BGN_DATE), y=n, fill = year(BGN_DATE)), stat = "identity")+ scale_color_gradient() + scale_x_continuous(breaks = c(1:12), name = "Month") + theme(legend.position = "none")
grid.arrange(p, q, ncol = 2)

From the left plot, we can observe some seasonality over the number of events. But, from the right plot we observe that the frequency of the events get higher when in the middle of the year (May, June and July).

To show the real economic impact we must format with the exponents values of the CROPDMGEXP and PROPDMGEXP columns.

stormData2$PROPDMGEXP %>% unique()
## [1] "K" ""  "M" "B" "0"
stormData2$CROPDMGEXP %>% unique()
## [1] "K" ""  "M" "B"

We interpretate those multipliers as follows:

  • [blank] or 0 -> \(\times 1\)
  • K -> \(\times 10^3\)
  • M -> \(\times 10^6\)
  • B -> \(\times 10^9\)
multiplier <- function(x) {
        res <- numeric(length(x))
        res[x=="K"] <- 1e3
        res[x=="M"] <- 1e6
        res[x=="B"] <- 1e9
        res[x==""] <- 1
        return(res)
}

topEconomicImpact <- stormData2 %>% 
        mutate(totalPropDMG = PROPDMG*multiplier(PROPDMGEXP), totalCropDMG = CROPDMG*multiplier(CROPDMGEXP)) %>% 
        filter(ecoImpact) %>% 
        group_by(EVTYPE) %>% 
        summarise(totalDMG = sum(totalPropDMG + totalCropDMG)) %>%
        ungroup() %>% 
        slice_max(totalDMG, n=10)

topEconomicImpact %>% 
        ggplot() + 
        geom_bar(aes(x=reorder(EVTYPE, -totalDMG), y=totalDMG), stat = "identity") + 
        coord_flip()

topEconomicImpact %>% 
        mutate(DAMAGE.Economic = scales::dollar(totalDMG)) %>%
        select(EVTYPE, DAMAGE.Economic) %>% 
        print()
## # A tibble: 10 x 2
##    EVTYPE            DAMAGE.Economic 
##    <chr>             <chr>           
##  1 FLOOD             $148,919,611,950
##  2 HURRICANE/TYPHOON $71,913,712,800 
##  3 STORM SURGE       $43,193,541,000 
##  4 TORNADO           $24,900,370,720 
##  5 HAIL              $17,071,172,870 
##  6 FLASH FLOOD       $16,557,105,610 
##  7 HURRICANE         $14,554,229,010 
##  8 DROUGHT           $14,413,667,000 
##  9 TROPICAL STORM    $8,320,186,550  
## 10 HIGH WIND         $5,881,421,660

Finally, we can observe that FLOOD is the eventy type that leaves more economic damage and also are in the most frequent events with considerably human damage.

Conclusions

To conclude this report, we left the following scatter plot where is possible to observe the economic and human damage together.

stormData2 %>%
        filter(ecoImpact) %>% 
        mutate(totalPropDMG = PROPDMG*multiplier(PROPDMGEXP), totalCropDMG = CROPDMG*multiplier(CROPDMGEXP)) %>% 
        group_by(EVTYPE) %>% 
        summarise(totalEcoDMG = sum(totalPropDMG + totalCropDMG), 
                  totalHumDMG = sum(FATALITIES + INJURIES), 
                  freq = n()) %>%
        ggplot() +
        geom_point(aes(x=log10(totalEcoDMG), y=log10(totalHumDMG), color=freq), alpha = 0.3, size = 2)

As an alternative and future work, we can note that there might be a linear observations of those events which human damage and economic damage are not zero.