NOAA Storm Database - Most harmful and ruinous weather events between 1950-2011

Synopsis

This report aims to answer for questions of what types of weather events are the most harmful for people’s health and the most ruinous from an economic point of view. It is based on US National Oceanic and Atmospheric Administration´s (NOAA) storm database, which contains several pieces of information about major weather events in the US between 1950 and 2011. We will focus on those regarding to the estimates of fatalities, injuries, property and crop damages.

Data Processing

First of all, we load the packages we are going to use in this report.

library(dplyr)
library(ggplot2)

We downloaded the data from this link and loaded it into R via read.csv function. It has been stored as a data frame in an object called data.

data <- read.csv("repdata_data_StormData.csv")

Since we are going to investigate which kinds of events are the most harmful and ruinous, we must take a look at what variable contains the names of the event types.

event_types <- length(unique(data$EVTYPE))

The variable is EVTYPE and there are 985 types of weather events in this dataset.

Now we are going to create a new data frame called harmfulness and assign to it some info from the original dataset. First we group the variable EVTYPE and then sum the numeric columns of fatalities and injuries (dropping any existent NA, if there is any) into a column called HARM. Finally we sort HARM in descending order so that we obtain the most harmful types of events with head function.

harmfulness <- data %>%
        group_by(EVTYPE) %>%
        summarise(HARM = sum(FATALITIES, INJURIES, na.rm = TRUE)) %>%
        arrange(desc(HARM))

In order to get the estimate amount of economic losses for each type of event, we are going to need to do a little extra manipulation of the available data. There are two kinds of economical damage reported - property damage and crop damage. Both have estimates that need to be multiplied by the exponentials indicated in columns called PROPDMGEXP and CROPDMGEXP. First, let´s see what their class is and their unique values.

class(data$PROPDMGEXP)
## [1] "character"
unique(data$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
class(data$CROPDMGEXP)
## [1] "character"
unique(data$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

Basically, these values can be summarized as follows:

black/empty character = 0

(?) = 0

(-) = 0

(+) = 1

numeric 0:8 = 10

H,h = hundreds = 100

K,k = thousands = 1,000

M,m = millions = 1,000,000

B,b = billions = 1,000,000,000

Here is the manipulation that conducts to a data frame called damage, which is roughly the result of the multiplication between the estimates contained in the columns PROPDMG and CROPDMG and their respective exponentials obtained from the manipulation that takes place in the beginning of the following code.

damage <- data %>%
        mutate(PROP_MULT = case_when(
                        PROPDMGEXP %in% c("", "-", "?") ~ 0,
                        PROPDMGEXP %in% c("+") ~ 1,
                        PROPDMGEXP %in% c("0", "1", "2", "3", "4", 
                                        "5", "6", "7", "8") ~ 10,
                        PROPDMGEXP %in% c("h", "H") ~ 100,
                        PROPDMGEXP %in% c("k", "K") ~ 1000,
                        PROPDMGEXP %in% c("m", "M") ~ 1000000,
                        PROPDMGEXP %in% c("b", "B") ~ 1000000000
                )
        ) %>%
        mutate(CROP_MULT = case_when(
                        CROPDMGEXP %in% c("", "-", "?") ~ 0,
                        CROPDMGEXP %in% c("+") ~ 1,
                        CROPDMGEXP %in% c("0", "1", "2", "3", "4", 
                                        "5", "6", "7", "8") ~ 10,
                        CROPDMGEXP %in% c("h", "H") ~ 100,
                        CROPDMGEXP %in% c("k", "K") ~ 1000,
                        CROPDMGEXP %in% c("m", "M") ~ 1000000,
                        CROPDMGEXP %in% c("b", "B") ~ 1000000000
                )
        )

damage <- damage %>%
        mutate(PROP_LOSS = PROPDMG * PROP_MULT) %>%
        mutate(CROP_LOSS = CROPDMG * CROP_MULT)

damage <- damage %>%
        group_by(EVTYPE) %>%
        summarise(DMG = sum(PROP_LOSS, CROP_LOSS, na.rm = TRUE)) %>%
        arrange(desc(DMG))

Like harmfulness data frame, damage was also grouped by EVTYPE variable and, in turn, sorted in descending order by the new created variable DMG, that stands for the sum of property and crop losses.

Results

Let´s see what we got from the data handled so far and do some plots.

With the next code we get the ten most harmful types of event.

head_HARM <- head(harmfulness, 10) %>%
        mutate(EVTYPE = factor(EVTYPE, 
                               levels = EVTYPE[order(HARM)]))
head_HARM
## # A tibble: 10 × 2
##    EVTYPE             HARM
##    <fct>             <dbl>
##  1 TORNADO           96979
##  2 EXCESSIVE HEAT     8428
##  3 TSTM WIND          7461
##  4 FLOOD              7259
##  5 LIGHTNING          6046
##  6 HEAT               3037
##  7 FLASH FLOOD        2755
##  8 ICE STORM          2064
##  9 THUNDERSTORM WIND  1621
## 10 WINTER STORM       1527

Now are the ten most economic ruinous types of event (losses turned in billions).

head_DMG <- head(damage, 10) %>%
        mutate(EVTYPE = factor(EVTYPE, 
                               levels = EVTYPE[order(DMG)])) %>%
        mutate(DMG = DMG/1e+09)
head_DMG
## # A tibble: 10 × 2
##    EVTYPE               DMG
##    <fct>              <dbl>
##  1 FLOOD             150.  
##  2 HURRICANE/TYPHOON  71.9 
##  3 TORNADO            57.4 
##  4 STORM SURGE        43.3 
##  5 HAIL               18.8 
##  6 FLASH FLOOD        17.6 
##  7 DROUGHT            15.0 
##  8 HURRICANE          14.6 
##  9 RIVER FLOOD        10.1 
## 10 ICE STORM           8.97

We can see in both headings that there is an outlier type of event, which far exceeds the second position. Because of it, we are going to plot the results zooming at the other nine types of events, without removing the outlier line though.

ggplot(head_HARM, aes(y = EVTYPE)) + 
        
        geom_segment(aes(
                x = 0,
                xend = HARM,
                yend = EVTYPE),
                linewidth = 2,
                color = "darkred") +
        
        geom_point(aes(x = HARM), size = 3, color = "steelblue") +
        
        coord_cartesian(xlim = c(0, 10000)) +
        
        labs(x = "fatalities/injuries",
             y = "types of event",
             title = "Top 10 harmful types of event 1950-2011")

When it comes to harmfulness, tornadoes greatly surpasses the other types of weather events, with estimates of 96979 fatalities/injuries over the years considered. On the other hand, in terms of economic damages, it is noticeable that flood events are the most ruinous ones, accounting for 150.3197 billion dollars in losses.

ggplot(head_DMG, aes(y = EVTYPE)) + 
        
        geom_segment(aes(
                x = 0,
                xend = DMG,
                yend = EVTYPE),
                linewidth = 2,
                color = "darkred") +
        
        geom_point(aes(x = DMG), size = 3, color = "steelblue") +
        
        coord_cartesian(xlim = c(0, 75)) +
        
        labs(x = "property & crop losses in $ billions",
             y = "types of event",
             title = "Top 10 ruinous types of event 1950-2011")