Tornadoes kill, floods & hurricanes destroy

A preliminary look at NOAA data on fatalities, injuries, and economic damages

Synopsis

A simple exploratory analysis is performed, aimed at classifying weather-related events in order of number of fatalities, number of related injuries, and property and crop damage caused.

Some data manipulation, carefully reported and reproducible, allows the use of ggplot() functions to visualize the quantitative information of interest in the form of barplots.

A laconic comment preceeds most r code blocks.

A brief illustration of the results follow the plots included in the present report.

Data processing

Start by calling the necessary libraries.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.1.0     ✓ dplyr   1.0.4
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(ggplot2)

First, we read the data and do some preliminary inspection.

rawData <- read.csv("/Users/francescoaldo/Desktop/COURSES MATERIAL/R DATA SCIENCE - Coursera/c5_w4_ProgAss/repdata-data-StormData.csv", header = TRUE)

StormData <- rawData 
dim(StormData)

## [1] 902297     37

names(StormData)

##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

After unzipping and reading in the comma separated data, we keep only the variables we are interested in¹.

selectedVars <- c("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", 
                   "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
StormData <- StormData[, selectedVars] 
dim(StormData)

## [1] 902297      8

str(StormData)

## 'data.frame':    902297 obs. of  8 variables:
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...

From a 902,297 x 37 data set we are now down to 902,297 x 8. Let’s take a look at the numeric variables.

summary(StormData[, c("FATALITIES", "INJURIES", "PROPDMG", "CROPDMG")])

##    FATALITIES          INJURIES            PROPDMG           CROPDMG       
##  Min.   :  0.0000   Min.   :   0.0000   Min.   :   0.00   Min.   :  0.000  
##  1st Qu.:  0.0000   1st Qu.:   0.0000   1st Qu.:   0.00   1st Qu.:  0.000  
##  Median :  0.0000   Median :   0.0000   Median :   0.00   Median :  0.000  
##  Mean   :  0.0168   Mean   :   0.1557   Mean   :  12.06   Mean   :  1.527  
##  3rd Qu.:  0.0000   3rd Qu.:   0.0000   3rd Qu.:   0.50   3rd Qu.:  0.000  
##  Max.   :583.0000   Max.   :1700.0000   Max.   :5000.00   Max.   :990.000

The ‘PROPDMGEXP’ and ‘CROPDMGEXP’ variables contain the exponents (or orders of magnitude), coded as charater inputs, for the ‘PROPDMG’ and ‘CROPDMG’ variables respectively. A double call to the unique() function tells us what unique values they assume, and we then “translate” those to numbers based on the documentation.

unique(StormData$PROPDMGEXP)

##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"

unique(StormData$CROPDMGEXP)

## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

StormData$PROPDMGEXP <- toupper(StormData$PROPDMGEXP)
StormData$PROPDMGEXP[StormData$PROPDMGEXP %in% c("", "+", "-", "?")] <- "0"
StormData$PROPDMGEXP[StormData$PROPDMGEXP %in% c("B")] <- "9"
StormData$PROPDMGEXP[StormData$PROPDMGEXP %in% c("M")] <- "6"
StormData$PROPDMGEXP[StormData$PROPDMGEXP %in% c("K")] <- "3"
StormData$PROPDMGEXP[StormData$PROPDMGEXP %in% c("H")] <- "2"

StormData$CROPDMGEXP <- toupper(StormData$CROPDMGEXP) 
StormData$CROPDMGEXP[StormData$CROPDMGEXP %in% c("", "+", "-", "?")] <- "0" 
StormData$CROPDMGEXP[StormData$CROPDMGEXP %in% c("B")] <- "9" 
StormData$CROPDMGEXP[StormData$CROPDMGEXP %in% c("M")] <- "6" 
StormData$CROPDMGEXP[StormData$CROPDMGEXP %in% c("K")] <- "3" 
StormData$CROPDMGEXP[StormData$CROPDMGEXP %in% c("H")] <- "2"

Now we use the numeric exponents above to create ‘total’ variables.

StormData$PROPDMGTOTAL <- StormData$PROPDMG*(10^as.numeric(StormData$PROPDMGEXP)) 
StormData$CROPDMGTOTAL <- StormData$CROPDMG*(10^as.numeric(StormData$CROPDMGEXP)) 
StormData$DMGTOTAL <- StormData$PROPDMGTOTAL + StormData$CROPDMGTOTAL

We use the ‘dplyr’ package pipeline operator to compute the total number of injuries, fatalities, and damages by type of event.

Totals <- StormData %>%
                 group_by(EVTYPE) %>%
                 summarize(SUMFATALITIES = sum(FATALITIES), 
                           SUMINJURIES = sum(INJURIES), 
                           SUMPROPDMG = sum(PROPDMGTOTAL), 
                           SUMCROPDMG = sum(CROPDMGTOTAL), 
                           TOTALDMG = sum(DMGTOTAL) 
                           ) 

head(Totals, 5)

## # A tibble: 5 x 6
##   EVTYPE                SUMFATALITIES SUMINJURIES SUMPROPDMG SUMCROPDMG TOTALDMG
##   <chr>                         <dbl>       <dbl>      <dbl>      <dbl>    <dbl>
## 1 "   HIGH SURF ADVISO…             0           0     200000          0   200000
## 2 " COASTAL FLOOD"                  0           0          0          0        0
## 3 " FLASH FLOOD"                    0           0      50000          0    50000
## 4 " LIGHTNING"                      0           0          0          0        0
## 5 " TSTM WIND"                      0           0    8100000          0  8100000

Let’s get a look at the ordering, now. First, the worst: fatalities. We’ll show the “top ten”.

TotalFatalities <- arrange(Totals, desc(SUMFATALITIES)) 
TopTenFatalities <- head(TotalFatalities, 10)

Similarly, for injuries:

TotalInjuries <- arrange(Totals, desc(SUMINJURIES)) 
TopTenInjuries <- head(TotalInjuries, 10)

Finally, property, crop, and total damage.

TotalPropertyDMG <- arrange(Totals, desc(SUMPROPDMG)) 
TopTenPropertyDMG <- head(TotalPropertyDMG, 10) 

TotalCropDMG <- arrange(Totals, desc(SUMCROPDMG)) 
TopTenCropDMG <- head(TotalCropDMG, 10) 

TotalDMG <- arrange(Totals, desc(TOTALDMG)) 
TopTenTotalDMG <- head(TotalDMG, 10)

Results

Fatalities

TopTenFatalities$EVTYPE <- with(TopTenFatalities, 
                                reorder(EVTYPE, -SUMFATALITIES)) 
plotFatalities <- ggplot(TopTenFatalities, 
                         aes(EVTYPE, SUMFATALITIES, label = SUMFATALITIES)) 
plotFatalities <- plotFatalities + 
    geom_bar(stat = "identity", color = "black", 
             fill = "lightblue", width = 0.75) + 
    geom_text(nudge_y = 200, color ="darkorange2") + 
    xlab("Type of Weather Event") + 
    ylab("Total Number of Fatalities") + 
    ggtitle("Most Fatal Events") + 
    theme_light() + 
    theme(plot.title = element_text(hjust = 0.5), 
          axis.text.x = element_text(size = 7, angle = 20)) 
plotFatalities

Most weather-event related fatalities are due to tornadoes, who account for thrice the number of deaths due to excessive heat.

Injuries

TopTenInjuries$EVTYPE <- with(TopTenInjuries, 
                              reorder(EVTYPE, -SUMINJURIES)) 
plotInjuries <- ggplot(TopTenInjuries, 
                         aes(EVTYPE, SUMINJURIES, label = SUMINJURIES)) 
plotInjuries <- plotInjuries + 
    geom_bar(stat = "identity", color = "black", 
             fill = "lightblue", width = 0.75) + 
    geom_text(nudge_y = 2750, color ="darkorange2") + 
    scale_y_continuous(limits = c(0, 100000)) + 
    xlab("Type of Weather Event") + 
    ylab("Total Number of Injuries") + 
    ggtitle("Events which Caused Most Injuries") + 
    theme_light() + 
    theme(plot.title = element_text(hjust = 0.5), 
          axis.text.x = element_text(size = 7, angle = 20)) 
plotInjuries

Tornadoes also “dominate” the appalling injuries statistics, with more than 90,000 events related to them: they represent more than the other nine head-listers combined.

Damages

TopTenTotalDMG$EVTYPE <- with(TopTenTotalDMG, reorder(EVTYPE, -TOTALDMG))
DamageDataLong <- TopTenTotalDMG %>%
    gather(key = "Type", value = "TOTALDAMAGE", c("SUMPROPDMG", "SUMCROPDMG")) %>%
    select(EVTYPE, Type, TOTALDAMAGE)
DamageDataLong$Type[DamageDataLong$Type %in% c("SUMPROPDMG")] <- "Property damage" 
DamageDataLong$Type[DamageDataLong$Type %in% c("SUMCROPDMG")] <- "Crop damage"

plotDMG <- ggplot(DamageDataLong, 
                  aes(x = EVTYPE, y = TOTALDAMAGE, fill = Type))
plotDMG <- plotDMG + 
    geom_bar(stat = "identity", position = "stack", 
             color = "black", width = 0.75) + 
    scale_y_continuous(labels = scales::label_comma(scale = 1/1000000000)) + 
    xlab("Event Type") + 
    ylab("Total Damage in Billion USD$") + 
    ggtitle("Events Causing Most Damage") + 
    theme_light() + 
    theme(plot.title = element_text(hjust = 0.5), legend.position = "bottom", 
          axis.text.x = element_text(size = 7, angle = 20)) 
plotDMG

Most of the economics damages is in the form in damage to the property (buildings, factories, vehicles, etc.), with flood representing double the total damage by hurricanes (2nd place).

As was to be expected, drought, river floods and ice storms are pretty damaging for agriculture; the split of total damages between property and crop is about 50-50 for the latter two, while is almost all in crop damage for drought.

Reproducible Research: Peer Assessment 2