knitr::opts_chunk$set(cache = TRUE)

Severe Weather Events

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The basic goal of this project is to explore the NOAA Storm Database and answer some basic questions about severe weather events. The first objective of this analysis is to find out which events cause the most deaths and injuries in the U.S. population. The second objective is to provide data on which events cause the most property damage in the United States.

Data Processing

Loading Data

We first load all the necessary libraries

library(plyr)
library(dplyr)
library(ggplot2)
library(stringr)
library(reshape2)
library(scales)

Then we load the data

storm=read.csv("repdata_data_StormData.csv.bz2")

Data processing of events that cause more deaths and injuries

We create object health that will store the data grouped by the event type.

health=storm %>% group_by(EVTYPE) %>% summarise(FATALITIES=sum(FATALITIES),INJURIES=sum(INJURIES))
str(health)
## tibble [985 × 3] (S3: tbl_df/tbl/data.frame)
##  $ EVTYPE    : chr [1:985] "   HIGH SURF ADVISORY" " COASTAL FLOOD" " FLASH FLOOD" " LIGHTNING" ...
##  $ FATALITIES: num [1:985] 0 0 0 0 0 0 0 0 0 0 ...
##  $ INJURIES  : num [1:985] 0 0 0 0 0 0 0 0 0 0 ...

Since the object has too many event types (985) and many mispelled event names, we are going to clean the data to group it again and make it more compact.

health$EVTYPE=str_squish(health$EVTYPE) #eliminate left and right whitespaces"
health=health[c(health$FATALITIES!=0 | health$INJURIES!=0),] #eliminate zeros
health$EVTYPE=toupper(health$EVTYPE) #all events to uppercase
health$EVTYPE=sub("TYPHOON","HURRICANE",health$EVTYPE) #changing typhoon to hurricane                                                                   #(Documentation considers them the same)

We create a vector with the event types presentend in the documentation (pg.6)

event=c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", 
        "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", 
        "Excessive Heat", "Extreme Cold/Wind Chill", "Flash Flood", "Flood", "Frost/Freeze", 
        "Funnel Cloud", "Freezing Fog", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", 
        "High Wind", "Hurricane (Typhoon)", "Ice Storm", "Lake-Effect Snow", "Lakeshore Flood", 
        "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", 
        "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Surge/Tide", 
        "Strong Wind", "Thunderstorm Wind",         "Tornado", "Tropical Depression", 
        "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", 
        "Winter Storm", "Winter Weather")
event=toupper(event) #All to uppercase
event=sub("HURRICANE \\(TYPHOON\\)","HURRICANE",event) #Typhoon to Hurricane for simplicity 
event=sub("\\/.*","",event) #Eliminating slash(/) and everything after for easier matching with                              #health data frame

Now that we have all events that the documentation validates, we are going to substitute all recognizable event mispellings in the health data frame for the correct event names. Then we group it by event type again.

for (i in 1:length(event)){health$EVTYPE=sub(paste0(event[i],".*"),event[i],health$EVTYPE)}
health=health %>% group_by(EVTYPE) %>% summarise(FATALITIES=sum(FATALITIES),INJURIES=sum(INJURIES))
str(health)
## tibble [137 × 3] (S3: tbl_df/tbl/data.frame)
##  $ EVTYPE    : chr [1:137] "AVALANCE" "AVALANCHE" "BLACK ICE" "BLIZZARD" ...
##  $ FATALITIES: num [1:137] 1 224 1 101 2 0 6 3 1 158 ...
##  $ INJURIES  : num [1:137] 0 170 24 805 14 2 7 2 0 60 ...

After all this cleansing we are left with a data frame that has 137 unique event types. Some of the events in the data are not included in the documentations and others continue to be mispelled.

summary(health[!health$EVTYPE %in% event,])#Summary of events that are not included in the docs or                                         #or are mispelled
##     EVTYPE            FATALITIES        INJURIES      
##  Length:99          Min.   :  0.00   Min.   :   0.00  
##  Class :character   1st Qu.:  1.00   1st Qu.:   0.00  
##  Mode  :character   Median :  1.00   Median :   2.00  
##                     Mean   : 10.67   Mean   :  99.83  
##                     3rd Qu.:  4.50   3rd Qu.:  19.00  
##                     Max.   :504.00   Max.   :6957.00
summary(health[health$EVTYPE %in% event,])#Summary of events that are included in the docs
##     EVTYPE            FATALITIES         INJURIES       
##  Length:38          Min.   :   0.00   Min.   :    0.00  
##  Class :character   1st Qu.:  14.25   1st Qu.:   47.25  
##  Mode  :character   Median :  93.50   Median :  362.50  
##                     Mean   : 370.76   Mean   : 3438.03  
##                     3rd Qu.: 222.25   3rd Qu.: 1359.00  
##                     Max.   :5658.00   Max.   :91364.00

Although when we look at the mean and median of mispelled events and events not included in the docs, their value is low compared to the events included in the docs.This means that poorly written events won’t have a major influence on the plot.

We create a plot1 data frame were the events, in the health data frame, that are included in the docs will be stored.

plot1 = data.frame(EVTYPE = event) 
plot1 = join(plot1,health,by="EVTYPE") #Passing correct events to plot1 data frame
plot1 = plot1 %>% replace(is.na(.),0) #Zeroing events without data
plot1 = plot1%>% mutate(TOTALHEALTH=FATALITIES+INJURIES)%>%arrange(desc(TOTALHEALTH)) #order
mplot1=melt(plot1,id.vars="EVTYPE") #melting values into one variable for plot

Data processing of events that cause more property damage

We create object econ that will store the data grouped by the event type and property damage unit.

econ =storm %>% group_by(EVTYPE,PROPDMGEXP,CROPDMGEXP) %>% 
    summarise(PROPDMG=sum(PROPDMG),CROPDMG=sum(CROPDMG))
## `summarise()` has grouped output by 'EVTYPE', 'PROPDMGEXP'. You can override
## using the `.groups` argument.
dim(econ)
## [1] 1694    5

Since the object has too many event types (1694) and many mispelled event names, we are going to clean the data to group it again and make it more compact.

econ$EVTYPE=str_squish(econ$EVTYPE)  #eliminate left and right whitespaces"
econ =econ[econ$PROPDMG != 0|econ$CROPDMG != 0, ] #eliminate zeros
econ[,1:3]=apply(econ[,1:3],2,toupper) #Events and units to uppercase
econ$EVTYPE=sub("TYPHOON","HURRICANE",econ$EVTYPE) #changing typhoon to hurricane

#Homogenize dollar units for property damage to thousands (B=Billions,M=Millions,K=Thousands, H=Hundreds).
for (i in 1:length(econ$PROPDMGEXP)){
    if (econ$PROPDMGEXP[i]=="B"){econ$PROPDMG[i]=econ$PROPDMG[i]*10^9}
    else if(econ$PROPDMGEXP[i]=="M"){econ$PROPDMG[i]=econ$PROPDMG[i]*10^6}
    else if(econ$PROPDMGEXP[i]=="K"){econ$PROPDMG[i]=econ$PROPDMG[i]*10^3}
    else if(econ$PROPDMGEXP[i]=="H"){econ$PROPDMG[i]=econ$PROPDMG[i]*10^2}     
    else if(!is.na(as.numeric(econ$PROPDMGEXP[i]))){
        econ$PROPDMG[i]=econ$PROPDMG[i]*10^as.numeric(econ$PROPDMGEXP[i])}}
## Warning: NAs introducidos por coerción
## Warning: NAs introducidos por coerción
## Warning: NAs introducidos por coerción
## Warning: NAs introducidos por coerción
## Warning: NAs introducidos por coerción
## Warning: NAs introducidos por coerción
for (i in 1:length(econ$CROPDMGEXP)){
    if (econ$CROPDMGEXP[i]=="B"){econ$CROPDMG[i]=econ$CROPDMG[i]*10^9}
    else if(econ$CROPDMGEXP[i]=="M"){econ$CROPDMG[i]=econ$CROPDMG[i]*10^6}
    else if(econ$CROPDMGEXP[i]=="K"){econ$CROPDMG[i]=econ$CROPDMG[i]*10^3}
    else if(!is.na(as.numeric(econ$CROPDMGEXP[i]))){
        econ$CROPDMG[i]=econ$CROPDMG[i]*10^as.numeric(econ$CROPDMGEXP[i])}}
## Warning: NAs introducidos por coerción
## Warning: NAs introducidos por coerción
## Warning: NAs introducidos por coerción
## Warning: NAs introducidos por coerción

We are going to substitute all recognizable event mispellings in the econ data frame for the correct event names. Then we group it by event type and property damage unit again.

for (i in 1:length(event)){econ$EVTYPE=sub(paste0(event[i],".*"),event[i],econ$EVTYPE)}
econ = econ %>% group_by(EVTYPE) %>% summarise(PROPDMG=sum(PROPDMG),CROPDMG=sum(CROPDMG))
str(econ,max.level=2)
## tibble [232 × 3] (S3: tbl_df/tbl/data.frame)
##  $ EVTYPE : chr [1:232] "?" "AGRICULTURAL FREEZE" "APACHE COUNTY" "ASTRONOMICAL HIGH TIDE" ...
##  $ PROPDMG: num [1:232] 5000 0 5000 9425000 320000 ...
##  $ CROPDMG: num [1:232] 0 28820000 0 0 0 ...

After all this cleansing we are left with a data frame that has 232 unique event types. Some of the events in the data are not included in the documentations and others continue to be mispelled.

summary(econ[!econ$EVTYPE %in% event,])#Summary of events that are not included in the docs or                                         #or are mispelled
##     EVTYPE           PROPDMGEXP           PROPDMG       
##  Length:172         Length:172         Min.   :      0  
##  Class :character   Class :character   1st Qu.:     13  
##  Mode  :character   Mode  :character   Median :     94  
##                                        Mean   :  89265  
##                                        3rd Qu.:   1000  
##                                        Max.   :5235240
summary(econ[econ$EVTYPE %in% event,])#Summary of events that are included in the docs
##     EVTYPE           PROPDMGEXP           PROPDMG         
##  Length:47          Length:47          Min.   :        4  
##  Class :character   Class :character   1st Qu.:     1517  
##  Mode  :character   Mode  :character   Median :    40115  
##                                        Mean   :  8765213  
##                                        3rd Qu.:  3587963  
##                                        Max.   :144957524

Although when we look at the mean and median of mispelled events and events not included in the docs, their value is low compared to the events included in the docs.This means that poorly written events won’t have a major influence on the plot.

We create a plot2 data frame were the events, in the econ data frame, that are included in the docs will be stored.

plot2 = data.frame(EVTYPE = event)
plot2 = join(plot2,econ,by="EVTYPE") #Passing correct events to plot1 data frame
plot2 = plot2 %>% replace(is.na(.),0) #Zeroing events without data
plot2 = plot2%>% mutate(TOTALDMG=PROPDMG+CROPDMG)%>%arrange(desc(TOTALDMG)) #order
mplot2=melt(plot2,id.vars="EVTYPE") #melting values into one variable for plot

Results

We use ggplot, on the x-axis will be the fatalities, in the y-axis are the event types, and we use colors to separate the events that cause more injuries. The fatalities and injuries are in a logarithmic style because there is high dispersion and a notable outlier.

ggplot(mplot1[c(1:20,49:69),],aes(value,reorder(EVTYPE,value),fill=variable,width=0.5))+
    geom_col()+
    labs(x="People Affected",y="Event Type",
         title="Fatalities and Injuries by Event Type",
         fill="Damage Type")+
    theme_minimal()+
    theme(legend.position="bottom")

Data shows that tornadoes are the most harmful type of event, causing the most deaths and injuries by a large margin compared to excessive heat, which is the second most damaging event . There is also a clear direct correlation between the fatalities and injuries.

On the other hand, the event with the greatest economic consequences is flooding, closely followed by hurricanes.

ggplot(mplot2[c(1:20,49:69),],aes(value,reorder(EVTYPE,value),fill=variable,width=0.5))+
    geom_col()+
    scale_x_continuous(labels = scales::comma)+
    labs(x="Property Damage($)",y="Event Type",
         title="Property and Crop Damage by Event Type",
         fill="Damage Type")+
    theme_minimal()+
    theme(legend.position="bottom")