Synopsis

Storms or weather events can cause significant damage and be harmful to public health. Using the Nation Weather Service Storm Preparation Data, we examined both property and crop damage, as well as injuries and fatalities attributed to specific weather events. After summarizing the weather events into 7 distinct groups, we observed that the vast majority of fatalities and injuries were associated with Severe Local Storms. Whereas crop and property damage was attributed to a wider variety of weather events. These data suggest that protecting human health can be more successful if prevention efforts were focused on Sever Local Storms that produce tornado’s, high winds, hail, and/or large amounts of rain. These data also suggest that, due to the more numerous types of damaging events, prevention of weather-related crop and property damage would be difficult to prevent.

Data Import & Initial Processing

Packages - These are the R Packages we need to complete the analysis

library(dplyr)
library(tidyr)
library(ggplot2)
library(kableExtra)

Data was imported as a bzip2 file from:

download.file(“https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2”, “stormdata.csv.bz2”)

stormdata.csv.bzip2 file was imported and decoded using the below command in R:

storm <- read.csv(bzfile("stormdata.csv.bz2"), header=TRUE)

srows <- format(as.numeric(nrow(storm)),big.mark=",")
scols <- ncol(storm)

The imported dataframe has 902,297 observations and 37 variables

Since we are only interested in a subset of variables for this analysis, a smaller data frame is created:

storm1 <- storm %>%
        select(EVTYPE, FATALITIES, INJURIES, PROPDMG,PROPDMGEXP, CROPDMG, CROPDMGEXP)

str(storm1)   # shows structure of dataframe     
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...

Data Cleaning

events <- nrow(data.frame(table(storm1$EVTYPE)))

Since the goal of this analysis is to evaluate the relationship between weather events (variable = EVTYPE) to population health and property/crop damage, interrogation of the EVTYPE variable shows 985 weather events. Upon closer examination, there was a broad mixture of event types that mixed weather events (i.e. thunderstorm) with weather conditions (i.e. wind, rain, hail, etc.). Extracting harm and damage information without summarizing these weather event would yield difficult to interpret results.

Summarizing EVTYPE variable

After conduction a Google search for weather event types, I chose to use the National Weather Service’s definition for severe weather to caegorize the weather events into 7 groups.

The NWS divides severe weather alerts into a few types of hazardous weather/hydrologic events:
Severe local storms(SLS) – Short-fused, small-scale hazardous weather or hydrologic events produced by thunderstorms, including large hail, damaging winds, tornadoes, and flash floods.
Winter storms(WS) – Weather hazards associated with freezing or frozen precipitation (freezing rain, sleet, snow) or combined effects of winter precipitation and strong winds.
Fire weather (FW) – Weather conditions leading to an increased risk of wildfires.
Flooding(FLD) – Hazardous hydrologic events resulting in temporary inundation of land areas not normally covered by water, often caused by excessive rainfall.
Coastal/lakeshore hazards(CLH) – Hydrological hazards that may affect property, marine or leisure activities in areas near ocean and lake waters including high surf and coastal or lakeshore flooding, as well as rip currents.
Marine hazards(MH) – Hazardous events that may affect marine travel, fishing and shipping interests along large bodies of water, including hazardous seas and freezing spray.
Other hazards(OTH) – Weather hazards not directly associated with any of the above including extreme heat or cold, dense fog, high winds, and river or lakeshore flooding.

The above definitions provide sufficient information to perform expression search on the EVTYPE variable and replace the individual data point with one of the above summary categories.

It should be noted that it was important to perform the “Winter Storms” categorization first as many of these events had weather conditions that overlapped with other event types (i.e. winds can be found in thunderstorms and blizzards). By performing the “Winter Storms” search first using search terms that did not contain these common weather types, we avoided to mis-categorizing of Winter Storms with Severe Local Storms

storm1 <- storm1 %>% mutate(EVTYPE = as.character(EVTYPE))

# Winter Storms
storm1$EVTYPE[grepl("blizzard|snow|ice|sleet|freez|cold|wint|chill|avalanche|frost|icy", storm1$EVTYPE,ignore.case = TRUE)] <- "WS"

# Severe Local Storms
storm1$EVTYPE[grepl("Lightning|hail|wind|rain|thunder|tornado|funnel|microburst|tropical|precip|summary", storm1$EVTYPE,ignore.case = TRUE)] <- "SLS"

#Coastal/Lakeshore Hazards
storm1$EVTYPE[grepl("coast|beach|tide|stream|surf|seiche|surge", storm1$EVTYPE,ignore.case = TRUE)] <- "CLH"

#Flooding
storm1$EVTYPE[grepl("flood|fldg", storm1$EVTYPE,ignore.case = TRUE)] <- "FLD"

#Marine Hazards
storm1$EVTYPE[grepl("spray|hurricane|fog|rip|waterspout|typhoon|tsunami", storm1$EVTYPE,ignore.case = TRUE)] <- "MH"
#Fire Weather
storm1$EVTYPE[grepl("heat|drought|fire|warm|dust|smoke|dry", storm1$EVTYPE,ignore.case = TRUE)] <- "FW"

#other
storm1$EVTYPE[!grepl("WS|SLS|CLH|FLD|MH|FW", storm1$EVTYPE,ignore.case = TRUE)] <- "OTH"

events1 <- storm1%>%
        group_by(EVTYPE)%>%
        summarise(Count=n())%>%
        arrange(desc(Count))

Frequency Table of Categorized Weather Event Types

knitr::kable(events1, "html")%>%
  kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
EVTYPE Count
SLS 748864
FLD 81752
WS 47329
FW 10404
MH 6772
CLH 6124
OTH 1052

Cleanup and Transformation of PROPDMGEXP and CROPDMGEXP

The downloaded csv file contained 4 variables that are used to calculate property (PROPDMG & PROPDMGEXP) and crop damage (CROPDMG & CROPDMGEXP). Damage estimates in PROPDMG and CROPDMG are rounded to 3 significant digits and PROPDMGEXP and CROPDMGEXP are alphabetical characters signifying the magnitude of the damage estimate.

Replacement criteria:

  • k or K: value replaced with “1000”
  • m or M: value replaced with “1000000”
  • b or B: value replaced with “1000000000”
  • h or H: although not mentioned data set documentation as being a legitimate entry, I assumed that this was intentional to reflect “hundreds”. Value changed to “100”
  • All other values or blanks changed to “0”

Damage estimates were then multiplied by the exponential to give total damage

storm2 <- storm1 %>%
        mutate(PROPDMGEXP = if_else(PROPDMGEXP == "?", 0,
                                if_else(PROPDMGEXP == "-", 0,
                                        if_else(PROPDMGEXP == "+", 0,
                                                if_else(PROPDMGEXP == "0:8", 0,
                                                        if_else(PROPDMGEXP == "h", 100,
                                                                if_else(PROPDMGEXP == "H", 100,
                                                                        if_else(PROPDMGEXP == "K", 1000,
                                                                                if_else(PROPDMGEXP == "M", 1000000,
                                                                                        if_else(PROPDMGEXP == "B", 1000000000,
                                                                                                if_else(PROPDMGEXP == "m", 1000000,0)))))))))))%>%
        mutate(CROPDMGEXP = if_else(CROPDMGEXP == "?", 0,
                                    if_else(CROPDMGEXP == "0", 0,
                                            if_else(CROPDMGEXP == "2", 0,
                                                    if_else(CROPDMGEXP == "B", 1000000000,
                                                            if_else(CROPDMGEXP == "k", 1000,
                                                                    if_else(CROPDMGEXP == "K", 1000,
                                                                            if_else(CROPDMGEXP == "m", 1000000,
                                                                                    if_else(CROPDMGEXP == "M", 1000000,0)))))))))%>%
        mutate(TOTPROP = PROPDMG*PROPDMGEXP)%>%
        mutate(TOTCROP = CROPDMG*CROPDMGEXP)%>%
        mutate(EVTYPE = as.factor(EVTYPE))%>%
        select(-PROPDMG,-PROPDMGEXP,-CROPDMG, -CROPDMGEXP)

Results

Analysis of Human Casualties

pophealth <- storm2%>%
                group_by(EVTYPE)%>%
                summarise(Fatalities = sum(FATALITIES),
                          Injuries = sum(INJURIES))%>%
                gather("Harm","sum",2:3)

harm.type <- pophealth%>% group_by(Harm)%>%summarise(Total=sum(sum))

totfatal <- format(as.numeric(harm.type[1,2]),big.mark=",")
totinj <- format(as.numeric(harm.type[2,2]),big.mark=",")

ggplot(pophealth, aes(x=EVTYPE, y=sum, fill=Harm))+
        geom_col(position = "dodge")+
        labs(title = "Fig 1. Weather Events vs Injuries/Fatalities",
             y = "Number of Inuries or Fatalities",
             x = "Weather Event Types")

Overall, there were15,145 fatalities and 140,528 injuries associated with all weather events. Figure 1 shows that the vast majority of Injuries and Fatalities were associated with Sever Local Storms (SLS). This is most likely due to the fact that Sever Local Storms generated a variety of individual weather conditions (i.e wind or hail) that were each dangerous, but could also generate multiple conditions (i.e. wind and hail) that may have had a synergistic effect.

Analysis of Crop and Property Damage

damage <- storm2%>%
        group_by(EVTYPE)%>%
        summarise(Crop = sum(TOTCROP)/1000000000,          #divided by billion to make results more readable
                  Property = sum(TOTPROP)/1000000000)%>%
        gather("Damage","sum",2:3)


dam.type <- damage%>%group_by(Damage)%>%summarise(Total=sum(sum)) 

totcrop <- format(round(as.numeric(dam.type[1,2]),1))
totprop <- format(round(as.numeric(dam.type[2,2]),1))

ggplot(damage, aes(x=EVTYPE, y=sum, fill=Damage))+
        geom_col(position = "dodge")+
        labs(title = "Fig 2. Weather Events vs. Property/Crop Damage",
             y= "Total Damage ($, x 1,000,000,000)",
             x = "Weather Event Types")

Crop and property damage tell a different story as damage was found in a wider variety of weather events. The overall value of damage was quite large as there was $ 427.3 B in property damage and $ 49.1 Bin crop damage. Damage due to Flooding caused the most overall damage (Figure 2, >$150B) followed by Severe Local Storms, Marine Hazards, and Coastal/Lakeshore Hazards.