Reviewing Health, Economic Effects from Different Storms

Synopsis:

This report looks at the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks major storms and weather events in the United States, including estimates of any fatalities, injuries, and property damage. In this analysis, we found that tornadoes cause the most fatalities, injuries and property damage over the period from 1950-2011, while hail caused the most crop damage over that period.

Data Processing

The first step is to download the data and load necessary R packages.

if (!file.exists("data")) {dir.create("data")}
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, destfile = "./data/StormData.csv.bz2", method = "curl")
stormData <- read.csv("./data/StormData.csv.bz2")

#Missing values are a common problem, so we check to see what proportion of the observations are missing (i.e. coded as NA).
mean(is.na(stormData))
## [1] 0.05229737
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(stringr)
library(ggplot2) 
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

Next, we subset the data to look at the storm types and the outcomes we are interested in for this analysis: fatalities, injuries, property damage and crop damage.
Note: The property and crop damage data have to be mutated, based on this information from NOAA:
“Estimates should be rounded to three significant digits, followed by an alphabetical character signifying the magnitude of the number, i.e., 1.55B for $1,550,000,000. Alphabetical characters used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions.”

stormSub <- select(stormData, BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

unique(stormData$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"
unique(stormData$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
newStorm <- stormSub %>% 
  mutate(PropDam = case_when(PROPDMGEXP == "H" ~ PROPDMG*100
                             ,PROPDMGEXP == "h" ~ PROPDMG*100
                             ,PROPDMGEXP == "K" ~ PROPDMG*1000
                             ,PROPDMGEXP == "k" ~ PROPDMG*1000
                             ,PROPDMGEXP == "M" ~ PROPDMG*1000000
                             ,PROPDMGEXP == "m" ~ PROPDMG*1000000
                             ,PROPDMGEXP == "B" ~ PROPDMG*1000000000
                             ,PROPDMGEXP == "b" ~ PROPDMG*1000000000
                             , TRUE ~ PROPDMG))

newStorm <- newStorm %>% 
  mutate(CropDam = case_when(CROPDMGEXP == "K" ~ CROPDMG*1000
                             ,CROPDMGEXP == "k" ~ CROPDMG*1000
                             ,CROPDMGEXP == "M" ~ CROPDMG*1000000
                             ,CROPDMGEXP == "m" ~ CROPDMG*1000000
                             ,CROPDMGEXP == "B" ~ CROPDMG*1000000000
                             ,CROPDMGEXP == "b" ~ CROPDMG*1000000000
                             , TRUE ~ CROPDMG))

  
cleanStorm <- select(newStorm, BGN_DATE, EVTYPE, FATALITIES, INJURIES, PropDam, CropDam)
colnames(cleanStorm) <- c("Date", "Type", "Fatalities", "Injuries", "PropDam", "CropDam")

Since the storm types are in all caps and some types of storms repeat (for example, all types of floods could be categorized as a flood), we then changed the case of the storm types and renamed/regrouped common types of storms.

#This changes the storm types from all caps to lowercase.
cleanStorm[[2]] <- tolower(cleanStorm[[2]])

#Renaming and grouping the common storm types.
cleanStorm$Type[grepl("flood", cleanStorm$Type)] <- "flood"
cleanStorm$Type[grepl("fld", cleanStorm$Type)] <- "flood"
cleanStorm$Type[grepl("flooding", cleanStorm$Type)] <- "flood"
cleanStorm$Type[grepl("wind", cleanStorm$Type)] <- "wind"
cleanStorm$Type[grepl("funnel", cleanStorm$Type)] <- "funnel"
cleanStorm$Type[grepl("hurricane", cleanStorm$Type)] <- "hurricane"
cleanStorm$Type[grepl("thunderstorm", cleanStorm$Type)] <- "thunderstorm"
cleanStorm$Type[grepl("snow", cleanStorm$Type)] <- "snow"
cleanStorm$Type[grepl("heat", cleanStorm$Type)] <- "heat"
cleanStorm$Type[grepl("high temperature", cleanStorm$Type)] <- "heat"
cleanStorm$Type[grepl("warm", cleanStorm$Type)] <- "heat"
cleanStorm$Type[grepl("fire", cleanStorm$Type)] <- "fire"
cleanStorm$Type[grepl("tornado", cleanStorm$Type)] <- "tornado"
cleanStorm$Type[grepl("hail", cleanStorm$Type)] <- "hail"
cleanStorm$Type[grepl("rain", cleanStorm$Type)] <- "rain"
cleanStorm$Type[grepl("lightning", cleanStorm$Type)] <- "lightning"
cleanStorm$Type[grepl("ice", cleanStorm$Type)] <- "ice"
cleanStorm$Type[grepl("cold", cleanStorm$Type)] <- "cold"
cleanStorm$Type[grepl("winter", cleanStorm$Type)] <- "winter"
cleanStorm$Type[grepl("blizzard", cleanStorm$Type)] <- "blizzard"
cleanStorm$Type[grepl("surf", cleanStorm$Type)] <- "surf"
cleanStorm$Type[grepl("swells", cleanStorm$Type)] <- "surf"
cleanStorm$Type[grepl("seas", cleanStorm$Type)] <- "surf"
cleanStorm$Type[grepl("surge", cleanStorm$Type)] <- "surf"
cleanStorm$Type[grepl("spout", cleanStorm$Type)] <- "spout"
cleanStorm$Type[grepl("dry microburst", cleanStorm$Type)] <- "dry microburst"
cleanStorm$Type[grepl("landslide", cleanStorm$Type)] <- "landslide"

We then summarized the data by looking at the sum of the fatalities, injuries, property damage and crop damage caused by each storm type, and created a new data frame with the five “worst” types of storms by each outcome.

stormSum <- summarise(group_by(cleanStorm, Type), Fatalities = sum(Fatalities), Injuries = sum(Injuries), PropDam = sum(PropDam), CropDam = sum(CropDam))

#Break up summaries by which have non-zero values, then arrange those by the highest number of negative outcome, then subset the worst five for each
Fatal <- data.frame(subset(stormSum, Fatalities > 0))
Fatal <- arrange(Fatal, desc(Fatalities))
F5 <- Fatal[1:5, c(1,2)]
colnames(F5) <- c("StormType", "Count")
F5$Outcome <- "Fatalities"

Injurious <- data.frame(subset(stormSum, Injuries > 0))
Injurious <- arrange(Injurious, desc(Injuries))
I5 <- Injurious[1:5, c(1,3)]
colnames(I5) <- c("StormType", "Count")
I5$Outcome <- "Injuries"

PDamage <- data.frame(subset(stormSum, PropDam > 0))
PDamage <- arrange(PDamage, desc(PropDam))
P5 <- PDamage[1:5, c(1,4)]
colnames(P5) <- c("StormType", "Count")
P5$Outcome <- "Property Damage"

CDamage <- data.frame(subset(stormSum, CropDam > 0))
CDamage <- arrange(CDamage, desc(CropDam))
C5 <- CDamage[1:5, c(1,5)]
colnames(C5) <- c("StormType", "Count")
C5$Outcome <- "Crop Damage"

WorstFive <- rbind(F5, I5, P5, C5)
colnames(WorstFive) <- c("StormType", "Count", "OutcomeType")

Results

The plots below show the five storm types that caused the most fatalities, injuries, crop damage and property damage during the assessment period.

ggplot(data=WorstFive, aes(x=as.factor(StormType), y=Count, fill=StormType)) + geom_bar(stat="identity") + facet_wrap(.~OutcomeType, scales="free") + labs(title = "Worst Storm Outcomes: 1950-2011") + labs(x = "Storm Type") + labs(y = "Sum of Negative Outcomes") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_fill_brewer(palette = "Paired")

The following code pulls out which storm type had the highest number of negative outcomes for each category.

MostFatal <- Fatal[which.max(Fatal$Fatalities),1]

MostInjurious <- Injurious[which.max(Injurious$Injuries),1]

MostProperty <- PDamage[which.max(PDamage$PropDam),1]

MostCrops <- CDamage[which.max(CDamage$CropDam),1]

Question One: Across the United States, which types of events are most harmful with respect to population health?

The storm type that caused the most fatalities from 1950-2011 is tornado, and the storm type that caused the most injuries during that period is tornado.

Question Two: Across the United States, which types of events have the greatest economic consequences?

The storm type that caused the most crop damage from 1950-2011 is drought, and the storm type that caused the most property damage during that period is flood.

Conclusion

Our analysis of the NOAA data from this period shows that tornadoes caused the most fatalities and injuries during the period under review, while droughts and floods caused the most crop and property damage during that time. Additional analysis of the NOAA data would benefit from standardized coding for storm types and property/crop damage magnifiers.