Weather events are responsible for significant economic damage and human casualties in the United States. Presented below is an analysis of collected data showing the cumulative damage both in dollars and human lives.

The Data

The U.S. National Oceanic and Atmospheric Administration (NOAA) makes its storm database available to the public. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. Reports are made approximately 60 to 90 days after the event.

From the National Weather Service FAQ:

Where does the data come from?

NCDC receives Storm Data from the National Weather Service. The National Weather service receives their information from a variety of sources, which include but are not limited to: county, state and federal emergency management officials, local law enforcement officials, skywarn spotters, NWS damage surveys, newspaper clipping services, the insurance industry and the general public.

How accurate is the data?

Storm Data Disclaimer:

…Some information appearing in Storm Data may be provided by or gathered from sources outside the National Weather Service (NWS), such as the media, law enforcement and/or other government agencies, private companies, individuals, etc…

The data is inconsistant at best and requires a great deal of tidying.

setwd("~/Assignments/RepData_Storm")

library(knitr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(lubridate)
library(ggplot2)
library(stringi)
library(gridExtra)
library(RColorBrewer)

Data Processing

Download the data file:

if(!file.exists("StormData.csv.bz2")){
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "StormData.csv.bz2")}

Read in the data to storm and extract only those columns related to date, event type, economic damage and human harm:

Columns 2, 8 and 23 through 28

#get the data and hold onto it for dear life
if(!file.exists("temp.RDS")){
temp<<-read.csv("StormData.csv.bz2", header=T, na.strings = "", strip.white = T)
saveRDS(temp, "temp.RDS")
}
#rm("temp")

storm<-readRDS("temp.RDS")

storm<-storm[,c(2,8,23:28)]
saveRDS(storm, "storm.RDS")
storm<-readRDS("storm.RDS")

Save the data in a form that can be easily read and rename columns to more intuitive human readable descriptions.

storm$year<-year(as.Date(storm$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"))
storm$year<-as.numeric(storm$year)

storm<-storm[storm$year >=1994,]

storm<-select(storm, year=year, event=EVTYPE, fatalities = FATALITIES, injuries = INJURIES, prop=PROPDMG, pexp=PROPDMGEXP, crop=CROPDMG, cexp=CROPDMGEXP)
saveRDS(object = storm, "storm.RDS")
storm$pexp<-tolower(as.character(storm$pexp))
storm$cexp<-tolower(as.character(storm$cexp))

Filter the main data set:

Reporting increases dramatically in 1994 so we’ll use the data from 1994 to 2011 for our analysis.

storm$event<-as.character(storm$event)
storm<-filter(storm, year>=1994)
str(storm)
## 'data.frame':    702131 obs. of  8 variables:
##  $ year      : num  1995 1995 1994 1995 1995 ...
##  $ event     : chr  "FREEZING RAIN" "SNOW" "ICE STORM/FLASH FLOOD" "SNOW/ICE" ...
##  $ fatalities: num  0 0 0 0 0 2 0 0 0 0 ...
##  $ injuries  : num  0 0 2 0 0 0 0 0 0 0 ...
##  $ prop      : num  0 0 0 0 0 0.1 50 5 500 0 ...
##  $ pexp      : chr  NA NA NA NA ...
##  $ crop      : num  0 0 0 0 0 10 0 500 0 0 ...
##  $ cexp      : chr  NA NA NA NA ...

Transform the numerical data:

The data in the damages columns, crop and prop require multiplying exponents that currently use a character code. In order to perform like-to-like analysis, we need to transform the damages columns so that they reflect the same units of measure.

storm$cexp<-gsub("h", 2, storm$cexp)
storm$cexp<-gsub("k", 3, storm$cexp)
storm$cexp<-gsub("m", 6, storm$cexp)
storm$cexp<-gsub("b", 9, storm$cexp)
storm$pexp<-gsub("h", 2, storm$pexp)
storm$pexp<-gsub("k", 3, storm$pexp)
storm$pexp<-gsub("m", 6, storm$pexp)
storm$pexp<-gsub("b", 9, storm$pexp)

storm$cexp<-as.numeric(storm$cexp)
## Warning: NAs introduced by coercion
storm$pexp<-as.numeric(storm$pexp)
## Warning: NAs introduced by coercion
storm$cfact<-10^storm$cexp
storm$pfact<-10^storm$pexp
storm$crop<-storm$cfact*storm$crop
storm$prop<-storm$pfact*storm$prop
storm<-select(storm, -c(pfact, cfact))

storm$event<-toupper(trimws(storm$event))

We’ll remove any rows where the event is missing, as that is the most important element and divide the dollar amounts by \(10^6\) to make it friendlier for plotting.

storm<-select(storm, -c(pexp, cexp))
storm<-filter(storm, !is.na(event) & event !="")
storm<-as.data.frame(storm)

storm$crop<-as.numeric(storm$crop/(10^6))
storm$prop<-as.numeric(storm$prop/(10^6))
storm$event<-as.factor(storm$event)
str(storm)
## 'data.frame':    702131 obs. of  6 variables:
##  $ year      : num  1995 1995 1994 1995 1995 ...
##  $ event     : Factor w/ 837 levels "?","ABNORMAL WARMTH",..: 158 521 355 539 539 339 658 658 703 191 ...
##  $ fatalities: num  0 0 0 0 0 2 0 0 0 0 ...
##  $ injuries  : num  0 0 2 0 0 0 0 0 0 0 ...
##  $ prop      : num  NA NA NA NA NA 1e+02 5e-02 5e+00 5e-01 NA ...
##  $ crop      : num  NA NA NA NA NA 10 NA 0.5 NA NA ...

Find the top ten events that cause the most economic or human harm:

For each variable, Property Damage (prop), Crop Damage (crop), Fatalilies (fatalities) and Injuries (injuries) using dplyr and piping, create a separate table for each variable and sum the damage. Arrange the the table in descending order by amount of damage or deaths and then extract the first ten rows.

property damage:

#using dplyr and piping, create a separate table for each variable and sum the damage.  Arrange the the table in descending order by amount of damage or deaths and then extract the first ten rows
prop<-select(storm, event, prop)%>%
        filter(!is.na(prop))%>%
        group_by(event)%>%
        summarise(damage=sum(prop, na.rm=T))%>%
        arrange(desc(damage))

prop$event<-as.character(prop$event)
prop<-prop[1:10,]
prop$event<-as.factor(prop$event)

prop<-transform(prop, event=reorder(event, desc(damage)))

prop$ID<-seq.int(nrow(prop))
kable(prop)
event damage ID
FLOOD 144179.609 1
HURRICANE/TYPHOON 69305.840 2
STORM SURGE 43193.536 3
TORNADO 25630.588 4
FLASH FLOOD 16398.306 5
HAIL 15338.044 6
HURRICANE 11862.819 7
TROPICAL STORM 7703.386 8
HIGH WIND 5266.939 9
WILDFIRE 4765.114 10

crop damage:

crop<-select(storm, event, crop)%>%
        filter(!is.na(crop))%>%
        group_by(event)%>%
        summarise(damage=sum(crop, na.rm=T))%>%
        arrange(desc(damage))
        
crop$event<-as.character(crop$event)
crop<-crop[1:10,]
crop$event<-as.factor(crop$event)
crop<-transform(crop, event=reorder(event, desc(damage)))

kable(crop)
event damage
DROUGHT 13922.0660
FLOOD 5506.9424
ICE STORM 5022.1135
HAIL 2982.6991
HURRICANE 2741.4100
HURRICANE/TYPHOON 2607.8728
FLASH FLOOD 1402.6615
EXTREME COLD 1312.9730
FROST/FREEZE 1094.1860
HEAVY RAIN 733.3998

death:

death<-select(storm, event, fatalities)%>%
        filter(fatalities>0)%>%
        group_by(event)%>%
        summarise(deaths=sum(fatalities, na.rm=T))%>%
        arrange(desc(deaths))

death$event<-as.character(death$event)
death<-death[1:10,]
death$event<-as.factor(death$event)
death<-transform(death, event=reorder(event, desc(deaths)))

kable(death)
event deaths
EXCESSIVE HEAT 1903
TORNADO 1593
FLASH FLOOD 951
HEAT 930
LIGHTNING 794
FLOOD 450
RIP CURRENT 368
HIGH WIND 242
TSTM WIND 241
AVALANCHE 224

injury:

injury<-select(storm, event, injuries)%>%
        filter(injuries>0)%>%
        group_by(event)%>%
        summarise(injured=sum(injuries, na.rm=T))%>%
        arrange(desc(injured))

injury$event<-as.character(injury$event)
injury<-injury[1:10,]
injury<-transform(injury, event=reorder(event, desc(injured)))


kable(injury)
event injured
TORNADO 22571
FLOOD 6778
EXCESSIVE HEAT 6525
LIGHTNING 5116
TSTM WIND 3631
HEAT 2095
ICE STORM 1971
FLASH FLOOD 1754
THUNDERSTORM WIND 1476
WINTER STORM 1298

Figures 1 shows economic damage:

figProp<-ggplot(prop, aes(event, damage, fill=event))+
        geom_bar(stat="identity")+
        ggtitle("Property Damage")+
        theme_classic()+
        theme(axis.text.x = element_text(angle = 90, hjust = 1), axis.title.x=element_blank(), axis.title.y=element_blank()) +
        theme(legend.position="none")


figCrop<-ggplot(crop, aes(event, damage, fill=event))+
        geom_bar(stat="identity")+
        ggtitle("Crop Damage")+
        theme_classic()+
        theme(axis.text.x = element_text(angle = 90, hjust = 1), axis.title.x=element_blank(), axis.title.y=element_blank()) +
        theme(legend.position="none")

grid.arrange(figCrop, figProp, ncol=2, top="Economic Damage", left="USD (millions)")

Figure 2 compares death and injury

figDeath<-ggplot(death, aes(event, deaths, fill=event))+
        geom_bar(stat="identity")+
        ggtitle("Fatalities")+
        theme_classic()+
        theme(axis.text.x = element_text(angle = 90, hjust = 1), axis.title.x=element_blank(), axis.title.y=element_blank()) +
        theme(legend.position="none")

figInjured<-ggplot(injury, aes(event, injured, fill=event))+
        geom_bar(stat="identity")+
        ggtitle("Injuries")+
        theme_classic()+
        theme(axis.text.x = element_text(angle = 90, hjust = 1), axis.title.x=element_blank(), axis.title.y=element_blank()) +
        theme(legend.position="none")

grid.arrange(figDeath, figInjured, ncol=2, top="Human Harm", left="Number of Persons")

Conclusion:

Economic Damage

Water, either too much (flood) or too little (drought) accounts for the bulk of economic damage. Drought wrecks the most damage in agriculture, while habitable property is most severely affected by Flood.

Human Harm

While not as televisually dramatic as other types of events such as hurricanes and floods, heat poses the greatest risk of death of all weather events in the United States, followed closely by tornado. Tornados, however, are the most injurious, non-fatal events.