Synopsis

This analysis identifies the top 10 storm events by the number of human casualties and also the economic impact. For the purpose of this analysis, we defined economic impact as property damage plus crop damage and the health impact is deaths plus injuries. The data is provided by the National Oceanic and Atmospheric Administration (NOAA) from 1950. Through the process below, I will show that Tornados are the most costly storm event by both human and economic impact.
Note: The cost data is not adjusted for inflation.

Background

This document was prepared for the Reproducible Research course offered by John Hopkins University on the Coursera platform. The course can be accessed at https://class.coursera.org/repdata-008. The purpose of this project was to get familiar with publishing R markdown files to RPubs. The project has a defined rubric which was designed by the course instructors and will be graded by peers within the course. The data set was obtained from the instructor provided link and accordingly, I take no responsibility for the validity of the data used to perform this analysis.

Step 1: Load and Processing the Data

The first step is to load the data. The source data is a .csv file compressed with the bzip2 algorithm. Per spec the analysis must start with the raw .csv file.

setInternet2(use=TRUE)
myURL<-"http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(myURL,destfile="Data.csv.bz2",mode="wb")
Data<- read.csv(bzfile("Data.csv.bz2"))

We are tasked to determine which of the storm types cause the greatest amount of economic damage and create the greatest heath impacts. Before doing that, we need to determine what are the possible storm outcomes. We’ll use the levels(factor()) function to create a list of unique storm outcomes.

Outcomes<-levels(factor(Data$EVTYPE))
head(Outcomes,10)
##  [1] "   HIGH SURF ADVISORY" " COASTAL FLOOD"       
##  [3] " FLASH FLOOD"          " LIGHTNING"           
##  [5] " TSTM WIND"            " TSTM WIND (G45)"     
##  [7] " WATERSPOUT"           " WIND"                
##  [9] "?"                     "ABNORMAL WARMTH"
length(Outcomes)
## [1] 985

Our list shows that there are several repetitive values with different cases. This is because R is case sensitive so “snow” and “SNOW” are two different values. We’ll eliminate some of the repetition by changing the case of all of them to capital letters using the toupper() function. There are also several whitespaces around the outcomes so we’ll used the str_trim() function from the stringr package. Note, the values are sorted alphabetically so once the leading white spaces are removed, the values will be re-sorted.

library(stringr)
Data$EVTYPE<- str_trim(toupper(Data$EVTYPE))
Outcomes<-levels(factor(Data$EVTYPE))
head(Outcomes,10)
##  [1] "?"                      "ABNORMAL WARMTH"       
##  [3] "ABNORMALLY DRY"         "ABNORMALLY WET"        
##  [5] "ACCUMULATED SNOWFALL"   "AGRICULTURAL FREEZE"   
##  [7] "APACHE COUNTY"          "ASTRONOMICAL HIGH TIDE"
##  [9] "ASTRONOMICAL LOW TIDE"  "AVALANCE"
length(Outcomes)
## [1] 890

We find our list of possible storms have decreased from 985 to 890. The next step is to get a total count of the health impact for each event. To do this we’ll add the number of Fatalities and Injuries from the respective columns to get the Health impact and similarly add property damage and crop damage for the Economic impact.

Data$Economic<-Data$PROPDMG+Data$CROPDMG
Data$Health<- Data$FATALITIES + Data$INJURIES
Summary<- aggregate(cbind(Health,Economic)~EVTYPE,data=Data,FUN=sum)
head(Summary,10)
##                    EVTYPE Health Economic
## 1                       ?      0     5.00
## 2         ABNORMAL WARMTH      0     0.00
## 3          ABNORMALLY DRY      0     0.00
## 4          ABNORMALLY WET      0     0.00
## 5    ACCUMULATED SNOWFALL      0     0.00
## 6     AGRICULTURAL FREEZE      0    28.82
## 7           APACHE COUNTY      0     5.00
## 8  ASTRONOMICAL HIGH TIDE      0   933.50
## 9   ASTRONOMICAL LOW TIDE      0   320.00
## 10               AVALANCE      1     0.00

With our summary table in hand, we can sort the table by each of the top columns to identify the top 10 storm events by economic costs and human health impact…

Max.Econ  <-head(Summary[order(Summary[3],decreasing=T),c(1,3)],10)
Max.Health<-head(Summary[order(Summary[2],decreasing=T),c(1:2)],10)

…and then create barcharts to visualize the top storm events.

barchart(EVTYPE~Economic,data=Max.Econ,horizontal=T,auto.key=list(title="EVTYPE"),
         main="Top 10 Storm Events by Economic ")

barchart(EVTYPE~Health,data=Max.Health,horizontal=T,auto.key=list(title="EVTYPE"),
         main="Top 10 Storm Events by Human Casualties")

Step 3: Results

The visualizations show the top 10 storm events by both economic and casualty costs. We can see that for human casualties, Tornados clearly outpace every other type of storm. The number of casualties since the inception of the dataset is nearly 10,000 deaths and injuries. Tornados are also clearly the leader in economic impact with $3,000,000K in damage- thats $3 billion.