Synopsis

In this report, I aim to determine the worst storm events in terms of human health and economic impact. To accomplish this, I am using data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage since from 1950 to 2011. From this data, I found the 5 storm events with largest total fatality count, largest total injury count, largest total property damage, and largest total crop damage.

Data Processing

Loading the Data

The data for this analysis were downloaded from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database from the direct link here

The data comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. Once unzipped and saved in my working directory for my R project, the following code was used to read the data in as a data frame:

stormData <- read.csv("repdata-data-StormData.csv")

Open libraries used in analysis:

## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Processing to answer question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

To answer this question, I will consider population fatalities and injuries separately. I will find the five storm events which have the highest total fatality counts and total injury counts, respectively.

The following code determines which storm events these are, and stores their total fatality/injury counts in descending order in two separate tables whose variables are then renamed. These tables are used later in the results section.

storm2 <- tbl_df(stormData)
storm2 <- group_by(storm2, EVTYPE)
storm2 <- summarize(storm2, fatalSum = sum(FATALITIES), injSum = sum(INJURIES))

fatalset <- arrange(storm2, desc(fatalSum)) 
injuryset <- arrange(storm2, desc(injSum))

fatalset <- fatalset[1:5,1:2] # keep only top 5 storm events
names(fatalset) <- c("StormEvent","TotalFatalities") # rename variables
injuryset <- injuryset[1:5,c(1,3)] # keep only top 5 storm events
names(injuryset) <- c("StormEvent", "TotalInjuries") # rename variables

Processing to answer question 2: Across the United States, which types of events have the greatest economic consequences?

To answer this question, property damage and crop damage will be considered separately. The five storm events with the highest total property damage and highest total crop damage are found.

First, the values for property and crop damage need to be converted from their raw formatting (separate column for units) to strictly numeric values (all one column, numbers only). To do this, I remove any observations with alphabetical characters in the “PROPDMGEXP” and “CROPDMGEXP” variables that are not one of blank, B, M, or K. This removes 376 observations out of 902297 and keeps consistency with the documentation provided by the NOAA. Ultimately, it makes for easy numeric conversions.

stormEcon <- tbl_df(stormData) # convert to tbl for use with dplyr

stormEcon <- filter(stormEcon, PROPDMGEXP == "" | PROPDMGEXP == "B" | PROPDMGEXP == "M" | PROPDMGEXP == "K")
stormEcon <-filter(stormEcon, CROPDMGEXP == "" | CROPDMGEXP == "B" | CROPDMGEXP == "M" | CROPDMGEXP == "K")

# convert to numeric values only
propB <- which(stormEcon$PROPDMGEXP == "B")
propM <- which(stormEcon$PROPDMGEXP == "M")
propK <- which(stormEcon$PROPDMGEXP == "K")

cropB <- which(stormEcon$CROPDMGEXP == "B")
cropM <- which(stormEcon$CROPDMGEXP == "M")
cropK <- which(stormEcon$CROPDMGEXP == "K")

stormEcon$PROPDMG[propB] <- stormEcon$PROPDMG[propB]*1e9 
stormEcon$PROPDMG[propM] <- stormEcon$PROPDMG[propM]*1e6
stormEcon$PROPDMG[propK] <- stormEcon$PROPDMG[propK]*1e3

stormEcon$CROPDMG[cropB] <- stormEcon$CROPDMG[cropB]*1e9
stormEcon$CROPDMG[cropM] <- stormEcon$CROPDMG[cropM]*1e6
stormEcon$CROPDMG[cropK] <- stormEcon$CROPDMG[cropK]*1e3

Next, I arrange the various storm events with highest total property damage and highest total crop damage in descending order, and store this info in two separate tables whose variables are then renamed. These tables are used in the results section later on.

stormEcon <- group_by(stormEcon, EVTYPE)
stormEcon <- summarize(stormEcon, propSum = sum(PROPDMG), cropSum = sum(CROPDMG))

propset <- arrange(stormEcon, desc(propSum)) 
cropset <- arrange(stormEcon, desc(cropSum))

propset <- propset[1:5,1:2] # keep only top 5 storm events
names(propset) <- c("StormEvent","TotalPropertyDamage($)") # rename variables
cropset <- cropset[1:5,c(1,3)] # keep only top 5 storm events
names(cropset) <- c("StormEvent", "TotalCropDamage($)") # rename variables

Results

Events that are most harmful with respect to population health

Fatalities
fatalset
## Source: local data frame [5 x 2]
## 
##       StormEvent TotalFatalities
##           (fctr)           (dbl)
## 1        TORNADO            5633
## 2 EXCESSIVE HEAT            1903
## 3    FLASH FLOOD             978
## 4           HEAT             937
## 5      LIGHTNING             816

As we can see from the table, the five storm events with highest total fatality count are Tornado, Excessive Heat, Flash Flood, Heat, and Lightning. Tornadoes had almost three times as many fatalities as Excessive Heat.

Injuries
injuryset
## Source: local data frame [5 x 2]
## 
##       StormEvent TotalInjuries
##           (fctr)         (dbl)
## 1        TORNADO         91346
## 2      TSTM WIND          6957
## 3          FLOOD          6789
## 4 EXCESSIVE HEAT          6525
## 5      LIGHTNING          5230

As we can see from the table, the five storm events with highest total injury count are Tornado, TSTM Wind, Flood, Excessive Heat, and Lightning. Tornadoes had over 13 times as many injuries as TSTM Wind.

Events with the greatest economic consequence

Property Damage
propset
## Source: local data frame [5 x 2]
## 
##          StormEvent TotalPropertyDamage($)
##              (fctr)                  (dbl)
## 1             FLOOD           144657709807
## 2 HURRICANE/TYPHOON            69305840000
## 3           TORNADO            56925485483
## 4       STORM SURGE            43323536000
## 5       FLASH FLOOD            16140811717

As we can see from the table, the five storm events with highest total property damage are Flood, Hurricane/Typhoon, Tornado, Storm Surge, and Flash Flood. Floods have caused more than two times the amount of property damage that Hurricanes/Typhoons have.

Crop Damage
cropset
## Source: local data frame [5 x 2]
## 
##    StormEvent TotalCropDamage($)
##        (fctr)              (dbl)
## 1     DROUGHT        13972566000
## 2       FLOOD         5661968450
## 3 RIVER FLOOD         5029459000
## 4   ICE STORM         5022110000
## 5        HAIL         3000537453

As we can see from the table, the five storm events with highest total crop damage are Drought, Flood, River Flood, Ice Storm, and Hail. Droughts have caused more than two times the amount of crop damage that Floods have.

Plots

Since the project requires at least one plot, here I plot the worst storm events in terms of total fatality count.

ggplot(fatalset) + geom_bar(aes(x = StormEvent, y = TotalFatalities), 
       stat="identity", fill=I("red")) + labs(x="Storm Event", 
       y="Total Fatalities") + labs(title="Worst Storms by Fatality Count") + 
       scale_x_discrete(limits = as.character(fatalset$StormEvent))