In this report, I aim to determine the worst storm events in terms of human health and economic impact. To accomplish this, I am using data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage since from 1950 to 2011. From this data, I found the 5 storm events with largest total fatality count, largest total injury count, largest total property damage, and largest total crop damage.
The data for this analysis were downloaded from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database from the direct link here
The data comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. Once unzipped and saved in my working directory for my R project, the following code was used to read the data in as a data frame:
stormData <- read.csv("repdata-data-StormData.csv")
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
To answer this question, I will consider population fatalities and injuries separately. I will find the five storm events which have the highest total fatality counts and total injury counts, respectively.
The following code determines which storm events these are, and stores their total fatality/injury counts in descending order in two separate tables whose variables are then renamed. These tables are used later in the results section.
storm2 <- tbl_df(stormData)
storm2 <- group_by(storm2, EVTYPE)
storm2 <- summarize(storm2, fatalSum = sum(FATALITIES), injSum = sum(INJURIES))
fatalset <- arrange(storm2, desc(fatalSum))
injuryset <- arrange(storm2, desc(injSum))
fatalset <- fatalset[1:5,1:2] # keep only top 5 storm events
names(fatalset) <- c("StormEvent","TotalFatalities") # rename variables
injuryset <- injuryset[1:5,c(1,3)] # keep only top 5 storm events
names(injuryset) <- c("StormEvent", "TotalInjuries") # rename variables
To answer this question, property damage and crop damage will be considered separately. The five storm events with the highest total property damage and highest total crop damage are found.
First, the values for property and crop damage need to be converted from their raw formatting (separate column for units) to strictly numeric values (all one column, numbers only). To do this, I remove any observations with alphabetical characters in the “PROPDMGEXP” and “CROPDMGEXP” variables that are not one of blank, B, M, or K. This removes 376 observations out of 902297 and keeps consistency with the documentation provided by the NOAA. Ultimately, it makes for easy numeric conversions.
stormEcon <- tbl_df(stormData) # convert to tbl for use with dplyr
stormEcon <- filter(stormEcon, PROPDMGEXP == "" | PROPDMGEXP == "B" | PROPDMGEXP == "M" | PROPDMGEXP == "K")
stormEcon <-filter(stormEcon, CROPDMGEXP == "" | CROPDMGEXP == "B" | CROPDMGEXP == "M" | CROPDMGEXP == "K")
# convert to numeric values only
propB <- which(stormEcon$PROPDMGEXP == "B")
propM <- which(stormEcon$PROPDMGEXP == "M")
propK <- which(stormEcon$PROPDMGEXP == "K")
cropB <- which(stormEcon$CROPDMGEXP == "B")
cropM <- which(stormEcon$CROPDMGEXP == "M")
cropK <- which(stormEcon$CROPDMGEXP == "K")
stormEcon$PROPDMG[propB] <- stormEcon$PROPDMG[propB]*1e9
stormEcon$PROPDMG[propM] <- stormEcon$PROPDMG[propM]*1e6
stormEcon$PROPDMG[propK] <- stormEcon$PROPDMG[propK]*1e3
stormEcon$CROPDMG[cropB] <- stormEcon$CROPDMG[cropB]*1e9
stormEcon$CROPDMG[cropM] <- stormEcon$CROPDMG[cropM]*1e6
stormEcon$CROPDMG[cropK] <- stormEcon$CROPDMG[cropK]*1e3
Next, I arrange the various storm events with highest total property damage and highest total crop damage in descending order, and store this info in two separate tables whose variables are then renamed. These tables are used in the results section later on.
stormEcon <- group_by(stormEcon, EVTYPE)
stormEcon <- summarize(stormEcon, propSum = sum(PROPDMG), cropSum = sum(CROPDMG))
propset <- arrange(stormEcon, desc(propSum))
cropset <- arrange(stormEcon, desc(cropSum))
propset <- propset[1:5,1:2] # keep only top 5 storm events
names(propset) <- c("StormEvent","TotalPropertyDamage($)") # rename variables
cropset <- cropset[1:5,c(1,3)] # keep only top 5 storm events
names(cropset) <- c("StormEvent", "TotalCropDamage($)") # rename variables
fatalset
## Source: local data frame [5 x 2]
##
## StormEvent TotalFatalities
## (fctr) (dbl)
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
As we can see from the table, the five storm events with highest total fatality count are Tornado, Excessive Heat, Flash Flood, Heat, and Lightning. Tornadoes had almost three times as many fatalities as Excessive Heat.
injuryset
## Source: local data frame [5 x 2]
##
## StormEvent TotalInjuries
## (fctr) (dbl)
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
As we can see from the table, the five storm events with highest total injury count are Tornado, TSTM Wind, Flood, Excessive Heat, and Lightning. Tornadoes had over 13 times as many injuries as TSTM Wind.
propset
## Source: local data frame [5 x 2]
##
## StormEvent TotalPropertyDamage($)
## (fctr) (dbl)
## 1 FLOOD 144657709807
## 2 HURRICANE/TYPHOON 69305840000
## 3 TORNADO 56925485483
## 4 STORM SURGE 43323536000
## 5 FLASH FLOOD 16140811717
As we can see from the table, the five storm events with highest total property damage are Flood, Hurricane/Typhoon, Tornado, Storm Surge, and Flash Flood. Floods have caused more than two times the amount of property damage that Hurricanes/Typhoons have.
cropset
## Source: local data frame [5 x 2]
##
## StormEvent TotalCropDamage($)
## (fctr) (dbl)
## 1 DROUGHT 13972566000
## 2 FLOOD 5661968450
## 3 RIVER FLOOD 5029459000
## 4 ICE STORM 5022110000
## 5 HAIL 3000537453
As we can see from the table, the five storm events with highest total crop damage are Drought, Flood, River Flood, Ice Storm, and Hail. Droughts have caused more than two times the amount of crop damage that Floods have.
Since the project requires at least one plot, here I plot the worst storm events in terms of total fatality count.
ggplot(fatalset) + geom_bar(aes(x = StormEvent, y = TotalFatalities),
stat="identity", fill=I("red")) + labs(x="Storm Event",
y="Total Fatalities") + labs(title="Worst Storms by Fatality Count") +
scale_x_discrete(limits = as.character(fatalset$StormEvent))