The purpose of this report is to determine which types of severe weather events are most harmful to people’s health and economically costly. This report uses data from the NOAA Storm Database from 1950 - November 2011. During this time period, on average, the most fatalities resulted from “Tornadoes, TSTM Wind, Hail” events while the most injuries resulted from “Heat Wave” events. On average, the most property damage resulted from “Tornadoes, TSTM Wind, Hail” events.
The data set was downloaded from the Reproducible Research course website on January 3, 2017 at 9:55 AM EST. The raw data file was a CSV file compressed using the bzip2 algorithm.
storms <- read.csv("repdata%2Fdata%2FStormData.csv.bz2")
Both fatalities and injuries are reported in the NOAA database. Determine the mean number of fatalities and injuries for each event type.
fatalities <- aggregate(FATALITIES ~ EVTYPE, data = storms, mean)
injuries <- aggregate(INJURIES ~ EVTYPE, data = storms, mean)
From the output of the following code, we see that many of the event types had a mean of 0 for injuries or fatalities.
dim(fatalities)
## [1] 985 2
sum(fatalities$FATALITIES == 0)
## [1] 817
dim(injuries)
## [1] 985 2
sum(injuries$INJURIES == 0)
## [1] 827
Since we are interested in events that cause the most harm to human health, we will plot the events with the top 20 mean values for fatalities.
topfatal <- fatalities[order(-fatalities$FATALITIES)[1:20],]
par(mar = c(11,5,4,4))
barplot(topfatal$FATALITIES, names.arg = topfatal$EVTYPE, las = 2, cex.names = 0.65, ylab = "Average Number of Fatalities")
Similarly, we will plot the events by type with the top 20 mean values for injuries.
topinjure <- injuries[order(-injuries$INJURIES)[1:20],]
par(mar = c(11,5,4,4))
barplot(topinjure$INJURIES, names.arg = topinjure$EVTYPE, las = 2, cex.names = 0.65, ylab = "Average Number of Injuries")
From these plots, we see that, on average, the most fatalities resulted from “Tornadoes, TSTM Wind, Hail” events while the most injuries resulted from “Heat Wave” events.
We will use the property damage data to determine which events are most economically costly. This data is contained in two columns, one containing a number and another containing an exponent or alphabetical character (for example, B is used to denote billions of dollars.) The following code will combine this information into one column.
for(i in 1:length(storms$PROPDMGEXP)) {
if(storms$PROPDMGEXP[i] == "B" | storms$PROPDMGEXP[i] == "b") {
storms$newexp[i] <- 9
} else if(storms$PROPDMGEXP[i] == "M" | storms$PROPDMGEXP[i] == "m"){
storms$newexp[i] <- 6
} else if(storms$PROPDMGEXP[i] == "K" | storms$PROPDMGEXP[i] == "k"){
storms$newexp[i] <- 3
} else if(storms$PROPDMGEXP[i] == "H" | storms$PROPDMGEXP[i] == "h"){
storms$newexp[i] <- 2
} else {storms$newexp[i] <- storms$PROPDMGEXP[i]}
}
storms$damage <- storms$PROPDMG * 10^storms$newexp
Next, we will determine the average property damage by event type.
avgdamage <- aggregate(damage ~ EVTYPE, data = storms, mean)
Since we are interested in events that cause the most damage, we will plot the top 20 average values for property damage.
topdamage <- avgdamage[order(-avgdamage$damage)[1:20],]
par(mar = c(11,11,4,4))
barplot(topdamage$damage, names.arg = topdamage$EVTYPE, las = 2, cex.axis = 0.65, cex.names = 0.65, cex.lab = 0.75, ylab = "Average Property Damage (dollars)")
From the graph, we see that, on average, the event type “Tornadoes, TSTM Wind, Hail” caused the most property damage.