require(dplyr)
This analysis explores data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. In particular, this analysis works to identify the types of weather events that
This data is taken over about 60 years (from 1950 - 2011) and more on the contents of the dataset can be found here.
Encrypted csv file is downloaded and stored in data folder, unless it is already there.
if(!(file.exists("data"))) {dir.create("./data")}
if(!(file.exists("./data/stormdata.csv.bz2"))) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
"./data/stormdata.csv.bz2", method = "curl")
}
If storm data frame is not already cached, the data is read into a data frame from the encrypted csv file, and the resulting object is cached via R Markdown’s code chunk cache parameter. If storm data frame is already cached, it is loaded in.
stormData <- read.csv("./data/stormdata.csv.bz2") #reads in compressed csv file, result is cached
A subset of the original data frame with only a few of the columns/variables is taken for this analysis:
Further, only observations that correspond to events in the 50 U.S. states plus D.C. are taken.
validStates <- unique(stormData$STATE)[c(1:50, 52)]
stormData <- stormData %>% filter(STATE %in% validStates) %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
Each damage value (PROPDMG and CROPDMG) has a corresponding exponent value (PROPDMGEXP and CROPDMGEXP). The damage values are supposed to be multiplied by their corresponding exponent values before they are compared to each other. However, the exponent values are coded and must be converted to their proper numeric values before multiplying. I obtained these code to numeric conversions from another individual’s analysis.
#Converts PROPDMGEXP codes to their numeric values.
stormData$PROPDMGEXP[stormData$PROPDMGEXP == "K"] <- 1000
stormData$PROPDMGEXP[stormData$PROPDMGEXP == "M"] <- 1000000
stormData$PROPDMGEXP[stormData$PROPDMGEXP == ""] <- 1
stormData$PROPDMGEXP[stormData$PROPDMGEXP == "B"] <- 1000000000
stormData$PROPDMGEXP[stormData$PROPDMGEXP == "m"] <- 1000000
stormData$PROPDMGEXP[stormData$PROPDMGEXP == "0"] <- 1
stormData$PROPDMGEXP[stormData$PROPDMGEXP == "5"] <- 100000
stormData$PROPDMGEXP[stormData$PROPDMGEXP == "6"] <- 1000000
stormData$PROPDMGEXP[stormData$PROPDMGEXP == "4"] <- 10000
stormData$PROPDMGEXP[stormData$PROPDMGEXP == "2"] <- 100
stormData$PROPDMGEXP[stormData$PROPDMGEXP == "3"] <- 1000
stormData$PROPDMGEXP[stormData$PROPDMGEXP == "h"] <- 100
stormData$PROPDMGEXP[stormData$PROPDMGEXP == "7"] <- 10000000
stormData$PROPDMGEXP[stormData$PROPDMGEXP == "H"] <- 100
stormData$PROPDMGEXP[stormData$PROPDMGEXP == "1"] <- 10
stormData$PROPDMGEXP[stormData$PROPDMGEXP == "8"] <- 100000000
stormData$PROPDMGEXP[stormData$PROPDMGEXP == "+"] <- 0
stormData$PROPDMGEXP[stormData$PROPDMGEXP == "-"] <- 0
stormData$PROPDMGEXP[stormData$PROPDMGEXP == "?"] <- 0
#Converts CROPDMGEXP codes to their numeric values.
stormData$CROPDMGEXP[stormData$CROPDMGEXP == "K"] <- 1000
stormData$CROPDMGEXP[stormData$CROPDMGEXP == "k"] <- 1000
stormData$CROPDMGEXP[stormData$CROPDMGEXP == "M"] <- 1000000
stormData$CROPDMGEXP[stormData$CROPDMGEXP == ""] <- 1
stormData$CROPDMGEXP[stormData$CROPDMGEXP == "B"] <- 1000000000
stormData$CROPDMGEXP[stormData$CROPDMGEXP == "m"] <- 1000000
stormData$CROPDMGEXP[stormData$CROPDMGEXP == "0"] <- 1
stormData$CROPDMGEXP[stormData$CROPDMGEXP == "2"] <- 100
stormData$CROPDMGEXP[stormData$CROPDMGEXP == "?"] <- 0
#Now I will multiply the damage values by their respective exponent values for a proper comparison of these values. Then we will remove the exponent columns from the data frame, since they are no longer needed.
stormData$PROPDMG <- stormData$PROPDMG * as.numeric(stormData$PROPDMGEXP)
stormData$CROPDMG <- stormData$CROPDMG * as.numeric(stormData$CROPDMGEXP)
stormData <- stormData[,-c(5,7)]
Now that the property and crop damage values can be compared properly, I will add the two damage values together so that there is a total damage value for each observation. I will store these values in a new column called DAMAGE and delete the two individual value columns (PROPDMG and CROPDMG).
stormData$DAMAGE <- stormData$PROPDMG + stormData$CROPDMG
stormData <- stormData[,-c(4,5)]
Our data frame now only has four variables, which we will use in the analysis ahead:
There are two measures available in this data for evaluating the harm that weather events have on population health: fatalities and injuries; this analysis will take a look at both. As a measure of harm, this analysis will look at the sums of these measures by event type; which should indicate the net harm done by an event type over time.
First, this analysis looks at the top 8 most deadly event types using a type’s net number of fatalities.
fatalitiesByType <- sort(tapply(stormData$FATALITIES, stormData$EVTYPE, sum), decreasing = TRUE)[1:8]
barplot(fatalitiesByType, cex.names = .45, col = 1:8,
xlab = "Event Type", ylab = "Net Number of Fatalities",
main = "Most deadly event types by net number of fatalities")
Now, this analysis looks at the top 8 event types with the largest number of injuries.
injuriesByType <- sort(tapply(stormData$INJURIES, stormData$EVTYPE, sum), decreasing = TRUE)[1:8]
barplot(injuriesByType, cex.names = .45, col = 1:8,
xlab = "Event Type", ylab = "Net Number of Injuries",
main = "Most deadly event types by net number of injuries")
Besides identifying the event types that cause the greatest harm to the population, as measured by fatalities and injuries, this analysis will look at the events types that cause the greatest economic damage. Similarly as for harm to the population, economic damage by type will be measured by looking at the sum of the damage for each event type.
damageByType <- sort(tapply(stormData$DAMAGE, stormData$EVTYPE, sum), decreasing = TRUE)[1:8]
damageByType <- damageByType / 1000000000 #converts damage into billions of dollars
barplot(damageByType, cex.names = .45, col = 1:8,
xlab = "Event Type", ylab = "Net Economic Damage (in billions of dollars)", main = "Most costly event types by damage (property + crop)")
Before briefly describing the results, it should be noted that using the sums of fatalities, injuries and economic damage by event type as accurate indicators assumes events that fall under the types of categories analyzed here are generally recorded and included in this data; this analysis assumes that to be the case.
The first plot (most harmful event types by fatalities) indicates that tornadoes, excessive heat and flash floods are the deadliest event types in the U.S., in that order. The second plot (most harmful event types by injuries) shows that tornadoes, “TSTM WIND” and floods are the weather events that cause the most injuries, in that order. Thus, the data indicates that tornadoes cause the most harm to the U.S. population among weather event types, as measured by both fatalities and injuries.
The third and last plot (most costly event types by property + crop damage) indicates that floods cause the greatest economic damage of all weather events in the U.S. Further, hurricanes and tornadoes are indicated to be second and third greatest causes of economic damage among weather events, respectively.