Synopsis

Severe weather storms can cause significant population health and economic damage. We will use data from the NOAA Storm Database and answer two important questions. First, which types of events are most harmful with respect to population health? Second, which types of events have the greatest economic consequences? We answer these questions by finding the top 10 types of events that contribute to these measure using the number of fatalities, the number of injuries, and the property damage. We found that tornadoes were the most harmful to population health, and tornadoes and thunderstorm winds had the greatest economic consequences.

Data Processing

Loading the data, with the assumption that the data is already downloaded and in the working directory.

data <- read.csv(bzfile("C:/Users/stefa/OneDrive/Desktop/Data Science/Reproducible Research/Course Project 2/repdata_data_StormData.csv.bz2"), header = TRUE)

Since we don’t need all of the variables, we will reduce the data to what we need.

storm_data <- data[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "CROPDMG")]

Next we will rename a few of the event names so they are easier to understand.

storm_data$EVTYPE <- gsub("^HEAT$", "EXCESSIVE HEAT", storm_data$EVTYPE)
storm_data$EVTYPE <- gsub("^TSTM WIND$", "THUNDERSTORM WIND", storm_data$EVTYPE)
storm_data$EVTYPE <- gsub("^THUNDERSTORM WIND$", "THUNDERSTORM WINDS", storm_data$EVTYPE)

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

There are two variables that make useful indicators of population health, fatalities and injuries. We will look at which events cause the greatest number.

First we find which events are the top 10 causes of fatalities.

fatalities_data <- aggregate(storm_data$FATALITIES, by=list(storm_data$EVTYPE), FUN=sum, na.rm=TRUE)
colnames(fatalities_data) = c("event.type", "fatality.total")
fatalities_ordered <- fatalities_data[order(-fatalities_data$fatality.total),]
top_fatalities <- fatalities_ordered[1:10,]
top_fatalities$event.type <- factor(top_fatalities$event.type, levels=top_fatalities$event.type, ordered = TRUE)

Next we find which events are the top 10 causes of injuries.

injuries_data <- aggregate(storm_data$INJURIES, by=list(storm_data$EVTYPE), FUN=sum, na.rm=TRUE)
colnames(injuries_data) = c("event.type", "injury.total")
injuries_ordered <- injuries_data[order(-injuries_data$injury.total),]
top_injuries <- injuries_ordered[1:10,]
top_injuries$event.type <- factor(top_injuries$event.type, levels=top_injuries$event.type, ordered = TRUE)

Across the United States, which types of events have the greatest economic consequences?

The most useful variable as an indicator of economic consequences is property damage. We will find the events that are the top 10 causes of property damage.

propdmg_data <- aggregate(storm_data$PROPDMG, by=list(storm_data$EVTYPE), FUN=sum, na.rm=TRUE)
colnames(propdmg_data) = c("event.type", "prop.dmg.total")
propdmg_ordered <- propdmg_data[order(-propdmg_data$prop.dmg.total),]
top_propdmg <- propdmg_ordered[1:10,]
top_propdmg$event.type <- factor(top_propdmg$event.type, levels=top_propdmg$event.type, ordered = TRUE)

Results

Now we will graph the top 10 causes of fatalities, injuries, and property damage.

library(ggplot2)
ggplot(data=top_fatalities, aes(x=event.type, y=fatality.total)) + geom_bar(stat="identity", fill = "#CC79A7", colour = "Black") + xlab("Event Type") + ylab("Total Fatalities") + ggtitle("Fatalities By Event Type") + theme(axis.text.x = element_text(angle = 45, hjust = 1))

We can see that tornadoes are the most significant cause to the number of fatalities, with extensive heat as the second most significant. After that, the types of events are close together in the number of fatatalities they cause.

ggplot(data=top_injuries, aes(x=event.type, y=injury.total)) + geom_bar(stat="identity", fill = "#E69F00", colour = "Black") + xlab("Event Type") + ylab("Total Injuries") + ggtitle("Injuries By Event Type") + theme(axis.text.x = element_text(angle = 45, hjust = 1))

We can see that tornadoes are the most significant cause to the number of injuries, with no other type of event close to it.

ggplot(data=top_propdmg, aes(x=event.type, y=prop.dmg.total)) + geom_bar(stat="identity", fill = "#009E73", colour = "Black") + xlab("Event Type") + ylab("Total Property Damage") + ggtitle("Properties Damaged By Event Type") + theme(axis.text.x = element_text(angle = 45, hjust = 1))

Tornadoes are the most harmful for economic consequences, with thunderstorm winds second most, and flash floods third most. Property damage is more spread out among the event types than either the number of fatalities or injuries.