Severe weather is a serious concern for many reasons, but in this analysis, we use data collected about it between 1950 and 2011 to examine its impact on public health and economic concerns. This project makes use of data obtained from NOAA’s database, which tracks many characteristics of storms and weather events in the United States.
Specifically, through the course of this research, we seek to determine which weather events are the most harmful to human populations and which have the greatest economic impact.
This represents the final project in Coursera’s Reproducible Research class.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. It is available on the course website through this link, link
Documentation is available through:
holdStormData <- "./repdata-data-StormData.csv.bz2"
if (!file.exists(holdStormData))
{
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url = url, destfile = holdStormData)
}
stormData <- read.csv("repdata-data-StormData.csv.bz2")
# Let's load the packages we will need.
library(ggplot2)
library(plyr)
# Let's get a feel for our data
head(stormData)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
# While we're busy preparing, we will not be using all of the data in the set. Since we have now seen all the variables, we can create a subset of the ones we will find useful.
subStormData <- stormData[ , c('EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')]
Now, we can focus on health and economic concerns.
First, we impose some order on the weather-related fatalities and injuries.
aggregateFatalities <- aggregate(FATALITIES ~ EVTYPE, data = subStormData, FUN = sum)
fatalities <- aggregateFatalities[order(-aggregateFatalities$FATALITIES),][1:10,]
fatalities$EVTYPE <- factor(fatalities$EVTYPE, levels = fatalities$EVTYPE)
aggregateInjuries <- aggregate(INJURIES ~ EVTYPE, data = subStormData, FUN = sum)
injuries <- aggregateInjuries[order(-aggregateInjuries$INJURIES),][1:10,]
injuries$EVTYPE <- factor(injuries$EVTYPE, levels = injuries$EVTYPE)
To process the data that relate to economic concerns, we must create conversions, since the table uses a column containing conversion variables, h = hundred, k = thousand, m = million, b = billion.
subStormData$PROPEXP[subStormData$PROPDMGEXP == ""] <- 1
subStormData$PROPEXP[subStormData$PROPDMGEXP == "K"] <- 1000
subStormData$PROPEXP[subStormData$PROPDMGEXP == "M"] <- 1000000
subStormData$PROPEXP[subStormData$PROPDMGEXP == "B"] <- 1000000000
subStormData$PROPDMGVAL <- subStormData$PROPDMG * subStormData$PROPEXP
subStormData$CROPEXP[subStormData$CROPDMGEXP == ""] <- 1
subStormData$CROPEXP[subStormData$CROPDMGEXP == "K"] <- 1000
subStormData$CROPEXP[subStormData$CROPDMGEXP == "M"] <- 1000000
subStormData$CROPEXP[subStormData$CROPDMGEXP == "B"] <- 1000000000
subStormData$CROPDMGVAL <- subStormData$CROPDMG * subStormData$CROPEXP
Next, now that the playing field is even, we can combine the pools of property and crop damage to determine which events overall have the greatest economic impact.
totalPropertyDamage <- aggregate(PROPDMGVAL ~ EVTYPE, data = subStormData, FUN = sum)
totalCropDamage <- aggregate(CROPDMGVAL ~ EVTYPE, data = subStormData, FUN = sum)
# Let's merge the property and crop damage values to eventually produce an assessment of total damages.
propertyandcropdamage <- merge(totalPropertyDamage, totalCropDamage, all = T)
propertyandcropdamage <- mutate(propertyandcropdamage, TOTALDMGVAL = PROPDMGVAL + CROPDMGVAL)
totalDamage <- propertyandcropdamage[order(-propertyandcropdamage$TOTALDMGVAL), ][1:10, ]
totalDamage$EVTYPE <- factor(totalDamage$EVTYPE, levels = totalDamage$EVTYPE)
We now have processed data which we can present in graphical form to shed light on the questions driving this analysis.
Across the United States, which types of events are most harmful with respect to population health?
ggplot(fatalities, aes(x = EVTYPE, y = FATALITIES)) +
geom_bar(stat = "identity", fill = "black") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("EVENTS") + ylab("FATALITIES") +
ggtitle("Number of Fatalities by Top 10 Weather Events")
Here, for the sake of comparison analysis, we have plotted the top 10 most impactful weather events by the volume of their fatalities over the span of the data collection. By far, tornados have killed the greatest number of people.
ggplot(injuries, aes(x = EVTYPE, y = INJURIES)) +
geom_bar(stat = "identity", fill = "purple") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("EVENTS") + ylab("INJURIES") +
ggtitle("Number of Injuries by Top 10 Weather Events")
Also for comparison analysis, we have plotted the top 10 most impactful weather events by their injuries caused over the span of the data collection. By far, tornados have hurt the greatest number of people.
Across the United States, which types of events have the greatest economic consequences?
ggplot(totalDamage, aes(x = EVTYPE, y = TOTALDMGVAL)) +
geom_bar(stat = "identity", fill = "green") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("EVENTS") + ylab("DAMAGES IN US DOLLARS") +
ggtitle("Property & Crop Damages by Top 10 Weather Events")
Finally, in another comparison analysis, we have plotted the top 10 most impactful weather events by the value of their property and crop damages over the span of the data collection. By far, floods have caused the most combined damage.
Let’s summarize our key findings: 1. Tornados cause the most fatalities. 2. Tornados cause the most injuries. 3. Flooding causes the most combined property and crop damage.