1. Synopsis

Severe weather is a serious concern for many reasons, but in this analysis, we use data collected about it between 1950 and 2011 to examine its impact on public health and economic concerns. This project makes use of data obtained from NOAA’s database, which tracks many characteristics of storms and weather events in the United States.

Specifically, through the course of this research, we seek to determine which weather events are the most harmful to human populations and which have the greatest economic impact.

This represents the final project in Coursera’s Reproducible Research class.

2. Data Processing

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. It is available on the course website through this link, link

Documentation is available through:

holdStormData <- "./repdata-data-StormData.csv.bz2"
if (!file.exists(holdStormData))
{
  url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  download.file(url = url, destfile = holdStormData)
}
stormData <- read.csv("repdata-data-StormData.csv.bz2")

# Let's load the packages we will need.

library(ggplot2)
library(plyr)

# Let's get a feel for our data

head(stormData)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
# While we're busy preparing, we will not be using all of the data in the set. Since we have now seen all the variables, we can create a subset of the ones we will find useful.

subStormData <- stormData[ , c('EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')]

Now, we can focus on health and economic concerns.

Health

First, we impose some order on the weather-related fatalities and injuries.

aggregateFatalities <- aggregate(FATALITIES ~ EVTYPE, data = subStormData, FUN = sum)
fatalities <- aggregateFatalities[order(-aggregateFatalities$FATALITIES),][1:10,]
fatalities$EVTYPE <- factor(fatalities$EVTYPE, levels = fatalities$EVTYPE)

aggregateInjuries <- aggregate(INJURIES ~ EVTYPE, data = subStormData, FUN = sum)
injuries <- aggregateInjuries[order(-aggregateInjuries$INJURIES),][1:10,]
injuries$EVTYPE <- factor(injuries$EVTYPE, levels = injuries$EVTYPE)

Economics

To process the data that relate to economic concerns, we must create conversions, since the table uses a column containing conversion variables, h = hundred, k = thousand, m = million, b = billion.

subStormData$PROPEXP[subStormData$PROPDMGEXP == ""] <- 1
subStormData$PROPEXP[subStormData$PROPDMGEXP == "K"] <- 1000
subStormData$PROPEXP[subStormData$PROPDMGEXP == "M"] <- 1000000
subStormData$PROPEXP[subStormData$PROPDMGEXP == "B"] <- 1000000000
subStormData$PROPDMGVAL <- subStormData$PROPDMG * subStormData$PROPEXP


subStormData$CROPEXP[subStormData$CROPDMGEXP == ""] <- 1
subStormData$CROPEXP[subStormData$CROPDMGEXP == "K"] <- 1000
subStormData$CROPEXP[subStormData$CROPDMGEXP == "M"] <- 1000000
subStormData$CROPEXP[subStormData$CROPDMGEXP == "B"] <- 1000000000
subStormData$CROPDMGVAL <- subStormData$CROPDMG * subStormData$CROPEXP

Next, now that the playing field is even, we can combine the pools of property and crop damage to determine which events overall have the greatest economic impact.

totalPropertyDamage <- aggregate(PROPDMGVAL ~ EVTYPE, data = subStormData, FUN = sum)
totalCropDamage <- aggregate(CROPDMGVAL ~ EVTYPE, data = subStormData, FUN = sum)

# Let's merge the property and crop damage values to eventually produce an assessment of total damages.

propertyandcropdamage <- merge(totalPropertyDamage, totalCropDamage, all = T)
propertyandcropdamage <- mutate(propertyandcropdamage, TOTALDMGVAL = PROPDMGVAL + CROPDMGVAL)
totalDamage <- propertyandcropdamage[order(-propertyandcropdamage$TOTALDMGVAL), ][1:10, ]
totalDamage$EVTYPE <- factor(totalDamage$EVTYPE, levels = totalDamage$EVTYPE)

3. Results

We now have processed data which we can present in graphical form to shed light on the questions driving this analysis.

Health

Across the United States, which types of events are most harmful with respect to population health?

ggplot(fatalities, aes(x = EVTYPE, y = FATALITIES)) + 
  geom_bar(stat = "identity", fill = "black") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  xlab("EVENTS") + ylab("FATALITIES") +
  ggtitle("Number of Fatalities by Top 10 Weather Events")

Here, for the sake of comparison analysis, we have plotted the top 10 most impactful weather events by the volume of their fatalities over the span of the data collection. By far, tornados have killed the greatest number of people.

ggplot(injuries, aes(x = EVTYPE, y = INJURIES)) + 
  geom_bar(stat = "identity", fill = "purple") + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
  xlab("EVENTS") + ylab("INJURIES") + 
  ggtitle("Number of Injuries by Top 10 Weather Events")

Also for comparison analysis, we have plotted the top 10 most impactful weather events by their injuries caused over the span of the data collection. By far, tornados have hurt the greatest number of people.

Economics

Across the United States, which types of events have the greatest economic consequences?

ggplot(totalDamage, aes(x = EVTYPE, y = TOTALDMGVAL)) + 
    geom_bar(stat = "identity", fill = "green") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("EVENTS") + ylab("DAMAGES IN US DOLLARS") +
  ggtitle("Property & Crop Damages by Top 10 Weather Events")

Finally, in another comparison analysis, we have plotted the top 10 most impactful weather events by the value of their property and crop damages over the span of the data collection. By far, floods have caused the most combined damage.

Let’s summarize our key findings: 1. Tornados cause the most fatalities. 2. Tornados cause the most injuries. 3. Flooding causes the most combined property and crop damage.