Synopsis

This following data analysis involves exploring the U.S. National Ocenic and Atmospheric Administration (NOAA) Storm Database in order to determine two major aspects.

  1. Which types of events are most harmful with respect to population health?
  2. Which types of events have the greatest economic impacts?

Bases on the analysis, the major finding is that tornadoes have the most impact on overall population health. Additionally, the results show that floods and drought cause the most economic damage across the United States.

Data Processing

Load required R libraries:

library(ggplot2)
library(grid)
library(gridExtra)
library(dplyr)

We first download and read the NOAA Storm data. (The documentation can be found in National Weather Service Storm Data Documentation)

download.file(url="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
              destfile="repdata-data-StormData.csv.bz2", method="curl")
rawData <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))

After reading the NOAA Storm data, we display the first few rows in this dataset.

dim(rawData)
## [1] 902297     37
head(rawData)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

Then, we subset only attributes that we will use in this analysis.
EVTYPE - Event Types
INJURIES - Count of injuries
FATALITIES - Count of weather-related fatalities
PROPDMG,PROPDMGEXP - Property damage in USD
CROPDMG,CROPDMGEXP - Crop damage in USD

stormDf <- rawData[, c('EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG', 
                       'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')]

Because some attributes, such as PROPDMGEXP, CROPDMGEXP, contains coded symbols of magnitude, we will convert this symbols to estimated damage in dollars.

Calculate Property Damage:

unique(stormDf$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
#Convert symbols of magnitude
stormDf$PROPDMGEXP <- as.character(stormDf$PROPDMGEXP)
stormDf$PROPDMGEXP <- gsub('B', 9, stormDf$PROPDMGEXP, ignore.case = TRUE)
stormDf$PROPDMGEXP <- gsub('M', 6, stormDf$PROPDMGEXP, ignore.case = TRUE)
stormDf$PROPDMGEXP <- gsub('K', 3, stormDf$PROPDMGEXP, ignore.case = TRUE)
stormDf$PROPDMGEXP <- gsub('H', 2, stormDf$PROPDMGEXP, ignore.case = TRUE)
stormDf$PROPDMGEXP <- gsub('\\-|\\+|\\?', 0, stormDf$PROPDMGEXP)
stormDf$PROPDMGEXP <- as.numeric(stormDf$PROPDMGEXP)
stormDf$PROPDMGEXP[is.na(stormDf$PROPDMGEXP)] <- 0

#Calculate property damage in USD
stormDf$PROPDMGVAL <- stormDf$PROPDMG * 10^stormDf$PROPDMGEXP

Calculate Crop Damage:

unique(stormDf$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M
#Convert symbols of magnitude
stormDf$CROPDMGEXP <- as.character(stormDf$CROPDMGEXP)
stormDf$CROPDMGEXP <- gsub('B', 9, stormDf$CROPDMGEXP, ignore.case = TRUE)
stormDf$CROPDMGEXP <- gsub('M', 6, stormDf$CROPDMGEXP, ignore.case = TRUE)
stormDf$CROPDMGEXP <- gsub('K', 3, stormDf$CROPDMGEXP, ignore.case = TRUE)
stormDf$CROPDMGEXP <- gsub('H', 2, stormDf$CROPDMGEXP, ignore.case = TRUE)
stormDf$CROPDMGEXP <- gsub('\\-|\\+|\\?', 0, stormDf$CROPDMGEXP)
stormDf$CROPDMGEXP <- as.numeric(stormDf$CROPDMGEXP)
stormDf$CROPDMGEXP[is.na(stormDf$CROPDMGEXP)] <- 0

#Calculate crop damage in USD
stormDf$CROPDMGVAL <- stormDf$CROPDMG * 10^stormDf$CROPDMGEXP

Results

1. Which types of events are most harmful with respect to population health?

We will create fatalDf and injureDf data frames to find the most harmful events over a period of time.

fatalDf represents the total deaths for each event type:

fatalitiesDf <- aggregate(FATALITIES ~ EVTYPE, data=stormDf, sum)
fatalitiesDf <- arrange(fatalitiesDf, desc(FATALITIES))

injureDf represents the total injuries for each event type:

injureDf <- aggregate(INJURIES ~ EVTYPE, data=stormDf, sum)
injureDf <- arrange(injureDf, desc(INJURIES))

We will plot the top 10 events with the highest number of fatalities and injuries.

p1 <- qplot(y=FATALITIES, x=reorder(EVTYPE, -FATALITIES), data=fatalitiesDf[1:10, ], 
            geom="bar", stat="identity",
            xlab="Event Type", ylab="Total Number of Fatalities",
            main="Fatalities") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1))

p2 <- qplot(y=INJURIES, x=reorder(EVTYPE, -INJURIES), data=injureDf[1:10, ], 
            geom="bar", stat="identity", 
            xlab="Event Type", ylab="Total number of Injuries",
            main="Injuries") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1))

grid.arrange(p1, p2, ncol=2, 
        main="Top 10 Events with The Most Harmful with Repect to Population Health")

2. Which types of events have the greatest economic impacts?

We will create propDf and cropDf data frames to identify the greatest economic impacts.

propDf represents the total property damage for each event type:

propDf <- aggregate(PROPDMGVAL ~ EVTYPE, data=stormDf, sum)
propDf <- arrange(propDf, desc(PROPDMGVAL))

cropDf represents the total crop damage for each event type:

cropDf <- aggregate(CROPDMGVAL ~ EVTYPE, data=stormDf, sum)
cropDf <- arrange(cropDf, desc(CROPDMGVAL))

Finally, we will plot the top 10 events with the greatest economic occurences.

p3 <- qplot(y=PROPDMGVAL, x=reorder(EVTYPE, -PROPDMGVAL), data=propDf[1:10, ], 
            geom="bar", stat="identity",
            xlab="Event Type", ylab="Total Damage (USD)",
            main="Property Damage") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1))

p4 <- qplot(y=CROPDMGVAL, x=reorder(EVTYPE, -CROPDMGVAL), data=cropDf[1:10, ], 
            geom="bar", stat="identity", 
            xlab="Event Type", ylab="Total Crop Damage (USD)",
            main="Crop Damage") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1))

grid.arrange(p3, p4, ncol=2, main="Top 10 Events with The Greatest Economic Impacts")

Conclusion

Ultimately, tornadoes have the highest impact on population health across the United States by causing the highest number of fatalities and injuries. In addition, Floods and drought have dealt the greatest economic damage in the United States.