Synopsis

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. The database will be used to explore which types of events are most harmful with respect to population health, and which types of events have the greatest economic consequences. Coding in R is done to support your analysis.

Data Processing

Load the necessary R packages (assuming that they are installed on the local machine).

library(ggplot2)
library(plyr)

Unzip the folder and read in the raw storm data.

if (!file.exists("./repdata%2Fdata%2FStormData.csv.bz2")) {
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                destfile = "./repdata%2Fdata%2FStormData.csv.bz2")
}
stormData <- read.csv("./repdata%2Fdata%2FStormData.csv.bz2")

After reading in the data, check the first few rows and fields of the dataset.

dim(stormData)
## [1] 902297     37
head(stormData)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

I only need to explore the data that describes how harmful individual severe weather events are with respect to population health and the types of events that have the greatest economic consequences. So, I will segment the data based on that.

neccStormData <- stormData[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

Since harmfulness encompasses injuries and fatalities, then I will create two new data tables. One will be for injuries (dfInjuries), and the other will be for fatalities (dfFatalities). For both tables, I start off by getting the total fatalities or injuries per event type with more than 0 fatalities or injuries, respectively. Then, I sort the data frames in descending order.

# New Table for Injuries
dfFatalities <- aggregate(FATALITIES ~ EVTYPE, data = neccStormData, FUN=sum, 
                          subset = as.numeric(FATALITIES) > 0)
dfFatalities$FATALITIES <- as.numeric(dfFatalities$FATALITIES)
topFatalities <- dfFatalities[order(dfFatalities$FATALITIES, decreasing = TRUE), ]

# New Table for Fatalities
dfInjuries <- aggregate(INJURIES ~ EVTYPE, data = neccStormData, FUN=sum,
                        subset = as.numeric(INJURIES) > 0)
dfInjuries$INJURIES <- as.numeric(dfInjuries$INJURIES)
topInjuries <- dfInjuries[order(dfInjuries$INJURIES, decreasing = TRUE), ]

A new field (propDamage) must be created that multiplies the property damage base number (PROPDMG) by the exponent (PRODPDMGEXP) to get the actual costs of the property damage. Likewise, a new field (cropDamage) must be created that multiples the crop damage base number (CROPDMG) by the exponent (CROPDMGEXP) to get the actual costs of the crop damage.

# Convert H = 100, K = 1000, M = 1,000,000, and B=1,000,000,000 for Property Damage 
neccStormData$propDamage <- 0
neccStormData[neccStormData$PROPDMGEXP == "H", ]$propDamage <- 
  neccStormData[neccStormData$PROPDMGEXP == "H", ]$PROPDMG * 10^2

neccStormData[neccStormData$PROPDMGEXP == "K", ]$propDamage <- 
  neccStormData[neccStormData$PROPDMGEXP == "K", ]$PROPDMG * 10^3

neccStormData[neccStormData$PROPDMGEXP == "M", ]$propDamage <- 
  neccStormData[neccStormData$PROPDMGEXP == "M", ]$PROPDMG * 10^6

neccStormData[neccStormData$PROPDMGEXP == "B", ]$propDamage <- 
  neccStormData[neccStormData$PROPDMGEXP == "B", ]$PROPDMG * 10^9

# Converting the HH = 100, K = 1000, M = 1,000,000, and B=1,000,000,000 for Crop Damage
neccStormData$cropDamage <- 0
neccStormData[neccStormData$CROPDMGEXP == "H", ]$cropDamage <- 
  neccStormData[neccStormData$CROPDMGEXP == "H", ]$CROPDMG * 10^2

neccStormData[neccStormData$CROPDMGEXP == "K", ]$cropDamage <- 
  neccStormData[neccStormData$CROPDMGEXP == "K", ]$CROPDMG * 10^3

neccStormData[neccStormData$CROPDMGEXP == "M", ]$cropDamage <- 
  neccStormData[neccStormData$CROPDMGEXP == "M", ]$CROPDMG * 10^6

neccStormData[neccStormData$CROPDMGEXP == "B", ]$cropDamage <- 
  neccStormData[neccStormData$CROPDMGEXP == "B", ]$CROPDMG * 10^9

Now I will create two new data tables similar to the ones created above for injuries and fatalities. One will be for cropDamage (dfCropDamage), and the other will be for property damage (dfPropDamage).

# New Table for Crop Damage
dfCropDamage <- aggregate(cropDamage ~ EVTYPE, data = neccStormData, FUN=sum, 
                          subset = as.numeric(cropDamage) > 0)
dfCropDamage$cropDamage <- as.numeric(dfCropDamage$cropDamage)
topCropDamage <- dfCropDamage[order(dfCropDamage$cropDamage, decreasing = TRUE), ]

# New Table for Property Damage
dfPropDamage <- aggregate(propDamage ~ EVTYPE, data = neccStormData, FUN=sum,
                        subset = as.numeric(propDamage) > 0)
dfPropDamage$propDamage <- as.numeric(dfPropDamage$propDamage)
topPropDamage <- dfPropDamage[order(dfPropDamage$propDamage, decreasing = TRUE), ]

Results

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

As shown below, the single weather event that caused the most injuries and fatalities is the TORNADO.

ggplot(data = head(topInjuries, 10), aes(x = reorder(EVTYPE, INJURIES), y = INJURIES)) +
    geom_bar(fill="blue", stat="identity")  + 
    coord_flip() + 
    ylab("Total number of injuries") + xlab("Event type") +
    theme(legend.position="none") +
    ggtitle("Top 10 Weather Events Causing the Most Injuries in the U.S.")

ggplot(data = head(topFatalities, 10), aes(x = reorder(EVTYPE, FATALITIES), y = FATALITIES)) +
    geom_bar(fill="red", stat="identity") + 
    coord_flip() +
    ylab("Total number of fatalities") + xlab("Event type") +
    theme(legend.position="none") +
    ggtitle("Top 10 Weather Events Causing the Most Fatalities in the U.S.")

Across the United States, which types of events have the greatest economic consequences?

As shown below, the weather event that caused the most crop damage is DROUGHT, and the weather event that caused the most property damage is the FLOOD.

ggplot(data = head(topCropDamage, 10), aes(x = reorder(EVTYPE, cropDamage), y = cropDamage)) +
    geom_bar(fill="blue", stat="identity")  + 
    coord_flip() + 
    ylab("Crop Damage Costs ($)") + xlab("Event type") +
    theme(legend.position="none") +
    ggtitle("Top 10 Weather Events Causing the Most Crop Damage in the U.S.")

ggplot(data = head(topPropDamage, 10), aes(x = reorder(EVTYPE, propDamage), y = propDamage)) +
    geom_bar(fill="red", stat="identity") + 
    coord_flip() +
    ylab("Property Damage Costs ($)") + xlab("Event type") +
    theme(legend.position="none") +
    ggtitle("Top 10 Weather Events Causing the Most Property Damage in the U.S.")