Riley Matsuda

Synopsis

Natural disasters are among the most destructive events known to mankind. The worst of these disasters are capable of causing widespread injury and/or death, and billions of dollars in infrastructure damage. The U.S. National Oceanic and Atmospheric Administration gathers and stores data in their storm database whenever such an event occurs. This report uses their data from 1950 to 2011 to determine which types of events are, on average, the most costly to population health (injuries and fatalities) and to the economy (property and crop damage). We found that tornadoes were both the most fatal event (25 deaths per occurrence) and the event that caused the highest average property damages (about 1.5 billion USD per occurrence). Heat waves led to the highest injury rates (about 70 injuries per occurrence), while ice storms caused the greatest crop damages (about 250 million USD per occurrence).

Data Processing: Loading and Processing the Raw Data

Data from the U.S. National Oceanic and Atmospheric Administration’s storm database is used in this analysis. The data includes readings from the years 1950 through 2011. A copy can be obtained here (Clicking the link will download a ~50 MB bz2 compressed file!)

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
"stormdata.csv.bz2")
data <- read.csv("stormdata.csv.bz2")

We then perform a few checks on the dataset to see how it is organized.

dim(data)
## [1] 902297     37
names(data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Using the dplyr package, we select the desired columns measuring impacts on public health (injuries and fatalities) and the economy (property and crop damages). We also select for the event type, since we will be comparing the impacts of each type of disaster.

library(dplyr)
data <- select(data, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, 
               CROPDMG, CROPDMGEXP)
head(data)
dim(data)
## [1] 902297      7

We can then use aggregate to find the types of events that cause, on average, the most injuries and the most fatalities per event. We will focus on the top 10 events that have the highest average injury and fatality rates.

injuries <- aggregate(data$INJURIES, list(data$EVTYPE), mean)
names(injuries) <- c("event_type", "count")
fatalities <- aggregate(data$FATALITIES, list(data$EVTYPE), mean)
names(fatalities) <- c("event_type", "count")
dim(injuries)
## [1] 985   2
injuries <- injuries[order(-injuries$count), ]
fatalities <- fatalities[order(-fatalities$count), ]

topinj <- injuries[1:10, ]
topfatal <- fatalities[1:10, ]
topinj
topfatal

From the dimension of injuries, we can see that there are 985 different types of events listed. Therefore, the top 10 most dangerous events correspond roughly to the top 1%.

We will now perform a similar analysis on the economic impacts of each type. We must first take the additional step of analyzing the PROPDMGEXP and CROPDMGEXP attributes, which tell us whether the reported damages are in thousands (k/K), millions (m/M), or billions (B) of dollars. We will assume that any events with an average below the thousands of dollars range will not fall in the top 10, and will thus ignore the other symbols in PROPDMGEXP and CROPDMGEXP.

The following calculates the total property and crop damages per event.

prop_k <- toupper(data$PROPDMGEXP) == "K"
prop_m <- toupper(data$PROPDMGEXP) == "M"
prop_b <- toupper(data$PROPDMGEXP) == "B"
data <- mutate(data, PROPDMGADJ = data$PROPDMG*prop_k*10^3 + 
                        data$PROPDMG*prop_m*10^6 + data$PROPDMG*prop_b*10^9)

crop_k <- toupper(data$CROPDMGEXP) == "K"
crop_m <- toupper(data$CROPDMGEXP) == "M"
crop_b <- toupper(data$CROPDMGEXP) == "B"
data <- mutate(data, CROPDMGADJ = data$CROPDMG*crop_k*10^3 + 
                        data$PROPDMG*crop_m*10^6 + data$PROPDMG*crop_b*10^9)

Now that the damages have been adjusted, we will aggregate the damages by event type and find the 10 events that caused the highest average damages.

prop <- aggregate(data$PROPDMGADJ, list(data$EVTYPE), mean)
names(prop) <- c("event_type", "damage")
crop <- aggregate(data$CROPDMGADJ, list(data$EVTYPE), mean)
names(crop) <- c("event_type", "damage")

prop <- prop[order(-prop$damage), ]
crop <- crop[order(-crop$damage), ]

topprop <- prop[1:10, ]
topcrop <- crop[1:10, ]
topprop
topcrop

The first few rows of topprop and topcrop indicate that the most destructive events caused average property damages in the billions, and average crop damages in the hundreds of millions. We will adjust the damages to units of millions of U.S. dollars.

topprop <- mutate(topprop, damage = damage/10^6)
topcrop <- mutate(topcrop, damage = damage/10^6)
head(topprop, 2)
head(topcrop, 2)

Results

We will now plot the events with the highest average injury and fatality rates using the ggplot2 plotting system. The event types are set as factors to prevent ggplot from sorting the labels alphabetically by the event type.

library(ggplot2)
topinj$event_type <- factor(topinj$event_type, levels = topinj$event_type)
topfatal$event_type <- factor(topfatal$event_type, levels = topfatal$event_type)
g_inj <- ggplot(topinj, aes(topinj$event_type, topinj$count, fill = topinj$event_type)) 
g_inj <- g_inj + geom_bar(stat = "identity")
g_inj <- g_inj + scale_fill_brewer(palette = "RdYlGn")
g_inj <- g_inj + labs(fill = "Event Type") + ylab("Injuries Per Event")
g_inj <- g_inj + ggtitle("Highest Injury Rate Per Event") + 
        theme(plot.title = element_text(hjust = 0.5))
g_inj <- g_inj + theme(axis.text.x = element_blank(), axis.title.x = element_blank(),
                       axis.ticks.x = element_blank())

g_fatal <- ggplot(topfatal, aes(topfatal$event_type, topfatal$count, fill = topfatal$event_type)) 
g_fatal <- g_fatal + geom_bar(stat = "identity")
g_fatal <- g_fatal + scale_fill_brewer(palette = "RdYlGn")
g_fatal <- g_fatal + labs(fill = "Event Type") + ylab("Fatalities Per Event")
g_fatal <- g_fatal + ggtitle("Highest Fatality Rate Per Event") + 
        theme(plot.title = element_text(hjust = 0.5))
g_fatal <- g_fatal + theme(axis.text.x = element_blank(), axis.title.x = element_blank(),
                       axis.ticks.x = element_blank())
library(gridExtra)
grid.arrange(g_inj, g_fatal, nrow = 2, ncol = 1)

We can see that heat waves had the highest average injury rate (about 70 per event), while tornadoes had by far the highest fatality rates (about 25 per event. Note that the events displayed in these charts with the “lowest” injury and fatality rates were still the 10th most dangerous among all types of events.

We can now plot the data similarly to how we did with the injury and fatality rates.

topprop$event_type <- factor(topprop$event_type, levels = topprop$event_type)
topcrop$event_type <- factor(topcrop$event_type, levels = topcrop$event_type)
g_prop <- ggplot(topprop, aes(topprop$event_type, topprop$damage, fill = topprop$event_type)) 
g_prop <- g_prop + geom_bar(stat = "identity")
g_prop <- g_prop + scale_fill_brewer(palette = "RdYlGn")
g_prop <- g_prop + labs(fill = "Event Type") + ylab("Damage Per Event (Mil. USD)")
g_prop <- g_prop + ggtitle("Greatest Property Damage Per Event") + 
        theme(plot.title = element_text(hjust = 0.5))
g_prop <- g_prop + theme(axis.text.x = element_blank(), axis.title.x = element_blank(),
                       axis.ticks.x = element_blank())

g_crop <- ggplot(topcrop, aes(topcrop$event_type, topcrop$damage, fill = topcrop$event_type)) 
g_crop <- g_crop + geom_bar(stat = "identity")
g_crop <- g_crop + scale_fill_brewer(palette = "RdYlGn")
g_crop <- g_crop + labs(fill = "Event Type") + ylab("Damage Per Event (Mil. USD)")
g_crop <- g_crop + ggtitle("Greatest Crop Damage Per Event") + 
        theme(plot.title = element_text(hjust = 0.5))
g_crop <- g_crop + theme(axis.text.x = element_blank(), axis.title.x = element_blank(),
                       axis.ticks.x = element_blank())

grid.arrange(g_prop, g_crop, nrow = 2, ncol = 1)

Tornadoes again appear at the top of the list. They caused the greatest average property damage per occurrence (about 1.5 billion USD). Heavy rains came in at a close second with roughly 1.25 billion USD in damages per occurrence. When it comes to crop damage, however, ice storms were the most costly events, causing about 250 million USD in damages per occurrence. The singular event Tropical Storm Jerry came in at a close second, causing roughly 200 million USD in crop damages.