Synopsis

In this report we aim to determine the type of weather events that cause the highest fatalities and injuries as well as property damages in the United States between the years 1950 and 2011. To answer these questions, we obtained storm data from the U.S. National Oceanic and Atmospheric Administration (NOAA), which tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. From these data, we found that tornado causes both the highest fatality and injury rates compared to other events. In contrary, we found flood causes the most property damage compared to other events.

Data Processing

Loading and reading the data

We first download the zip file and then unzip the file into a designated folder in order to obtain the raw data in .csv format. We then set the designated folder as the working directory and read the file into a raw data frame called “data”.
setwd("~/R/storm")
data <- read.csv("repdata-data-StormData.csv")
We review the raw data table and to get an idea of which data is relevant for this study
head(data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

Analyze Fatalities and Injuries versus Events

In this case, the columns that we are interested in are columns 8, 23, and 24, which are the event type, fatality rate, and injury rate, respectively. We read each and individual column so that we can calculate the sum of fatalities and injuries for each event type.
event <- data[,8]
fatal <- data[,23]
injure <- data[,24]
fatalsum <- aggregate(fatal ~ event, FUN = sum)
injuresum <- aggregate(injure ~ event, FUN = sum)
We then sort the resulting data tables in descending order so that the events that cause the highest fatalities and injuries are on the top of the list.
fatalsum <- fatalsum[order(-fatalsum$fatal, fatalsum$event),]
injuresum <- injuresum[order(-injuresum$injure, injuresum$event),]

Analyze Property Damages versus Events

The columns that we are interested in are columns 8, 25, and 26, which are the event type, values, and magnitude, respectively. Alphabetical characters used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions. We subset these columns from the raw data and arranged them by magnitude for further processing.
propdmg <- data[,c(8,25,26)]
library(plyr)
propdmg <- arrange(propdmg, PROPDMGEXP, EVTYPE, PROPDMG)
We subset rows that are in the thousands, millions, and billions into separate tables. Then we obtained the real values (in terms of dollars) by multiplying their respective magnitude. Finally we combine them again into one table. This table will exclude those without values or magnitudes because we think they are not useful for this analysis.
K <- subset(propdmg, PROPDMGEXP == "K")
M <- subset(propdmg, PROPDMGEXP == "M")
B <- subset(propdmg, PROPDMGEXP == "B")
K$PROPDMG <- 1000*K$PROPDMG
M$PROPDMG <- 1000000*M$PROPDMG
B$PROPDMG <- 1000000000*B$PROPDMG
propdmgsum <- rbind(K, M, B)
From the new data table, We read each and individual column so that we can calculate the sum of property damages for each event type.
event <- propdmgsum[,1]
damage <- propdmgsum[,2]
damagesum <- aggregate(damage ~ event, FUN = sum)
We then sort the resulting data table in descending order so that the events that cause the most property damages are on the top of the list.
damagesum <- damagesum[order(-damagesum$damage, damagesum$event),]

Results

The following is the bar plot of top ten total fatalities versus event type. From this plot, we determine that the highest fatality rates are caused by tornado, followed by excessive heat and flash flood.
barplot(fatalsum[1:10,2], main="Total Fatalities vs Event (TOP TEN)", ylab="Fatalities", names.arg=fatalsum[1:10,1],las=2,cex.names=0.5,cex.axis=0.5)

The following is the bar plot of top ten total injuries versus event type. From this plot, we determine that tornado causes the highest injury rate by a wide margin. The other events in this top ten list pale in comparison.
barplot(injuresum[1:10,2], main="Total Injuries vs Event (TOP TEN)", ylab="Injuries", names.arg=injuresum[1:10,1],las=2,cex.names=0.5,cex.axis=0.5)

The following is the bar plot of top ten total property damages versus event type. From this plot, we determine that the most expensive property damages are caused by flood, followed by hurricane/typhoon and tornado.
barplot(damagesum[1:10,2], main="Total Property Damage vs Event (TOP TEN)", ylab="Property Damage", names.arg=damagesum[1:10,1],las=2,cex.names=0.5,cex.axis=0.5)