Synopsis
In this report we aim to determine the type of weather events that cause the highest fatalities and injuries as well as property damages in the United States between the years 1950 and 2011. To answer these questions, we obtained storm data from the U.S. National Oceanic and Atmospheric Administration (NOAA), which tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. From these data, we found that tornado causes both the highest fatality and injury rates compared to other events. In contrary, we found flood causes the most property damage compared to other events.
Data Processing
Loading and reading the data
We review the raw data table and to get an idea of which data is relevant for this study
head(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
Analyze Fatalities and Injuries versus Events
In this case, the columns that we are interested in are columns 8, 23, and 24, which are the event type, fatality rate, and injury rate, respectively. We read each and individual column so that we can calculate the sum of fatalities and injuries for each event type.
event <- data[,8]
fatal <- data[,23]
injure <- data[,24]
fatalsum <- aggregate(fatal ~ event, FUN = sum)
injuresum <- aggregate(injure ~ event, FUN = sum)
We then sort the resulting data tables in descending order so that the events that cause the highest fatalities and injuries are on the top of the list.
fatalsum <- fatalsum[order(-fatalsum$fatal, fatalsum$event),]
injuresum <- injuresum[order(-injuresum$injure, injuresum$event),]
Analyze Property Damages versus Events
The columns that we are interested in are columns 8, 25, and 26, which are the event type, values, and magnitude, respectively. Alphabetical characters used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions. We subset these columns from the raw data and arranged them by magnitude for further processing.
propdmg <- data[,c(8,25,26)]
library(plyr)
propdmg <- arrange(propdmg, PROPDMGEXP, EVTYPE, PROPDMG)
We subset rows that are in the thousands, millions, and billions into separate tables. Then we obtained the real values (in terms of dollars) by multiplying their respective magnitude. Finally we combine them again into one table. This table will exclude those without values or magnitudes because we think they are not useful for this analysis.
K <- subset(propdmg, PROPDMGEXP == "K")
M <- subset(propdmg, PROPDMGEXP == "M")
B <- subset(propdmg, PROPDMGEXP == "B")
K$PROPDMG <- 1000*K$PROPDMG
M$PROPDMG <- 1000000*M$PROPDMG
B$PROPDMG <- 1000000000*B$PROPDMG
propdmgsum <- rbind(K, M, B)
From the new data table, We read each and individual column so that we can calculate the sum of property damages for each event type.
event <- propdmgsum[,1]
damage <- propdmgsum[,2]
damagesum <- aggregate(damage ~ event, FUN = sum)
We then sort the resulting data table in descending order so that the events that cause the most property damages are on the top of the list.
damagesum <- damagesum[order(-damagesum$damage, damagesum$event),]
Results
The following is the bar plot of top ten total fatalities versus event type. From this plot, we determine that the highest fatality rates are caused by tornado, followed by excessive heat and flash flood.
barplot(fatalsum[1:10,2], main="Total Fatalities vs Event (TOP TEN)", ylab="Fatalities", names.arg=fatalsum[1:10,1],las=2,cex.names=0.5,cex.axis=0.5)

The following is the bar plot of top ten total injuries versus event type. From this plot, we determine that tornado causes the highest injury rate by a wide margin. The other events in this top ten list pale in comparison.
barplot(injuresum[1:10,2], main="Total Injuries vs Event (TOP TEN)", ylab="Injuries", names.arg=injuresum[1:10,1],las=2,cex.names=0.5,cex.axis=0.5)

The following is the bar plot of top ten total property damages versus event type. From this plot, we determine that the most expensive property damages are caused by flood, followed by hurricane/typhoon and tornado.
barplot(damagesum[1:10,2], main="Total Property Damage vs Event (TOP TEN)", ylab="Property Damage", names.arg=damagesum[1:10,1],las=2,cex.names=0.5,cex.axis=0.5)
