This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
This research document will show that, using the data provided by NOAA, between 1950 and 2011, Flood, Typhoon, Tornado and Storm Surge caused the most Property and Crop Damage while Tornado , TSTM Wind, Flood, Lightning and Excessive Heat caused the most injuries and fatalities to the population in United States.
library(lubridate)
library(plyr)
library(reshape2)
library(data.table)
library(ggplot2)
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
infile <- "storm.csv.bz2"
download.file(fileUrl, destfile=infile, method="curl")
Once the data is downloaded, we will load the data and convert the columns for dates to date format.
DT <- as.data.table(read.csv(bzfile(infile)))
DT$BGN_DATE <- mdy_hms(DT$BGN_DATE)
DT$END_DATE <- mdy_hms(DT$END_DATE)
From the whole dataset loaded, we need to take a subset of the data which pertains to injuries and fatalities for the population. On this subset we need to calculate the total, mean and median for each type of event and consider the top 20 events with the highest total of damages.
data.set1 <-DT[,list(BGN_DATE, END_DATE, EVTYPE, FATALITIES, INJURIES)]
data.set1.melted<-melt(data.set1, id=c("BGN_DATE", "END_DATE", "EVTYPE"))
data.set1.topevents<- head(data.set1.melted[
,list(sum=sum(value),
mean=mean(value),
median=median(value)),
by=list(EVTYPE, variable)][order(-sum, -mean)], 20)
The top 20 events that caused the most damage to the population are listed below.
data.set1.topevents
## EVTYPE variable sum mean median
## 1: TORNADO INJURIES 91346 1.506067401 0
## 2: TSTM WIND INJURIES 6957 0.031631354 0
## 3: FLOOD INJURIES 6789 0.268064440 0
## 4: EXCESSIVE HEAT INJURIES 6525 3.888557807 0
## 5: TORNADO FATALITIES 5633 0.092874101 0
## 6: LIGHTNING INJURIES 5230 0.331979180 0
## 7: HEAT INJURIES 2100 2.737940026 0
## 8: ICE STORM INJURIES 1975 0.984546361 0
## 9: EXCESSIVE HEAT FATALITIES 1903 1.134088200 0
## 10: FLASH FLOOD INJURIES 1777 0.032739466 0
## 11: THUNDERSTORM WIND INJURIES 1488 0.018022601 0
## 12: HAIL INJURIES 1361 0.004714873 0
## 13: WINTER STORM INJURIES 1321 0.115542727 0
## 14: HURRICANE/TYPHOON INJURIES 1275 14.488636364 0
## 15: HIGH WIND INJURIES 1137 0.056253711 0
## 16: HEAVY SNOW INJURIES 1021 0.064998727 0
## 17: FLASH FLOOD FATALITIES 978 0.018018682 0
## 18: HEAT FATALITIES 937 1.221642764 0
## 19: WILDFIRE INJURIES 911 0.329952916 0
## 20: THUNDERSTORM WINDS INJURIES 908 0.043563786 0
We can easily see the comparison of the fatalities and injuries due to these top 20 events in the following chart.
ggplot(data.set1.topevents, aes(x=reorder(EVTYPE, sum), y=sum, fill=variable)) +
geom_bar(width=1, stat="identity") +
coord_flip() +
labs(title="Top 20 Event Types for injuries and fatalities",
y="Total Injuries+fatalities",x="Event Type")
Again, we need to extract the relevant subset from the whole dataset which pertains to the damage to crop and property.
data.set2 <-DT[,list(BGN_DATE, END_DATE, EVTYPE, PROPDMG,
PROPDMGEXP, CROPDMG, CROPDMGEXP)]
In the data loaded, property and crop damage have exponents specified for amount of damage which need to be converted to get the exact amount of damage.
exp <- list(c("k", "K", "M", "", "B", "m", "+", "?", "h", "H", "-",
"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"),
c(1e3, 1e3, 1e6, 1, 1e9, 1e6, 1, 1, 1, 1e2, 1,
1, 10, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8, 1e9))
data.set2$PROPDMGEXP<-mapvalues(data.set2$PROPDMGEXP, unlist(exp[1]),
unlist(exp[2]), warn_missing = FALSE)
data.set2$CROPDMGEXP<-mapvalues(data.set2$CROPDMGEXP, unlist(exp[1]),
unlist(exp[2]), warn_missing = FALSE)
data.set2$totalpropdmg <- data.set2$PROPDMG *
as.numeric(as.character(data.set2$PROPDMGEXP))
data.set2$totalcropdmg <- data.set2$CROPDMG *
as.numeric(as.character(data.set2$CROPDMGEXP))
Now that the total crop and property damage is known, we need to calculate the total, mean and median for damage caused to crop and property by the various weather events and find the top 20 events.
data.set2.melted<-melt(data.set2[,list(BGN_DATE, END_DATE, EVTYPE,
totalcropdmg, totalpropdmg)],
id=c("BGN_DATE", "END_DATE", "EVTYPE"))
data.set2.topevents<- head(
data.set2.melted[,list(sum=sum(value), mean=mean(value),
median=median(value)), by=list(EVTYPE, variable)]
[order(-sum, -mean)], 20)
The top 20 weather events which caused the most crop and property damage are,
data.set2.topevents
## EVTYPE variable sum mean median
## 1: FLOOD totalpropdmg 144657709807 5711826.18 0
## 2: HURRICANE/TYPHOON totalpropdmg 69305840000 787566363.64 6765000
## 3: TORNADO totalpropdmg 56947380676 938920.08 2500
## 4: STORM SURGE totalpropdmg 43323536000 165990559.39 37500
## 5: FLASH FLOOD totalpropdmg 16822673978 309941.12 0
## 6: HAIL totalpropdmg 15735267513 54511.23 0
## 7: DROUGHT totalcropdmg 13972566000 5615983.12 0
## 8: HURRICANE totalpropdmg 11868319010 68208729.94 500000
## 9: TROPICAL STORM totalpropdmg 7703890550 11165058.77 5000
## 10: WINTER STORM totalpropdmg 6688497251 585016.82 0
## 11: FLOOD totalcropdmg 5661968450 223563.47 0
## 12: HIGH WIND totalpropdmg 5270046295 260738.49 0
## 13: RIVER FLOOD totalpropdmg 5118945500 29589280.35 5000
## 14: RIVER FLOOD totalcropdmg 5029459000 29072017.34 0
## 15: ICE STORM totalcropdmg 5022113500 2503546.11 0
## 16: WILDFIRE totalpropdmg 4765114000 1725865.27 0
## 17: STORM SURGE/TIDE totalpropdmg 4641188000 31359378.38 0
## 18: TSTM WIND totalpropdmg 4484928495 20391.60 0
## 19: ICE STORM totalpropdmg 3944927860 1966564.24 0
## 20: THUNDERSTORM WIND totalpropdmg 3483122472 42187.45 700
We can also see the comparison of the property and crop damage due to these top 20 events in the following chart.
ggplot(data.set2.topevents, aes(x=reorder(EVTYPE, sum), y=sum, fill=variable)) +
geom_bar(width=1, stat="identity") +
coord_flip() +
labs(title="Top 20 Event Types for Property and Crop Damage",
y="Total Property+Crop Damage",x="Event Type")
Using the data provided by NOAA, between 1950 and 2011 and based on the analysis and charts above, we can conclude the following,