Published By: Eric Lim B G, Published Date: 21-Oct-14
Storms and other severe weather events cause both public health and economic problems for communities and municipalities across the United States. This analysis make uses of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to identify events that are most harmful to population health (e.g. fatalities, injuries) or with the greatest economic consequences (e.g. crop & property damages).
The database consists of weather events recorded between year 1950 to November 2011 from which 7 variables per event were selected for the analysis.
The result of the analysis shows that while tornadoes are the most hazardous to human health with 5633 reported fatalities and 91346 injuries, floods posts the greatest economic consequences with total damages in excess of $160 billion.
The storm data file (*.bz2) is obtained from U.S. NOAA and placed in the “data” folder under the R working directory. The file is subsequently unzipped for relevant variables to be extracted and loaded into a data frame for analysis.
The 7 variables extracted are listed below:
library("R.utils")
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.6.1 (2014-01-04) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.18.0 (2014-02-22) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
##
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
##
## The following objects are masked from 'package:base':
##
## attach, detach, gc, load, save
##
## R.utils v1.34.0 (2014-10-07) successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
##
## The following object is masked from 'package:utils':
##
## timestamp
##
## The following objects are masked from 'package:base':
##
## cat, commandArgs, getOption, inherits, isOpen, parse, warnings
# Unzip U.S. NOAA (*.BZ2) data file
bunzip2(file="data/repdata-data-StormData.csv.bz2",
destname="data/repdata-data-StormData.csv",
overwrite=TRUE,remove=FALSE)
unlink("repdata-data-StormData.csv.bz2")
# Load data from CSV file
stormdata <- read.csv(file="data/repdata-data-StormData.csv",header=TRUE,sep=",",
strip.white=TRUE,na.strings=c("NA",""))
# Extract variables of interest
studydata <- data.frame(EVTYPE=stormdata$EVTYPE,
FATALITIES=stormdata$FATALITIES,INJURIES=stormdata$INJURIES,
PROPDMG=stormdata$PROPDMG,PROPDMGEXP=stormdata$PROPDMGEXP,
CROPDMG=stormdata$CROPDMG,CROPDMGEXP=stormdata$CROPDMGEXP)
The structure and summary of the extracted data is previewed to assess data format and quality.
str(studydata)
## 'data.frame': 902297 obs. of 7 variables:
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 18 levels "-","?","+","0",..: 16 16 16 16 16 16 16 16 16 16 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 8 levels "?","0","2","B",..: NA NA NA NA NA NA NA NA NA NA ...
summary(studydata)
## EVTYPE FATALITIES INJURIES
## HAIL :288661 Min. : 0.0000 Min. : 0.0000
## TSTM WIND :219940 1st Qu.: 0.0000 1st Qu.: 0.0000
## THUNDERSTORM WIND: 82563 Median : 0.0000 Median : 0.0000
## TORNADO : 60652 Mean : 0.0168 Mean : 0.1557
## FLASH FLOOD : 54277 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## FLOOD : 25326 Max. :583.0000 Max. :1700.0000
## (Other) :170878
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0.00 K :424665 Min. : 0.000 K :281832
## 1st Qu.: 0.00 M : 11330 1st Qu.: 0.000 M : 1994
## Median : 0.00 0 : 216 Median : 0.000 k : 21
## Mean : 12.06 B : 40 Mean : 1.527 0 : 19
## 3rd Qu.: 0.50 5 : 28 3rd Qu.: 0.000 B : 9
## Max. :5000.00 (Other): 84 Max. :990.000 (Other): 9
## NA's :465934 NA's :618413
Page 12 of the Storm Data Documentation provided by National Weather Service states that PROPDMGEXP and CROPDMGEXP are expotential factor (i.e. M=millions, B=billions & etc.) for PROPDMG and CROPDMG respectively. Therefore, property and crop damages are recomputed with these factors to reflect their actual values.
# Convert, trim and format string variables
studydata$EVTYPE <- toupper(trim(as.character(studydata$EVTYPE)))
studydata$PROPDMGEXP <- toupper(trim(as.character(studydata$PROPDMGEXP)))
studydata$CROPDMGEXP <- toupper(trim(as.character(studydata$CROPDMGEXP)))
# Convert expotential factor of property damages into power multiplier (of 10)
studydata$PROPDMGEXP[(studydata$PROPDMGEXP=="+")|(studydata$PROPDMGEXP=="-")|
(studydata$PROPDMGEXP=="?")|(studydata$PROPDMGEXP==0)|
(is.na(studydata$PROPDMGEXP))] <- 0
studydata$PROPDMGEXP[(studydata$PROPDMGEXP=="H")] <- 2
studydata$PROPDMGEXP[(studydata$PROPDMGEXP=="K")] <- 3
studydata$PROPDMGEXP[(studydata$PROPDMGEXP=="M")] <- 6
studydata$PROPDMGEXP[(studydata$PROPDMGEXP=="B")] <- 9
# Convert expotential factor of crop damages into power multiplier (of 10)
studydata$CROPDMGEXP[(studydata$CROPDMGEXP=="+")|(studydata$CROPDMGEXP=="-")|
(studydata$CROPDMGEXP=="?")|(studydata$CROPDMGEXP==0)|
(is.na(studydata$CROPDMGEXP))] <- 0
studydata$CROPDMGEXP[(studydata$CROPDMGEXP=="H")] <- 2
studydata$CROPDMGEXP[(studydata$CROPDMGEXP=="K")] <- 3
studydata$CROPDMGEXP[(studydata$CROPDMGEXP=="M")] <- 6
studydata$CROPDMGEXP[(studydata$CROPDMGEXP=="B")] <- 9
# Implement the power multiplier for property damages
studydata$PROPDMGEXP <- as.numeric(studydata$PROPDMGEXP)
studydata$PROPDMG <- (10^studydata$PROPDMGEXP)*studydata$PROPDMG
# Implement the power multiplier for crop damages
studydata$CROPDMGEXP <- as.numeric(studydata$CROPDMGEXP)
studydata$CROPDMG <- (10^studydata$CROPDMGEXP)*studydata$CROPDMG
# Compute and incorporate total damages into data frame
TOTALDMG <- studydata$PROPDMG+studydata$CROPDMG
studydata <- cbind(studydata,TOTALDMG)
Previous summary shows duplication/classification issues (e.g. “TSTM WIND” and “THUNDERSTORM WIND”). Therefore, the top 20 weather event types with reference to the subject of interest (i.e. injuries, fatalities & damages) are sampled to consolidate similar event types.
head(sort(tapply(studydata$FATALITIES,studydata$EVTYPE,sum),decreasing=TRUE),20)
## TORNADO EXCESSIVE HEAT FLASH FLOOD
## 5633 1903 978
## HEAT LIGHTNING TSTM WIND
## 937 816 504
## FLOOD RIP CURRENT HIGH WIND
## 470 368 248
## AVALANCHE WINTER STORM RIP CURRENTS
## 224 206 204
## HEAT WAVE EXTREME COLD THUNDERSTORM WIND
## 172 162 133
## HEAVY SNOW EXTREME COLD/WIND CHILL HIGH SURF
## 127 125 104
## STRONG WIND BLIZZARD
## 103 101
head(sort(tapply(studydata$INJURIES,studydata$EVTYPE,sum),decreasing=TRUE),20)
## TORNADO TSTM WIND FLOOD
## 91346 6957 6789
## EXCESSIVE HEAT LIGHTNING HEAT
## 6525 5230 2100
## ICE STORM FLASH FLOOD THUNDERSTORM WIND
## 1975 1777 1488
## HAIL WINTER STORM HURRICANE/TYPHOON
## 1361 1321 1275
## HIGH WIND HEAVY SNOW WILDFIRE
## 1137 1021 911
## THUNDERSTORM WINDS BLIZZARD FOG
## 908 805 734
## WILD/FOREST FIRE DUST STORM
## 545 440
head(sort(tapply(studydata$TOTALDMG,studydata$EVTYPE,sum),decreasing=TRUE),20)
## FLOOD HURRICANE/TYPHOON
## 150319678257 71913712800
## TORNADO STORM SURGE
## 57362333947 43323541000
## HAIL FLASH FLOOD
## 18761221986 18244041079
## DROUGHT HURRICANE
## 15018672000 14610229010
## RIVER FLOOD ICE STORM
## 10148404500 8967041360
## TROPICAL STORM WINTER STORM
## 8382236550 6715441251
## HIGH WIND WILDFIRE
## 5908617595 5060586800
## TSTM WIND STORM SURGE/TIDE
## 5047065845 4642038000
## THUNDERSTORM WIND HURRICANE OPAL
## 3897965522 3191846000
## WILD/FOREST FIRE HEAVY RAIN/SEVERE WEATHER
## 3108626330 2500000000
Consolidation of weather event types is performed to reduce fragmentation issues. (Note: certain human judgement is exercised in identifying the duplicates.)
studydata$EVTYPE[(studydata$EVTYPE=="TSTM WIND")] <- "THUNDERSTORM WIND"
studydata$EVTYPE[(studydata$EVTYPE=="RIP CURRENTS")] <- "RIP CURRENT"
studydata$EVTYPE[(studydata$EVTYPE=="STORM SURGE")] <- "STORM SURGE/TIDE"
studydata$EVTYPE[(studydata$EVTYPE=="WILD FIRE")] <- "WILD/FOREST FIRE"
studydata$EVTYPE[(studydata$EVTYPE=="HURRICANE OPAL")] <- "HURRICANE/TYPHOON"
studydata$EVTYPE[(studydata$EVTYPE=="HURRICANE")] <- "HURRICANE/TYPHOON"
studydata$EVTYPE[(studydata$EVTYPE=="RIVER FLOOD")] <- "FLOOD"
studydata$EVTYPE[(studydata$EVTYPE=="ICE STORM")] <- "WINTER STORM"
studydata$EVTYPE[(studydata$EVTYPE=="EXCESSIVE HEAT")] <- "HEAT"
studydata$EVTYPE[(studydata$EVTYPE=="HEAT WAVE")] <- "HEAT"
Data processing is now completed, and the data are ready for our analysis.
The top 15 weather event types that have the greatest fatalities in the United States from year 1950 to November 2011 are plotted.
library(ggplot2)
# Aggregate fatalities count by weather event type, sort and extract top 15
fatalities <- aggregate(FATALITIES~EVTYPE,studydata,"sum")
fatalities <- fatalities[order(fatalities$FATALITIES,decreasing=TRUE),][1:15,]
# Plot horizontal bar chart of top fatalities by weather event type
ggplot(transform(fatalities,EVTYPE=reorder(EVTYPE,order(FATALITIES))),
aes(EVTYPE,FATALITIES,fill=FATALITIES)) +
coord_flip() + xlab("Event Type") + ylab("Fatalities") +
ggtitle("Top 15 Fatalities By Weather Event Type") +
geom_bar(stat="identity") + theme(legend.position = "none") +
geom_text(aes(x=EVTYPE,y=FATALITIES,ymax=FATALITIES,
label=FATALITIES,hjust=1,vjust=0.5),
colour="red",size=3)
Tornadoes (TORNADO) post the greatest risk to human lives in the United States with 5633 fatalities recorded from 1950 to November 2011. This is more than the fatalities of the second (HEAT = 3012) and third (FLASH FLOOD = 978) riskiest weather event combined.
The top 15 weather event types that cause the most injuries in the United States from year 1950 to November 2011 are plotted.
# Aggregate injuries count by weather event type, sort and extract top 15
injuries <- aggregate(INJURIES~EVTYPE,studydata,"sum")
injuries <- injuries[order(injuries$INJURIES,decreasing=TRUE),][1:15,]
# Plot horizontal bar chart of top injuries by weather event type
ggplot(transform(injuries,EVTYPE=reorder(EVTYPE,order(INJURIES))),
aes(EVTYPE,INJURIES,fill=INJURIES)) +
coord_flip() + xlab("Event Type") + ylab("Injuries") +
ggtitle("Top 15 Injuries By Weather Event Type") +
geom_bar(stat="identity") + theme(legend.position = "none") +
geom_text(aes(x=EVTYPE,y=INJURIES,ymax=INJURIES,
label=INJURIES,hjust=1,vjust=0.5),
colour="red",size=3)
Tornadoes (TORNADO) also result in the most injuries (91346) in the United States, which is more than 4-times the combined reported injuries of the second (HEAT = 9004) and third (THUNDERSTORM WIND = 8445) most hazardous weather event.
The top 15 weather event types that have the greatest financial impact on the United States from year 1950 to November 2011 are plotted.
# Aggregate damages by weather event type, sort and extract top 15
damages <- aggregate(TOTALDMG~EVTYPE,studydata,"sum")
damages <- damages[order(damages$TOTALDMG,decreasing=TRUE),][1:15,]
# Plot horizontal bar chart of top damages by weather event type
ggplot(transform(damages,EVTYPE=reorder(EVTYPE,order(TOTALDMG))),
aes(EVTYPE,round(TOTALDMG/1000000),fill=TOTALDMG)) +
coord_flip() + xlab("Event Type") + ylab("Damages ('mil)") +
ggtitle("Top 15 Damages By Weather Event Type") +
geom_bar(stat="identity") + theme(legend.position = "none") +
geom_text(aes(x=EVTYPE,y=round(TOTALDMG/1000000),ymax=round(TOTALDMG/1000000),
label=round(TOTALDMG/1000000),hjust=1,vjust=0.5),
colour="red",size=3)
In terms of economic impact however, tornadoes are ranked third with total damages of 57 billion. Floods (FLOOD = +160 billion) and Hurricanes/Typhoons (HURRICANE/TYPHOON = +89 billion) are the top two weather events that post the greatest economic consequences.