This report examines a set of raw data sourced from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database covering over 900,000 events from 1950 to 2011. Analysis of this data shows the most frequently occurring type of event has been Thunderstorm Wind, accounting for more than a third of the records (36%). However, these incidents are less severe in terms of impact on public health and economic consequences. Tornado events have caused more injuries (91,346) and fatalities (5,633) than any other event. The most intense events in terms of impact on health have been those categorised as Excessive Heat with each incident averaging 3.56 injuries. It is the cold that kills though, as measured by the ratio of deaths to injuries with Cold/Wind Chill events resulting in 7.92 fatalities for every non-fatal injury recorded. The most damaging events from a financial perspective were Floods, costing an estimated $160.6bn (consisting of $149.9bn in damage to property and $10.7bn through crop loss). Although realtively rare, individual Storm Surge/Tide events were the most expensive with each averaging $117m in damages.
The source of the data used in the analysis is a link provided in the Coursera Reproducible Research Assignment Storm Data. It is stated that this is a version of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. It has not been possible to independently validate this.
# necessary libraries
library(plyr)
library(ggplot2)
library(RColorBrewer)
# download data and read few rows
temp <- tempfile()
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",temp)
data <- read.table(bzfile(temp, "repdata-data-StormData.csv"),sep=",",header=T,nrows=100)
names(data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
An early look at this data shows not all of the 37 variables are required. A filter is applied before loading the complete dataset. Only the following variables are included:
BGN_DATE date the event began
STATE US state in which the event occurred
EVTYPE event type
FATALITIES number of fatalities attributed to the event
INJURIES number of non-fatal injuries attributed to the event
PROPDMG estimated $ value of damages done to property
PROPDMGEXP unit in which PROPDMG is expressed (e.g. K=1000s, M=1000000s)
CROPDMG estimated $ value of damages done to crops
CROPDMGEXP unit in which CROPDMG is expressed (e.g. K=1000s, M=1000000s)
req_f <- c("NULL","character",rep("NULL",4),"factor","character",rep("NULL",14),
rep("numeric",3),"character","numeric","character",rep("NULL",9))
df <- read.table(bzfile(temp, "repdata-data-StormData.csv"),sep=",",colClasses=req_f,header=T)
unlink(temp)
df$BGN_DATE <- as.POSIXct(strptime(df$BGN_DATE,"%m/%d/%Y %H:%M:%S"))
df$yr <- as.factor(format(df$BGN_DATE,"%Y"))
The data for Event Type is very “raw” with over 800 unique values, Many of these are minor variations (e.g. spelling mistakes, spacing, capitalization). To simplify the analysis, all spaces and the / character have been removed and everything has been capitalized. A bespoke table has been created (igr_evtype.csv) to map the most popular and obvious Event Types to the list given in the National Weather Service documentation. This mapping is reproduced at the end of this report in Appendix A.
df$EVTYPE <- toupper(df$EVTYPE)
df$EVTYPE <- gsub("/","",df$EVTYPE)
df$EVTYPE <- gsub(" ","",df$EVTYPE)
new_type_labels <- read.csv("igr_evtype.csv",header=T)
df <- join(df,new_type_labels,type="left")
## Joining by: EVTYPE
The columns (PROPDMGEXP and CROPDMGEXP) representing the scale of the estimated costs of the damage done by the events contain a relatively small number of unexplained values. Again after capitalizing everything, the following rules have been used:
0 or blank -> U-> x1
H -> x100
K -> x1000
M -> x1000000
B -> x1000000000
Anything else is treated as NA,
units <- c("U","H","K","M","B")
df$PROPDMGEXP <- toupper(df$PROPDMGEXP)
df$PROPDMGEXP <- ifelse(df$PROPDMGEXP=="0"|df$PROPDMGEXP=="","U",df$PROPDMGEXP)
df$PROPDMGEXP <- ifelse(!(df$PROPDMGEXP %in% units),"NA",df$PROPDMGEXP)
df$CROPDMGEXP <- toupper(df$CROPDMGEXP)
df$CROPDMGEXP <- ifelse(df$CROPDMGEXP=="0"|df$CROPDMGEXP=="","U",df$CROPDMGEXP)
df$CROPDMGEXP <- ifelse(!(df$CROPDMGEXP %in% units),"NA",df$CROPDMGEXP)
unit_values <- c(1,100,1000,1000000,1000000000)
df$prop_damages <- df$PROPDMG * unit_values[match(df$PROPDMGEXP,units)]
df$crop_damages <- df$CROPDMG * unit_values[match(df$CROPDMGEXP,units)]
grpd_health <- ddply(df,"NWS_Category", function(x) c(incidents=nrow(x),
fatalities= sum(x$FATALITIES),injuries=sum(x$INJURIES)))
grpd_health$injperinc <- grpd_health$injuries / grpd_health$incidents
grpd_health$deathratio <- grpd_health$fatalities / grpd_health$injuries
Event Categories - Top 5 by Number of Injuries
head(grpd_health[order(-grpd_health$injuries),],5)
## NWS_Category incidents fatalities injuries injperinc deathratio
## 34 Tornado 60652 5633 91346 1.50607 0.06167
## 33 Thunderstorm Wind 324390 706 9458 0.02916 0.07465
## 14 Flood 25871 478 6793 0.26257 0.07037
## 11 Excessive Heat 1832 1903 6525 3.56168 0.29165
## 26 Lightning 15755 816 5230 0.33196 0.15602
The most frequent category explains 35.9516% of the total incidents
Event Categories - Top 5 by Injuries per Incident
head(grpd_health[order(-grpd_health$injperinc),],5)
## NWS_Category incidents fatalities injuries injperinc deathratio
## 11 Excessive Heat 1832 1903 6525 3.5617 0.29165
## 18 Heat 893 948 2117 2.3707 0.44780
## 34 Tornado 60652 5633 91346 1.5061 0.06167
## 10 Dust Storm 428 22 440 1.0280 0.05000
## 24 Ice Storm 2006 89 1975 0.9845 0.04506
Event Categories - Top 5 by Fatality:Injury ratio
head(grpd_health[order(-grpd_health$deathratio),],5)
## NWS_Category incidents fatalities injuries injperinc
## 5 Cold/Wind Chill 539 95 12 0.022263
## 4 Coastal Flood 841 6 2 0.002378
## 23 Hurricane (Typhoon) 174 61 46 0.264368
## 2 Avalanche 386 224 170 0.440415
## 12 Extreme Cold/Wind Chill 1869 304 260 0.139112
## deathratio
## 5 7.917
## 4 3.000
## 23 1.326
## 2 1.318
## 12 1.169
This chart plots each category
cutpoints <- quantile(grpd_health$deathratio, seq(0, 1, length = 6), na.rm = TRUE)
grpd_health$quintiles <- cut(grpd_health$deathratio, cutpoints)
gh <- ggplot(grpd_health,aes(log(injuries),injperinc))
gh <- gh + geom_point(aes(color=quintiles))
gh <- gh + scale_color_manual("Fatality:Injury ratio",values=brewer.pal(5, "Set2"))
gh <- gh + xlab("log(Total Number of Injuries)")
gh <- gh + ylab("Average Injuries per Incident")
gh <- gh + ggtitle("US Severe Weather Event Categories 1950-2011")
print(gh)
grpd_cost <- ddply(df[complete.cases(df),],"NWS_Category", function(x) c(incidents=nrow(x),prop_damage= sum(x$prop_damages),crop_damage=sum(x$crop_damages)))
grpd_cost$total_damage <- grpd_cost$prop_damage + grpd_cost$crop_damage
grpd_cost$dmgperinc <- grpd_cost$total_damage / grpd_cost$incidents
Event Categories - Top 5 by Total Cost of Damage Caused
head(grpd_cost[order(-grpd_cost$total_damage),],5)
## NWS_Category incidents prop_damage crop_damage total_damage
## 14 Flood 25868 1.499e+11 1.070e+10 1.606e+11
## 36 Unclassified 8684 8.106e+10 4.263e+09 8.532e+10
## 34 Tornado 60641 5.694e+10 4.150e+08 5.735e+10
## 31 Storm Surge/Tide 409 4.796e+10 8.550e+05 4.797e+10
## 17 Hail 288641 1.573e+10 3.001e+09 1.873e+10
## dmgperinc
## 14 6208613
## 36 9824707
## 34 945765
## 31 117275254
## 17 64901
Event Categories - Top 5 by Cost of Damage Caused to Property
head(grpd_cost[order(-grpd_cost$prop_damage),],5)
## NWS_Category incidents prop_damage crop_damage total_damage
## 14 Flood 25868 1.499e+11 1.070e+10 1.606e+11
## 36 Unclassified 8684 8.106e+10 4.263e+09 8.532e+10
## 34 Tornado 60641 5.694e+10 4.150e+08 5.735e+10
## 31 Storm Surge/Tide 409 4.796e+10 8.550e+05 4.797e+10
## 13 Flash Flood 55577 1.662e+10 1.531e+09 1.815e+10
## dmgperinc
## 14 6208613
## 36 9824707
## 34 945765
## 31 117275254
## 13 326634
Event Categories - Top 5 by Cost of Damage Caused to Crops
head(grpd_cost[order(-grpd_cost$crop_damage),],5)
## NWS_Category incidents prop_damage crop_damage total_damage dmgperinc
## 8 Drought 2488 1.046e+09 1.397e+10 1.502e+10 6036444
## 14 Flood 25868 1.499e+11 1.070e+10 1.606e+11 6208613
## 24 Ice Storm 2006 3.945e+09 5.022e+09 8.967e+09 4470110
## 36 Unclassified 8684 8.106e+10 4.263e+09 8.532e+10 9824707
## 17 Hail 288641 1.573e+10 3.001e+09 1.873e+10 64901
Event Categories - Top 5 by Total Cost of Damage per Incident
head(grpd_cost[order(-grpd_cost$dmgperinc),],5)
## NWS_Category incidents prop_damage crop_damage total_damage
## 31 Storm Surge/Tide 409 4.796e+10 8.550e+05 4.797e+10
## 23 Hurricane (Typhoon) 174 1.187e+10 2.742e+09 1.461e+10
## 35 Tropical Storm 690 7.704e+09 6.783e+08 8.382e+09
## 36 Unclassified 8684 8.106e+10 4.263e+09 8.532e+10
## 14 Flood 25868 1.499e+11 1.070e+10 1.606e+11
## dmgperinc
## 31 117275254
## 23 83966833
## 35 12148169
## 36 9824707
## 14 6208613
After capitalizing and removing spaces and the / character…
new_type_labels[new_type_labels$NWS_Category!="Unclassified",]
## EVTYPE NWS_Category
## 9 ASTRONOMICALLOWTIDE Astronomical Low Tide
## 11 AVALANCHE Avalanche
## 20 BLIZZARD Blizzard
## 40 COASTALFLOOD Coastal Flood
## 41 COASTALFLOODING Coastal Flood
## 57 COLDWINDCHILL Cold/Wind Chill
## 67 DENSEFOG Dense Fog
## 73 DROUGHT Drought
## 92 DUSTDEVIL Dust Devil
## 94 DUSTSTORM Dust Storm
## 104 EXCESSIVEHEAT Excessive Heat
## 113 EXTREMECOLD Extreme Cold/Wind Chill
## 114 EXTREMECOLDWINDCHILL Extreme Cold/Wind Chill
## 118 EXTREMEWINDCHILL Extreme Cold/Wind Chill
## 125 FLASHFLOOD Flash Flood
## 130 FLASHFLOODING Flash Flood
## 139 FLOOD Flood
## 142 FLOODFLASHFLOOD Flash Flood
## 145 FLOODING Flood
## 153 FOG Dense Fog
## 170 FROSTFREEZE Frost/Freeze
## 172 FUNNELCLOUD Funnel Cloud
## 194 HAIL Hail
## 224 HEAT Heat
## 234 HEAVYRAIN Heavy Rain
## 254 HEAVYSNOW Heavy Snow
## 283 HEAVYSURFHIGHSURF High Surf
## 288 HIGHSURF High Surf
## 297 HIGHWIND High Wind
## 310 HIGHWINDS High Wind
## 337 HURRICANE Hurricane (Typhoon)
## 362 ICESTORM Ice Storm
## 370 LAKE-EFFECTSNOW Lake-Effect Snow
## 373 LANDSLIDE Debris Flow
## 387 LIGHTNING Lightning
## 413 MARINEHAIL Marine Hail
## 414 MARINEHIGHWIND Marine High Wind
## 417 MARINETHUNDERSTORMWIND Marine Thunderstorm Wind
## 418 MARINETSTMWIND Marine Thunderstorm Wind
## 490 RECORDWARMTH Excessive Heat
## 495 RIPCURRENT Rip Current
## 496 RIPCURRENTS Rip Current
## 499 RIVERFLOOD Flood
## 532 SNOW Heavy Snow
## 565 STORMSURGE Storm Surge/Tide
## 566 STORMSURGETIDE Storm Surge/Tide
## 570 STRONGWIND Strong Wind
## 572 STRONGWINDS Strong Wind
## 654 THUNDERSTORMWIND Thunderstorm Wind
## 676 THUNDERSTORMWINDS Thunderstorm Wind
## 713 TORNADO Tornado
## 727 TROPICALSTORM Tropical Storm
## 735 TSTMWIND Thunderstorm Wind
## 751 TSTMWINDHAIL Thunderstorm Wind
## 763 UNSEASONABLYWARM Heat
## 780 URBANFLOOD Flood
## 805 WATERSPOUT Waterspout
## 819 WILDFIRE Wildfire
## 821 WILDFORESTFIRE Wildfire
## 823 WIND Strong Wind
## 834 WINTERSTORM Winter Storm
## 838 WINTERWEATHER Winter Weather
## 839 WINTERWEATHERMIX Winter Weather
All other entries were bracketed as Unclassifed