An extract National Oceanic and Atmospheric Administration (NOAA) Storm Database has been supplied for this assignement covering the time period between 1950 and November 2011. Data has been collected and grouped by type (EVTYPE) with some revisions over time in notation as it has been periodically updated. Documenttaion is incomplete the following is a partial sample :
The data set has been analysed to understand which events have had the highest cumulative national impact, firstly on health and secondly economic consequences. Health impacts have been ranked and reported based on NOAA event types linked to fatalities. Economic consequences are ranked and reported based on total cost in terms of property and crop damage.
Download raw “tar”" file and cache, load as data table and report dimensions and number of event types.
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = "StormData.csv.bz2")
## read.csv can read and extract from tar files with extensions bx2
stormDF <- read.csv("StormData.csv.bz2")
stormDT <- as.data.table(stormDF)
## dimensions and number of EVTYPE
dim(stormDT)
## [1] 902297 37
length(unique(stormDT$EVTYPE))
## [1] 985
Subset to only those observations with a defined EVTYPE and numeric fatalities, injuries or economic damage including zero values. In addition the majority of 37 columns are not needed for the analysis. This reduces size of the data table for analysis.
stormDT <- stormDT[(EVTYPE != "?" &
(INJURIES > 0 | FATALITIES > 0 | PROPDMG > 0 | CROPDMG > 0)), ]
## keep only 7 of 37 columns needed for analysis
RemoveCol <- colnames(stormDT[, !c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG"
, "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")])
stormDT[, c(RemoveCol) := NULL]
dim(stormDT)
## [1] 254632 7
A number of different vintages of data entry are present in the data set. I am indebted to the work done by flying disk and Eddie Song in deciphering a key to transalte to these for numerical work github post. This work is used as the basis to tidy and convert to numeric exponents most of the wrnagling is required in the time period from 1994 to 1995, property and crop cost variables are then created.
cols <- c("PROPDMGEXP", "CROPDMGEXP")
stormDT[, (cols) := c(lapply(.SD, toupper)), .SDcols = cols]
source <- sort(unique(as.character(stormDT$PROPDMGEXP)))
source <- c(source,"?") ## additional variable only in CROPDMGEXP
translation <- c(0,0,1,10,10,10,10,10,10,10,10^9,10^2,10^3,10^6)
translation <- c(translation, 0)
key <- data.frame(source, translation)
stormDT$PROPDMGEXP <- key[match(stormDT$PROPDMGEXP,key$source), 'translation']
stormDT$CROPDMGEXP <- key[match(stormDT$CROPDMGEXP,key$source), 'translation']
stormDT$DamProp <- stormDT$PROPDMG*stormDT$PROPDMGEXP
stormDT$DamCrop <- stormDT$CROPDMG*stormDT$CROPDMGEXP
Group and sum data by event type, then rank based on cumulative fatalities. Injuries are tracked as a secondary variable. There may be some overlap in event types over the time period but the documentation is insuffiecent to aggregate with confidence.
harmed <- aggregate(cbind(FATALITIES,INJURIES)~EVTYPE,stormDT,sum)
harmed <- harmed[order(harmed$FATALITIES, harmed$INJURIES, decreasing = TRUE), ]
head(harmed, 20)
## EVTYPE FATALITIES INJURIES
## 406 TORNADO 5633 91346
## 60 EXCESSIVE HEAT 1903 6525
## 72 FLASH FLOOD 978 1777
## 150 HEAT 937 2100
## 257 LIGHTNING 816 5230
## 422 TSTM WIND 504 6957
## 85 FLOOD 470 6789
## 305 RIP CURRENT 368 232
## 199 HIGH WIND 248 1137
## 10 AVALANCHE 224 170
## 480 WINTER STORM 206 1321
## 306 RIP CURRENTS 204 297
## 152 HEAT WAVE 172 309
## 66 EXTREME COLD 160 231
## 363 THUNDERSTORM WIND 133 1488
## 169 HEAVY SNOW 127 1021
## 67 EXTREME COLD/WIND CHILL 125 24
## 352 STRONG WIND 103 280
## 13 BLIZZARD 101 805
## 194 HIGH SURF 101 152
A bar chart of the 10 highest event types on health using the base R plotting routines.
## use only top 10 events for plot, lock in ranking order for chart i.e not alphabeteic
harmed.plot <- harmed[1:10, ]
harmed.plot$EVTYPE <- factor(harmed.plot$EVTYPE, levels = harmed.plot$EVTYPE)
par(las=2 , mar=c(5,4,4,2)) # make label text perpendicular to axis
barplot(cbind(FATALITIES,INJURIES)~EVTYPE, data = harmed.plot,
beside = TRUE, xlab = "", cex.names = 0.75, col = c("blue","red"))
legend("top",c("Fatalities","Injuries"), cex = 1.0, fill = c("blue","red"))
title(main = "Top 10 US Storm Events 1950-2012")
mtext("cumulative ranked by fatality, source : NOAA", las =1, side =3, cex = 0.8)
Group and sum data by event type, then rank based on cumulative total cost.
Damage <-aggregate(cbind(DamProp,DamCrop)~EVTYPE,stormDT,sum)
Damage$TotalCost <- Damage$DamProp+Damage$DamCrop
Damage <- Damage[order(Damage$TotalCost, decreasing = TRUE), ]
head(Damage, 20)
## EVTYPE DamProp DamCrop TotalCost
## 85 FLOOD 144657709800 5661968450 150319678250
## 223 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 406 TORNADO 56937162897 414954710 57352117607
## 349 STORM SURGE 43323536000 5000 43323541000
## 133 HAIL 15732269877 3025954650 18758224527
## 72 FLASH FLOOD 16140815011 1421317100 17562132111
## 48 DROUGHT 1046106000 13972566000 15018672000
## 214 HURRICANE 11868319010 2741910000 14610229010
## 309 RIVER FLOOD 5118945500 5029459000 10148404500
## 237 ICE STORM 3944928310 5022113500 8967041810
## 416 TROPICAL STORM 7703890550 678346000 8382236550
## 480 WINTER STORM 6688497260 26944000 6715441260
## 199 HIGH WIND 5270046280 638571300 5908617580
## 470 WILDFIRE 4765114000 295472800 5060586800
## 422 TSTM WIND 4484928990 554007350 5038936340
## 350 STORM SURGE/TIDE 4641188000 850000 4642038000
## 363 THUNDERSTORM WIND 3483122560 414843050 3897965610
## 220 HURRICANE OPAL 3172846000 19000000 3191846000
## 468 WILD/FOREST FIRE 3001829500 106796830 3108626330
## 162 HEAVY RAIN/SEVERE WEATHER 2500000000 0 2500000000
## use top 10 events for chart, lock in ranking order for plotting i.e not alphabeteic
Damage.plot <- Damage[1:10, ]
Damage.plot$EVTYPE <- factor(Damage.plot$EVTYPE, levels = Damage.plot$EVTYPE)
par(las=2 , mar=c(8,4,4,2)) # make label text perpendicular to axis
barplot(cbind(DamProp,DamCrop,TotalCost)~EVTYPE, data = Damage.plot,
beside = TRUE, xlab = "", cex.axis =0.75, cex.names = 0.65, col = c("blue","red","green"))
legend("top",c("Property","Crops","Total $US"), cex = 0.8, fill = c("blue","red","green"))
title(main = "Top 10 US Storm Events 1950-2012")
mtext("ranked by economic consequence, source : NOAA", las =1, side =3, cex = 0.8)