1. Synposis

An extract National Oceanic and Atmospheric Administration (NOAA) Storm Database has been supplied for this assignement covering the time period between 1950 and November 2011. Data has been collected and grouped by type (EVTYPE) with some revisions over time in notation as it has been periodically updated. Documenttaion is incomplete the following is a partial sample :

NWS Documentation

The data set has been analysed to understand which events have had the highest cumulative national impact, firstly on health and secondly economic consequences. Health impacts have been ranked and reported based on NOAA event types linked to fatalities. Economic consequences are ranked and reported based on total cost in terms of property and crop damage.

2. Data Processing

2.1 Data Loading

Download raw “tar”" file and cache, load as data table and report dimensions and number of event types.

fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = "StormData.csv.bz2")
## read.csv can read and extract from tar files with extensions bx2
stormDF <- read.csv("StormData.csv.bz2")
stormDT <- as.data.table(stormDF)
## dimensions and number of EVTYPE
dim(stormDT)
## [1] 902297     37
length(unique(stormDT$EVTYPE))
## [1] 985

2.2 Subset and Tidy

Subset to only those observations with a defined EVTYPE and numeric fatalities, injuries or economic damage including zero values. In addition the majority of 37 columns are not needed for the analysis. This reduces size of the data table for analysis.

stormDT <- stormDT[(EVTYPE != "?" & 
            (INJURIES > 0 | FATALITIES > 0 | PROPDMG > 0 | CROPDMG > 0)), ]
## keep only 7 of 37 columns needed for analysis
RemoveCol <- colnames(stormDT[, !c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG"
                                   , "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")])
stormDT[, c(RemoveCol) := NULL]
dim(stormDT)
## [1] 254632      7

2.3 Exponent Wrangling PROPDMGEXP & CROPDMGEXP

A number of different vintages of data entry are present in the data set. I am indebted to the work done by flying disk and Eddie Song in deciphering a key to transalte to these for numerical work github post. This work is used as the basis to tidy and convert to numeric exponents most of the wrnagling is required in the time period from 1994 to 1995, property and crop cost variables are then created.

cols <- c("PROPDMGEXP", "CROPDMGEXP")
stormDT[,  (cols) := c(lapply(.SD, toupper)), .SDcols = cols]
source <- sort(unique(as.character(stormDT$PROPDMGEXP)))
source <- c(source,"?") ## additional variable only in CROPDMGEXP
translation <- c(0,0,1,10,10,10,10,10,10,10,10^9,10^2,10^3,10^6)
translation <- c(translation, 0)
key <- data.frame(source, translation)
stormDT$PROPDMGEXP <- key[match(stormDT$PROPDMGEXP,key$source), 'translation']
stormDT$CROPDMGEXP <- key[match(stormDT$CROPDMGEXP,key$source), 'translation']
stormDT$DamProp <- stormDT$PROPDMG*stormDT$PROPDMGEXP
stormDT$DamCrop <- stormDT$CROPDMG*stormDT$CROPDMGEXP

3. Analysis and Results

3.1 Events most harmful to US population health

Group and sum data by event type, then rank based on cumulative fatalities. Injuries are tracked as a secondary variable. There may be some overlap in event types over the time period but the documentation is insuffiecent to aggregate with confidence.

harmed <- aggregate(cbind(FATALITIES,INJURIES)~EVTYPE,stormDT,sum)
harmed <- harmed[order(harmed$FATALITIES, harmed$INJURIES, decreasing = TRUE), ]
head(harmed, 20) 
##                      EVTYPE FATALITIES INJURIES
## 406                 TORNADO       5633    91346
## 60           EXCESSIVE HEAT       1903     6525
## 72              FLASH FLOOD        978     1777
## 150                    HEAT        937     2100
## 257               LIGHTNING        816     5230
## 422               TSTM WIND        504     6957
## 85                    FLOOD        470     6789
## 305             RIP CURRENT        368      232
## 199               HIGH WIND        248     1137
## 10                AVALANCHE        224      170
## 480            WINTER STORM        206     1321
## 306            RIP CURRENTS        204      297
## 152               HEAT WAVE        172      309
## 66             EXTREME COLD        160      231
## 363       THUNDERSTORM WIND        133     1488
## 169              HEAVY SNOW        127     1021
## 67  EXTREME COLD/WIND CHILL        125       24
## 352             STRONG WIND        103      280
## 13                 BLIZZARD        101      805
## 194               HIGH SURF        101      152

A bar chart of the 10 highest event types on health using the base R plotting routines.

## use only top 10 events for plot, lock in ranking order for chart i.e not alphabeteic
harmed.plot <- harmed[1:10, ]
harmed.plot$EVTYPE <- factor(harmed.plot$EVTYPE, levels = harmed.plot$EVTYPE)
par(las=2 , mar=c(5,4,4,2)) # make label text perpendicular to axis
barplot(cbind(FATALITIES,INJURIES)~EVTYPE, data = harmed.plot, 
        beside = TRUE, xlab = "", cex.names = 0.75, col = c("blue","red"))
legend("top",c("Fatalities","Injuries"), cex = 1.0, fill = c("blue","red"))
title(main = "Top 10 US Storm Events 1950-2012")
mtext("cumulative ranked by fatality, source : NOAA", las =1, side =3, cex = 0.8)

3.2 Events with the greatest economic consequences to US

Group and sum data by event type, then rank based on cumulative total cost.

Damage <-aggregate(cbind(DamProp,DamCrop)~EVTYPE,stormDT,sum)
Damage$TotalCost <- Damage$DamProp+Damage$DamCrop
Damage <- Damage[order(Damage$TotalCost, decreasing = TRUE), ]
head(Damage, 20)
##                        EVTYPE      DamProp     DamCrop    TotalCost
## 85                      FLOOD 144657709800  5661968450 150319678250
## 223         HURRICANE/TYPHOON  69305840000  2607872800  71913712800
## 406                   TORNADO  56937162897   414954710  57352117607
## 349               STORM SURGE  43323536000        5000  43323541000
## 133                      HAIL  15732269877  3025954650  18758224527
## 72                FLASH FLOOD  16140815011  1421317100  17562132111
## 48                    DROUGHT   1046106000 13972566000  15018672000
## 214                 HURRICANE  11868319010  2741910000  14610229010
## 309               RIVER FLOOD   5118945500  5029459000  10148404500
## 237                 ICE STORM   3944928310  5022113500   8967041810
## 416            TROPICAL STORM   7703890550   678346000   8382236550
## 480              WINTER STORM   6688497260    26944000   6715441260
## 199                 HIGH WIND   5270046280   638571300   5908617580
## 470                  WILDFIRE   4765114000   295472800   5060586800
## 422                 TSTM WIND   4484928990   554007350   5038936340
## 350          STORM SURGE/TIDE   4641188000      850000   4642038000
## 363         THUNDERSTORM WIND   3483122560   414843050   3897965610
## 220            HURRICANE OPAL   3172846000    19000000   3191846000
## 468          WILD/FOREST FIRE   3001829500   106796830   3108626330
## 162 HEAVY RAIN/SEVERE WEATHER   2500000000           0   2500000000
## use top 10  events for chart, lock in ranking order for plotting i.e not alphabeteic
Damage.plot <- Damage[1:10, ]
Damage.plot$EVTYPE <- factor(Damage.plot$EVTYPE, levels = Damage.plot$EVTYPE)
par(las=2 , mar=c(8,4,4,2)) # make label text perpendicular to axis
barplot(cbind(DamProp,DamCrop,TotalCost)~EVTYPE, data = Damage.plot, 
        beside = TRUE, xlab = "", cex.axis =0.75, cex.names = 0.65, col = c("blue","red","green"))
legend("top",c("Property","Crops","Total $US"), cex = 0.8, fill = c("blue","red","green"))
title(main = "Top 10 US Storm Events 1950-2012")
mtext("ranked by economic consequence, source : NOAA", las =1, side =3, cex = 0.8)