Analysis of the most Costly Storms in the USA

Synopsis

The five most cumulatively costly severe weather events in terms of property and crop damage are Hurricanes, Typhoons, Floods, Tornadoes, and Storm Surges. The five most cumulatively harmful severe weather events in terms of human health outcomes–injuries and fatalities–are Tornadoes, Excessive Heat, Thunderstorm Winds, Floods, and Lightning. Tornadoes are by far the most dangerous for health outcomes, and Hurricanes/Typhoons also are significantly more economically costly than other severe weather events.

Data Processing

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile = "stormdata.csv.bz2", method = "curl")
stormData<-read.csv(bzfile("stormdata.csv.bz2"), header=TRUE)

Let’s take a look at the basic structure of the data frame

str(stormData)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels ""," Christiansburg",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","%SD",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","\t","\t\t",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

We can see that the data set is rather large: 902297x37, and the variables of interest to us are EVTYPE (for storm type), CROPDMG for crop damage, PROPDMG for property damage, and INJURIES, and FATALITIES. There then appear to be exponents PROPDMGEXP and CROPDMGEXP for the economic damage. We’ll go through these variables and see what the proportion of missing values is for each one, as well as the distribution of values using “summary()”:

summary(stormData$EVTYPE)
##                     HAIL                TSTM WIND        THUNDERSTORM WIND 
##                   288661                   219940                    82563 
##                  TORNADO              FLASH FLOOD                    FLOOD 
##                    60652                    54277                    25326 
##       THUNDERSTORM WINDS                HIGH WIND                LIGHTNING 
##                    20843                    20212                    15754 
##               HEAVY SNOW               HEAVY RAIN             WINTER STORM 
##                    15708                    11723                    11433 
##           WINTER WEATHER             FUNNEL CLOUD         MARINE TSTM WIND 
##                     7026                     6839                     6175 
## MARINE THUNDERSTORM WIND               WATERSPOUT              STRONG WIND 
##                     5812                     3796                     3566 
##     URBAN/SML STREAM FLD                 WILDFIRE                 BLIZZARD 
##                     3392                     2761                     2719 
##                  DROUGHT                ICE STORM           EXCESSIVE HEAT 
##                     2488                     2006                     1678 
##               HIGH WINDS         WILD/FOREST FIRE             FROST/FREEZE 
##                     1533                     1457                     1342 
##                DENSE FOG       WINTER WEATHER/MIX           TSTM WIND/HAIL 
##                     1293                     1104                     1028 
##  EXTREME COLD/WIND CHILL                     HEAT                HIGH SURF 
##                     1002                      767                      725 
##           TROPICAL STORM           FLASH FLOODING             EXTREME COLD 
##                      690                      682                      655 
##            COASTAL FLOOD         LAKE-EFFECT SNOW        FLOOD/FLASH FLOOD 
##                      650                      636                      624 
##                LANDSLIDE                     SNOW          COLD/WIND CHILL 
##                      600                      587                      539 
##                      FOG              RIP CURRENT              MARINE HAIL 
##                      538                      470                      442 
##               DUST STORM                AVALANCHE                     WIND 
##                      427                      386                      340 
##             RIP CURRENTS              STORM SURGE            FREEZING RAIN 
##                      304                      261                      250 
##              URBAN FLOOD     HEAVY SURF/HIGH SURF        EXTREME WINDCHILL 
##                      249                      228                      204 
##             STRONG WINDS           DRY MICROBURST    ASTRONOMICAL LOW TIDE 
##                      196                      186                      174 
##                HURRICANE              RIVER FLOOD               LIGHT SNOW 
##                      174                      173                      154 
##         STORM SURGE/TIDE            RECORD WARMTH         COASTAL FLOODING 
##                      148                      146                      143 
##               DUST DEVIL         MARINE HIGH WIND        UNSEASONABLY WARM 
##                      141                      135                      126 
##                 FLOODING   ASTRONOMICAL HIGH TIDE        MODERATE SNOWFALL 
##                      120                      103                      101 
##           URBAN FLOODING               WINTRY MIX        HURRICANE/TYPHOON 
##                       98                       90                       88 
##            FUNNEL CLOUDS               HEAVY SURF              RECORD HEAT 
##                       87                       84                       81 
##                   FREEZE                HEAT WAVE                     COLD 
##                       74                       74                       72 
##              RECORD COLD                      ICE  THUNDERSTORM WINDS HAIL 
##                       64                       61                       61 
##      TROPICAL DEPRESSION                    SLEET         UNSEASONABLY DRY 
##                       60                       59                       56 
##                    FROST              GUSTY WINDS      THUNDERSTORM WINDSS 
##                       53                       53                       51 
##       MARINE STRONG WIND                    OTHER               SMALL HAIL 
##                       48                       48                       47 
##                   FUNNEL             FREEZING FOG             THUNDERSTORM 
##                       46                       45                       45 
##       Temperature record          TSTM WIND (G45)         Coastal Flooding 
##                       43                       39                       38 
##              WATERSPOUTS    MONTHLY PRECIPITATION                    WINDS 
##                       37                       36                       36 
##                  (Other) 
##                     2940
summary(stormData$CROPDMG)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   1.527   0.000 990.000
summary(stormData$PROPDMG)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    0.00   12.06    0.50 5000.00
summary(stormData$INJURIES)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##    0.0000    0.0000    0.0000    0.1557    0.0000 1700.0000
summary(stormData$FATALITIES)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   0.0000   0.0000   0.0000   0.0168   0.0000 583.0000
summary(stormData$PROPDMGEXP)
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330
summary(stormData$CROPDMGEXP)
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994

Good news is there are no NA values to deal with here! We can see, though, that there are a lot of “0” values in all the economic and health outcome variables. This indicates that the majority of the storms in this data set are relatively non-damaging.

There are some confusing symbols in the “exp” sections however. For the purposes of this analysis, I will exclude infrequent and ambiguous signs, namely “-”, “?”, and “+” (i.e. count them as 0s). I will count “h” or “H” as a 2s exponent, “B” “k” or “K” as a 3s exponent, “m” or “M” as a 6s exponent, and “b” or “B” as a 9s exponent (this is for hundreds, thousands, etc.). I’ll do this by defining a simple function and using apply on the the EXP columns

getExp<-function(char){
    if (char=="h" | char=="H"){
        char = 2
    }
    else if (char=="k" | char=="K"){
        char = 3
    }
    else if (char=="m" | char=="M"){
        char = 6
    }
    else if (char=="b" | char=="B"){
        char = 9
    }
    else if (char=="-" | char == "?" | char == "+" | char == ""){
        char = 0
    }
    else if (class(char)=="numeric" | class(char)=="integer"){
        char
    }
    else{
        #just in case there's odd/empty strings that didn't show up in summary
        char = 0
    }
    char
}
stormData$CROPDMGEXP<-sapply(stormData$CROPDMGEXP, getExp)
stormData$PROPDMGEXP<-sapply(stormData$PROPDMGEXP, getExp)

Now I’m going to make a new column for the true property/crop damages. To do this, I’ll use dplyr to “mutate” on a couple new columns that are the products of taking CROP/PROPDMG to the EXP number

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
stormData_tbl<-tbl_df(stormData)
stormData_tbl<-stormData_tbl%>%
    mutate(PROPDMG_ACTUAL = (PROPDMG)*(10^PROPDMGEXP))%>%
    mutate(CROPDMG_ACTUAL = (CROPDMG)*(10^PROPDMGEXP))

Results

The goal of this section is to answer which storm types cause the most economic and human health damage, respectively. I will pay attention only to storms that cause a damage that is quantified>0. I’ll start out by looking at property and crop damage individually, then sum them to get a summary of total economic damage.

propDmg<-stormData_tbl%>%
    #select the data we care about
    select(EVTYPE, PROPDMG_ACTUAL)%>%
    filter(PROPDMG_ACTUAL>0)%>%
    #group by event and sum the total damage value over events
    group_by(EVTYPE)%>%
    summarize(totalPropDmg = sum(PROPDMG_ACTUAL))%>%
    arrange(desc(totalPropDmg))
head(propDmg, n=10)
## Source: local data frame [10 x 2]
## 
##               EVTYPE totalPropDmg
## 1              FLOOD 144657709807
## 2  HURRICANE/TYPHOON  69305840000
## 3            TORNADO  56937160779
## 4        STORM SURGE  43323536000
## 5        FLASH FLOOD  16140812067
## 6               HAIL  15732267543
## 7          HURRICANE  11868319010
## 8     TROPICAL STORM   7703890550
## 9       WINTER STORM   6688497251
## 10         HIGH WIND   5270046295
cropDmg<-stormData_tbl%>%
    #do the same as above
    select(EVTYPE, CROPDMG_ACTUAL)%>%
    filter(CROPDMG_ACTUAL>0)%>%
    #group by event and sum the total damage value over events
    group_by(EVTYPE)%>%
    summarize(totalCropDmg = sum(CROPDMG_ACTUAL))%>%
    arrange(desc(totalCropDmg))
head(cropDmg, n=10)
## Source: local data frame [10 x 2]
## 
##                       EVTYPE totalCropDmg
## 1                  HURRICANE 802881916053
## 2          HURRICANE/TYPHOON 732768451330
## 3                      FLOOD  87251978368
## 4                FLASH FLOOD  38822140319
## 5                    TORNADO  28269878512
## 6                       HAIL  15314323923
## 7  HURRICANE OPAL/HIGH WINDS  10000000000
## 8                  TSTM WIND   7684657862
## 9                  HIGH WIND   7174066263
## 10                  WILDFIRE   7173809478
econDmg<-stormData_tbl%>%
    #do the same as above
    select(EVTYPE, CROPDMG_ACTUAL, PROPDMG_ACTUAL)%>%
    filter(CROPDMG_ACTUAL>0 | PROPDMG_ACTUAL>0)%>%
    #group by event and sum the total damage value over events
    group_by(EVTYPE)%>%
    summarize(totalEconDmg = sum(CROPDMG_ACTUAL, PROPDMG_ACTUAL))%>%
    arrange(desc(totalEconDmg))
head(econDmg, n=10)
## Source: local data frame [10 x 2]
## 
##               EVTYPE totalEconDmg
## 1          HURRICANE 814750235063
## 2  HURRICANE/TYPHOON 802074291330
## 3              FLOOD 231909688175
## 4            TORNADO  85207039291
## 5        FLASH FLOOD  54962952386
## 6        STORM SURGE  43328536000
## 7               HAIL  31046591466
## 8          HIGH WIND  12444112558
## 9          TSTM WIND  12169586357
## 10          WILDFIRE  11938923478
library(ggplot2)
#we'll compare the 10 most damaging event types
econDmg10<-head(econDmg, n=10)
ggplot(econDmg10, aes(x=EVTYPE, y = totalEconDmg))+geom_bar(stat="identity")

Above we can see the top 10 most costly weather for crop and property damage respectively, then the top 10 with the most cumulative damage. Finally, the barplot plots all the significantly economically damaging events.

I refer the reader to the executive summary for a synopsis of this data. It is also important to note that we have not explored the distribution of the damage by event (i.e. which events cause the most severe damage in one go). This would be useful but a bit beyond the scope of this analysis. This data just gives us a good idea of the worst events.

Now let’s use dplyr again to get a summary of the human health outcomes in injuries and fatalities of these severe weather events

injuryDmg<-stormData_tbl%>%
    select(EVTYPE, INJURIES)%>%
    filter(INJURIES>0)%>%
    #group by event and sum the total damage value over events
    group_by(EVTYPE)%>%
    summarize(totalInjuryDmg = sum(INJURIES))%>%
    arrange(desc(totalInjuryDmg))
head(injuryDmg, n=10)
## Source: local data frame [10 x 2]
## 
##               EVTYPE totalInjuryDmg
## 1            TORNADO          91346
## 2          TSTM WIND           6957
## 3              FLOOD           6789
## 4     EXCESSIVE HEAT           6525
## 5          LIGHTNING           5230
## 6               HEAT           2100
## 7          ICE STORM           1975
## 8        FLASH FLOOD           1777
## 9  THUNDERSTORM WIND           1488
## 10              HAIL           1361
fatalDmg<-stormData_tbl%>%
    select(EVTYPE, FATALITIES)%>%
    filter(FATALITIES>0)%>%
    #group by event and sum the total damage value over events
    group_by(EVTYPE)%>%
    summarize(totalFatalDmg = sum(FATALITIES))%>%
    arrange(desc(totalFatalDmg))
head(fatalDmg, n=10)
## Source: local data frame [10 x 2]
## 
##            EVTYPE totalFatalDmg
## 1         TORNADO          5633
## 2  EXCESSIVE HEAT          1903
## 3     FLASH FLOOD           978
## 4            HEAT           937
## 5       LIGHTNING           816
## 6       TSTM WIND           504
## 7           FLOOD           470
## 8     RIP CURRENT           368
## 9       HIGH WIND           248
## 10      AVALANCHE           224
healthDmg<-stormData_tbl%>%
    #do the same as above
    select(EVTYPE, INJURIES, FATALITIES)%>%
    filter(INJURIES>0 | FATALITIES>0)%>%
    #group by event and sum the total damage value over events
    group_by(EVTYPE)%>%
    summarize(totalhealthDmg = sum(INJURIES, FATALITIES))%>%
    arrange(desc(totalhealthDmg))
head(healthDmg, n=10)
## Source: local data frame [10 x 2]
## 
##               EVTYPE totalhealthDmg
## 1            TORNADO          96979
## 2     EXCESSIVE HEAT           8428
## 3          TSTM WIND           7461
## 4              FLOOD           7259
## 5          LIGHTNING           6046
## 6               HEAT           3037
## 7        FLASH FLOOD           2755
## 8          ICE STORM           2064
## 9  THUNDERSTORM WIND           1621
## 10      WINTER STORM           1527
library(ggplot2)
#we'll compare the 10 most damaging event types
healthDmg10<-head(healthDmg, n=10)
ggplot(healthDmg10, aes(x=EVTYPE, y = totalhealthDmg))+geom_bar(stat="identity")

The data here is produced identically to the economic data. Again, I refer the reader to the synopsis for an executive summary of this data.