Synopsis

This report analyzes data from the U.S. National Oceanic and Atmospheric Administration’s storm database in order to determine which sorts of events are the most harmful in terms of human health and economic impact. It concludes that the most harmful events for human health are tornados, heat, and flooding. It concludes that the most costly events are hurricanes, tornados, and flooding.

Data Processing

Data for this analysis is retrieved from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. Here are links further describing this data set:

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","Data/StormData.csv.bz2",method="curl")
sd <- read.csv(bzfile("Data/StormData.csv.bz2"))

# make a copy of storm data containing only the fields we will work with in the analysis
pd <- data.frame(sd[,c("EVTYPE","FATALITIES", "INJURIES", "PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")])

# discard the original data frame to save some memory
# sd <- NULL

The economic damage is encoded with a base value and an encoded exponent. This will be converted to simple numeric types to simplify the quantification impacts from the events. We will also sum the property and crop damages for a measure of total economic impact. We sum fatalities and injuries for total health impacts.

transform_exponents <- function(level)
{
    level[level==""] <- "0"
    level[level=="-"] <- "0"
    level[level=="?"] <- "0"
    level[level=="+"] <- "0"
    level[level=="B"] <- "9"
    level[level=="h"] <- "2"
    level[level=="H"] <- "2"
    level[level=="K"] <- "3"
    level[level=="k"] <- "3"
    level[level=="m"] <- "6"
    level[level=="M"] <- "6"
    level
}

# convert all the exponents to numeric strings "0"
levels(pd$PROPDMGEXP) <- transform_exponents(levels(pd$PROPDMGEXP))
levels(pd$CROPDMGEXP) <- transform_exponents(levels(pd$CROPDMGEXP))

# convert the exponent to a number
pd$PROPDMGEXP <- as.numeric(levels(pd$PROPDMGEXP))[pd$PROPDMGEXP]

# Transform the damage by multiplying it by the specified power.  We can then work with the PROPDMG directly
pd$PROPDMG <- pd$PROPDMG*10^pd$PROPDMGEXP

# get a numeric crop damage value also
pd$CROPDMGEXP <- as.numeric(levels(pd$CROPDMGEXP))[pd$CROPDMGEXP]
pd$CROPDMG <- pd$CROPDMG*10^pd$CROPDMGEXP

# create a total damage variable
pd$TOTDMG <- pd$PROPDMG + pd$CROPDMG
summary(pd$TOTDMG)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.00e+00 0.00e+00 0.00e+00 5.29e+05 1.00e+03 1.15e+11
# create a total casualities (fatalities + injuries)
pd$TOTCAS <- pd$FATALITIES + pd$INJURIES
summary(pd$TOTCAS)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##    0.0000    0.0000    0.0000    0.1725    0.0000 1742.0000

Results

There are over 900 types of events in the database, most of which are not pertinent to the question of which types of atmospheric events cause the most health and economic impacts. We will analyze only the top 10 types of events for health impacts and the 10 ten types of events for economic impact.

library(dplyr)

# summarize the total casualities and economic damage by event type
pdsumcas <- pd %>% group_by(EVTYPE) %>% summarize(SUMTOTCAS=sum(TOTCAS)) %>% arrange(desc(SUMTOTCAS))
pdsumdmg <- pd %>% group_by(EVTYPE) %>% summarize(SUMTOTDMG=sum(TOTDMG)) %>% arrange(desc(SUMTOTDMG))

# get the top 10 events for casualties and economic damage
pdsumcas10 <- pdsumcas[1:10,]
pdsumdmg10 <- pdsumdmg[1:10,]

# order the factors so that the plots come out ordered on the event type magnitude
pdsumcas10$EVTYPE <- factor(pdsumcas10$EVTYPE, levels = pdsumcas10$EVTYPE, ordered=TRUE)
pdsumdmg10$EVTYPE <- factor(pdsumdmg10$EVTYPE, levels = pdsumdmg10$EVTYPE, ordered=TRUE)

Now that we have the data summarized by top 10 total damages and casualties we can construct plots of the results.

# Create bar plots for casualities and damage
library(ggplot2)

ggplot(pdsumcas10, aes(EVTYPE, y=SUMTOTCAS)) + geom_bar(stat="identity") + coord_flip() + 
    labs(title="Top 10 Event Types Causing Injuries and Fatalities") +
    labs(y="Sum of Injuries and Fatalities") + 
    labs(x="Event Type")

The chart demonstrates that tornadoes are the most dangerous atmospheric events in the United States by far. The next most dangerous events are heat and flooding.

ggplot(pdsumdmg10, aes(EVTYPE, y=SUMTOTDMG)) + geom_bar(stat="identity") + coord_flip() + 
    labs(title="Top 10 Event Types Cuasing Economic Damage") +
    labs(y="Sum of Property and Crop Damages $") + 
    labs(x="Event Type")

The chart displays floods as the most economically damaging weather event, but this is likely due to a single erroneous data entry point (see the supplementary analysis). So this report concludes that Hurricanes and Typhoons are the most damaging events. This is particularrly true if we sum the “HURRICANE/TYPHOON” category with the other related categories of “STORM SURGE” and “HURRICANE.” The next most damaging events are Tornados and Floods (once the erroneous entry is corrected).

Supplementary Information

Exploratory Data Analysis

# Let's check and see how the exponents are encoded
summary(sd$PROPDMGEXP)
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330
summary(sd$CROPDMGEXP)
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994
# Check the values in the storm types
summary(sd$EVTYPE)
##                     HAIL                TSTM WIND        THUNDERSTORM WIND 
##                   288661                   219940                    82563 
##                  TORNADO              FLASH FLOOD                    FLOOD 
##                    60652                    54277                    25326 
##       THUNDERSTORM WINDS                HIGH WIND                LIGHTNING 
##                    20843                    20212                    15754 
##               HEAVY SNOW               HEAVY RAIN             WINTER STORM 
##                    15708                    11723                    11433 
##           WINTER WEATHER             FUNNEL CLOUD         MARINE TSTM WIND 
##                     7026                     6839                     6175 
## MARINE THUNDERSTORM WIND               WATERSPOUT              STRONG WIND 
##                     5812                     3796                     3566 
##     URBAN/SML STREAM FLD                 WILDFIRE                 BLIZZARD 
##                     3392                     2761                     2719 
##                  DROUGHT                ICE STORM           EXCESSIVE HEAT 
##                     2488                     2006                     1678 
##               HIGH WINDS         WILD/FOREST FIRE             FROST/FREEZE 
##                     1533                     1457                     1342 
##                DENSE FOG       WINTER WEATHER/MIX           TSTM WIND/HAIL 
##                     1293                     1104                     1028 
##  EXTREME COLD/WIND CHILL                     HEAT                HIGH SURF 
##                     1002                      767                      725 
##           TROPICAL STORM           FLASH FLOODING             EXTREME COLD 
##                      690                      682                      655 
##            COASTAL FLOOD         LAKE-EFFECT SNOW        FLOOD/FLASH FLOOD 
##                      650                      636                      624 
##                LANDSLIDE                     SNOW          COLD/WIND CHILL 
##                      600                      587                      539 
##                      FOG              RIP CURRENT              MARINE HAIL 
##                      538                      470                      442 
##               DUST STORM                AVALANCHE                     WIND 
##                      427                      386                      340 
##             RIP CURRENTS              STORM SURGE            FREEZING RAIN 
##                      304                      261                      250 
##              URBAN FLOOD     HEAVY SURF/HIGH SURF        EXTREME WINDCHILL 
##                      249                      228                      204 
##             STRONG WINDS           DRY MICROBURST    ASTRONOMICAL LOW TIDE 
##                      196                      186                      174 
##                HURRICANE              RIVER FLOOD               LIGHT SNOW 
##                      174                      173                      154 
##         STORM SURGE/TIDE            RECORD WARMTH         COASTAL FLOODING 
##                      148                      146                      143 
##               DUST DEVIL         MARINE HIGH WIND        UNSEASONABLY WARM 
##                      141                      135                      126 
##                 FLOODING   ASTRONOMICAL HIGH TIDE        MODERATE SNOWFALL 
##                      120                      103                      101 
##           URBAN FLOODING               WINTRY MIX        HURRICANE/TYPHOON 
##                       98                       90                       88 
##            FUNNEL CLOUDS               HEAVY SURF              RECORD HEAT 
##                       87                       84                       81 
##                   FREEZE                HEAT WAVE                     COLD 
##                       74                       74                       72 
##              RECORD COLD                      ICE  THUNDERSTORM WINDS HAIL 
##                       64                       61                       61 
##      TROPICAL DEPRESSION                    SLEET         UNSEASONABLY DRY 
##                       60                       59                       56 
##                    FROST              GUSTY WINDS      THUNDERSTORM WINDSS 
##                       53                       53                       51 
##       MARINE STRONG WIND                    OTHER               SMALL HAIL 
##                       48                       48                       47 
##                   FUNNEL             FREEZING FOG             THUNDERSTORM 
##                       46                       45                       45 
##       Temperature record          TSTM WIND (G45)         Coastal Flooding 
##                       43                       39                       38 
##              WATERSPOUTS    MONTHLY PRECIPITATION                    WINDS 
##                       37                       36                       36 
##                  (Other) 
##                     2940
# check the flood record that is recorded as $115B in damages to see if it is an outlier
pd[pd$PROPDMG==max(pd$PROPDMG),]
##        EVTYPE FATALITIES INJURIES  PROPDMG PROPDMGEXP  CROPDMG CROPDMGEXP
## 605953  FLOOD          0        0 1.15e+11          9 32500000          6
##              TOTDMG TOTCAS
## 605953 115032500000      0
sd[605953,]
##        STATE__         BGN_DATE    BGN_TIME TIME_ZONE COUNTY COUNTYNAME
## 605953       6 1/1/2006 0:00:00 12:00:00 AM       PST     55       NAPA
##        STATE EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI         END_DATE
## 605953    CA  FLOOD         0         COUNTYWIDE 1/1/2006 0:00:00
##           END_TIME COUNTY_END COUNTYENDN END_RANGE END_AZI END_LOCATI
## 605953 07:00:00 AM          0         NA         0         COUNTYWIDE
##        LENGTH WIDTH  F MAG FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 605953      0     0 NA   0          0        0     115          B    32.5
##        CROPDMGEXP WFO          STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 605953          M MTR CALIFORNIA, Western               3828     12218
##        LATITUDE_E LONGITUDE_
## 605953       3828      12218
##                                                                                                                                                                                                                                                                                                                                                                                               REMARKS
## 605953 Major flooding continued into the early hours of January 1st, before the Napa River finally fell below flood stage and the water receeded. Flooding was severe in Downtown Napa from the Napa Creek and the City and Parks Department was hit with $6 million in damage alone. The City of Napa had 600 homes with moderate damage, 150 damaged businesses with costs of at least $70 million.
##        REFNUM
## 605953 605943

Record 605953 notes in its remarks that one constituent of the damage is $70M. Most likely the total damage is $115M and the PROPDMGEXP is a mistaken entry of that should be M (millions) instead of B (billions).

# check that we don't have an outlier driving the TORNADO event casualties
max(pd$TOTCAS)
## [1] 1742

Total Tornado casualties is 90k+ and the largest single entry is 1742 casualties. This indicates that there are many entries driving the total casualities for the event.

System Environment for this Analysis

sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_1.0.1 dplyr_0.4.1  
## 
## loaded via a namespace (and not attached):
##  [1] assertthat_0.1   codetools_0.2-11 colorspace_1.2-6 DBI_0.3.1       
##  [5] digest_0.6.8     evaluate_0.7     formatR_1.2      grid_3.1.2      
##  [9] gtable_0.1.2     htmltools_0.2.6  knitr_1.10       labeling_0.3    
## [13] lazyeval_0.1.10  magrittr_1.5     MASS_7.3-40      munsell_0.4.2   
## [17] parallel_3.1.2   plyr_1.8.2       proto_0.3-10     Rcpp_0.11.5     
## [21] reshape2_1.4.1   rmarkdown_0.5.1  scales_0.2.4     stringr_0.6.2   
## [25] tools_3.1.2      yaml_2.1.13