Synopsis

Given data provided by the U.S. National Oceanic and Atmospheric Administration, this paper attempts to determine which disasters are the most costly in terms of human life and economical impact. After parsing the data (as outlined below), it is our conclusion that tornadoes are the most damaging in terms of human casualties, while floods are the most expensive in terms of monetary damage (to property and crops).

Data Processing

Read Data

First, download and read the data from the website:

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, "NOAAStormData.csv.bz2")
sd <- read.csv("NOAAStormData.csv.bz2")

Examine the dataframe:

head(sd)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

For the purpose of this analysis, we only require the columns EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, and CROPDMGEXP.

Health Impact

Now, we can start to examine how natural disasters impact human lives in terms of fatalities and injuries.

First, we determine the sum of fatalities caused by all natural disasters (EVTYPE):

sdFatalities <- aggregate(sd$FATALITIES, by = list(Category = sd$EVTYPE), FUN = sum)
head(sdFatalities[order(-sdFatalities[,2]), ])
##           Category    x
## 834        TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153    FLASH FLOOD  978
## 275           HEAT  937
## 464      LIGHTNING  816
## 856      TSTM WIND  504

And we do the same with injuries:

sdInjuries <- aggregate(sd$INJURIES, by = list(Category = sd$EVTYPE), FUN = sum)
head(sdInjuries[order(-sdInjuries[, 2]), ])
##           Category     x
## 834        TORNADO 91346
## 856      TSTM WIND  6957
## 170          FLOOD  6789
## 130 EXCESSIVE HEAT  6525
## 464      LIGHTNING  5230
## 275           HEAT  2100

We will then add both tables together to find the total number of casualties (fatalities and injuries) caused by each natural disaster. To keep things simple, we will also save this dataframe in the order of highest total to lowest.

sdCasualties <- merge(sdFatalities, sdInjuries, by = "Category")
sdCasualties$Total <- sdCasualties$x.x + sdCasualties$x.y
sdCasualties <- sdCasualties[order(-sdCasualties[, 4]), ]
head(sdCasualties, 10)
##              Category  x.x   x.y Total
## 834           TORNADO 5633 91346 96979
## 130    EXCESSIVE HEAT 1903  6525  8428
## 856         TSTM WIND  504  6957  7461
## 170             FLOOD  470  6789  7259
## 464         LIGHTNING  816  5230  6046
## 275              HEAT  937  2100  3037
## 153       FLASH FLOOD  978  1777  2755
## 427         ICE STORM   89  1975  2064
## 760 THUNDERSTORM WIND  133  1488  1621
## 972      WINTER STORM  206  1321  1527

Economic Impact

Parsing the economic impact of natural disasters requires a little more work. The columns “PROPDMGEXP” and “CROPDMGEXP” signify the exponents needed to multiply with the values of “PROPDMG” and “CROPDMG”. According to this link, the values can be translated as follows: > H,h = 100 > K,k = 1,000 > M,m = 1,000,000 > B,b = 1,000,000,000 > + = 1 > - = 0 > ? = 0 > Blank/Empty = 0 > Numeric (0…8) = 10

So, first let us translate the symbols into actual values. Then we can essentially replace the original symbols of PROPDMGEXP and CROPDMGEXP with their translated exponent values to calculate the actual values of PROPDMG and CROPDMG:

symbol <- sort(unique(as.character(sd$PROPDMGEXP)))
symbol
##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
exponent <- c(0,0,0,1,10,10,10,10,10,10,10,10,10,10^9,10^2,10^2,10^3,10^6,10^6)
translation <- data.frame(symbol, exponent)
translation
##    symbol exponent
## 1            0e+00
## 2       -    0e+00
## 3       ?    0e+00
## 4       +    1e+00
## 5       0    1e+01
## 6       1    1e+01
## 7       2    1e+01
## 8       3    1e+01
## 9       4    1e+01
## 10      5    1e+01
## 11      6    1e+01
## 12      7    1e+01
## 13      8    1e+01
## 14      B    1e+09
## 15      h    1e+02
## 16      H    1e+02
## 17      K    1e+03
## 18      m    1e+06
## 19      M    1e+06
sd$PropMult <- translation$exponent[match(sd$PROPDMGEXP, translation$symbol)]
sd$CropMult <- translation$exponent[match(sd$CROPDMGEXP, translation$symbol)]
sd$PROPDMGACT <- sd$PROPDMG * sd$PropMult
sd$CROPDMGACT <- sd$CROPDMG * sd$CropMult
sdPropDMG <- aggregate(sd$PROPDMGACT, by = list(Category = sd$EVTYPE), FUN = sum)
sdCropDMG <- aggregate(sd$CROPDMGACT, by = list(Category = sd$EVTYPE), FUN = sum)
head(sdPropDMG[order(-sdPropDMG[, 2]), ])
##              Category            x
## 170             FLOOD 144657709800
## 411 HURRICANE/TYPHOON  69305840000
## 834           TORNADO  56937162897
## 670       STORM SURGE  43323536000
## 153       FLASH FLOOD  16140815011
## 244              HAIL  15732269877
head(sdCropDMG[order(-sdCropDMG[, 2]), ])
##              Category           x
## 95            DROUGHT 13972566000
## 170             FLOOD  5661968450
## 590       RIVER FLOOD  5029459000
## 427         ICE STORM  5022113500
## 402         HURRICANE  2741910000
## 411 HURRICANE/TYPHOON  2607872800

From here, we repeat what we did with the casualties to determine the total cost of each type of natural disaster:

sdTotalDamage <- merge(sdPropDMG, sdCropDMG, by = "Category")
sdTotalDamage$Total <- sdTotalDamage$x.x + sdTotalDamage$x.y
sdTotalDamage <- sdTotalDamage[order(-sdTotalDamage[, 4]), ]
head(sdTotalDamage, 10)
##              Category          x.x         x.y        Total
## 170             FLOOD 144657709800  5661968450 150319678250
## 411 HURRICANE/TYPHOON  69305840000  2607872800  71913712800
## 834           TORNADO  56937162897   414954710  57352117607
## 670       STORM SURGE  43323536000        5000  43323541000
## 153       FLASH FLOOD  16140815011  1421317100  17562132111
## 95            DROUGHT   1046106000 13972566000  15018672000
## 402         HURRICANE  11868319010  2741910000  14610229010
## 590       RIVER FLOOD   5118945500  5029459000  10148404500
## 427         ICE STORM   3944928310  5022113500   8967041810
## 848    TROPICAL STORM   7703890550   678346000   8382236550

Results

Here are the top 10 disasters with the highest total casualties:

library(ggplot2)

g <- ggplot(sdCasualties[1:10, ], aes(x = reorder(Category, -Total), y = Total)) +
     geom_bar(stat = "identity") +
     theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
     ggtitle("Top 10 Events with Highest Casualties") + 
     labs(x = "Event Type", y = "Total Casualties")
g

As we can see, the most costly disaster is tornadoes by a wide margin. Surprisingly, excessive heat comes in second.

In regards to economic cost:

g <- ggplot(sdTotalDamage[1:10, ], aes(x = reorder(Category, -Total), y = Total)) +        geom_bar(stat = "identity") + 
     theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
     ggtitle("Top 10 Events with Highest Economical Cost") +
     labs(x = "Event Type", y = "Total Cost ($USD)")
g

In terms of economic impact, tornadoes come in third while floods are the most damaging.