Given data provided by the U.S. National Oceanic and Atmospheric Administration, this paper attempts to determine which disasters are the most costly in terms of human life and economical impact. After parsing the data (as outlined below), it is our conclusion that tornadoes are the most damaging in terms of human casualties, while floods are the most expensive in terms of monetary damage (to property and crops).
First, download and read the data from the website:
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, "NOAAStormData.csv.bz2")
sd <- read.csv("NOAAStormData.csv.bz2")
Examine the dataframe:
head(sd)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
For the purpose of this analysis, we only require the columns EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, and CROPDMGEXP.
Now, we can start to examine how natural disasters impact human lives in terms of fatalities and injuries.
First, we determine the sum of fatalities caused by all natural disasters (EVTYPE):
sdFatalities <- aggregate(sd$FATALITIES, by = list(Category = sd$EVTYPE), FUN = sum)
head(sdFatalities[order(-sdFatalities[,2]), ])
## Category x
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
And we do the same with injuries:
sdInjuries <- aggregate(sd$INJURIES, by = list(Category = sd$EVTYPE), FUN = sum)
head(sdInjuries[order(-sdInjuries[, 2]), ])
## Category x
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
We will then add both tables together to find the total number of casualties (fatalities and injuries) caused by each natural disaster. To keep things simple, we will also save this dataframe in the order of highest total to lowest.
sdCasualties <- merge(sdFatalities, sdInjuries, by = "Category")
sdCasualties$Total <- sdCasualties$x.x + sdCasualties$x.y
sdCasualties <- sdCasualties[order(-sdCasualties[, 4]), ]
head(sdCasualties, 10)
## Category x.x x.y Total
## 834 TORNADO 5633 91346 96979
## 130 EXCESSIVE HEAT 1903 6525 8428
## 856 TSTM WIND 504 6957 7461
## 170 FLOOD 470 6789 7259
## 464 LIGHTNING 816 5230 6046
## 275 HEAT 937 2100 3037
## 153 FLASH FLOOD 978 1777 2755
## 427 ICE STORM 89 1975 2064
## 760 THUNDERSTORM WIND 133 1488 1621
## 972 WINTER STORM 206 1321 1527
Parsing the economic impact of natural disasters requires a little more work. The columns “PROPDMGEXP” and “CROPDMGEXP” signify the exponents needed to multiply with the values of “PROPDMG” and “CROPDMG”. According to this link, the values can be translated as follows: > H,h = 100 > K,k = 1,000 > M,m = 1,000,000 > B,b = 1,000,000,000 > + = 1 > - = 0 > ? = 0 > Blank/Empty = 0 > Numeric (0…8) = 10
So, first let us translate the symbols into actual values. Then we can essentially replace the original symbols of PROPDMGEXP and CROPDMGEXP with their translated exponent values to calculate the actual values of PROPDMG and CROPDMG:
symbol <- sort(unique(as.character(sd$PROPDMGEXP)))
symbol
## [1] "" "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
exponent <- c(0,0,0,1,10,10,10,10,10,10,10,10,10,10^9,10^2,10^2,10^3,10^6,10^6)
translation <- data.frame(symbol, exponent)
translation
## symbol exponent
## 1 0e+00
## 2 - 0e+00
## 3 ? 0e+00
## 4 + 1e+00
## 5 0 1e+01
## 6 1 1e+01
## 7 2 1e+01
## 8 3 1e+01
## 9 4 1e+01
## 10 5 1e+01
## 11 6 1e+01
## 12 7 1e+01
## 13 8 1e+01
## 14 B 1e+09
## 15 h 1e+02
## 16 H 1e+02
## 17 K 1e+03
## 18 m 1e+06
## 19 M 1e+06
sd$PropMult <- translation$exponent[match(sd$PROPDMGEXP, translation$symbol)]
sd$CropMult <- translation$exponent[match(sd$CROPDMGEXP, translation$symbol)]
sd$PROPDMGACT <- sd$PROPDMG * sd$PropMult
sd$CROPDMGACT <- sd$CROPDMG * sd$CropMult
sdPropDMG <- aggregate(sd$PROPDMGACT, by = list(Category = sd$EVTYPE), FUN = sum)
sdCropDMG <- aggregate(sd$CROPDMGACT, by = list(Category = sd$EVTYPE), FUN = sum)
head(sdPropDMG[order(-sdPropDMG[, 2]), ])
## Category x
## 170 FLOOD 144657709800
## 411 HURRICANE/TYPHOON 69305840000
## 834 TORNADO 56937162897
## 670 STORM SURGE 43323536000
## 153 FLASH FLOOD 16140815011
## 244 HAIL 15732269877
head(sdCropDMG[order(-sdCropDMG[, 2]), ])
## Category x
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 590 RIVER FLOOD 5029459000
## 427 ICE STORM 5022113500
## 402 HURRICANE 2741910000
## 411 HURRICANE/TYPHOON 2607872800
From here, we repeat what we did with the casualties to determine the total cost of each type of natural disaster:
sdTotalDamage <- merge(sdPropDMG, sdCropDMG, by = "Category")
sdTotalDamage$Total <- sdTotalDamage$x.x + sdTotalDamage$x.y
sdTotalDamage <- sdTotalDamage[order(-sdTotalDamage[, 4]), ]
head(sdTotalDamage, 10)
## Category x.x x.y Total
## 170 FLOOD 144657709800 5661968450 150319678250
## 411 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 834 TORNADO 56937162897 414954710 57352117607
## 670 STORM SURGE 43323536000 5000 43323541000
## 153 FLASH FLOOD 16140815011 1421317100 17562132111
## 95 DROUGHT 1046106000 13972566000 15018672000
## 402 HURRICANE 11868319010 2741910000 14610229010
## 590 RIVER FLOOD 5118945500 5029459000 10148404500
## 427 ICE STORM 3944928310 5022113500 8967041810
## 848 TROPICAL STORM 7703890550 678346000 8382236550
Here are the top 10 disasters with the highest total casualties:
library(ggplot2)
g <- ggplot(sdCasualties[1:10, ], aes(x = reorder(Category, -Total), y = Total)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
ggtitle("Top 10 Events with Highest Casualties") +
labs(x = "Event Type", y = "Total Casualties")
g
As we can see, the most costly disaster is tornadoes by a wide margin. Surprisingly, excessive heat comes in second.
In regards to economic cost:
g <- ggplot(sdTotalDamage[1:10, ], aes(x = reorder(Category, -Total), y = Total)) + geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
ggtitle("Top 10 Events with Highest Economical Cost") +
labs(x = "Event Type", y = "Total Cost ($USD)")
g
In terms of economic impact, tornadoes come in third while floods are the most damaging.