This report is to examine the NOAA Storm Database and identify the following:
This analysis uses the EVTYPE, FATALITIES, INJURIES, PROPDMG, CROPDMG, PROPDMGEXP and CRODMGEXP to classify damage by event in terms of population health and economic damage.
In the analysis, we find that Tornadoes, excessive heat, marine thunderstorm winds, flood, lighting to be the most damaging to population health, in that order. We find floods, typhoons, storm surges, droughts and hurricanes to be the most damaging economically, in that order.
These results have a loose correlation with a study conducted by University of South Carolina researchers: http://www.prb.org/Publications/Articles/2011/disasters-by-type.aspx - although the variable names and disaster type groupings are different, similar events show up within our top 10 and theirs.
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "Stormdata.csv.bz2")
StormData <- read.csv("Stormdata.csv.bz2")
StormData <- StormData %>% group_by(EVTYPE)
#check if data loaded properly
head(StormData)
## Source: local data frame [6 x 37]
## Groups: EVTYPE [1]
##
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## <dbl> <fctr> <fctr> <fctr> <dbl> <fctr> <fctr>
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## # ... with 30 more variables: EVTYPE <fctr>, BGN_RANGE <dbl>,
## # BGN_AZI <fctr>, BGN_LOCATI <fctr>, END_DATE <fctr>, END_TIME <fctr>,
## # COUNTY_END <dbl>, COUNTYENDN <lgl>, END_RANGE <dbl>, END_AZI <fctr>,
## # END_LOCATI <fctr>, LENGTH <dbl>, WIDTH <dbl>, F <int>, MAG <dbl>,
## # FATALITIES <dbl>, INJURIES <dbl>, PROPDMG <dbl>, PROPDMGEXP <fctr>,
## # CROPDMG <dbl>, CROPDMGEXP <fctr>, WFO <fctr>, STATEOFFIC <fctr>,
## # ZONENAMES <fctr>, LATITUDE <dbl>, LONGITUDE <dbl>, LATITUDE_E <dbl>,
## # LONGITUDE_ <dbl>, REMARKS <fctr>, REFNUM <dbl>
The dataset contains two variables pertaining to human population damage: injuries and fatalities. Since we need to collapse both these factors to one number for ranking, I’ve chosen to weight each INJURY as 1/2 of a FATALITY. The tentative logic here is that small injuries (like scratches) would not have been reported, while things like broken bones, which are serious damage, are most likely to show up here.
#isolate the variables we need
HumanDamage <- subset(StormData, select = c(EVTYPE, INJURIES, FATALITIES))
#group, then aggregate data
HumanDamage <- group_by(HumanDamage, EVTYPE)
HumanDamage<- summarise(HumanDamage, INJURIES = sum(INJURIES), FATALITIES = sum(FATALITIES))
#drop cases of zero injuries or fatalities
HumanDamage<- subset(HumanDamage, INJURIES > 0 | FATALITIES > 0)
HumanDamage$DAMAGE <- (HumanDamage$FATALITIES + (HumanDamage$INJURIES / 2))
#sort descending
HumanDamage <- HumanDamage[order(-HumanDamage$DAMAGE),]
There are four variables pertaining to economic damage: CROPGMG, PROPDMG, and their relative …EXP variables. The …EXP variables are exponents of the values given, and are represented in letters (K = 1000, M=1000000, B=1000000000). There’s no need to weight things here, but a bit of conversion is required before we can calculate the final damage done.
#isolate the variables we need
EconDamage <- subset(StormData, select = c(EVTYPE, CROPDMG, CROPDMGEXP, PROPDMG, PROPDMGEXP))
#this dataset contains NA values that will need to be converted to 0 for the rest of this to work
EconDamage$CROPDMGEXP <- as.character(EconDamage$CROPDMGEXP)
EconDamage$PROPDMGEXP <- as.character(EconDamage$PROPDMGEXP)
#the ...EXP variables indicate the magnitude of the value in the associated column. They're represented
#on a scale of 1000 (K) to 1000000 (M) to 1000000000 (B). Convert these to power of exponents.
EconDamage[(EconDamage$PROPDMGEXP == ""), ]$PROPDMGEXP <- 0
EconDamage[(EconDamage$PROPDMGEXP == "K" | EconDamage$PROPDMGEXP == "k"), ]$PROPDMGEXP <- 3
EconDamage[(EconDamage$PROPDMGEXP == "M" | EconDamage$PROPDMGEXP == "m"), ]$PROPDMGEXP <- 6
EconDamage[(EconDamage$PROPDMGEXP == "B" | EconDamage$PROPDMGEXP == "b"), ]$PROPDMGEXP <- 9
EconDamage[(EconDamage$CROPDMGEXP == ""), ]$CROPDMGEXP <- 0
EconDamage[(EconDamage$CROPDMGEXP == "K" | EconDamage$CROPDMGEXP == "k"), ]$CROPDMGEXP <- 3
EconDamage[(EconDamage$CROPDMGEXP == "M" | EconDamage$CROPDMGEXP == "m"), ]$CROPDMGEXP <- 6
EconDamage[(EconDamage$CROPDMGEXP == "B" | EconDamage$CROPDMGEXP == "b"), ]$CROPDMGEXP <- 9
#compute the damage thus inflicted by multiplying property and crops damage by 10 ^ power of the associared exponent
EconDamage$PROPDMG <- EconDamage$PROPDMG * 10^as.numeric(EconDamage$PROPDMGEXP)
## Warning: NAs introduced by coercion
EconDamage$CROPDMG <- EconDamage$CROPDMG * 10^as.numeric(EconDamage$CROPDMGEXP)
## Warning: NAs introduced by coercion
# compute combined economic damage (property damage + crops damage)
EconDamage$DAMAGE <- (EconDamage$PROPDMG + EconDamage$CROPDMG)
#group, then aggregate data
EconDamage <- group_by(EconDamage, EVTYPE)
EconDamage<- summarise(EconDamage, CROPDMG = sum(CROPDMG), PROPDMG = sum(PROPDMG), DAMAGE = sum(DAMAGE))
#drop cases of zero damage
EconDamage<- subset(EconDamage, DAMAGE > 0)
#sort descending
EconDamage <- EconDamage[order(-EconDamage$DAMAGE),]
Which events cause the most amount of impact to population health?
#print top 10 most damaging events ranked by damage
head(HumanDamage, 10)
## # A tibble: 10 × 4
## EVTYPE INJURIES FATALITIES DAMAGE
## <fctr> <dbl> <dbl> <dbl>
## 1 TORNADO 91346 5633 51306.0
## 2 EXCESSIVE HEAT 6525 1903 5165.5
## 3 TSTM WIND 6957 504 3982.5
## 4 FLOOD 6789 470 3864.5
## 5 LIGHTNING 5230 816 3431.0
## 6 HEAT 2100 937 1987.0
## 7 FLASH FLOOD 1777 978 1866.5
## 8 ICE STORM 1975 89 1076.5
## 9 THUNDERSTORM WIND 1488 133 877.0
## 10 WINTER STORM 1321 206 866.5
Which events have the largest economic consequences?
#print top 10 most damaging events ranked by damage
head(EconDamage, 10)
## # A tibble: 10 × 4
## EVTYPE CROPDMG PROPDMG DAMAGE
## <fctr> <dbl> <dbl> <dbl>
## 1 FLOOD 5661968450 144657709807 150319678257
## 2 HURRICANE/TYPHOON 2607872800 69305840000 71913712800
## 3 STORM SURGE 5000 43323536000 43323541000
## 4 DROUGHT 13972566000 1046106000 15018672000
## 5 HURRICANE 2741910000 11868319010 14610229010
## 6 RIVER FLOOD 5029459000 5118945500 10148404500
## 7 ICE STORM 5022113500 3944927860 8967041360
## 8 TROPICAL STORM 678346000 7703890550 8382236550
## 9 WINTER STORM 26944000 6688497251 6715441251
## 10 WILDFIRE 295472800 4765114000 5060586800
#plot top 5 events by economic damage caused
testplot <- head(EconDamage, 5)
ggplot(testplot, aes(x = EVTYPE, y = DAMAGE)) + geom_bar(stat = "Identity")