This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
We are asking the questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
Synopsis: Tornados have the most harmful in terms of fatalities, with storms causing the most injuries. Floods do the most property damage.
We first download the Storm Data as a bz2 file (https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2) and bunzip it into our current directory.
storm <- read.table("repdata-data-StormData.csv.bz2", sep = ",", header=TRUE, na.strings = "")
For the first question about which event type has the most harm with respect to population health, we are looking at the FATALITIES and INJURIES values.
library(dplyr)
dim(storm)
## [1] 902297 37
#Remove rows not in US states and DC
data(state) #built in list of states
storm_us <- storm[which(storm$STATE %in% c(state.abb, "DC")),]
dim(storm_us)
## [1] 883623 37
storm_unhealthy <- storm_us %>% filter(FATALITIES > 0 | INJURIES > 0) %>% select(EVTYPE, STATE, FATALITIES, INJURIES) %>% arrange(desc(FATALITIES), desc(INJURIES))
dim(storm_unhealthy)
## [1] 21709 4
head(storm_unhealthy, 20)
## EVTYPE STATE FATALITIES INJURIES
## 1 HEAT IL 583 0
## 2 TORNADO MO 158 1150
## 3 TORNADO MI 116 785
## 4 TORNADO TX 114 597
## 5 EXCESSIVE HEAT IL 99 0
## 6 TORNADO MA 90 1228
## 7 TORNADO KS 75 270
## 8 EXCESSIVE HEAT PA 74 135
## 9 EXCESSIVE HEAT PA 67 0
## 10 TORNADO MS 57 504
## 11 EXTREME HEAT WI 57 0
## 12 TORNADO AR 50 325
## 13 EXCESSIVE HEAT TX 49 0
## 14 EXCESSIVE HEAT CA 46 18
## 15 TORNADO AL 44 800
## 16 TORNADO TX 42 1700
## 17 EXCESSIVE HEAT MO 42 397
## 18 EXCESSIVE HEAT NY 42 0
## 19 TORNADO MS 38 270
## 20 TORNADO MO 37 176
#Compare fatalities to injuries
plot(log(storm_unhealthy$FATALITIES+1), log(storm_unhealthy$INJURIES+1), main="Log of Fatalities vs Injuries")
#Look at the top 100 of each
head(sort(storm_unhealthy$FATALITIES, decreasing = TRUE), 100)
## [1] 583 158 116 114 99 90 75 74 67 57 57 50 49 46 44 42 42
## [18] 42 38 37 36 34 33 33 33 32 32 31 31 31 30 30 30 29
## [35] 29 29 27 27 27 26 25 25 25 25 25 24 24 24 24 23 23
## [52] 23 22 22 22 22 22 22 21 21 21 20 20 20 20 20 20 20
## [69] 19 18 18 17 17 17 17 17 17 17 16 16 16 16 16 16 16
## [86] 16 16 16 16 15 15 15 15 15 14 14 14 14 14 14
head(sort(storm_unhealthy$INJURIES, decreasing = TRUE), 100)
## [1] 1700 1568 1228 1150 1150 800 800 785 780 750 700 600 597 560
## [15] 550 519 504 500 500 500 500 500 500 500 463 450 450 450
## [29] 437 411 410 397 385 350 350 350 350 342 325 306 300 300
## [43] 300 300 300 293 280 280 275 270 270 270 266 258 257 257
## [57] 252 252 250 250 250 246 241 240 234 230 225 225 224 223
## [71] 216 215 210 207 200 200 200 200 200 200 200 200 200 200
## [85] 200 200 200 200 200 200 200 200 200 200 195 192 192 190
## [99] 185 185
The vast majority of events have very few fatalities or injuries. Now lets see what the EVTYPE is for the top 200 of storm_unhealthy.
sort(table(factor(head(storm_unhealthy$EVTYPE, 200))), decreasing = TRUE)
##
## TORNADO EXCESSIVE HEAT
## 123 34
## HEAT HEAT WAVE
## 8 6
## FLASH FLOOD FLOOD
## 4 3
## EXTREME HEAT LANDSLIDE
## 2 2
## WILDFIRE COLD AND SNOW
## 2 1
## DUST STORM EXTREME COLD/WIND CHILL
## 1 1
## FOG HIGH SURF
## 1 1
## HURRICANE HURRICANE/TYPHOON
## 1 1
## RECORD/EXCESSIVE HEAT STORM SURGE
## 1 1
## STORM SURGE/TIDE TORNADOES, TSTM WIND, HAIL
## 1 1
## TROPICAL STORM TSTM WIND
## 1 1
## UNSEASONABLY WARM UNSEASONABLY WARM AND DRY
## 1 1
## WINTER STORMS
## 1
There is some common thems related to storms (including hurricanes), heat waves and floods. Let try to group these into a new EVGROUP factor.
evgroup <- function(ev){
if(grepl("WARM|HEAT|DRY", ev))
"HEAT RELATED"
else if(grepl("TORNADO|FUNNEL", ev))
"TORNADO"
else if(grepl("LIGHTNING", ev))
"LIGHTNING"
else if(grepl("FIRE", ev))
"WILDFIRE"
else if(grepl("WINTER|WINTRY|ICE|ICY|BLIZZARD|COLD|FREEZE|FROST|LOW|FREEZING|SNOW|AVALANCHE|CHILL|HYPOTHERMIA|HYPERTHERMIA|SLEET|HAIL", ev))
"COLD AND SNOW RELATED"
else if(grepl("FLOOD|TIDE|SURGE|SURF|MARINE|SEAS|WATER|FLD|RIP|DROWNING|WAVE|TSUNAMI", ev))
"FLOOD AND SEA RELATED"
else if(grepl("DUST|FOG", ev))
"DUST OR FOG RELATED"
else if(grepl("STORM|HURRICANE|WIND|TYPHOON|RAIN|PRECIP", ev))
"STORM AND WIND RELATED"
else
"OTHER"
}
storm_unhealthy$EVGROUP <- factor(sapply(toupper(storm_unhealthy$EVTYPE), evgroup))
Firt lets look at fatalities and injury counts by these new groups to geta sense of their variance.
library(ggplot2);library(Hmisc);library(gridExtra);
## Loading required package: grid
## Loading required package: lattice
## Loading required package: survival
## Loading required package: splines
## Loading required package: Formula
##
## Attaching package: 'Hmisc'
##
## The following objects are masked from 'package:dplyr':
##
## combine, src, summarize
##
## The following objects are masked from 'package:base':
##
## format.pval, round.POSIXt, trunc.POSIXt, units
#boxplot for fatalities by new group
qplot(EVGROUP, log10(FATALITIES), data=storm_unhealthy, fill=EVGROUP, geom=c("boxplot"), main = "Log10 FATALITIES Blox Plot by Group Type")
#boxplot for injuries by new group
qplot(EVGROUP, log10(INJURIES), data=storm_unhealthy, fill=EVGROUP, geom=c("boxplot"), main = "Log10 UNJURIES Blox Plot by Group Type")
Note that FLOOD AND SEA RELATED has the single largest loss of life events, closly followed by TORNADO.
But For the second question about economic consequence, we will look at PROPDMG and CROPDMG and adjust for their units.
storm_costly <- storm_us %>% filter(PROPDMG > 0 | CROPDMG > 0) %>% select(EVTYPE, STATE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
#Adjust for units
k <- which(toupper(storm_costly$PROPDMGEXP) == "K")
storm_costly$PROPDMG[k] = (storm_costly$PROPDMG[k] * 1e3)
m <- which(toupper(storm_costly$PROPDMGEXP) == "M")
storm_costly$PROPDMG[m] = (storm_costly$PROPDMG[m] * 1e6)
b <- which(toupper(storm_costly$PROPDMGEXP) == "B")
storm_costly$PROPDMG[b] = (storm_costly$PROPDMG[b] * 1e9)
k <- which(toupper(storm_costly$CROPDMGEXP) == "K")
storm_costly$CROPDMG[k] = (storm_costly$CROPDMG[k] * 1e3)
m <- which(toupper(storm_costly$CROPDMGEXP) == "M")
storm_costly$CROPDMG[m] = (storm_costly$CROPDMG[m] * 1e6)
b <- which(toupper(storm_costly$CROPDMGEXP) == "B")
storm_costly$CROPDMG[b] = (storm_costly$CROPDMG[b] * 1e9)
storm_costly$EVGROUP <- factor(sapply(toupper(storm_costly$EVTYPE), evgroup))
dim(storm_costly)
## [1] 243911 7
We tabulate our total fatalities and injuries.
storm_unhealthy %>% group_by(EVGROUP) %>% summarise(TotalFatalities=sum(FATALITIES), TotalInjuries=sum(INJURIES)) %>% arrange(desc(TotalFatalities), desc(TotalInjuries))
## Source: local data frame [9 x 3]
##
## EVGROUP TotalFatalities TotalInjuries
## 1 TORNADO 5661 91410
## 2 HEAT RELATED 3181 9272
## 3 FLOOD AND SEA RELATED 2219 9471
## 4 STORM AND WIND RELATED 1378 12889
## 5 COLD AND SNOW RELATED 1378 8125
## 6 LIGHTNING 807 5226
## 7 DUST OR FOG RELATED 104 1559
## 8 WILDFIRE 90 1607
## 9 OTHER 49 276
Tornados have the most harmful in terms of population health in terms of fatalities, and the aggregate of storm types is the most harful in terms of injuries.
We can tabulate our total property and crop damage.
storm_costly %>% group_by(EVGROUP) %>% summarise(TotalProperty=sum(PROPDMG), TotalCrop=sum(CROPDMG)) %>% arrange(desc(TotalProperty), desc(TotalCrop))
## Source: local data frame [9 x 3]
##
## EVGROUP TotalProperty TotalCrop
## 1 FLOOD AND SEA RELATED 215346005032 12332670200
## 2 STORM AND WIND RELATED 110333489447 8300863738
## 3 TORNADO 58592827629 417460520
## 4 COLD AND SNOW RELATED 28737600284 11863838773
## 5 WILDFIRE 8489501500 402116630
## 6 OTHER 1370338650 14140418950
## 7 LIGHTNING 937999447 12097090
## 8 DUST OR FOG RELATED 29147130 3600000
## 9 HEAT RELATED 27058350 904494280
Floods do the most property damage, while OTHER (which includes Drought) does the most crop damage.