Human health and economics are adversely affect by severe weather events. This paper analyzes the NOAA storm database to identify which events cause the greatest health and economic concerns. Once identified, these particularly high adverse events could then be analyzed for characteristics that would lead to risk prevention schemes.
The NOAA data is load conditionally from the cloud.
if(!exists("NOAA")){
NOAAFile <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(NOAAFile,"NOAAFile.csv.bz2", method="curl", quiet=TRUE)
NOAA <- read.csv("NOAAFile.csv.bz2",stringsAsFactors=FALSE)
}
The NOAA data is often chaotic. For instance, the event types (NOAA$EVTYPE) has 985 unique types. Consider, within these types the words THUNDER and TSTM occur 116 times. In order to improve the analysis the EVTYPE variable will be corrected as follows. (See the appendix for the mapping of the top events.)
# These are the predetermined columns of focus for health and economic analysis
summary(select(NOAA,EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP))
## EVTYPE FATALITIES INJURIES
## Length:902297 Min. : 0.0000 Min. : 0.0000
## Class :character 1st Qu.: 0.0000 1st Qu.: 0.0000
## Mode :character Median : 0.0000 Median : 0.0000
## Mean : 0.0168 Mean : 0.1557
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :583.0000 Max. :1700.0000
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0.00 Length:902297 Min. : 0.000 Length:902297
## 1st Qu.: 0.00 Class :character 1st Qu.: 0.000 Class :character
## Median : 0.00 Mode :character Median : 0.000 Mode :character
## Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :5000.00 Max. :990.000
# The following are aggregation type to improve the statistcal analysis
EVENTS = as.character(c("TEMPERATURE", "THUNDERSTORM", "TORNADO", "HURRICANE",
"FLOOD", "PERCIP", "DROUGHT","LIGHTNING"))
# Aggregate specific EVTYPEs into these new types. Otherwise use the existing EVTYPEs
NOAA$EVENTS <- NOAA$EVTYPE
NOAA[grep("COLD|HEAT|WARM", NOAA$EVTYPE),]$EVENTS <-EVENTS[1]
NOAA[grep("TSTM|THUNDER", NOAA$EVTYPE),]$EVENTS <-EVENTS[2]
NOAA[grep("TORNADO", NOAA$EVTYPE),]$EVENTS <-EVENTS[3]
NOAA[grep("HURRICANE", NOAA$EVTYPE),]$EVENTS <-EVENTS[4]
NOAA[grep("FLOOD|RAIN|SNOW|ICE|HAIL", NOAA$EVTYPE),]$EVENTS <-EVENTS[5]
NOAA[grep("RAIN|SNOW|ICE|HAIL", NOAA$EVTYPE),]$EVENTS <-EVENTS[6]
NOAA[grep("DROUGHT", NOAA$EVTYPE),]$EVENTS <-EVENTS[7]
NOAA[grep("LIGHTNING", NOAA$EVTYPE),]$EVENTS <-EVENTS[8]
The NOAA data also contains damage cost estimates that add their own complications. These base numbers are in CROPDMG and PROPDMG while the CROPDMGEXP and PROPDMGEXP are their respective exponent multipliers. Therefore to properly conclude the economic considerations, the damage costs are created in a new field without exponents.
m <- c(1000, 1000, 1000000,1000000, 1000000000, 1000000000)
l <- c("k", "K", "m", "M", "b", "B")
multipler <- data.frame(m,l)
NOAA$COST <- NOAA$PROPDMG*multipler[match(NOAA$PROPDMGEXP, multipler$l, nomatch=1), 1] +
NOAA$CROPDMG*multipler[match(NOAA$CROPDMGEXP, multipler$l, nomatch=1), 1]
After the data fixes above, the top health and economic events will be determined by aggregating fatalities (rather than injuries as deaths are more important to most people) and costs under their respective events, then sorting and taking the top of the list.
NOAA.HE <- select(NOAA, EVTYPE, EVENTS,INJURIES,FATALITIES,COST)
NOAA.HE <- aggregate(cbind(INJURIES,FATALITIES,COST) ~ EVENTS, data = NOAA.HE, FUN = sum)
NOAA.HE <- mutate(NOAA.HE, ECONPERCENT = (COST/ sum(NOAA.HE$COST))*100)
NOAA.HE <- mutate(NOAA.HE, FATPERCENT = (FATALITIES/ sum(NOAA.HE$FATALITIES))*100)
topPicks = 10
# A simple bar graph is better for most people. Likewise, putting them side by side
# allows read to see similaritys between heath and cost.
g1 <- ggplot(tail(arrange(NOAA.HE, FATALITIES), n=topPicks),
aes(x=reorder(EVENTS,FATALITIES), y= FATALITIES))
g1 <- g1 + geom_bar(stat='identity') + coord_flip() +
xlab("Event Type") + ggtitle("Health")
g2 <- ggplot(tail(arrange(NOAA.HE, COST), n=topPicks),
aes(x=reorder(EVENTS,COST), y= COST))
g2 <- g2 + geom_bar(stat='identity') + coord_flip() + xlab("") + ggtitle("Economic")
grid.arrange(g1,g2,ncol=2, top="Weather Events (1950-2011)")
The top 10 events in these graphs represent a minimum of 91 percent of all event categories. Here are the answer to the basic questions:
1) Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
TORNADO in its various EVTYPE forms are clearly the most deadly weather related health event accounting for 37 percent of deaths.
2) Across the United States, which types of events have the greatest economic consequences?
FLOOD in its various EVTYPE forms account for 38 percent of costs associated with severe weather.
To improve the poor EVTYPE data entry, several EVTYPEs were included together under EVENTS. Here are the top health and economic EVENT compositions.
Top Health List Evtype:
unique(NOAA[NOAA$EVENTS==tail(arrange(NOAA.HE, FATALITIES), n=1)$EVENTS,]$EVTYPE)
## [1] "TORNADO" "TORNADO F0" "TORNADOS"
## [4] "WATERSPOUT/TORNADO" "WATERSPOUT TORNADO" "WATERSPOUT-TORNADO"
## [7] "COLD AIR TORNADO" "WATERSPOUT/ TORNADO" "TORNADO F3"
## [10] "TORNADO F1" "TORNADO/WATERSPOUT" "TORNADO F2"
## [13] "TORNADOES" "TORNADO DEBRIS"
Top Economic List Evtype:
unique(NOAA[NOAA$EVENTS==tail(arrange(NOAA.HE, COST), n=1)$EVENTS,]$EVTYPE)
## [1] "FLASH FLOOD" "FLASH FLOODING"
## [3] "FLOODING" "FLOOD"
## [5] "FLASH FLOODING/THUNDERSTORM WI" "BREAKUP FLOODING"
## [7] "RIVER FLOOD" "COASTAL FLOOD"
## [9] "FLOOD WATCH/" "FLASH FLOODS"
## [11] "HEAVY SURF COASTAL FLOODING" "URBAN FLOODING"
## [13] "URBAN/SMALL FLOODING" "LOCAL FLOOD"
## [15] "FLOOD/FLASH FLOOD" "FLASH FLOOD WINDS"
## [17] "URBAN/SMALL STREAM FLOODING" "STREAM FLOODING"
## [19] "FLASH FLOOD/" "SMALL STREAM URBAN FLOOD"
## [21] "URBAN FLOOD" "COASTAL FLOODING"
## [23] "HIGH WINDS/FLOODING" "URBAN/SMALL STREAM FLOOD"
## [25] "MINOR FLOODING" "URBAN/SMALL STREAM FLOOD"
## [27] "URBAN AND SMALL STREAM FLOOD" "SMALL STREAM FLOODING"
## [29] "FLOODS" "SMALL STREAM AND URBAN FLOODIN"
## [31] "SMALL STREAM/URBAN FLOOD" "SMALL STREAM AND URBAN FLOOD"
## [33] "RURAL FLOOD" "THUNDERSTORM WINDS URBAN FLOOD"
## [35] "MAJOR FLOOD" "STREET FLOOD"
## [37] "SMALL STREAM FLOOD" "LAKE FLOOD"
## [39] "URBAN AND SMALL STREAM FLOODIN" "RIVER AND STREAM FLOOD"
## [41] "MINOR FLOOD" "HIGH WINDS/COASTAL FLOOD"
## [43] "RIVER FLOODING" "FLOOD/RIVER FLOOD"
## [45] "MUD SLIDES URBAN FLOODING" "THUNDERSTORM WINDS/FLASH FLOOD"
## [47] "LOCAL FLASH FLOOD" "FLOOD/FLASH FLOODING"
## [49] "COASTAL/TIDAL FLOOD" "FLASH FLOOD/FLOOD"
## [51] "FLASH FLOOD/ STREET" "FLOOD FLASH"
## [53] "FLOOD FLOOD/FLASH" "TIDAL FLOOD"
## [55] "FLOOD/FLASH" "THUNDERSTORM WINDS/FLOODING"
## [57] "HIGHWAY FLOODING" "FLASH FLOOD/ FLOOD"
## [59] "BEACH EROSION/COASTAL FLOOD" "FLASH FLOODING/FLOOD"
## [61] "BEACH FLOOD" "THUNDERSTORM WINDS/ FLOOD"
## [63] "FLOOD/FLASHFLOOD" "URBAN SMALL STREAM FLOOD"
## [65] "URBAN FLOOD LANDSLIDE" "URBAN FLOODS"
## [67] "FLASH FLOOD/LANDSLIDE" "LANDSLIDE/URBAN FLOOD"
## [69] "FLASH FLOOD LANDSLIDES" "COASTALFLOOD"
## [71] "STREET FLOODING" "TIDAL FLOODING"
## [73] " COASTAL FLOOD" "COASTAL FLOODING/EROSION"
## [75] "URBAN/STREET FLOODING" "COASTAL FLOODING/EROSION"
## [77] "FLOOD/FLASH/FLOOD" " FLASH FLOOD"
## [79] "CSTL FLOODING/EROSION" "LAKESHORE FLOOD"