Analysis of the NOAA Storm Data
Synopsis
The purpose of this analysis is to explore the NOAA Storm Database and #address the following two question:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
It turns out that TORNADO, EXCESSIVE HEAT, TSTM WIND are probably the most #severe event types with respect to both population health and economic #damage.
sessionInfo()
Data Processing
Environment
The analysis is conducted using R and RStudio running on Mac OS X 10.9.
Loading Data
The NOAA storm data is analyzed in this report, which can be downloaded here: Storm Data.
noaa <- read.csv(bzfile(“repdata-data-StormData.csv.bz2”)) noaa <- noaa[, c(“EVTYPE”, “FATALITIES”, “INJURIES”, “PROPDMG”, “PROPDMGEXP”, “CROPDMG”, “CROPDMGEXP”)] str(noaa)
Cleaning Data
We can see that two columns, i.e., PROPDMGEXP and CROPDMGEXP, must be #processed before doing actual analysis.
levels(noaa$PROPDMGEXP)
levels(noaa$CROPDMGEXP)
Here we replace the exponents with their corresponding values, and multiply them to PROPDMG and CROPDMG respectively.
“”, “+”, “-”, and “?” are simply treated as 1e0 here for simplicity.
expValue <- function(x) { y <- rep(1, length(x)) y[x == “0”] = 1e0 y[x == “1”] = 1e1 y[x == “2”] = 1e2 y[x == “3”] = 1e3 y[x == “4”] = 1e4 y[x == “5”] = 1e5 y[x == “6”] = 1e6 y[x == “7”] = 1e7 y[x == “8”] = 1e8 y[x == “9”] = 1e9 y[x == “h” | x == “H”] = 100 y[x == “k” | x == “K”] = 1000 y[x == “m” | x == “M”] = 1000000 y[x == “b” | x == “B”] = 1000000000 } noaa\(PROPDMG <- noaa\)PROPDMG * expValue(noaa\(PROPDMGEXP) noaa\)CROPDMG <- noaa\(CROPDMG * expValue(noaa\)CROPDMGEXP) str(noaa)
‘data.frame’: 902297 obs. of 7 variables:
$ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY“,..: 834 834 834 834 834 834 834 834 834 834 … ## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 … ## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 … ## $ PROPDMG : num 2.5e+10 2.5e+09 2.5e+10 2.5e+09 2.5e+09 … ## $ PROPDMGEXP: Factor w/ 19 levels”“,”-“,”?“,”+“,..: 17 17 17 17 17 17 17 17 17 17 … ## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 … ## $ CROPDMGEXP: Factor w/ 9 levels”“,”?“,”0“,”2“,..: 1 1 1 1 1 1 1 1 1 1 …
Results
Population Health
This section attempts to address the following question:
Across the United States, which types of events are most harmful with #respect to population health?
tmp <- tapply(noaa\(FATALITIES+noaa\)INJURIES, noaa\(EVTYPE, sum) ph <- data.frame(EVTYPE = names(tmp), COUNT = tmp) ph <- ph[order(ph\)COUNT, decreasing=TRUE), ] ph\(EVTYPE <- factor(as.character(ph\)EVTYPE), levels = rev(as.character(ph$EVTYPE)), ordered = TRUE) rownames(ph) <- NULL head(ph, 10)
EVTYPE COUNT
1 TORNADO 96979
2 EXCESSIVE HEAT 8428
3 TSTM WIND 7461
4 FLOOD 7259
5 LIGHTNING 6046
6 HEAT 3037
7 FLASH FLOOD 2755
8 ICE STORM 2064
9 THUNDERSTORM WIND 1621
10 WINTER STORM 152
library(ggplot2) ggplot(data = ph[1:10, ], aes(x = EVTYPE, y = COUNT)) + geom_bar(stat = “identity”) + coord_flip() + labs(x = “Event Types”, y = “Fatalities and Injuries”) + labs(title = “Fatalities and Injuries Caused by Top 10 Harmful Event Types”)
From the plot, we can see that the three most harmful event types with respect to population health are probably:
TORNADO, EXCESSIVE HEAT, TSTM WIND.
Economic Consequences
This section attempts to address the following question:
Across the United States, which types of events have the greatest economic consequences?
tmp <- tapply(noaa\(PROPDMG+noaa\)CROPDMG, noaa\(EVTYPE, sum) ec <- data.frame(EVTYPE = names(tmp), COUNT = tmp) ec <- ec[order(ec\)COUNT, decreasing=TRUE), ] ec\(EVTYPE <- factor(as.character(ec\)EVTYPE), levels = rev(as.character(ec$EVTYPE)), ordered = TRUE) rownames(ec) <- NULL head(ec, 10)
EVTYPE COUNT
1 TORNADO 3.312e+15
2 FLASH FLOOD 1.599e+15
3 TSTM WIND 1.445e+15
4 HAIL 1.268e+15
5 FLOOD 1.068e+15
6 THUNDERSTORM WIND 9.436e+14
7 LIGHTNING 6.069e+14
8 THUNDERSTORM WINDS 4.650e+14
9 HIGH WIND 3.420e+14
10 WINTER STORM 1.347e+14
ggplot(data = ec[1:10, ], aes(x = EVTYPE, y = COUNT)) + geom_bar(stat = “identity”) + coord_flip() + labs(x = “Event Types”, y = “Property and Crop Damage”) + labs(title = “Property and Crop Damage Caused by Top 10 Harmful Event Types”)
event types causing the most severe economic damage are probably:
TORNADO, EXCESSIVE HEAT, TSTM WIND.