Analysis of the NOAA Storm Data

Synopsis

The purpose of this analysis is to explore the NOAA Storm Database and #address the following two question:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Across the United States, which types of events have the greatest economic consequences?

It turns out that TORNADO, EXCESSIVE HEAT, TSTM WIND are probably the most #severe event types with respect to both population health and economic #damage.

sessionInfo()

Data Processing

Environment

The analysis is conducted using R and RStudio running on Mac OS X 10.9.

Loading Data

The NOAA storm data is analyzed in this report, which can be downloaded here: Storm Data.

noaa <- read.csv(bzfile(“repdata-data-StormData.csv.bz2”)) noaa <- noaa[, c(“EVTYPE”, “FATALITIES”, “INJURIES”, “PROPDMG”, “PROPDMGEXP”, “CROPDMG”, “CROPDMGEXP”)] str(noaa)

Cleaning Data

We can see that two columns, i.e., PROPDMGEXP and CROPDMGEXP, must be #processed before doing actual analysis.

levels(noaa$PROPDMGEXP)

levels(noaa$CROPDMGEXP)

Here we replace the exponents with their corresponding values, and multiply them to PROPDMG and CROPDMG respectively.

“”, “+”, “-”, and “?” are simply treated as 1e0 here for simplicity.

expValue <- function(x) { y <- rep(1, length(x)) y[x == “0”] = 1e0 y[x == “1”] = 1e1 y[x == “2”] = 1e2 y[x == “3”] = 1e3 y[x == “4”] = 1e4 y[x == “5”] = 1e5 y[x == “6”] = 1e6 y[x == “7”] = 1e7 y[x == “8”] = 1e8 y[x == “9”] = 1e9 y[x == “h” | x == “H”] = 100 y[x == “k” | x == “K”] = 1000 y[x == “m” | x == “M”] = 1000000 y[x == “b” | x == “B”] = 1000000000 } noaa\(PROPDMG <- noaa\)PROPDMG * expValue(noaa\(PROPDMGEXP) noaa\)CROPDMG <- noaa\(CROPDMG * expValue(noaa\)CROPDMGEXP) str(noaa)

‘data.frame’: 902297 obs. of 7 variables:

$ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY“,..: 834 834 834 834 834 834 834 834 834 834 … ## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 … ## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 … ## $ PROPDMG : num 2.5e+10 2.5e+09 2.5e+10 2.5e+09 2.5e+09 … ## $ PROPDMGEXP: Factor w/ 19 levels”“,”-“,”?“,”+“,..: 17 17 17 17 17 17 17 17 17 17 … ## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 … ## $ CROPDMGEXP: Factor w/ 9 levels”“,”?“,”0“,”2“,..: 1 1 1 1 1 1 1 1 1 1 …

Results

Population Health

This section attempts to address the following question:

Across the United States, which types of events are most harmful with #respect to population health?

tmp <- tapply(noaa\(FATALITIES+noaa\)INJURIES, noaa\(EVTYPE, sum) ph <- data.frame(EVTYPE = names(tmp), COUNT = tmp) ph <- ph[order(ph\)COUNT, decreasing=TRUE), ] ph\(EVTYPE <- factor(as.character(ph\)EVTYPE), levels = rev(as.character(ph$EVTYPE)), ordered = TRUE) rownames(ph) <- NULL head(ph, 10)

EVTYPE COUNT

1 TORNADO 96979

2 EXCESSIVE HEAT 8428

3 TSTM WIND 7461

4 FLOOD 7259

5 LIGHTNING 6046

6 HEAT 3037

7 FLASH FLOOD 2755

8 ICE STORM 2064

9 THUNDERSTORM WIND 1621

10 WINTER STORM 152

library(ggplot2) ggplot(data = ph[1:10, ], aes(x = EVTYPE, y = COUNT)) + geom_bar(stat = “identity”) + coord_flip() + labs(x = “Event Types”, y = “Fatalities and Injuries”) + labs(title = “Fatalities and Injuries Caused by Top 10 Harmful Event Types”)

From the plot, we can see that the three most harmful event types with respect to population health are probably:

TORNADO, EXCESSIVE HEAT, TSTM WIND.

Economic Consequences

This section attempts to address the following question:

Across the United States, which types of events have the greatest economic consequences?

tmp <- tapply(noaa\(PROPDMG+noaa\)CROPDMG, noaa\(EVTYPE, sum) ec <- data.frame(EVTYPE = names(tmp), COUNT = tmp) ec <- ec[order(ec\)COUNT, decreasing=TRUE), ] ec\(EVTYPE <- factor(as.character(ec\)EVTYPE), levels = rev(as.character(ec$EVTYPE)), ordered = TRUE) rownames(ec) <- NULL head(ec, 10)

EVTYPE COUNT

1 TORNADO 3.312e+15

2 FLASH FLOOD 1.599e+15

3 TSTM WIND 1.445e+15

4 HAIL 1.268e+15

5 FLOOD 1.068e+15

6 THUNDERSTORM WIND 9.436e+14

7 LIGHTNING 6.069e+14

8 THUNDERSTORM WINDS 4.650e+14

9 HIGH WIND 3.420e+14

10 WINTER STORM 1.347e+14

ggplot(data = ec[1:10, ], aes(x = EVTYPE, y = COUNT)) + geom_bar(stat = “identity”) + coord_flip() + labs(x = “Event Types”, y = “Property and Crop Damage”) + labs(title = “Property and Crop Damage Caused by Top 10 Harmful Event Types”)

event types causing the most severe economic damage are probably:

TORNADO, EXCESSIVE HEAT, TSTM WIND.