Synopsis

This analysis explores severe weather events in the United States based on data from NOAA storm database. Two specific questions are explored:
1. What are the most harmful events for population health?
2. Which are the events with greatest economic consequences?
The relevant data is aggregated by event type, i.e. fatalities and injuries for harmful events; property damage and crop damage for events with most economic impact.
Based on the analysis, the topmost harmful events for population health are:
1. Tornado
2. Excessive heat
3. TSTM Wind
Based on the analysis, the events with greatest economic consequences are:
1. Flood
2. Hurricane/Typhoon
3. Storm surge

Loading and Processing the Raw Data

While this project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, the data for this assignment comes from the course web site.

fUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
setInternet2(use = TRUE) # Use Internet Explorer for https download on Windows
if(!file.exists("Stormdata.csv.bz2")) {
  download.file(fUrl,"./Stormdata.csv.bz2")
}

Reading the file

stormdata <- read.csv("./Stormdata.csv.bz2", stringsAsFactors = FALSE)

Results

1. Most harmful events for population health

The first question to be explored is:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

For this question, we’ll limit our exploration to the relevant variables: EVTYPE, FATALITIES and INJURIES. Since both of these events - ‘fatalities’ and ‘injuries’ are harmful to the population health, we’ll look at the combined total of these two variables by the type of weather event, order it and look at the top ten harmful event types.

# Limit exploration to the relevant variables
harmful <- stormdata[,c("EVTYPE","FATALITIES","INJURIES")]
harmful <- aggregate(harmful$FATALITIES + harmful$INJURIES,
    by=list(harmful$EVTYPE),FUN=sum)
names(harmful) <- c("EVTYPE","HARMCOUNT")
# Sort in descending order
harmful <- harmful[order(harmful$HARMCOUNT,decreasing = TRUE),]

Let us plot the top 10 event types which are harmful to population health.

# Subsetting to top 10 values for plotting
harmful.top10 <- harmful[1:10,]
par(mar=c(6,6,4,2))
barplot(harmful.top10$HARMCOUNT,names.arg = harmful.top10$EVTYPE,
    cex.axis = 0.6,cex.names = 0.5, las = 2)

plot of chunk top10harmful

As we can see from the plot above, the top 10 most harmful events with respect to population health are: harmful.top10$EVTYPE:
1.harmful.top10$EVTYPE[1]: TORNADO
2.harmful.top10$EVTYPE[2]: EXCESSIVE HEAT
3.harmful.top10$EVTYPE[3]: TSTM WIND
4.harmful.top10$EVTYPE[4]: FLOOD
5.harmful.top10$EVTYPE[5]: LIGHTNING
6.harmful.top10$EVTYPE[6]: HEAT
7.harmful.top10$EVTYPE[7]: FLASH FLOOD
8.harmful.top10$EVTYPE[8]: ICE STORM
9.harmful.top10$EVTYPE[9]: THUNDERSTORM WIND
10.harmful.top10$EVTYPE[10]: WINTER STORM

2. Events with maximum economic impact

The next question to be explored is:

Across the United States, which types of events have the greatest economic consequences?

For this question, we’ll limit our exploration to the relevant variables: EVTYPE,PROPDMG,PROPDMGEXP,CROPDMG,ROPDMGEXP.

economic <- stormdata[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]

To get the dollar values of the damage, it is required to multiply the -DMG variables with the exponent variables i.e. the -DMGEXP values. Let’s look at how the data for the -EXP variables look like.

#Property damage exponent values
propdmgexp <- economic$PROPDMGEXP
unique(propdmgexp)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
#Crop damage exponent values
cropdmgexp <- economic$CROPDMGEXP
unique(cropdmgexp)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

Looking at the values above, it’s clear that we need to recode the exponents from the letter figures to the corresponding numeric multiplier, e.g. substitute 10^3 or 1000 for K or 10^6 for M etc. Unknown values e.g. ?,+ or - are recoded as 1.

propdmgexp <- as.numeric(Recode(as.character(propdmgexp),"'K'=10^3;
    'M'=10^6;''=1;'B'=10^9;'+'=1;'0'=1;'5'=10^5;'6'=10^6;
    '?'=1;'4'=10^4;'2'=10^2;'3'=10^3;'h'=10^2;'7'=10^7;
    'H'=10^2;'-'=1;'8'=10^8"))
## Warning: NAs introduced by coercion
cropdmgexp <- as.numeric(Recode(as.character(cropdmgexp),"''=1;
    'M'=10^6;'K'=10^3;'m'=10^6;'B'=10^9;'?'=1;'0'=1;
    'k'=10^3;'2'=10^2"))

Let’s now calculate the total economic damage as the sum of the property damage and crop damage after multiplying with the respective exponent variables. As before, we’ll order it and look at the top ten event types for economic impact.

economic$totaldmg <- economic$PROPDMG * propdmgexp + economic$CROPDMG * cropdmgexp
economic <- aggregate(economic$totaldmg, by=list(economic$EVTYPE),FUN=sum)
names(economic) <- c("EVTYPE","ECONCOUNT")
# Sort in descending order
economic <- economic[order(economic$ECONCOUNT,decreasing = TRUE),]
# Subsetting to top 10 values for plotting
economic.top10 <- economic[1:10,]

Let us now plot these top 10 event types which have the greatest economic consequences.

par(mar=c(6,6,4,2))
barplot(economic.top10$ECONCOUNT,names.arg = economic.top10$EVTYPE,
        cex.axis = 0.5,cex.names = 0.5, las = 2)

plot of chunk top10economic

As we can see from the plot above, the top 10 most harmful events with respect to population health are: economic.top10$EVTYPE:
1.economic.top10$EVTYPE[1]: FLOOD
2.economic.top10$EVTYPE[2]: HURRICANE/TYPHOON
3.economic.top10$EVTYPE[3]: STORM SURGE
4.economic.top10$EVTYPE[4]: FLASH FLOOD
5.economic.top10$EVTYPE[5]: DROUGHT
6.economic.top10$EVTYPE[6]: HURRICANE
7.economic.top10$EVTYPE[7]: RIVER FLOOD
8.economic.top10$EVTYPE[8]: ICE STORM
9.economic.top10$EVTYPE[9]: TROPICAL STORM
10.economic.top10$EVTYPE[10]: WINTER STORM

# Clean-up the workspace
rm(fUrl,stormdata,harmful,harmful.top10,economic,
    propdmgexp,cropdmgexp,economic.top10)