Synopsis: Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. Based on the NOAA Storm Database from 1950 to 2011, I analyze the types of events are the most harmful with respect to population health, and cause the greatest economic consequences. Based on my analysis, the most harmful events on public health are both TORNADO respect fatalities and injuries. As for the economic damages, the event of greatest economic damage on property damage is FLOOD, and the event of greatest economic damage on crop damage is DROUGHT. The event of greatest economic damage on both properties and crop is FLOOD.

Questions to anwser: 1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? 2. Across the United States, which types of events have the greatest economic consequences?

Data Processing:

Install packages

library(R.utils)
library(utils)
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Download and read file

  1. Load data from NOAA Storm Database
  2. Subset data to the one related to damages of population health and economics, which is named “subdata”, and save this dataset in the format of RDS for cache purpose.
if (!file.exists("./repdata-data-StormData.csv")) {
     fileURL <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
     download.file(url=fileURL, destfile="./repdata-data-StormData.bz2", method="curl")
     bunzip2("./repdata-data-StormData.bz2")
}

if (!file.exists("./subdata0.rds")){
    data <- read.csv("./repdata-data-StormData.csv")
    subdata <- subset(data, select=c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP",    "CROPDMG", "CROPDMGEXP"))
    saveRDS(subdata, "./subdata0.rds")
}

subdata <-readRDS("subdata0.rds")

Clean data:

  1. The variables in “subdata” related to harmfulness of population health are “FATALITIES” and “INJURIES”. The variables in “subdata” related to economic cost are “PROPDMG”, “PROPDMGEXP”, “CROPDMG” and “CROPDMGEXP”.
  2. First off, I analyze the harmfulness of population health, whose objects are assigned to “totharm”
  3. Add FATALITIES and INJURIES respective to indvidual event.
totharm <- subset(subdata, select=c("EVTYPE", "FATALITIES", "INJURIES"))
totFATALITIES <- aggregate(FATALITIES~EVTYPE, data=totharm,  sum)
totINJURIES <- aggregate(INJURIES~EVTYPE, data=totharm,  sum)
totharm <- merge(totFATALITIES, totINJURIES)

Check the names of event types.

head(unique(totharm$EVTYPE))
## [1]    HIGH SURF ADVISORY  COASTAL FLOOD         FLASH FLOOD         
## [4]  LIGHTNING             TSTM WIND             TSTM WIND (G45)     
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND

Reshape these names for summation of harfulness.

# reshape event names
totharm$EVTYPE <- gsub(".*FLOOD.*", "FLOOD",totharm$EVTYPE)
totharm$EVTYPE <- gsub(".*HIGH WINDS.*", "HIGH WINDS",totharm$EVTYPE)
totharm$EVTYPE <- gsub(".*HAIL.*", "HAIL",totharm$EVTYPE)
totharm$EVTYPE <- gsub(".*SNOW.*", "SNOW",totharm$EVTYPE)
totharm$EVTYPE <- gsub(".*LIGHTNING.*", "LIGHTNING",totharm$EVTYPE)
totharm$EVTYPE <- gsub(".*THUNDERSTORM WIND.*", "THUNDERSTORM WINDS",totharm$EVTYPE)
totharm$EVTYPE <- gsub(".*TORNADO.*", "TORNADO",totharm$EVTYPE)
totharm$EVTYPE <- gsub(".*TSTM WIND.*", "TSTM WIND",totharm$EVTYPE)
totharm$EVTYPE <- gsub(".*WATERSPOUT.*", "WATERSPOUT",totharm$EVTYPE)
totharm$EVTYPE <- gsub(".*WIND.*", "WIND",totharm$EVTYPE)

# sum the total harmfulness caused by various event types
totINJURIES <- aggregate(INJURIES~EVTYPE, data=totharm,  sum)     # numbers of INJURIES caused by variouse events
totFATALITIES <- aggregate(FATALITIES~EVTYPE, data=totharm,  sum)     # harmfulness of FATALITIES caused by variouse events
tot_FatInjHarm <- merge(totFATALITIES, totINJURIES)     # merge totINJURIES and totFATALITIES into single subset

Check the names of top events on INJURIES and FATALITIES.

head(arrange(tot_FatInjHarm, desc(INJURIES)))
##           EVTYPE FATALITIES INJURIES
## 1        TORNADO       5636    91407
## 2           WIND       1416    11398
## 3          FLOOD       1523     8603
## 4 EXCESSIVE HEAT       1903     6525
## 5      LIGHTNING        817     5232
## 6           HEAT        937     2100
head(arrange(tot_FatInjHarm, desc(FATALITIES)))
##           EVTYPE FATALITIES INJURIES
## 1        TORNADO       5636    91407
## 2 EXCESSIVE HEAT       1903     6525
## 3          FLOOD       1523     8603
## 4           WIND       1416    11398
## 5           HEAT        937     2100
## 6      LIGHTNING        817     5232

The most harmful events are “TORNADO”, “WIND”, “FLOOD” and “EXCESSIVE HEAT”. Plot them for comparison.

tot_FatInjHarm_top <- filter(tot_FatInjHarm, EVTYPE== "TORNADO"|EVTYPE== "WIND"|EVTYPE== "FLOOD" | EVTYPE== "EXCESSIVE HEAT" )

As for the second question, “what’s the greatest economic consequences?”, I subset the damage related variables to object named “totdamage”. The variables named “PROPDMGEXP” and “CROPDMGEXP” are multiplier of PROPDMG" and “CROPDMG” respectively: “K” means thousand, “M” means million, and “B” means billion.

totdamage <- subset(subdata, select=c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))
totdamage <- subset(totdamage, PROPDMGEXP %in% "B"| PROPDMGEXP %in% "h"| PROPDMGEXP %in% "H"| PROPDMGEXP %in% "K"| PROPDMGEXP %in% "m"| PROPDMGEXP %in% "M"| CROPDMGEXP %in% "B"| CROPDMGEXP %in% "k"| CROPDMGEXP %in% "K"| CROPDMGEXP %in% "m"| CROPDMGEXP %in% "M" , select=c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))

I only consider those variables whose multiplier are “K”, “M” and “B”, and relevel the basic unit to be million.

# in the unit of million dollars
totdamage$CROPDMGvalue <- totdamage$CROPDMG*0
totdamage[which(totdamage$CROPDMGEXP=="B"), "CROPDMGvalue"] <- totdamage[which(totdamage$CROPDMGEXP=="B"), "CROPDMG"]*1000
totdamage[which(totdamage$CROPDMGEXP=="M"), "CROPDMGvalue"] <- totdamage[which(totdamage$CROPDMGEXP=="M"), "CROPDMG"]
totdamage[which(totdamage$CROPDMGEXP=="K"), "CROPDMGvalue"] <- totdamage[which(totdamage$CROPDMGEXP=="K"), "CROPDMG"]*0.001

totdamage$PROPDMGvalue <- totdamage$PROPDMG*0
totdamage[which(totdamage$PROPDMGEXP=="B"), "PROPDMGvalue"] <- totdamage[which(totdamage$PROPDMGEXP=="B"), "PROPDMG"]*1000
totdamage[which(totdamage$PROPDMGEXP=="M"), "PROPDMGvalue"] <- totdamage[which(totdamage$PROPDMGEXP=="M"), "PROPDMG"]
totdamage[which(totdamage$PROPDMGEXP=="K"), "PROPDMGvalue"] <- totdamage[which(totdamage$PROPDMGEXP=="K"), "PROPDMG"]*0.001

Reshape these names for summation of harfulness.

totdamage$EVTYPE  <- gsub(".*FLOOD.*", "FLOOD",totdamage$EVTYPE )
totdamage$EVTYPE  <- gsub(".*HIGH WINDS.*", "HIGH WINDS",totdamage$EVTYPE )
totdamage$EVTYPE  <- gsub(".*HAIL.*", "HAIL",totdamage$EVTYPE )
totdamage$EVTYPE  <- gsub(".*SNOW.*", "SNOW",totdamage$EVTYPE )
totdamage$EVTYPE  <- gsub(".*LIGHTNING.*", "LIGHTNING",totdamage$EVTYPE )
totdamage$EVTYPE  <- gsub(".*THUNDERSTORM WIND.*", "THUNDERSTORM WINDS",totdamage$EVTYPE )
totdamage$EVTYPE  <- gsub(".*TORNADO.*", "TORNADO",totdamage$EVTYPE )
totdamage$EVTYPE  <- gsub(".*TSTM WIND.*", "TSTM WIND",totdamage$EVTYPE )
totdamage$EVTYPE  <- gsub(".*WATERSPOUT.*", "WATERSPOUT",totdamage$EVTYPE )
totdamage$EVTYPE  <- gsub(".*WIND.*", "WIND",totdamage$EVTYPE )
totdamage$EVTYPE  <- gsub(".*HURRICANE.*", "HURRICANE",totdamage$EVTYPE )
totdamage$EVTYPE  <- gsub(".*STORM.*", "STORM",totdamage$EVTYPE )

Add damages of PROPDMG and CROPDMG respect to indvidual event.

totPROPDMGvalue <- aggregate(PROPDMGvalue~EVTYPE, data=totdamage,  sum)
totCROPDMGvalue <- aggregate(CROPDMGvalue~EVTYPE, data=totdamage,  sum)

tot_PROPCROPDMG <- merge(totPROPDMGvalue, totCROPDMGvalue)

Check top damages source event.

head(arrange(tot_PROPCROPDMG, desc(PROPDMGvalue)))
##      EVTYPE PROPDMGvalue CROPDMGvalue
## 1     FLOOD    167379.14   12352.0791
## 2 HURRICANE     84636.18    5495.2928
## 3     STORM     67528.05    5766.6135
## 4   TORNADO     56981.60     414.9614
## 5      HAIL     17615.09    3113.7958
## 6      WIND     16084.38    1979.1151
head(arrange(tot_PROPCROPDMG, desc(CROPDMGvalue)))
##      EVTYPE PROPDMGvalue CROPDMGvalue
## 1   DROUGHT     1046.106    13972.566
## 2     FLOOD   167379.144    12352.079
## 3     STORM    67528.052     5766.614
## 4 HURRICANE    84636.180     5495.293
## 5      HAIL    17615.091     3113.796
## 6      WIND    16084.377     1979.115

Pick up “DROUGHT”“,”FLOOD“”, “STORM”“,”HURRICANE“” and “TORNADO” as top damage sources.

tot_PROPCROPDMG_top <- filter(tot_PROPCROPDMG, EVTYPE== "DROUGHT" | EVTYPE== "FLOOD"| EVTYPE== "STORM"| EVTYPE== "HURRICANE"| EVTYPE== "TORNADO")
tot_PROPCROPDMG_top$totdamage <- tot_PROPCROPDMG_top$PROPDMGvalue+tot_PROPCROPDMG_top$CROPDMGvalue

RESULTS:

The most harmfulness with respect to population health are plot below.

par(mfrow=c(1,2))
barplot(tot_FatInjHarm_top$INJURIES, col="blue", xlab="Event Type", ylab="Population", main="INJURIES", names.arg=tot_FatInjHarm_top$EVTYPE, cex.names=0.4)
barplot(tot_FatInjHarm_top$FATALITIES, col="red", xlab="Event Type", ylab="Population", main="FATALITIES", names.arg=tot_FatInjHarm_top$EVTYPE, cex.names=0.4)

No matter in the catalog of INJURIES or FATALITIES, TORNADO is the most harmful event to population health.

As for the greatest economic damage, the economic damages respect to total PROPDMG, CROPDMG, and their summation are lsited below.

arrange(tot_PROPCROPDMG_top, desc(totdamage))
##      EVTYPE PROPDMGvalue CROPDMGvalue totdamage
## 1     FLOOD   167379.144   12352.0791 179731.22
## 2 HURRICANE    84636.180    5495.2928  90131.47
## 3     STORM    67528.052    5766.6135  73294.67
## 4   TORNADO    56981.598     414.9614  57396.56
## 5   DROUGHT     1046.106   13972.5660  15018.67

The greatest economics damage on PROP is FLOOD.The greatest economics damage on CROP is DROUGHT. The greatest economics damage on summation of PROP and CROP is FLOOD.