Storm Data analysis for Peer Assessment 2

Synopsis

In this report I explore data from the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. I try to assert what type of severe weather events are most harmful to population number and to property that resides on the harm territory. The report consist of two parts. First, I obtain and transform data to the form most convenient to the analysis. Second, I demonstrate to what consequences may lead various types of events.

Data Processing

First, we want to get data from its source. It can be done with the following commands:

require(plyr)
require(car)
require(ggplot2)
require(reshape2)
## Use this if you don't have data in your working directory url <-
## 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
## download.file(url, paste0(getwd(),'/StormData.bz2'), method = 'auto')
## adjust method='' depend on your OS.
stormData <- read.csv("StormData.bz2")

Below is the quick summary of the data:

str(stormData)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

For purpose of research, we only need part of the colums:

cols <- which(names(stormData) %in% c("EVTYPE", "BGN_DATE", "FATALITIES", "INJURIES", 
    "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))
stormData <- stormData[, cols]

Since, the amount of damage for property and crops is being represented in the different units (which stored in PROPDMGEXP and CROPDMGEXP respectively), we need to coerce all damage to the common metric:

stormData$FATALITIES <- as.numeric(stormData$FATALITIES)
stormData$EVTYPE <- as.character(stormData$EVTYPE)
stormData$INJURIES <- as.numeric(stormData$INJURIES)
stormData$PROPDMG <- as.numeric(stormData$PROPDMG)
stormData$CROPDMG <- as.numeric(stormData$CROPDMG)
stormData$PROPDMGEXP <- as.numeric(recode(stormData$PROPDMGEXP, "'0'=1;'1'=10;'2'=100;'3'=1000;'4'=10000;'5'=100000;'6'=1000000;'7'=10000000;'8'=100000000;'B'=1000000000;'h'=100;'H'=100;'K'=1000;'m'=1000000;'M'=1000000;'-'=0;'?'=0;'+'=0;", 
    F, T))
stormData$CROPDMGEXP <- as.numeric(recode(stormData$CROPDMGEXP, "'0'=1;'2'=100;'B'=1000000000;'k'=1000;'K'=1000;'m'=1000000;'M'=1000000;''=0;'?'=0;", 
    F, T))
stormData$CROPDMG.U <- stormData$CROPDMG * stormData$CROPDMGEXP
stormData$PROPDMG.U <- stormData$PROPDMG * stormData$PROPDMGEXP

Finally, we need to aggregate data by event type based on the damage (for property and crop) and harm (for lethality and injuries):

stormData.byType <- aggregate(. ~ EVTYPE, data = stormData, sum, na.rm = T)
# Two variables ECONDMG and SOCDMG representing total economic and social
# damage by event
stormData.byType <- mutate(stormData.byType, ECONDMG = PROPDMG.U + CROPDMG.U)
stormData.byType <- mutate(stormData.byType, SOCDMG = FATALITIES + INJURIES)
topEcon <- stormData.byType[order(stormData.byType$ECONDMG, decreasing = T), 
    ][1:7, ]
topSoc <- stormData.byType[order(stormData.byType$SOCDMG, decreasing = T), ][1:7, 
    ]
top <- rbind(topSoc, topEcon)
top <- top[!duplicated(top), ]

Results

Events that are most harmful in economic sense.

According to data, economic consequences related to the two variables: total property damage and crop damage. Below figures show a barplot of the respective damages by top 7 type of events:

# Melt data for easy plot building
moltenTop <- melt(top, id.vars = c("EVTYPE"), measure.vars = c("FATALITIES", 
    "INJURIES", "CROPDMG.U", "PROPDMG.U", "ECONDMG", "SOCDMG"))
Econ <- moltenTop[moltenTop$variable %in% c("PROPDMG.U", "CROPDMG.U", "ECONDMG"), 
    ]
Econ <- Econ[order(Econ$variable, Econ$value, decreasing = T), ]
Econ$variable <- factor(Econ$variable, labels = c("Crop", "Property", "Total"))
ggplot(Econ, aes(x = EVTYPE, y = value, fill = EVTYPE)) + facet_grid(variable ~ 
    ., scales = "free_y") + geom_bar(stat = "identity") + labs(title = "Damage to property and crops in $", 
    x = "type of event", y = "Dollar amount") + guides(fill = F) + theme(axis.text.x = element_text(angle = 90, 
    hjust = 1))

plot of chunk plots1

Figure shows that the most harmful events to property is flood, following by Hurricane. It should also be noted various types of events behave differently in terms of destructive power regarding property and crop cultures.

Events that are most harmful in social sense.

We have data on number of lethal and injure incidents after specific type of severe weather occur. Following plots show top 5 by casuality events:

Soc <- moltenTop[moltenTop$variable %in% c("FATALITIES", "INJURIES", "SOCDMG"), 
    ]
Soc <- Soc[order(Soc$variable, Soc$value, decreasing = T), ]
Soc$variable <- factor(Soc$variable, labels = c("Lethality", "Injuries", "Total"))
ggplot(Soc, aes(x = EVTYPE, y = value, fill = EVTYPE)) + facet_grid(variable ~ 
    ., scales = "free_y") + geom_bar(stat = "identity") + labs(title = "Number of casualty damage to population", 
    x = "type of event", y = "Casualty amount") + guides(fill = F) + theme(axis.text.x = element_text(angle = 90, 
    hjust = 1))

plot of chunk plots2

From this figures it's apparent that most of the damage comes from tornado events.

Conclusion

Obtained results show that the most harmful events in economic sense are:

  1. Flood;
  2. Hurricane;
  3. Tornado.

While the most harm caused to population are:

  1. Tornado;
  2. Flood;
  3. TSTM Wind Gust.