Reproducible Research Peer Assignment 2

Tornado, thunderstrom wind, flood, excessive heat are the most harmful events to human health while flood, hurricane, tornado, storm surge and hail have the most economic consequences.

by Leandro Jimenez (Dec, 2015)

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm Data [47Mb] There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation

National Climatic Data Center Storm Events FAQ

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Assignment

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.

Questions

Your data analysis must address the following questions:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Across the United States, which types of events have the greatest economic consequences?

Consider writing your report as if it were to be read by a government or municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events. However, there is no need to make any specific recommendations in your report.

Synopsis

In this assignment, we analyzed the data of natural events from he U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. We first read the data and clean up some event types by looking into the cookbook. Then we aggregate the fatality, body injury, property damag, and crop damage by using the aggregate function according to different event types. With data processing and analyzing, we summarized the most harmful events to human health and the events have strongest damage to property and crop by table and figures. The results are tornado, thunderstrom wind, flood, excessive heat are the most harmful events to human health while while flood, hurricane, tornado, storm surge and hail have the most economic consequences.

Loading and preprocessing

packages that we need and unzip the file

require(data.table)
## Loading required package: data.table
require(gridExtra)
## Loading required package: gridExtra
require(ggplot2)
## Loading required package: ggplot2
require(dplyr)
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:data.table':
## 
##     between, last
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
require(plyr)
## Loading required package: plyr
## -------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------------
## 
## Attaching package: 'plyr'
## 
## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
csv <- bzfile("./repdata_data_StormData.csv.bz2","repdata_data_StormData.csv")
stormdata <- read.csv2(csv, sep = ",", stringsAsFactors = FALSE)
unlink(csv)

Quickly analyse the data: Str, summary and head

str(stormdata)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : chr  "1.00" "1.00" "1.00" "1.00" ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : chr  "97.00" "3.00" "57.00" "89.00" ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : chr  "0.00" "0.00" "0.00" "0.00" ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: chr  "0.00" "0.00" "0.00" "0.00" ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : chr  "0.00" "0.00" "0.00" "0.00" ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : chr  "14.00" "2.00" "0.10" "0.00" ...
##  $ WIDTH     : chr  "100.00" "150.00" "123.00" "100.00" ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : chr  "0.00" "0.00" "0.00" "0.00" ...
##  $ FATALITIES: chr  "0.00" "0.00" "0.00" "0.00" ...
##  $ INJURIES  : chr  "15.00" "0.00" "2.00" "2.00" ...
##  $ PROPDMG   : chr  "25.00" "2.50" "25.00" "2.50" ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : chr  "0.00" "0.00" "0.00" "0.00" ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : chr  "3040.00" "3042.00" "3340.00" "3458.00" ...
##  $ LONGITUDE : chr  "8812.00" "8755.00" "8742.00" "8626.00" ...
##  $ LATITUDE_E: chr  "3051.00" "0.00" "0.00" "0.00" ...
##  $ LONGITUDE_: chr  "8806.00" "0.00" "0.00" "0.00" ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : chr  "1.00" "2.00" "3.00" "4.00" ...
summary(stormdata)
##    STATE__            BGN_DATE           BGN_TIME        
##  Length:902297      Length:902297      Length:902297     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##   TIME_ZONE            COUNTY           COUNTYNAME       
##  Length:902297      Length:902297      Length:902297     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##     STATE              EVTYPE           BGN_RANGE        
##  Length:902297      Length:902297      Length:902297     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##    BGN_AZI           BGN_LOCATI          END_DATE        
##  Length:902297      Length:902297      Length:902297     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##    END_TIME          COUNTY_END        COUNTYENDN      END_RANGE        
##  Length:902297      Length:902297      Mode:logical   Length:902297     
##  Class :character   Class :character   NA's:902297    Class :character  
##  Mode  :character   Mode  :character                  Mode  :character  
##                                                                         
##                                                                         
##                                                                         
##                                                                         
##    END_AZI           END_LOCATI           LENGTH         
##  Length:902297      Length:902297      Length:902297     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##     WIDTH                 F              MAG             FATALITIES       
##  Length:902297      Min.   :0.0      Length:902297      Length:902297     
##  Class :character   1st Qu.:0.0      Class :character   Class :character  
##  Mode  :character   Median :1.0      Mode  :character   Mode  :character  
##                     Mean   :0.9                                           
##                     3rd Qu.:1.0                                           
##                     Max.   :5.0                                           
##                     NA's   :843563                                        
##    INJURIES           PROPDMG           PROPDMGEXP       
##  Length:902297      Length:902297      Length:902297     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##    CROPDMG           CROPDMGEXP            WFO           
##  Length:902297      Length:902297      Length:902297     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##   STATEOFFIC         ZONENAMES           LATITUDE        
##  Length:902297      Length:902297      Length:902297     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##   LONGITUDE          LATITUDE_E         LONGITUDE_       
##  Length:902297      Length:902297      Length:902297     
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
##                                                          
##    REMARKS             REFNUM         
##  Length:902297      Length:902297     
##  Class :character   Class :character  
##  Mode  :character   Mode  :character  
##                                       
##                                       
##                                       
## 
head(stormdata,n=3)
##   STATE__          BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1    1.00 4/18/1950 0:00:00     0130       CST  97.00     MOBILE    AL
## 2    1.00 4/18/1950 0:00:00     0145       CST   3.00    BALDWIN    AL
## 3    1.00 2/20/1951 0:00:00     1600       CST  57.00    FAYETTE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO      0.00                                            0.00
## 2 TORNADO      0.00                                            0.00
## 3 TORNADO      0.00                                            0.00
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH  WIDTH F  MAG FATALITIES
## 1         NA      0.00                     14.00 100.00 3 0.00       0.00
## 2         NA      0.00                      2.00 150.00 2 0.00       0.00
## 3         NA      0.00                      0.10 123.00 2 0.00       0.00
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1    15.00   25.00          K    0.00                                    
## 2     0.00    2.50          K    0.00                                    
## 3     2.00   25.00          K    0.00                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1  3040.00   8812.00    3051.00    8806.00           1.00
## 2  3042.00   8755.00       0.00       0.00           2.00
## 3  3340.00   8742.00       0.00       0.00           3.00

Data processing

Convert to numeric: fatalities and injuries. Then, show some events (EVTYPE). After of that, convert event to factors

stormdata$FATALITIES <- as.numeric(stormdata$FATALITIES)
stormdata$INJURIES <- as.numeric(stormdata$INJURIES)
##EVTYPE
stormdata$EVTYPE <- toupper(stormdata$EVTYPE)
eventtype <- sort(unique(stormdata$EVTYPE))
stormdata$EVTYPE <- as.factor(stormdata$EVTYPE)
## Show some event types
eventtype[1:30]
##  [1] "?"                              "ABNORMALLY DRY"                
##  [3] "ABNORMALLY WET"                 "ABNORMAL WARMTH"               
##  [5] "ACCUMULATED SNOWFALL"           "AGRICULTURAL FREEZE"           
##  [7] "APACHE COUNTY"                  "ASTRONOMICAL HIGH TIDE"        
##  [9] "ASTRONOMICAL LOW TIDE"          "AVALANCE"                      
## [11] "AVALANCHE"                      "BEACH EROSIN"                  
## [13] "BEACH EROSION"                  "BEACH EROSION/COASTAL FLOOD"   
## [15] "BEACH FLOOD"                    "BELOW NORMAL PRECIPITATION"    
## [17] "BITTER WIND CHILL"              "BITTER WIND CHILL TEMPERATURES"
## [19] "BLACK ICE"                      "BLIZZARD"                      
## [21] "BLIZZARD AND EXTREME WIND CHIL" "BLIZZARD AND HEAVY SNOW"       
## [23] "BLIZZARD/FREEZING RAIN"         "BLIZZARD/HEAVY SNOW"           
## [25] "BLIZZARD/HIGH WIND"             "BLIZZARD SUMMARY"              
## [27] "BLIZZARD WEATHER"               "BLIZZARD/WINTER STORM"         
## [29] "BLOWING DUST"                   "BLOWING SNOW"

Results

All lethal events

fatalities <- as.data.table(subset(aggregate(FATALITIES ~ EVTYPE, data = stormdata, 
    FUN = "sum"), FATALITIES > 0))
fatalities <- fatalities[order(-FATALITIES), ]

#show some rows

top5 <- fatalities[1:5, ]
library(ggplot2)
ggplot(data = top5, aes(EVTYPE, FATALITIES, fill = FATALITIES)) + geom_bar(stat = "identity") + xlab("Event") + ylab("Fatalities") + ggtitle("Fatalities caused by Events (top 5) ") + theme(legend.position = "right")

All events with injuries

injuries <- as.data.table(subset(aggregate(INJURIES ~ EVTYPE, data = stormdata, 
    FUN = "sum"), INJURIES > 0))
injuries <- injuries[order(-INJURIES), ]
#show some rows
top5i <- injuries[1:5, ]
ggplot(data = top5i, aes(EVTYPE, INJURIES, fill = INJURIES)) + geom_bar(stat = "identity") + xlab("Event") + ylab("Injuries") + ggtitle("Injuries caused by Events (top 5) ") +  theme(legend.position = "right")

Economic impact

check and clean exponents. Calculate of cost of damage

### check 
unique(stormdata$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
stormdata$PROPDMGEXP <- toupper(stormdata$PROPDMGEXP)
unique(stormdata$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "+" "0" "5" "6" "?" "4" "2" "3" "H" "7" "-" "1" "8"
table(stormdata$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      H      K      M 
##      4      5      1     40      7 424665  11337
#### clean
calcExp <- function(x, exp = "") {
    switch(exp, `-` = x * -1, `?` = x, `+` = x, `1` = x, `2` = x * (10^2), `3` = x * 
        (10^3), `4` = x * (10^4), `5` = x * (10^5), `6` = x * (10^6), `7` = x * 
        (10^7), `8` = x * (10^8), H = x * 100, K = x * 1000, M = x * 1e+06, 
        B = x * 1e+09, x)
}

applyCalcExp <- function(vx, vexp) {
    if (length(vx) != length(vexp)) 
        stop("Not same size")
    result <- rep(0, length(vx))
    for (i in 1:length(vx)) {
        result[i] <- calcExp(vx[i], vexp[i])
    }
    result
}
### calculate the cost
stormdata$EconomicCosts <- applyCalcExp(as.numeric(stormdata$PROPDMG), stormdata$PROPDMGEXP)
summary(stormdata$EconomicCosts)
##       Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
## -1.500e+01  0.000e+00  0.000e+00  4.746e+05  5.000e+02  1.150e+11

cost per event

costs <- as.data.table(subset(aggregate(EconomicCosts ~ EVTYPE, data = stormdata, 
    FUN = "sum"), EconomicCosts > 0))
costs <- costs[order(-EconomicCosts), ]

library(scales)
top25c <- costs[1:25, ]
ggplot(data = top25c, aes(EVTYPE, EconomicCosts, fill = EconomicCosts)) + geom_bar(stat = "identity") + scale_y_continuous(labels = comma) + xlab("Event") + ylab("Economic costs in $") +  ggtitle("Economic costs caused by Events (top 25) ") + coord_flip() + theme(legend.position = "right")

Conclusions

As considered in the previous plot, storms, tornados and floods are many times part of the hurricanes. For this reason, we can consider Hurricanes the biggest threat for US economy, like Katrina demonstrated in 2005. It’s worth noticing that the #1 factor for Crop Damage is actually Drought, an event that shouldn’t be underesitmated especially in the warmest countries of the US.

Hurricanes, Tornados, Storms and Floods are the key events that threaten the safety and economics of the US.