The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events.

Severe Weather Events with Harmful Outcomes

synopsis

This project is an exploration of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. Two questions are addressed through the code below, as shown for the entire analysis which consists of tables, figures, and other summaries. This project is reproducible: you too can download the raw data and run this code against it to provide a comparison of the charts you produce with the ones published here. Not all R packages used are not specified, but must be loaded to support the analysis.

Two topics are addressed here:

1 Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

2 Across the United States, which types of events have the greatest economic consequences?

As a mock up, this report addresses an audience of a government or municipal manager who uses the data in addition to that found in other sources of evidence to prepare a presentation of data in support of severe weather events. Here, event types are prioritized in charts which are ordered most destructive to least. Data is presented clearly leaving inference and recommendation to the reader’s discretion.

data processing

# Start from raw data. Download bz2 data as needed
if(!file.exists("stormData.csv.bz2")) {
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
  destfile = "stormData.csv.bz2", method = "curl")
}

# LOAD RAW DATA
noaastorm <- read.csv(bzfile("stormData.csv.bz2"), sep=",", header=T)

# EXPLORE DS: noaastorm dimensions, var names & structure
dim(noaastorm)
## [1] 902297     37
names(noaastorm)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
str(noaastorm)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
# Question 1: Across the United States, which types of events (as indicated in the \color{red}{\verb|EVTYPE|}EVTYPE variable) are most harmful with respect to population health?

# SUBSET noaastorm most harmful elements, to 'noaaharm' subset
noaaharm <- noaastorm[,c('EVTYPE','FATALITIES','INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')]

results

Harmful Events: Fatalities, Injuries, Property Damage

## RESULTS

# import ggplot2 lib for visibility/understanding
library(ggplot2)

# Prepare FATALITIES data for plot: 

# plot number of fatalities with the most harmful event types
fatalities <- aggregate(FATALITIES ~ EVTYPE, data=noaaharm, sum)

# grab 30 events to see fatalities big picture at a glance
fatalities <- fatalities[order(-fatalities$FATALITIES), ][1:30, ]

# sort most to least
fatalities$EVTYPE <- factor(fatalities$EVTYPE, levels = fatalities$EVTYPE)

#  RESULTS

# draw the plot for 30 events in damages due to fatalities
# note the first 10 or 11 events are most fatal
ggplot(fatalities, aes(x = EVTYPE, y = FATALITIES)) + 
    geom_bar(stat = "identity", fill = "yellow") + 
    theme(axis.text.x = element_text(angle = 60, hjust = 1)) + 
    xlab("Weather Event Type") + ylab("Number of Fatalities") + ggtitle("FATALITIES: Weather Events most Harmful with respect to Population Health")

# Prepare INJURIES data for plot:

# plot number of injuries with the most harmful event type
injuries <- aggregate(INJURIES ~ EVTYPE, data=noaaharm, sum)

# grab 30 events to see injuries big picture at a glance
injuries <- injuries[order(-injuries$INJURIES), ][1:30, ]

# sort most to least
injuries$EVTYPE <- factor(injuries$EVTYPE, levels = injuries$EVTYPE)

#  RESULTS

# draw the plot for 30 events in damages by injuries
# verify that first 6 or 7 severe weather events result in the most injuries
ggplot(injuries, aes(x = EVTYPE, y = INJURIES)) + 
    geom_bar(stat = "identity", fill = "orange") + 
    theme(axis.text.x = element_text(angle = 80, hjust = 1)) + 
    xlab("Weather Event Type") + ylab("Number of Injuries") + ggtitle("INJURIES: Weather Events most Harmful with respect to Population Health")

# Question 2: Across the United States, which types of events have the greatest economic consequences?  Plot number of damages with the most harmful event type

# Prepare ECONOMIC data for plot: 

# Convert units to calculate Property Damage: 
noaaharm$PROPDMGNUM = 0
noaaharm$CROPDMGNUM = 0

# thousands(K), millions(M), billions(B)
noaaharm[noaaharm$PROPDMGEXP == "K", ]$PROPDMGNUM = noaaharm[noaaharm$PROPDMGEXP == "K", ]$PROPDMG * 10^3
noaaharm[noaaharm$PROPDMGEXP == "M", ]$PROPDMGNUM = noaaharm[noaaharm$PROPDMGEXP == "M", ]$PROPDMG * 10^6
noaaharm[noaaharm$PROPDMGEXP == "B", ]$PROPDMGNUM = noaaharm[noaaharm$PROPDMGEXP == "B", ]$PROPDMG * 10^9

# Economic consequences from property damage
damages <- aggregate(PROPDMGNUM + CROPDMGNUM ~ EVTYPE, data=noaaharm, sum)
names(damages) = c("EVTYPE", "TOTALDAMAGE")

# grab 30 events to see the economics big picture at a glance
damages <- damages[order(-damages$TOTALDAMAGE), ][1:30, ]

# sort most to least
damages$EVTYPE <- factor(damages$EVTYPE, levels = damages$EVTYPE)

#  RESULTS

# draw the plot for property damage across 30 events
# verify that the first 10 to 12 severe weather events result in the most property damage of economic consequence
ggplot(damages, aes(x = EVTYPE, y = TOTALDAMAGE)) + 
    geom_bar(stat = "identity", fill = "red") + 
    theme(axis.text.x = element_text(angle = 60, hjust = 1)) + 
    xlab("Weather Event Type") + ylab("Damages in dollars") + ggtitle("PROPERTY DAMAGE: Weather Events of Economic Consequence")

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.