Programming Assignment 2 - Reproducible Research

Health and Economic effects of Severe Weather in the United States

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events.

Key questions to address:

  1. Across the United States, which types of events are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Data Processing

The data for this prject was obtained in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size and found at:

https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

The documentation can be found at:

https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

LIBRARIES & WORKING DIRECTORY

options(editor = "internal") 
library(plyr)
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
Import data + check data
data <- read.csv("repdata_data_StormData.csv", header = TRUE, sep = ",")
data <- data[c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
str(data)
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
missing_vals <- function(x) sum(is.na(x))
colwise(missing_vals)(data)
##   EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1      0          0        0       0          0       0          0
Mutate data + Clean data

Two variables require a transformation into correct values. These are the Property Damage (PROPDMG) and the Crop Damage (CROPDMG). This is done by converting the exponent data into numerical values and multiplying these by the values in PROPDMG and CROPDMG. Also, the event type data need to be cleaned by grouping together comparable values.

data$PROPDMGEXP <- mapvalues(data$PROPDMGEXP, from = c("K", "M","", "B", "m", "+", "0", "5", "6", "?", "4", "2", "3", "h", "7", "H", "-", "1", "8"), to = c(10^3, 10^6, 1, 10^9, 10^6, 0,1,10^5, 10^6, 0, 10^4, 10^2, 10^3, 10^2, 10^7, 10^2, 0, 10, 10^8))
data$PROPDMGEXP <- as.numeric(as.character(data$PROPDMGEXP))
data$PROPDMGTOTAL <- (data$PROPDMG * data$PROPDMGEXP) / 1000000000

data$CROPDMGEXP <- mapvalues(data$CROPDMGEXP, from = c("","M", "K", "m", "B", "?", "0", "k","2"), to = c(1,10^6, 10^3, 10^6, 10^9, 0, 1, 10^3, 10^2))
data$CROPDMGEXP <- as.numeric(as.character(data$CROPDMGEXP))
data$CROPDMGTOTAL <- (data$CROPDMG * data$CROPDMGEXP) / 1000000000

data$DAMAGETOTAL <- data$PROPDMGTOTAL + data$CROPDMGTOTAL
detach(package:plyr)

data_TYPE <- data %>%
  mutate(evtypegrp = 
           ifelse(grepl("LIGHTNING|LIGNTNING", EVTYPE), "LIGHTNING",
                  ifelse(grepl("HAIL", EVTYPE), "HAIL",
                         ifelse(grepl("RAIN|FLOOD|WET|FLD", EVTYPE), "RAIN",
                                ifelse(grepl("SNOW|WINTER|WINTRY|BLIZZARD|SLEET|COLD|ICE|FREEZE|AVALANCHE|ICY", EVTYPE), "WINTER",
                                       ifelse(grepl("TORNADO|FUNNEL", EVTYPE), "TORNADO",
                                              ifelse(grepl("WIND|HURRICANE", EVTYPE), "WINDS",
                                                     ifelse(grepl("STORM|THUNDER|TSTM|TROPICAL +STORM", EVTYPE), "STORM",
                                                            ifelse(grepl("FIRE", EVTYPE), "FIRE",
                                                                   ifelse(grepl("FOG|VISIBILITY|DARK|DUST", EVTYPE), "FOG",
                                                                          ifelse(grepl("WAVE|SURF|SURGE|TIDE|TSUNAMI|CURRENT|SWELL", EVTYPE), "WAVE",
                                                                                 ifelse(grepl("HEAT|HIGH +TEMP|RECORD +TEMP|WARM|DRY", EVTYPE), "HEAT",
                                                                                        ifelse(grepl("VOLCAN", EVTYPE), "VOLCANO",
                                                                                               ifelse(grepl("DROUGHT", EVTYPE), "DROUGHT",
                                                                                                      "OTHER")))))))))))))
  )

Results

Create summary dataframe

Create a summary dataframe containing the results of the four different outcomes.

data_summary <- data_TYPE %>%
  group_by(evtypegrp)%>%
  summarize(damage = sum(DAMAGETOTAL), property = sum(PROPDMGTOTAL), crops = sum(CROPDMGTOTAL), fatalities = sum(FATALITIES), injuries = sum(INJURIES))

1. Across the United States, which types of events are most harmful with respect to population health?

Create plots with top 10 event types that affect population health.

fatalities <- head(data_summary[order(data_summary$fatalities, decreasing = TRUE),], 10)
injuries <- head(data_summary[order(data_summary$injuries, decreasing = TRUE),], 10)

plot1 <- ggplot(fatalities, aes(evtypegrp,fatalities, fill = fatalities))
plot1 + geom_bar(stat = "identity") + xlab("Event Type") + ylab("Number of Fatalities") + ggtitle("Total Fatalities By Event Type") +
  theme(axis.text.x = element_text(angle = 90)) + expand_limits(y = c(0,6000))

plot2 <- ggplot(injuries, aes(evtypegrp,injuries, fill = injuries)) 
plot2 + geom_bar(stat = "identity") + xlab("Event Type") + ylab("Number of Injuries") + ggtitle("Total Injuries By Event Type") +
  theme(axis.text.x = element_text(angle = 90)) + expand_limits(y = c(0,6000))

Tornado’s have highest impact on population health in terms of fatalities AND injuries.

2. Across the United States, which types of events have the greatest economic consequences?

Create plot with top 10 event types that affect economy.

damage <-head(data_summary[order(data_summary$damage, decreasing = TRUE),],10)
property <- damage %>% mutate(damage_type = "Property", damage_amount = property)
crops <- damage %>% mutate(damage_type = "Crops", damage_amount = crops)
damage_10 <- rbind(property, crops)

plot3 <- ggplot(damage_10, aes(evtypegrp, damage_amount, fill = factor(damage_type)))
plot3 + geom_bar(stat = "identity") + ylab("Economical damage 1950 - 2011") + xlab("Event") + scale_fill_discrete(name = "Damage") +
  ggtitle ("Total Economical Damage by Event") + theme(axis.text=element_text(size = 6))

In terms of economic damage, Rain has highest impact on properties while drought has highest impact on crops.