Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.

We focus in this two questions:

Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Libraries

Built under R version 3.2.3

  • knitr: alternative tool to Sweave with a more flexible design and new features. V 1.12.3
  • dplyr: provides a flexible grammar of data manipulation. V 0.4.3
  • ggplot2: An implementation of the grammar of graphics in R. V 2.0.0
  • gridExtra: Provides a number of user-level functions to work with “grid” graphics. V 2.2.1.
library(knitr)
library(dplyr)
library(ggplot2)
library(gridExtra)
opts_chunk$set(echo = TRUE, results = 'hold',warning = FALSE,message=FALSE)

Data Processing

1. Set working directory and load the data.

In order to reproduce the code, please change the working directory. Download the data only if it is not available in the workinf directory. The original data is stored in the variable storm_org.

setwd('C:/Users/HP/Documents/0_Sandra_Yojana/Data_Science/5 Reproducible Research')

#Download the data if necessary
zipfile <- "repdata-data-StormData.csv.bz2"
if (!file.exists(zipfile)) {
    url <-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
    download.file(url, destfile = zipfile)
}

## This line will likely take a few seconds. Be patient!
storm_org <- read.csv(bzfile(zipfile))

2. Select and transform the helpful variables to analyze both public health and economic problems.

We are going to use these variables for the analysis.

  • EVTYPE: The type of event.
  • FATALITIES: Number of deaths directly related to the event.
  • INJURIES: Number of injuries directly related to the event.
  • PROPDMG: Base amount of property damage.
  • PROPDMGEXP: Base multiplier for property damage.
  • CROPDMG: Base amount of crop damage.
  • CROPDMGEXP: Base multiplier for crop damage.

In order to calculate the damage we create two new numerical variables:

  • PROPEXP: Base multiplier for property damage.
  • CROPEXP: Base multiplier for crop damage.
storm <- subset(storm_org, select = c(EVTYPE,FATALITIES:CROPDMGEXP))
## Format types of events to avoid different names referred to the same type of event
storm <- mutate(storm, EVTYPE = gsub("[[:blank:][:punct:]+]", " ",toupper(storm$EVTYPE)))
##Create new variables PROPEXP and CROPEXP to calculate Property and crop damage according to variables PROPDMGEXP and CROPDMGEXP respectively.

storm$PROPEXP <- 0
storm$CROPEXP <- 0
storm[which(storm$PROPDMGEXP == "K"),]$PROPEXP <- 1000
storm[which(storm$PROPDMGEXP == "m"),]$PROPEXP <- 1000000
storm[which(storm$PROPDMGEXP == "M"),]$PROPEXP <- 1000000
storm[which(storm$PROPDMGEXP == "B"),]$PROPEXP <- 1000000000
storm[which(storm$CROPDMGEXP == "K"),]$CROPEXP <- 1000
storm[which(storm$CROPDMGEXP == "k"),]$CROPEXP <- 1000
storm[which(storm$CROPDMGEXP == "m"),]$CROPEXP <- 1000000
storm[which(storm$CROPDMGEXP == "M"),]$CROPEXP <- 1000000
storm[which(storm$CROPDMGEXP == "B"),]$CROPEXP <- 1000000000

Results

To filter, group, and arrange the data we use the functions of the package dplyr. Here you can see how we use only the data with values and group them by the type of event.

1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

##filter the events with casualties, group by type of event, and arrange the most harmful type of events first
storm_cas <- storm %>%
            filter(FATALITIES + INJURIES > 0) %>%
            group_by(EVTYPE) %>%
            summarize(FATALITIES = sum(FATALITIES),INJURIES = sum(INJURIES)) %>%
            arrange(desc(FATALITIES),desc(INJURIES))   
## Plot the 10 type events with more fatalities
fatalities <- ggplot(head(storm_cas,10), aes(x = reorder(EVTYPE, -FATALITIES),y = FATALITIES)) + 
     labs(title="Type of Events with more Fatalities") + xlab("") +
     ylab("Number of Fatalities") + geom_bar(color = "red", stat = "identity") + theme(axis.text.x = element_text(angle = 90, hjust = 1))

## Plot the 10 type events with more injuries
injuries <-  ggplot(head(storm_cas,10), aes(x = reorder(EVTYPE, -INJURIES),y = INJURIES)) + 
     labs(title="Type of Events with more Injuries") +
     xlab("Type of Event") + ylab("Number of Injuries") + geom_bar(color = "orange" , stat = "identity") + theme(axis.text.x = element_text(angle = 90, hjust = 1))

##Visualize the two previos graphs
grid.arrange(fatalities,injuries,nrow=1,ncol=2)

2. Across the United States, which types of events have the greatest economic consequences?

storm_eco <- storm %>%
            filter(PROPDMG + CROPDMG > 0) %>%
            group_by(EVTYPE) %>%
            summarize(PROPERTY_DAMAGE = sum(PROPDMG*PROPEXP),CROP_DAMAGE = sum(CROPDMG*CROPEXP)) %>%
            arrange(desc(PROPERTY_DAMAGE + CROP_DAMAGE)) 
## Plot the 10 type events with more property damage
property <- ggplot(head(storm_eco,10), aes(x = reorder(EVTYPE, -PROPERTY_DAMAGE),y = PROPERTY_DAMAGE/10^9)) + 
     labs(title="Type of Events with more Property damage") + xlab("") +
     ylab("Billions of dollars") + geom_bar(color = "red", stat = "identity") + theme(axis.text.x = element_text(angle = 90, hjust = 1))

## Plot the 10 type events with more crop damage
crop <-  ggplot(head(storm_eco,10), aes(x = reorder(EVTYPE, -CROP_DAMAGE),y = CROP_DAMAGE/10^9)) + 
     labs(title="Type of Events with more Crop damage") +
     xlab("Type of Event") + ylab("Billions of dollars") + geom_bar(color = "orange" , stat = "identity") + theme(axis.text.x = element_text(angle = 90, hjust = 1))

##Visualize the two previos graphs
grid.arrange(property,crop,nrow=1,ncol=2)