Coursera Reproducible Research Peer Assesment 2

MOST IMPACT WEATHER EVENTS ACROSS USA

INTRODUCTION

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

DATA

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm Data

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation

National Climatic Data Center Storm Events FAQ

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

QUESTIONS

The data analysis is conducted to trying to respond the following questions:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Across the United States, which types of events have the greatest economic consequences?

The analysis was conducted with R and write the markdown file with Knitr

PREPROCESSING DATA

To analysis the data we did the following preprocessing steps:

-Set the working directory to conduct the analysis

setwd("C:/Users/Eirr/Desktop/stormdata")

-Load the libraries

library(ggplot2)
library(reshape2)
library(Hmisc)

## Loading required package: grid
## Loading required package: lattice
## Loading required package: survival
## Loading required package: splines
## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## 
## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units

library(knitr)

-Load the data files in bz2 format and Factorize EVTYPE variable and convert the values like k/K, m/M from PROPDMGEXP and CROPDMGEXP to considerate as the same


stormdata <- read.csv("StormData.csv.bz2", as.is = T)

with(stormdata, {
    EVTYPE <- factor(EVTYPE)
    PROPDMGEXP <- toupper(PROPDMGEXP)
    PROPDMGEXP[PROPDMGEXP == ""] <- "0"
    CROPDMGEXP <- toupper(CROPDMGEXP)
    CROPDMGEXP[CROPDMGEXP == ""] <- "0"
})

-Subset the data to get the columns of the interest to run the analysis

subsetdata <- subset(stormdata, select = c("EVTYPE", "INJURIES", "PROPDMG", 
    "PROPDMGEXP", "CROPDMG", "CROPDMGEXP", "FATALITIES"))

RESULTS

Which types of events are most harmful with respect to population health?

To answer this question we considered the variables FATALITIES AND INJURIES and plot the top 20 events which cause the most impact in health across USA

healthdamage.count <- 20
healthdamage <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, subsetdata, 
    sum, na.rm = TRUE)
healthdamage.top <- healthdamage[order(-healthdamage$FATALITIES)[1:healthdamage.count], 
    ]

healthdamage.top$INJURIES <- cut2(healthdamage.top$INJURIES, g = 7)

ggplot(healthdamage.top, aes(x = reorder(EVTYPE, -FATALITIES), y = FATALITIES, 
    fill = INJURIES)) + geom_bar(stat = "identity") + scale_fill_brewer(palette = 11) + 
    guides(fill = guide_legend(reverse = T)) + theme(axis.text.x = element_text(angle = 90, 
    hjust = 1)) + xlab(NULL) + ggtitle(paste("Top 20 most harmful weather events in the United States")) + 
    labs(colour = "pink")

plot of chunk unnamed-chunk-5

We can see the most harmful weather event is TORNADO

Another plot to show the number of injuries produced by weather events

top.inj.count <- 10
top.inj.order <- order(-healthdamage$INJURIES)
data.health.inj <- healthdamage[top.inj.order[1:top.inj.count], ]
other.inj <- sum(healthdamage$INJURIES[top.inj.order[top.inj.count + 1:nrow(healthdamage)]], 
    na.rm = T)
data.health.inj <- rbind(data.health.inj, list("Other", 0, other.inj))


ggplot(data.health.inj, aes(x = factor(1), y = INJURIES, fill = reorder(EVTYPE, 
    -INJURIES))) + geom_bar(stat = "identity") + coord_polar(theta = "y") + 
    theme(legend.title = element_blank()) + xlab(NULL) + ylab(NULL) + ggtitle("Injures from the weather events in the United States") + 
    scale_fill_brewer(palette = "Spectral")

plot of chunk unnamed-chunk-6

The 2 plots we made above show us that the TORNADO is most harmful to population in the USA

Which types of events have the greatest economic consequences?

To answer this question we considered PROPDMG and CROPDMG.

We have to convert the input damage units to the same units to compute total damage, we did this with this function:


decode.units <- function(d) {
    switch(d, H = 100, K = 1000, M = 1e+06, B = 1e+09, `0` = 1, `1` = 10, `2` = 100, 
        `3` = 1000, `4` = 10000, `5` = 1e+05, `6` = 1e+06, `7` = 1e+07, `8` = 1e+08, 
        `9` = 1e+09, 0)
}

And Finally we compute and plot the total economic damage from weather events

top.damage.count <- 25
stormdata$DAMAGE <- stormdata$PROPDMG * sapply(stormdata$PROPDMGEXP, decode.units) + 
    stormdata$CROPDMG * sapply(stormdata$CROPDMGEXP, decode.units)
data.damage <- aggregate(DAMAGE ~ EVTYPE, stormdata, sum, na.rm = T)
data.damage.top <- data.damage[order(-data.damage$DAMAGE)[1:top.damage.count], 
    ]

ggplot(data.damage.top, aes(x = reorder(EVTYPE, -DAMAGE), y = DAMAGE)) + geom_bar(stat = "identity", 
    fill = "grey") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab(NULL) + ylab("Damage, $") + ggtitle(paste("Top", top.damage.count, 
    "events which have the greatest economic consequences in the United States"))

plot of chunk unnamed-chunk-8

The plot show us , the weather event which have most economic impact across USA is FLOOD