Storm data gather across the United States during the last 60 years provides information about the harmful with respect to population health, and about economic consequences for the nation.

Tornado is the worst atmospheric phenomenon because of the lifes it takes across the United States, and the economical damaged infringed in properties and crops.

The second worst is Flood.

We can’t assume and equivalency betwen Fatalities and Injuries, nor any proportionality, but we can assume that a first approach is to sum both of them.

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

-Storm Data [47Mb]

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

-National Weather Service Storm Data Documentation

-National Climatic Data Center Storm Events FAQ

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Assignment

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.

Objetives

The questions we are trying to answer on this document are:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Getting the Information

This document use the next libraries:

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

First we set the file url, create the temporary directory, and the file where the file will be downloaded. Then the files is download into the file. Curl method id used for Mac systems.

fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, destfile = "StormData.csv.bz2", method = "curl")

The information is stored in a DataFrame named StormData.

StormData <- read.csv(bzfile("StormData.csv.bz2"))
names(StormData) <- toupper(names(StormData))
StormData$BGN_DATE <- as.Date(StormData$BGN_DATE, format = "%m/%d/%Y")
StormData$EVTYPE <- toupper(StormData$EVTYPE)
StormData$PROPDMGEXP <- toupper(StormData$PROPDMGEXP)
StormData$CROPDMGEXP <- toupper(StormData$CROPDMGEXP)

We want to study 2 basic components of this information: 1. The most harmful events across U.S. with respect to population health. 2. The events with the greatest economic consecuences across the U.S.

Harmful Events

In order to get the most harmful events across U.S. with respect to population health the variables of interest in the data set are fatalities and injuries of the events types (“EVTYPE”).

Data Processing

Get the top 10 event types in injuries and fatalities, and in the sum of both.

harmful_StormData <- group_by(StormData, EVTYPE)
harmful_StormData <- summarise(harmful_StormData, FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES))
harmful_StormData <- merge(harmful_StormData, count(StormData, EVTYPE))
fatalities_StormData <- arrange(harmful_StormData, desc(FATALITIES))
top_fatalities_StormData <- fatalities_StormData[1:10,]
injuries_StormData <- arrange(harmful_StormData, desc(INJURIES))
top_injuries_StormData <- injuries_StormData[1:10,]
sumharmful_StormData <- arrange(harmful_StormData, desc(FATALITIES + INJURIES))
top_sumharmful_StormData <- sumharmful_StormData[1:10,]

Economical Consecuencies

The other hand of storms events is the economical costs of the events.

In the data set we can find two different kind of information, the Property Damage and the Crop Damage. The sum of both will give the total ammount of economical costs of the events.

Data Processing

The first task is to get a total amount of the both, properties and crops damage. As there are two columns for every record, one for the value, and another for de exponent used in the value, we must unify avery record.

StormData$PROPTOTAL <- with(StormData, PROPDMG * ((PROPDMGEXP == 'K') * 10^3 + (PROPDMGEXP == 'M') * 10^6 + (PROPDMGEXP == 'B') * 10^9))

StormData$CROPTOTAL <- with(StormData, CROPDMG * ((CROPDMGEXP == 'K') * 10^3 + (CROPDMGEXP == 'M') * 10^6 + (CROPDMGEXP == 'B') * 10^9))

damage_StormData <- group_by(StormData, EVTYPE)
damage_StormData <- summarise(damage_StormData, PROPTOTAL = sum(PROPTOTAL), CROPTOTAL = sum(CROPTOTAL))
damage_StormData <- merge(damage_StormData, count(StormData, EVTYPE))
damage_StormData <- arrange(damage_StormData, desc(PROPTOTAL + CROPTOTAL))
top_damage_StormData <- damage_StormData[1:10,]

Results

We can see in the following plot the events with more injuries caused.

plot1 <- ggplot(top_injuries_StormData, aes(x = reorder(EVTYPE, INJURIES), y = INJURIES)) + 
        geom_bar(stat = "identity") +
        coord_flip() +
        labs(x = "Event Type", y = "Value") +
        labs(title = "Top 10 events which have the most injuries") +
        theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0));
print(plot1);

As we can see, the Tornado is the most injuring event. TSTM Wind and Flood are the next ones.

But, even as injuring is important, fatalities are worst, and if we wath the next plot, we can see that Tornado is still the most harmful event, but the next ones are Excessive Heat and Flash Flood. Wich are the most deadly events in the Data Set.

plot2 <- ggplot(top_fatalities_StormData, aes(x = reorder(EVTYPE, FATALITIES), y = FATALITIES)) + 
        geom_bar(stat = "identity") +
        coord_flip() +
        labs(x = "Event Type", y = "Value") +
        labs(title = "Top 10 events which have the most fatalities") +
        theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0));
print(plot2);

In order to get a aggregatted information of both we can check the 10 top events more harmful with respect to the population where injuries are added to fatalities:

top_sumharmful_StormData
##               EVTYPE FATALITIES INJURIES      n
## 1            TORNADO       5633    91346  60652
## 2     EXCESSIVE HEAT       1903     6525   1678
## 3          TSTM WIND        504     6957 219942
## 4              FLOOD        470     6789  25327
## 5          LIGHTNING        816     5230  15754
## 6               HEAT        937     2100    767
## 7        FLASH FLOOD        978     1777  54277
## 8          ICE STORM         89     1975   2006
## 9  THUNDERSTORM WIND        133     1488  82564
## 10      WINTER STORM        206     1321  11433

About the Economical aspect of the Storm Events, we can see together the Properties Damage and the Crops Damage in the next plot:

plot3 <- ggplot(top_damage_StormData, aes(x = reorder(EVTYPE, (PROPTOTAL + CROPTOTAL)), y = (PROPTOTAL + CROPTOTAL)/10^9)) + 
        geom_bar(stat = "identity") +
        coord_flip() +
        labs(x = "Event Type", y = "Value (Billions)") +
        labs(title = "Top 10 events which have caused the greates economical cost") +
        theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0));
print(plot3);

And Flood is the worst Economical disaster, and then Hurricane/Typhoon and, again, Tornado.

With all this information we can say that Tornado is the worst event because of the human lifes costs and even economical. And Flood can be the second one, alway because of the harmful effect first, and then economical costs.