Storm data gather across the United States during the last 60 years provides information about the harmful with respect to population health, and about economic consequences for the nation.
Tornado is the worst atmospheric phenomenon because of the lifes it takes across the United States, and the economical damaged infringed in properties and crops.
The second worst is Flood.
We can’t assume and equivalency betwen Fatalities and Injuries, nor any proportionality, but we can assume that a first approach is to sum both of them.
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
-Storm Data [47Mb]
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
-National Weather Service Storm Data Documentation
-National Climatic Data Center Storm Events FAQ
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.
The questions we are trying to answer on this document are:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
This document use the next libraries:
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
First we set the file url, create the temporary directory, and the file where the file will be downloaded. Then the files is download into the file. Curl method id used for Mac systems.
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, destfile = "StormData.csv.bz2", method = "curl")
The information is stored in a DataFrame named StormData.
StormData <- read.csv(bzfile("StormData.csv.bz2"))
names(StormData) <- toupper(names(StormData))
StormData$BGN_DATE <- as.Date(StormData$BGN_DATE, format = "%m/%d/%Y")
StormData$EVTYPE <- toupper(StormData$EVTYPE)
StormData$PROPDMGEXP <- toupper(StormData$PROPDMGEXP)
StormData$CROPDMGEXP <- toupper(StormData$CROPDMGEXP)
We want to study 2 basic components of this information: 1. The most harmful events across U.S. with respect to population health. 2. The events with the greatest economic consecuences across the U.S.
In order to get the most harmful events across U.S. with respect to population health the variables of interest in the data set are fatalities and injuries of the events types (“EVTYPE”).
Get the top 10 event types in injuries and fatalities, and in the sum of both.
harmful_StormData <- group_by(StormData, EVTYPE)
harmful_StormData <- summarise(harmful_StormData, FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES))
harmful_StormData <- merge(harmful_StormData, count(StormData, EVTYPE))
fatalities_StormData <- arrange(harmful_StormData, desc(FATALITIES))
top_fatalities_StormData <- fatalities_StormData[1:10,]
injuries_StormData <- arrange(harmful_StormData, desc(INJURIES))
top_injuries_StormData <- injuries_StormData[1:10,]
sumharmful_StormData <- arrange(harmful_StormData, desc(FATALITIES + INJURIES))
top_sumharmful_StormData <- sumharmful_StormData[1:10,]
The other hand of storms events is the economical costs of the events.
In the data set we can find two different kind of information, the Property Damage and the Crop Damage. The sum of both will give the total ammount of economical costs of the events.
The first task is to get a total amount of the both, properties and crops damage. As there are two columns for every record, one for the value, and another for de exponent used in the value, we must unify avery record.
StormData$PROPTOTAL <- with(StormData, PROPDMG * ((PROPDMGEXP == 'K') * 10^3 + (PROPDMGEXP == 'M') * 10^6 + (PROPDMGEXP == 'B') * 10^9))
StormData$CROPTOTAL <- with(StormData, CROPDMG * ((CROPDMGEXP == 'K') * 10^3 + (CROPDMGEXP == 'M') * 10^6 + (CROPDMGEXP == 'B') * 10^9))
damage_StormData <- group_by(StormData, EVTYPE)
damage_StormData <- summarise(damage_StormData, PROPTOTAL = sum(PROPTOTAL), CROPTOTAL = sum(CROPTOTAL))
damage_StormData <- merge(damage_StormData, count(StormData, EVTYPE))
damage_StormData <- arrange(damage_StormData, desc(PROPTOTAL + CROPTOTAL))
top_damage_StormData <- damage_StormData[1:10,]
We can see in the following plot the events with more injuries caused.
plot1 <- ggplot(top_injuries_StormData, aes(x = reorder(EVTYPE, INJURIES), y = INJURIES)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(x = "Event Type", y = "Value") +
labs(title = "Top 10 events which have the most injuries") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0));
print(plot1);
As we can see, the Tornado is the most injuring event. TSTM Wind and Flood are the next ones.
But, even as injuring is important, fatalities are worst, and if we wath the next plot, we can see that Tornado is still the most harmful event, but the next ones are Excessive Heat and Flash Flood. Wich are the most deadly events in the Data Set.
plot2 <- ggplot(top_fatalities_StormData, aes(x = reorder(EVTYPE, FATALITIES), y = FATALITIES)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(x = "Event Type", y = "Value") +
labs(title = "Top 10 events which have the most fatalities") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0));
print(plot2);
In order to get a aggregatted information of both we can check the 10 top events more harmful with respect to the population where injuries are added to fatalities:
top_sumharmful_StormData
## EVTYPE FATALITIES INJURIES n
## 1 TORNADO 5633 91346 60652
## 2 EXCESSIVE HEAT 1903 6525 1678
## 3 TSTM WIND 504 6957 219942
## 4 FLOOD 470 6789 25327
## 5 LIGHTNING 816 5230 15754
## 6 HEAT 937 2100 767
## 7 FLASH FLOOD 978 1777 54277
## 8 ICE STORM 89 1975 2006
## 9 THUNDERSTORM WIND 133 1488 82564
## 10 WINTER STORM 206 1321 11433
About the Economical aspect of the Storm Events, we can see together the Properties Damage and the Crops Damage in the next plot:
plot3 <- ggplot(top_damage_StormData, aes(x = reorder(EVTYPE, (PROPTOTAL + CROPTOTAL)), y = (PROPTOTAL + CROPTOTAL)/10^9)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(x = "Event Type", y = "Value (Billions)") +
labs(title = "Top 10 events which have caused the greates economical cost") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0));
print(plot3);
And Flood is the worst Economical disaster, and then Hurricane/Typhoon and, again, Tornado.
With all this information we can say that Tornado is the worst event because of the human lifes costs and even economical. And Flood can be the second one, alway because of the harmful effect first, and then economical costs.