Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. We use the database to answer some questions about impact of weather events in health and economic impact and show the code for your entire analysis. The analysis consist of tables, figures, or other summaries.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size.
The first step is download the file
# download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","../StormData.csv.bz2")
the second step is read the file into a data frame
tormen <- read.csv(bzfile("C:/Users/javier/Data Science Specialization/Reproducible Research/Proyecto2/StormData.csv.bz2"))
One important thing in this analysis is the differents types of events, for this we have to determine how many events is in the field.
In the original type_of_events the number total of values are
length(unique(tormen$EVTYPE))
## [1] 985
We inspect the list of these events
eventos<- tormen$EVTYPE
and we can see some values like “FLASH FLOOD/LANDSLIDE” and FLASH “FLOOD LANDSLIDE” that are the same event, we have to fix it put all letters in lowercase and all points and characters like spaces
eventos <- tolower(eventos)
event_tipo <- gsub("[[:blank:][:punct:]+]", " ", eventos)
length(unique(event_tipo))
## [1] 874
tormen$EVTYPE<- event_tipo
further analysis could be done to clean data in event types.
library(plyr)
## Warning: package 'plyr' was built under R version 3.4.1
casualties <- ddply(tormen, .(EVTYPE), summarize,
fatalities = sum(FATALITIES),
injuries = sum(INJURIES))
## Find events that caused most death and injury (TOP10):
## Fatalities
fatal_events <- head(casualties[order(casualties$fatalities, decreasing = T), ], 10)
head(fatal_events[, c("EVTYPE", "fatalities")],10)
## EVTYPE fatalities
## 741 tornado 5633
## 116 excessive heat 1903
## 138 flash flood 978
## 240 heat 937
## 410 lightning 816
## 762 tstm wind 504
## 154 flood 470
## 515 rip current 368
## 314 high wind 248
## 19 avalanche 224
## Injuries
injury_events <- head(casualties[order(casualties$injuries, decreasing = T), ], 10)
head(injury_events[, c("EVTYPE", "injuries")],10)
## EVTYPE injuries
## 741 tornado 91346
## 762 tstm wind 6957
## 154 flood 6789
## 116 excessive heat 6525
## 410 lightning 5230
## 240 heat 2100
## 382 ice storm 1975
## 138 flash flood 1777
## 671 thunderstorm wind 1488
## 209 hail 1361
the first is calculate the appropiate damage, it comes in two fields, in dollars “..DMG” and it’s exponent “..DMGEXP”, the exponent comes in h -> hundred, k -> thousand, m -> million, b -> billion and have to be calculate with only one scale.
The next process is to calculate the damage in dollars
transfor <- function(e) {
if (e %in% c('h', 'H'))
return(2)
else if (e %in% c('k', 'K'))
return(3)
else if (e %in% c('m', 'M'))
return(6)
else if (e %in% c('b', 'B'))
return(9)
else if (!is.na(as.numeric(e)))
return(as.numeric(e))
else if (e %in% c('', '-', '?', '+'))
return(0)
else {
stop("Invalid exponent")
}
}
exp_dmg <- sapply(tormen$PROPDMGEXP, FUN=transfor)
tormen$prop_dmg <- tormen$PROPDMG * (10 ** exp_dmg)
exp_crop <- sapply(tormen$CROPDMGEXP, FUN=transfor)
tormen$crop_dmg <- tormen$CROPDMG * (10 ** exp_crop)
afer apply the function we obtain the real loss by event type:
library(plyr)
econ_loss <- ddply(tormen, .(EVTYPE), summarize,
prop_dmg = sum(prop_dmg),
crop_dmg = sum(crop_dmg))
##top 10 events:
##properties
Ord_dmg_prop <- arrange(econ_loss, desc(prop_dmg))
prop<-head(Ord_dmg_prop[, c("EVTYPE", "prop_dmg")],10)
##crops
Ord_dmg_crop <- arrange(econ_loss, desc(crop_dmg))
crop<-head(Ord_dmg_crop[, c("EVTYPE", "crop_dmg")],10)
The following plots shows the most severe weather events types in health impact in US (1950-2011)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.1
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.4.1
p1 <- ggplot(data=fatal_events,
aes(x=reorder(EVTYPE, fatalities), y=fatalities, fill=fatalities)) +
geom_bar(stat="identity") +
coord_flip() +
ylab("Number of Deaths") +
xlab("Event type") +
theme(legend.position="none")
p2 <- ggplot(data=injury_events,
aes(x=reorder(EVTYPE, injuries), y=injuries, fill=injuries)) +
geom_bar(stat="identity") +
coord_flip() +
ylab("Number of injuries") +
xlab("Event type") +
theme(legend.position="none")
grid.arrange(p1, p2,ncol=2, top="The most damaging climatic events in US (1950-2011)")
Tornados are the weather event with the most number of deaths and injuries among all event types. There are almost 6,000 deaths and more than 10,000 injuries between 1950 to 2011 in US due to tornados. The other event types that are most dangerous with respect to population health are excessive heat, flash floods and tstm winds.
The following plots shows the most severe events types in economic impact in US (1950-2011)
library(ggplot2)
library(gridExtra)
c1 <- ggplot(data=prop,
aes(x=reorder(EVTYPE, prop_dmg), y=log10(prop_dmg), fill=prop_dmg )) +
geom_bar(stat="identity") +
coord_flip() +
xlab("Event type") +
ylab("Property damage in dollars (log-scale)") +
theme(legend.position="none")
c2 <- ggplot(data=crop,
aes(x=reorder(EVTYPE, crop_dmg), y=crop_dmg, fill=crop_dmg)) +
geom_bar(stat="identity") +
coord_flip() +
xlab("Event type") +
ylab("Crop damage in dollars") +
theme(legend.position="none")
grid.arrange(c1, c2,ncol=2, top="The Most expensive climate events to US (1950-2011)")
Flash flood and thunderstorms winds were the weather events with the highest economics cost in propeties among all event types, both with more than 10 Billions Dollars. In the crop dagmage, the highest event cost was Drought and Flood.