library(reshape2)
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.2.3
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.3
Storms and other severe weather events can cause both public health and economic problems for States. Many severe events can result in fatalities, injuries, and property damage. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.
The basic goal of this report is to address the following questions:
From the NOAA storm database we obtained data on of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The events in the database start in the year 1950 and end in November 2011.
The raw data comes in a “.csv.bz2 archive” it was read using read.csv(). This information was stored in a data frame called “data”.
if(!file.exists("dataNOAA")){
dir.create("dataNOAA")}
fileUrl<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl,destfile="./dataNOAA/data.bz2")
data<-read.csv("./dataNOAA/data.bz2")
Three data frames were generated by selecting columns from the data frame “data”: ndata2, ndatab2
ndata2: contains columns FATALITIES and EVTYPE (Event Type) Event types were arranged according to the number of fatalities and those events which represent 80% of accumulated fatalities were selected.
ndata<-data[,c(7,8,23)]
ndata2<-aggregate(data=ndata,FATALITIES~EVTYPE,sum)
ndata2<- transform(ndata2, EVTYPE = reorder(EVTYPE, FATALITIES))
ndata2<-ndata2[order(-ndata2$FATALITIES),]
suma<-sum(ndata2$FATALITIES)
ndata2$p<-ndata2$FATALITIES/suma
ndata2$s<-cumsum(ndata2$p)
ndata2<-subset(ndata2,ndata2$s<0.8)
ndatab2: contains columns STATE, EVTYPE (Event Type) and ECONOMIC_losses (Economic losses). For constructing this data frame it was necessesary to convert columns PROPDMGEXP and CROPDMGEXP to factor numbers that were multiplied by columns ROPDMG and CROPDMG respectively. Then ECONOMIC_losses column was rearranged and thos events that represent alltogether 80% of losses were selected.
ndatab<-data[,c(8,25,26,27,28)]
ndatab$PROPDMGEXP_factor <- sapply(data$PROPDMGEXP, function(x) {
if(x=="K") 1000 else
if(x=="M") 1000000 else
if(x=="B") 1000000000 else
1})
ndatab$CROPDMGEXP_factor <- sapply(data$CROPDMGEXP, function(x) {
if(x=="K") 1000 else
if(x=="M") 1000000 else
if(x=="B") 1000000000 else
1})
ndatab$PROPDMG_USDM<-ndatab$PROPDMGEXP_factor*ndatab$PROPDMG/1000000
ndatab$CROPDMG_USDM<-ndatab$CROPDMGEXP_factor*ndatab$CROPDMG/1000000
ndatab$PROPDMGEXP<-NULL
ndatab$CROPDMGEXP<-NULL
ndatab$CROPDMG<-NULL
ndatab$CROPDMGEXP_factor<-NULL
ndatab$PROPDMG<-NULL
ndatab$PROPDMGEXP_factor<-NULL
ndatab$ECONOMIC_losses<-ndatab$PROPDMG_USD+ndatab$CROPDMG_USD
ndatab$PROPDMG_USD<-NULL
ndatab$CROPDMG_USD<-NULL
ndatab<-aggregate(data=ndatab,ECONOMIC_losses~EVTYPE,sum)
ndatab<- transform(ndatab, EVTYPE = reorder(EVTYPE,ECONOMIC_losses))
ndatab<-ndatab[order(-ndatab$ECONOMIC_losses),]
suma2<-sum(ndatab$ECONOMIC_losses)
ndatab$p<-ndatab$ECONOMIC_losses/suma2
ndatab$s<-cumsum(ndatab$p)
ndatab<-subset(ndatab,ndatab$s<0.8)
In the following graph the main weather events are shown.
g1<-ggplot(data=ndata2, aes(x=EVTYPE, y=FATALITIES) )+
geom_bar(stat="identity")+coord_flip()+ggtitle("Main weather events measured in fatalities")+xlab("Event Type")+ylab("Fatalities")
print(g1)
Tornados are by large the most damaging event, then excessive heat and flash flood.
The distribution of fatalities due to tornados, exccesive heat and flahs floods varies along the states. The largest fatality numbers due to Tornados are in AL,TX,MS,MO,AR and TN. The largest fatality numbers due to Flash Floods are in TX,MO and MS. The largest fatality numbers due to Excessive heat are in PA, IL and TX
ndata3<-aggregate(data=ndata,FATALITIES~EVTYPE+STATE,sum)
ndata3<-inner_join(ndata3,ndata2,by="EVTYPE")
## Warning in inner_join_impl(x, y, by$x, by$y): joining factors with
## different levels, coercing to character vector
ndata3$FATALITIES.y<-NULL
ndata3$p<-NULL
ndata3$s<-NULL
ndata3$FATALITIES<-ndata3$FATALITIES.x
ndata3$FATALITIES.x<-NULL
ndata3$EVTYPE<-as.factor(ndata3$EVTYPE)
ndata3<-subset(ndata3,ndata3$EVTYPE=="TORNADO" | ndata3$EVTYPE=="EXCESSIVE HEAT" | ndata3$EVTYPE=="FLASH FLOOD")
g2<-ggplot(data=ndata3, aes(STATE,FATALITIES) )+
geom_bar(stat="identity")+coord_flip()+facet_wrap(~EVTYPE)
print(g2)
In the following graph the main weather events that produced are depicted.
g3<-ggplot(data=ndatab, aes(x=EVTYPE, y=ECONOMIC_losses) )+
geom_bar(stat="identity")+coord_flip()+ggtitle("Main weather events measured in ECONOMIC losses")+xlab("Event Type")+ylab("ECONOMIC losses / USD Millions")
print(g3)
Across the United States, floods, hurricanes,tornados are the most harmful with respect to population health between 1950 and 2011.
Floods, hurricanes,Tornados are the weather events that produced the largest economic losses between 1950 and 2011.