The goal of this study is to decide which type of natural event in the United States of America happens to be the most harmful with respect to the population health and which has the greatest economic impact. With this objective in mind, we have used the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database in order to be able to answer the problems described above.
In order to read the data we first check wheter or not the file is downloaded in our enviroment. In case its not we downloaded from the website provided. Then we load the data in to a variable and set the cache option to true due to the high weight of the file.
if(!file.exists("stormData.csv.bz2")){
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","stormData.csv.bz2")
}
stormData<-read.csv("stormData.csv.bz2")
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
We are first going to sum the injuries and fatalities produced by each event in different variables.
evInjSum<-aggregate(INJURIES ~ EVTYPE, data = stormData,FUN=sum)
evFatSum<-aggregate(FATALITIES ~ EVTYPE, data = stormData,FUN=sum)
evInjSum<-evInjSum[order(evInjSum[,2],decreasing = TRUE)[1:3],]
evInjSum[,3]<-"injury"
colnames(evInjSum)[2:3]<-c("sum","Casualty")
evFatSum<-evFatSum[order(evFatSum[,2],decreasing = TRUE)[1:3],]
evFatSum[,3]<-"fatality"
colnames(evFatSum)[2:3]<-c("sum","Casualty")
evSum<-rbind(evInjSum,evFatSum)
evSumPlot<-evSum
We eliminate the rows that doesnt have information about the ecoonomic damage. Then we differentiate between the types of exponents and add for each row their propierty damage (PROPDMG) and crop damage (CROPDMG) into a new variable.
stormEc<-stormData[!is.na(stormData$PROPDMG) & stormData$PROPDMG!=0 | !is.na(stormData$CROPDMG) & stormData$CROPDMG!=0,]
for (i in 1:nrow(stormEc)){
aux<-0
if(as.character(stormEc[i,]$PROPDMGEXP)=="B" || as.character(stormEc[i,]$PROPDMGEXP)=="b"){
aux<-stormEc[i,]$PROPDMG*10^(9)
}
if(as.character(stormEc[i,]$PROPDMGEXP)=="M" || as.character(stormEc[i,]$PROPDMGEXP)=="m"){
aux<-stormEc[i,]$PROPDMG*10^(6)
}
if(as.character(stormEc[i,]$PROPDMGEXP)=="K" || as.character(stormEc[i,]$PROPDMGEXP)=="k"){
aux<-stormEc[i,]$PROPDMG*10^(3)
}
if(as.character(stormEc[i,]$PROPDMGEXP)=="H" || as.character(stormEc[i,]$PROPDMGEXP)=="h"){
aux<-stormEc[i,]$PROPDMG*10^(2)
}
if(as.character(stormEc[i,]$CROPDMGEXP)=="B" || as.character(stormEc[i,]$CROPDMGEXP)=="b"){
aux<-aux+stormEc[i,]$CROPDMG*10^(9)
}
if(as.character(stormEc[i,]$CROPDMGEXP)=="M" || as.character(stormEc[i,]$CROPDMGEXP)=="m"){
aux<-aux+stormEc[i,]$CROPDMG*10^(6)
}
if(as.character(stormEc[i,]$CROPDMGEXP)=="K" || as.character(stormEc[i,]$CROPDMGEXP)=="k"){
aux<-aux+stormEc[i,]$CROPDMG*10^(3)
}
if(as.character(stormEc[i,]$CROPDMGEXP)=="H" || as.character(stormEc[i,]$CROPDMGEXP)=="h"){
aux<-aux+stormEc[i,]$CROPDMG*10^(2)
}
stormEc[i,38]<-aux
}
As we did before que sum the total economic damage by each event.
colnames(stormEc)[38]<-"ECDMG"
evEcSum<-aggregate(ECDMG ~ EVTYPE, data = stormEc,FUN=sum)
evEcSum<-evEcSum[order(evEcSum[,2],decreasing = TRUE)[1:3],]
We are going to plot the event that produced the higher injuries and fatalities in different barplots and the 3 events that have the greatest economic impact.
g <- ggplot(data=evSumPlot,aes(x=EVTYPE,y=sum)) + geom_bar(stat="identity") + facet_grid(Casualty ~ .)
plot(g)
g <- ggplot(data=evEcSum,aes(x=EVTYPE,y=ECDMG)) + geom_bar(stat="identity")
plot(g)
Using the barplots produced its very clear that the natural event that produces the greatest economic damage is the flood event with damages of 150319678250 dollars followed by Hurricane and tornado.
Using the same method as before we can see that the natural event that is most harmful to the health are the Tornadoes and they have caused 91346 injuries and 5633 fatalities followed by TSTM WIND and excessive heat.