Each year in the US some wheather events are present leaving in its wake severe economic and human consequences. Here are shown the top 10 events that are more harmful with respect to the population health and the ones that have the greatest economic consequences using the National Oceanic and Atmospheric Administration’s(NOAA) storm database.
It was created a directory where the downloaded data was stored.
dir.create("./wheather data")
## Warning in dir.create("./wheather data"): '.\wheather data' already exists
URL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(URL, destfile = "./wheather data/data.csv")
Then the data was loaded into R and was processed for analysis. The choosen variables were EVTYPE, FATALITIES, INJURIES, PROPDMG and CROPDMG
data <- read.csv("./wheather data/data.csv")
##Choosing the relevant variables for the analysis.
names(data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
proc.data <- data[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "CROPDMG")]
table(complete.cases(proc.data))## there is no missing data
##
## TRUE
## 902297
## Fatalities and injuries were merged in a single column
library(tidyr)
proc.data <- gather(proc.data, HARMFULTYPE, HARMNUMBER, FATALITIES:INJURIES)
## Also Property and crop damage were merged
proc.data <- gather(proc.data, ECONOMICTYPE, ECONOMICCOST, PROPDMG:CROPDMG)
So then, two data sets were created, the first one contains the data ordered by the decreasing number of human harms and the other by the decreasing number of econonomic costs, so each data set, only contains their respective variables to avoid duplications from the code above.
human_cost <- proc.data[order(proc.data$HARMNUMBER, decreasing = TRUE),]
human_cost <- human_cost[, c("EVTYPE","HARMFULTYPE", "HARMNUMBER")]
human_cost <- unique(human_cost)
economic_cost <- proc.data[order(proc.data$ECONOMICCOST, decreasing = TRUE),]
economic_cost <- economic_cost[, c("EVTYPE", "ECONOMICTYPE", "ECONOMICCOST")]
economic_cost <- unique(economic_cost)
Here are shown the 10 most harmful events in the US. It is shown a greater increase in the number of injuries than fatalities for each of the 10 events. However, it is shown in the case for the tornado that it has the bigger amount of injuries and fatalities among all the events.
## choosing the 10 most harmful events
human_cost_top10 <- aggregate(human_cost$HARMNUMBER, by= list(human_cost$EVTYPE, human_cost$HARMFULTYPE), FUN=sum)
names(human_cost_top10) <- c("EVTYPE", "HARMTYPE", "TOTAL")
human_cost_top10 <- human_cost_top10[order(human_cost_top10$TOTAL, decreasing = T),]
## looking for the names of the new data set of the top 10 events
names_top_10 <- human_cost_top10[1:10,"EVTYPE"]
##plotting according to the top10 events.
library(ggplot2)
hrm.events <- ggplot(human_cost[human_cost$EVTYPE %in% names_top_10 ,], aes(EVTYPE, HARMNUMBER, col=HARMFULTYPE))
hrm.events + geom_boxplot() + facet_grid(.~HARMFULTYPE) + theme(axis.text.x = element_text(angle = 90))
As it was done above, here are shown the top 10 events which have the greatest economic consequences. As it is shown, there is no greater difference between the crop and property consequences. However, there are some states along the US with greater property costs after some events, such is the case of tornados, thunderstorm winds, highwinds, floods and flashfloods.
## choosing the 10 most harmful events
economic_cost_top10 <- aggregate(economic_cost$ECONOMICCOST, by= list(economic_cost$EVTYPE, economic_cost$ECONOMICTYPE), FUN=sum)
names(economic_cost_top10) <- c("EVTYPE", "ECONOMICTYPE", "TOTAL")
economic_cost_top10 <- economic_cost_top10[order(economic_cost_top10$TOTAL, decreasing = T),]
## looking for the names of the new data set of the top 10 events
names_top_10_economic <- economic_cost_top10[1:10,"EVTYPE"]
##plotting according to the top10 events.
library(ggplot2)
econ.events <- ggplot(economic_cost[economic_cost$EVTYPE %in% names_top_10_economic ,], aes(EVTYPE, ECONOMICCOST, col=ECONOMICTYPE))
econ.events + geom_boxplot() + facet_grid(.~ECONOMICTYPE) + theme(axis.text.x = element_text(angle = 90))