This project is based on an analysis of the storm database of the U.S. National Oceanic and Atmospheric Administration. We will estimate the potential of each meteorological event to cause both personal damages such as deaths and injuries, as well as monetary damages to property. The final objective is to be able to determine which of the meteorological disasters (hail, hurricanes, floods, typhoons, tornadoes, etc.) cause the greatest damage to health and which one causes the greatest economical impact. The final result of the analysis shows that tornadoes are the disaster that cause the most deaths and injuries to U.S. citizens. On the other hand, the natural disaster that causes the most monetary losses is floods.
The libraries used along this project are the folowing:
library(ggplot2)
library(plyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(utils)
After downloading the data, this is the code used to read it:
storm <- read.csv(bzfile("repdata_data_StormData.csv.bz2"),header = TRUE)
data.table::setDT(storm)
names(storm) <- tolower(names(storm))
storm[,zonenames:=NULL]
storm[,remarks:=NULL]
storm
I decided to delete some variables that are not going to be used, and modified the names to lowercase just for convenience.
Let's create a new variable containing the addition of fatalities and injuries and saving the top value of this variable:
storm[,harm:=injuries+fatalities]
top_harm <- max(storm$harm)
top_harm
## [1] 1742
Subsetting data to find the most harmful with respect to population health
harm_event <- as.character(storm[harm==top_harm]$evtype)
harm_event
## [1] "TORNADO"
In addition let's create a plot to see how much tornadoes are harmful in comparison to the rest of top 10 harmful events:
top_harmful_events <- data.table::setorder(storm[,sum(harm),by=evtype], -"V1")[c(1:10),]$evtype
data3 <- data.table::setDT(reshape2::melt(storm[evtype %in% top_harmful_events,.(fatalities=sum(fatalities),
injuries=sum(injuries)),by=evtype], id.vars = "evtype"))
# library(ggplot2)
ggplot(data3, aes(x = evtype, y = value)) +
geom_bar(stat = "identity", position = "dodge") +
labs(x = "Evtype", y = "Quantity", title = "Total Fatalities & Injuries by Event") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Let's also look at how tornado-caused injuries & fatalities are distributed across all observations:
boxplot(storm$harm[storm$evtype=="TORNADO"]
, main ="Total Fatalities & Injuries by Tornadoes"
,xlab="TORNADO", ylab="Fatalities & Injuries")
But let's take some more information:
chart_data <- c(sum(storm[evtype=="TORNADO"]$injuries), sum(storm[evtype=="TORNADO"]$fatalities))
chart_labels <- c("Injuries", "Fatalities")
piepercent<- round(100*chart_data/sum(chart_data), 1)
chart_labels <- paste0(chart_labels," (",piepercent,"%)")
pie(chart_data, labels = chart_labels, main = "Fatalities & Injuries caused by Tornadoes",col = rainbow(length(chart_data)))
The maximum property damage estimate in Billion dollar:
top_prop_dmg <- max(storm[propdmgexp == "B"]$propdmg)
top_prop_dmg
## [1] 115
Now it's time to calculate the evtype with most damage on crops. Using this instruction we can see that the top damages on crops are a lot lower than property damage.
storm[,c("evtype","cropdmg","cropdmgexp")][cropdmgexp == "B"]
## evtype cropdmg cropdmgexp
## 1: HEAT 0.40 B
## 2: RIVER FLOOD 5.00 B
## 3: DROUGHT 0.50 B
## 4: FREEZE 0.20 B
## 5: ICE STORM 5.00 B
## 6: HURRICANE/TYPHOON 1.51 B
## 7: DROUGHT 1.00 B
## 8: DROUGHT 0.00 B
## 9: DROUGHT 0.00 B
So, we can say that the largest amount of damage is 115 billion dollars on property damage.
Therefore the type of event that causes the most economic consequences is:
dmg_event <- as.character(storm[propdmgexp == "B" & propdmg==115]$evtype)
dmg_event
## [1] "FLOOD"