In this project, we examine the Storm dataset issued by U.S. National Oceanic and Atmospheric Administration (NOAA). Our focus is to find out, which event types are the most dangerous in terms of human health and damage in US Dollars. In both cases, tornados cause the most harm. In case of injuries and fatalities, tornados cause over 66% of the cases. In case of economic/financial damage, almost 28% of the damages are caused by tornados and over 95% are caused by only 11 event types.
First, we download our data from Storm Data and read it using read.csv(). Then we take a look at the dimensions of our dataset.
if(!file.exists("stormdata.csv")){
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile ="stormdata.csv.bz2")
}
stormdata <- read.csv("stormdata.csv")
dim(stormdata)
## [1] 902297 37
There are 902297 observations of 37 variables. Detailed description of the dataset can be found here.
Before we start exploring, we load the libraries that we will use.
library(dplyr)
library(ggplot2)
The event type is stored in the variable EVTYPE. To measure the effect on the population health, we consider the variables FATALITIES and INJURIES. First, we aggregate the data.
health_damage <- aggregate(cbind(INJURIES,FATALITIES)~EVTYPE, stormdata, sum)
Some events have caused both injuries and death, some caused only one of the two, and there are quite a few that have caused no damage on public health. We decided to count the sum of injuries and fatalities (stored in the variable TOTAL ) to get an idea, which storm events are the most harmful.
health_damage$TOTAL <- health_damage$INJURIES + health_damage$FATALITIES
health_damage <- arrange(health_damage, desc(TOTAL))
health_damage <- filter(health_damage, TOTAL>0)
There are 220 events, when there was a damage on human health.
Let’s examine how the variable TOTAL looks like.
summary(health_damage$TOTAL)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 1.75 5.00 707.60 44.25 96979.00
One can see, that the mean is much higher than the median. This means there are only a few events that cause the majority of the injuries and fatalities. Let’s take a look at the most dangerous ones. We consider the ones that are over the mean.
health_harmful<- filter(health_damage, TOTAL > mean(TOTAL))
There are about 18 dangerous events. Let’s take a look at them in a barplot.
ggplot(health_harmful, aes(x=reorder(EVTYPE,-TOTAL),y=TOTAL )) +
geom_bar(stat="identity", color="lightblue", fill="lightblue") +
theme(axis.text.x=element_text(angle=45, hjust=0.9)) +
labs(x="Event Type", y="Injuries and Fatalities", title="Injuries and Fatalities per Event Type")
The event TORNADO far exceeds all type of events in causing injuries and fatalities, precisely it causes 66.3853236% of all damages on human health.
As far as economic consequences, we can take a look the damage in
dollars. In our dataset, there are two variables measuring economic
damage, PROPDMG measures the damage on properties and CROPDMG measures
damage in the agricultural sector. We perform similar analysis on the
financial damage as we did in the previous case.
First, we aggregate our data, then we calculate the sum of PROPDMG and
CROPDMG as TOTAL and order the data in descening order.
financial_damage <- aggregate(cbind(PROPDMG,CROPDMG)~EVTYPE, stormdata, sum)
financial_damage$TOTAL <- financial_damage$PROPDMG + financial_damage$CROPDMG
financial_damage <- arrange(financial_damage, desc(TOTAL))
financial_damage <- filter(financial_damage, TOTAL>0)
There are 431 events, when there was a financial damage.
Let’s examine how the total data looks like.
summary(financial_damage$TOTAL)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 10 59 28451 601 3312277
Again, the mean is much higher than the median. That means, that the majority of the economic damages are caused by a few events. Let’s take a look at the ones that are higher than the mean.
financial_harmful<- filter(financial_damage, TOTAL > mean(TOTAL))
There are about 21 dangerous events. Let’s take a look at them in a barplot.
ggplot(financial_harmful, aes(x=reorder(EVTYPE,-TOTAL),y=TOTAL )) +
geom_bar(stat="identity", color="lightblue", fill="lightblue") +
theme(axis.text.x=element_text(angle=45, hjust=0.9)) +
labs(x="Event Type", y="Economic damage in dollars", title="Property and Crop Damage per Event Type") + geom_hline(yintercept = 100000,color="orange")
Again, the event TORNADO far exceeds all type of events in causing property and crop damage, precisely 27.9552136% of all damages caused. In this case though, there are a few more events that are worth to consider. Roughly, let’s cut at the level 100000 $:
financial_harmful_top<- filter(financial_damage, TOTAL > 100000)
There are 11 event types that cause the majority of the financial damages, precisely they present 95.452616% of the cases.
In our research, we examined which type of storm event causes the most harm on human health and financial damages. In both cases, tornados are the most dangerous.