This analysis focuses on the fatalites, injuries and economic damages caused by the the events grouped by the type. For each type of event, I calculate the avarage and total fatalities, injuries and economic damages. I explore type of events this is the most harmful with respect to population health, and explore the type of event that caused the greatest economic consequences which should have both high average and high total economic damages.
Loading the required packages and set the option to show the code throughout this file.
library(knitr)
opts_chunk$set(echo=TRUE, result="hide", fig.width=11)
library(reshape2)
library(lattice)
library(ggplot2)
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# library(plyr)
We read the raw data, then select the variables which describe type of events, fatalities, injuries, properties and crops damage, and save it into dat
.
data <- read.csv("repdata-data-StormData.csv")
names(data) <- tolower(names(data))
dat <- select(data, evtype, fatalities, injuries, propdmg, cropdmg)
We first have a look at the total fatalities and mean fatalities againtst types of events. It is reasonable to find out which type of weather event has caused the bigest total and mean fatalities.
dat1 <- select(dat, evtype, fatalities) %>%
group_by(evtype) %>%
summarize(average_fatalities = mean(fatalities), total_fatalities = sum(fatalities))
filter(dat1,total_fatalities == max(total_fatalities))
## Source: local data frame [1 x 3]
##
## evtype average_fatalities total_fatalities
## (fctr) (dbl) (dbl)
## 1 TORNADO 0.0928741 5633
filter(dat1,average_fatalities == max(average_fatalities))
## Source: local data frame [1 x 3]
##
## evtype average_fatalities total_fatalities
## (fctr) (dbl) (dbl)
## 1 TORNADOES, TSTM WIND, HAIL 25 25
We can see that the type of TORNADO have the highest total fatalities at 5633, but it the average of fatalities of this event is as low as 0.1, which means TORNADO could be a big disaster, but normally it doesn’t kill much people, while TORNADOES, TSTM WIND, HAIL has the highest average fatalities, but it only once caused only 25 death. So let’s see the table of events has at relatively both high total fatalities and high average fatalities, we set total fatalities more than 50 and average fatalities big than 0.5:
filter(dat1, total_fatalities>50, average_fatalities>.5) %>% arrange(desc(total_fatalities))
## Source: local data frame [8 x 3]
##
## evtype average_fatalities total_fatalities
## (fctr) (dbl) (dbl)
## 1 EXCESSIVE HEAT 1.1340882 1903
## 2 HEAT 1.2216428 937
## 3 RIP CURRENT 0.7829787 368
## 4 AVALANCHE 0.5803109 224
## 5 RIP CURRENTS 0.6710526 204
## 6 HEAT WAVE 2.3243243 172
## 7 EXTREME HEAT 4.3636364 96
## 8 HURRICANE/TYPHOON 0.7272727 64
From the above table, we can conclude that the type of HEAT related seem to be most harmful with respect to fatalities.
Now we turn our focues on the injuries casued by those events, we select some type of weather events both have high total injuries and high mean injuries.
dat1 <- select(dat, evtype, injuries) %>%
group_by(evtype) %>%
summarize(mean_injury = mean(injuries), total_injury = sum(injuries)) %>%
filter(total_injury > max(total_injury)/500, mean_injury>1) %>%
arrange(desc(total_injury))
dat1
## Source: local data frame [8 x 3]
##
## evtype mean_injury total_injury
## (fctr) (dbl) (dbl)
## 1 TORNADO 1.506067 91346
## 2 EXCESSIVE HEAT 3.888558 6525
## 3 HEAT 2.737940 2100
## 4 HURRICANE/TYPHOON 14.488636 1275
## 5 FOG 1.364312 734
## 6 DUST STORM 1.030445 440
## 7 HEAT WAVE 4.175676 309
## 8 GLAZE 6.750000 216
From the result, we can see that TORNADO caused maximum total injuries. HURRICANE/TYPHOON caused the maximum average injuries, the type of events related to HEAT has also caused many injuries. Then we caculate the related types for the total injuries.
dat1<- select(dat, evtype, fatalities, injuries) %>%
group_by(evtype) %>%
summarize(total_fatalities = sum(fatalities), total_injuries =sum(injuries))
filter(dat1, grepl("HEAT", as.character(evtype))) %>%
summarise(heat_total_injuries = sum(total_injuries), heat_total_fatal= sum(total_fatalities)) %>% as.data.frame
## heat_total_injuries heat_total_fatal
## 1 9154 3138
filter(dat1, grepl("TORNADO", as.character(evtype))) %>%
summarise(tornado_total_injuries = sum(total_injuries), tornado_total_fatal= sum(total_fatalities)) %>% as.data.frame
## tornado_total_injuries tornado_total_fatal
## 1 91407 5661
In conclusion, TORNADO related events although with relatively small average fatalities are most harmful with respect to population health for these reasons: 1. it has caused total of 5661 death from the record which is almost doubled the death caused by HEAT. 2. it casued total number of 91407 which is far more than the injuries caused by HEAT.