The data for this analysis study comes from National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. This analysis will drive our focus on a strategy that focuses our efforts toward high impact events. At the end of this study, you will have better understanding on the highly probably, with either high impact of frequent impact events on population health.
Below links are reference to the source of data used from NOAA storm database, for better understanding of measures captured and methodologies used to capture them.
National Weather Service Storm Data Documentation National Climatic Data Center Storm EventsFAQ
Events can severly impact poplulation health, whether on a low probability single severe event, or frequent yet of mild severity event. This study will start with an overview and will dig deeper into understanding the focus areas where to build a strategy that can drastically drop the injury index level, caused by weather event, on population’s health in the first place.
library(knitr)
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:dplyr':
##
## intersect, setdiff, union
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
if (!file.exists("repdata_data_StormData.csv.bz2")) {
fileurl<- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
filepath <- file.path(getwd(),"repdata_data_StormData.csv.bz2")
download.file(fileurl, destfile = filepath)
}
stormsdata<-tbl_df(read.csv("repdata_data_StormData.csv.bz2"))
evtype_stormsdata <- stormsdata %>% group_by(EVTYPE) %>% summarize(suminjuries=sum(INJURIES, na.rm = TRUE), avginjuries=mean(INJURIES, na.rm=TRUE))
evtype_stormsdata<-evtype_stormsdata %>% arrange(desc(suminjuries))
evtype_stormsdata <- mutate(evtype_stormsdata,percentage =suminjuries*100/sum(suminjuries,na.rm = TRUE))
evtype_stormsdata[1:11,]
## # A tibble: 11 x 4
## EVTYPE suminjuries avginjuries percentage
## <fct> <dbl> <dbl> <dbl>
## 1 TORNADO 91346 1.51 65.0
## 2 TSTM WIND 6957 0.0316 4.95
## 3 FLOOD 6789 0.268 4.83
## 4 EXCESSIVE HEAT 6525 3.89 4.64
## 5 LIGHTNING 5230 0.332 3.72
## 6 HEAT 2100 2.74 1.49
## 7 ICE STORM 1975 0.985 1.41
## 8 FLASH FLOOD 1777 0.0327 1.26
## 9 THUNDERSTORM WIND 1488 0.0180 1.06
## 10 HAIL 1361 0.00471 0.968
## 11 WINTER STORM 1321 0.116 0.940
barplot(evtype_stormsdata[1:10,]$percentage, names.arg = evtype_stormsdata[1:10,]$EVTYPE,cex.names =0.8, main="Top 11 contributing events to 90% of total injuries", xlab = "Events Type", ylab = "Percentage of contribution to total Injuries", ylim = c(0,100))
Figure 1: Top 11 events causing 90% of total injuries Number of injuries caused by each of the event
The above table shows that 11 out of 975 events, causes more than 90% of the total injuries, where tornado alone leading with 65%.
The total percentage of top 11 events listed above is:
sum(evtype_stormsdata$percentage[1:11])
## [1] 90.28023
sort_stormsdata <- select(stormsdata,c(8,23,24))
sort_stormsdata<-sort_stormsdata %>% arrange(desc(INJURIES))
sort_stormsdata <-mutate(sort_stormsdata, EVTYPE = as.character(EVTYPE))
topinjuries<-sort_stormsdata[which(sort_stormsdata$INJURIES>500),]
topinjuries <-mutate(topinjuries, EVTYPE = as.character(EVTYPE))
topinjuries
## # A tibble: 17 x 3
## EVTYPE FATALITIES INJURIES
## <chr> <dbl> <dbl>
## 1 TORNADO 42 1700
## 2 ICE STORM 1 1568
## 3 TORNADO 90 1228
## 4 TORNADO 36 1150
## 5 TORNADO 158 1150
## 6 FLOOD 2 800
## 7 TORNADO 44 800
## 8 TORNADO 116 785
## 9 HURRICANE/TYPHOON 7 780
## 10 FLOOD 0 750
## 11 TORNADO 20 700
## 12 FLOOD 11 600
## 13 TORNADO 114 597
## 14 TORNADO 17 560
## 15 FLOOD 0 550
## 16 EXCESSIVE HEAT 2 519
## 17 TORNADO 57 504
Table 2: Top 17 injuries per event occurance
Looking into the injuries from the other angle, by examining the highest impact per occurance, we can still see that the Tornados still lead in the total injuries per occurance. Ice storms comes in second place as one single occurance. It can be seen that tornado has appeared 10 times in the top 17 injuries per event occurance, and Flood comes second. This mean these two events has lead the top impact when coming with high severity. ## Results
tbl_topinjuries <- as.data.frame(table(topinjuries$EVTYPE), dnn="occurance")
tbl_topinjuries <- tbl_topinjuries %>% arrange(desc(Freq))
tbl_topinjuries
## Var1 Freq
## 1 TORNADO 10
## 2 FLOOD 4
## 3 EXCESSIVE HEAT 1
## 4 HURRICANE/TYPHOON 1
## 5 ICE STORM 1
Table 3: Occurance times of event in the top 17 injuries per event Occurance
occurance <- as.data.frame(table(sort_stormsdata$EVTYPE,dnn = "Occurance"))
occurance <- occurance %>% arrange(desc(Freq))
occurance[1:10,]
## Occurance Freq
## 1 HAIL 288661
## 2 TSTM WIND 219940
## 3 THUNDERSTORM WIND 82563
## 4 TORNADO 60652
## 5 FLASH FLOOD 54277
## 6 FLOOD 25326
## 7 THUNDERSTORM WINDS 20843
## 8 HIGH WIND 20212
## 9 LIGHTNING 15754
## 10 HEAVY SNOW 15708
Table 4: Frequently reccuring events
As can be seen realised from table above the tornado, thunderstorm, and flood are on top 10 of recurring events, and with high injuries numbers, as was concluded in first section of this study. # Economical Impact ## Results
eco_stormsdata <- stormsdata %>% group_by(EVTYPE) %>% summarize( prodexp.BUSD = round(sum( PROPDMG , na.rm = TRUE)/1000000,2), cropexp.BUSD = round(sum( CROPDMG , na.rm=TRUE)/1000000,2) , totaleconexp.BUSD = round(sum( PROPDMG , na.rm = TRUE)/1000000 + sum( CROPDMG , na.rm=TRUE)/1000000,2 ))
eco_stormsdata <- eco_stormsdata %>% arrange(desc(totaleconexp.BUSD))
pietableX<-as.vector(t(eco_stormsdata[1:5, 4]))
pielabel <-as.vector(t( within(eco_stormsdata[1:5,],label<-paste(EVTYPE,totaleconexp.BUSD,sep = ":(BUSD)"))[,5]))
pie(pietableX, labels = pielabel, col = rainbow(22), radius = 1)
# Conclusion Inconclusion, Tornado and flood has the highest impact in terms of population health and economy. That is because they have the highest impact during one occurance, and as an aggregated impact over events occurances over period of study.