Synopsis

The data for this analysis study comes from National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. This analysis will drive our focus on a strategy that focuses our efforts toward high impact events. At the end of this study, you will have better understanding on the highly probably, with either high impact of frequent impact events on population health.

Below links are reference to the source of data used from NOAA storm database, for better understanding of measures captured and methodologies used to capture them.

National Weather Service Storm Data Documentation National Climatic Data Center Storm EventsFAQ

Events can severly impact poplulation health, whether on a low probability single severe event, or frequent yet of mild severity event. This study will start with an overview and will dig deeper into understanding the focus areas where to build a strategy that can drastically drop the injury index level, caused by weather event, on population’s health in the first place.

Data processing

library(knitr)
library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:dplyr':
## 
##     intersect, setdiff, union

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

if (!file.exists("repdata_data_StormData.csv.bz2")) {
        fileurl<- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
        filepath <- file.path(getwd(),"repdata_data_StormData.csv.bz2")
        download.file(fileurl, destfile = filepath)
        
}

stormsdata<-tbl_df(read.csv("repdata_data_StormData.csv.bz2"))

Impact of Event

Results:Top Injuries contribution over all time

evtype_stormsdata <- stormsdata %>%  group_by(EVTYPE) %>% summarize(suminjuries=sum(INJURIES, na.rm = TRUE), avginjuries=mean(INJURIES, na.rm=TRUE))
evtype_stormsdata<-evtype_stormsdata %>% arrange(desc(suminjuries))
evtype_stormsdata <- mutate(evtype_stormsdata,percentage =suminjuries*100/sum(suminjuries,na.rm = TRUE))

evtype_stormsdata[1:11,]

## # A tibble: 11 x 4
##    EVTYPE            suminjuries avginjuries percentage
##    <fct>                   <dbl>       <dbl>      <dbl>
##  1 TORNADO                 91346     1.51        65.0  
##  2 TSTM WIND                6957     0.0316       4.95 
##  3 FLOOD                    6789     0.268        4.83 
##  4 EXCESSIVE HEAT           6525     3.89         4.64 
##  5 LIGHTNING                5230     0.332        3.72 
##  6 HEAT                     2100     2.74         1.49 
##  7 ICE STORM                1975     0.985        1.41 
##  8 FLASH FLOOD              1777     0.0327       1.26 
##  9 THUNDERSTORM WIND        1488     0.0180       1.06 
## 10 HAIL                     1361     0.00471      0.968
## 11 WINTER STORM             1321     0.116        0.940

barplot(evtype_stormsdata[1:10,]$percentage, names.arg =  evtype_stormsdata[1:10,]$EVTYPE,cex.names =0.8, main="Top 11 contributing events to 90% of total injuries", xlab = "Events Type", ylab = "Percentage of contribution to total Injuries", ylim = c(0,100))

Figure 1: Top 11 events causing 90% of total injuries Number of injuries caused by each of the event

The above table shows that 11 out of 975 events, causes more than 90% of the total injuries, where tornado alone leading with 65%.

The total percentage of top 11 events listed above is:

sum(evtype_stormsdata$percentage[1:11])

## [1] 90.28023

Top Injuries contribution per event occurance

sort_stormsdata <- select(stormsdata,c(8,23,24))
sort_stormsdata<-sort_stormsdata %>% arrange(desc(INJURIES))
sort_stormsdata <-mutate(sort_stormsdata, EVTYPE = as.character(EVTYPE))
topinjuries<-sort_stormsdata[which(sort_stormsdata$INJURIES>500),]
topinjuries <-mutate(topinjuries, EVTYPE = as.character(EVTYPE))
topinjuries

## # A tibble: 17 x 3
##    EVTYPE            FATALITIES INJURIES
##    <chr>                  <dbl>    <dbl>
##  1 TORNADO                   42     1700
##  2 ICE STORM                  1     1568
##  3 TORNADO                   90     1228
##  4 TORNADO                   36     1150
##  5 TORNADO                  158     1150
##  6 FLOOD                      2      800
##  7 TORNADO                   44      800
##  8 TORNADO                  116      785
##  9 HURRICANE/TYPHOON          7      780
## 10 FLOOD                      0      750
## 11 TORNADO                   20      700
## 12 FLOOD                     11      600
## 13 TORNADO                  114      597
## 14 TORNADO                   17      560
## 15 FLOOD                      0      550
## 16 EXCESSIVE HEAT             2      519
## 17 TORNADO                   57      504

Table 2: Top 17 injuries per event occurance

Looking into the injuries from the other angle, by examining the highest impact per occurance, we can still see that the Tornados still lead in the total injuries per occurance. Ice storms comes in second place as one single occurance. It can be seen that tornado has appeared 10 times in the top 17 injuries per event occurance, and Flood comes second. This mean these two events has lead the top impact when coming with high severity. ## Results

tbl_topinjuries <- as.data.frame(table(topinjuries$EVTYPE), dnn="occurance")
tbl_topinjuries <- tbl_topinjuries %>% arrange(desc(Freq))
tbl_topinjuries

##                Var1 Freq
## 1           TORNADO   10
## 2             FLOOD    4
## 3    EXCESSIVE HEAT    1
## 4 HURRICANE/TYPHOON    1
## 5         ICE STORM    1

Table 3: Occurance times of event in the top 17 injuries per event Occurance

Probability & Consistency of impact

Results

occurance <- as.data.frame(table(sort_stormsdata$EVTYPE,dnn = "Occurance"))
occurance <- occurance %>% arrange(desc(Freq))
occurance[1:10,]

##             Occurance   Freq
## 1                HAIL 288661
## 2           TSTM WIND 219940
## 3   THUNDERSTORM WIND  82563
## 4             TORNADO  60652
## 5         FLASH FLOOD  54277
## 6               FLOOD  25326
## 7  THUNDERSTORM WINDS  20843
## 8           HIGH WIND  20212
## 9           LIGHTNING  15754
## 10         HEAVY SNOW  15708

Table 4: Frequently reccuring events

As can be seen realised from table above the tornado, thunderstorm, and flood are on top 10 of recurring events, and with high injuries numbers, as was concluded in first section of this study. # Economical Impact ## Results

eco_stormsdata <- stormsdata %>%  group_by(EVTYPE) %>% summarize( prodexp.BUSD = round(sum( PROPDMG , na.rm = TRUE)/1000000,2), cropexp.BUSD = round(sum( CROPDMG , na.rm=TRUE)/1000000,2) , totaleconexp.BUSD =  round(sum( PROPDMG , na.rm = TRUE)/1000000 + sum( CROPDMG , na.rm=TRUE)/1000000,2 ))
eco_stormsdata <- eco_stormsdata %>% arrange(desc(totaleconexp.BUSD))
pietableX<-as.vector(t(eco_stormsdata[1:5, 4]))
pielabel <-as.vector(t( within(eco_stormsdata[1:5,],label<-paste(EVTYPE,totaleconexp.BUSD,sep = ":(BUSD)"))[,5]))
pie(pietableX, labels = pielabel, col = rainbow(22), radius = 1)

# Conclusion Inconclusion, Tornado and flood has the highest impact in terms of population health and economy. That is because they have the highest impact during one occurance, and as an aggregated impact over events occurances over period of study.

Storms focus Strategy on High Impact Events on population Health and Economy

Shakeeb Chehade

6/14/2020