STORM DATA ANALYSIS - REPRODUCIBLE RESEARCH ASSIGNMENT BY MOSES LUKE

OVERVIEW

Synopsis: Exploratory Analysis of the NOAA Storm Database (1950-2011) to analyze severe weather outcomes.

Goals: 1. Identify events that are harmful to population health. 2. Identify events that have the greatest economic consequences.

DATA PROCESSING

Let us download zip file, unzip it and assign it to different dataframe

library(tidyverse)
## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang
## -- Attaching packages ---------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.1     v purrr   0.3.3
## v tibble  2.1.1     v dplyr   0.8.3
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## -- Conflicts ------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", dest="tmp.bz2", method="curl")
df <- read.csv(bzfile("tmp.bz2"), header=TRUE, sep=",", stringsAsFactors=FALSE)

Showing download file

head(df)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

RESULTS

List of events # Question 1 Which Types of Events are most harmful to population health?

List of Fatalities and injuries which affect the health then sum of both assigned to column health damage

df.healthDM<- df %>%
  count(EVTYPE,FATALITIES, INJURIES, sort=TRUE)%>%
  mutate( healthDam = FATALITIES + INJURIES)%>%
  count(EVTYPE, healthDam>1, sort=TRUE)
head(df.healthDM)
## # A tibble: 6 x 3
##   EVTYPE         `healthDam > 1`     n
##   <chr>          <lgl>           <int>
## 1 TORNADO        TRUE              551
## 2 EXCESSIVE HEAT TRUE              115
## 3 TSTM WIND      TRUE               70
## 4 FLASH FLOOD    TRUE               53
## 5 WINTER STORM   TRUE               53
## 6 LIGHTNING      TRUE               50

The above code displays types of events which are most to population health.

Chart of Harmful Effect to Health

ggplot(df.healthDM[1:10,], aes(EVTYPE,n, fill= EVTYPE))+
geom_bar(stat="identity") + theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))+ggtitle("Top 10 Events with Highest Total Health Damage") +labs(x="EVENT TYPE", y="Total Health Damage")

Question 2

Which types of Event have the greatest economic consequences? Group events by economic costs We have property damage and property damage exp Crop damage and crop damage exp

df.EconDam<-df %>%
  count(EVTYPE,PROPDMG,CROPDMG, sort=TRUE)%>%
  mutate( EconDam = PROPDMG + CROPDMG)%>%
  count(EVTYPE, EconDam>0, sort=TRUE)
head(df.EconDam)
## # A tibble: 6 x 3
##   EVTYPE            `EconDam > 0`     n
##   <chr>             <lgl>         <int>
## 1 HAIL              TRUE           1182
## 2 FLOOD             TRUE           1144
## 3 TSTM WIND         TRUE           1101
## 4 FLASH FLOOD       TRUE           1058
## 5 TORNADO           TRUE            976
## 6 THUNDERSTORM WIND TRUE            594

We can see the most economic damage is caused by Hail then Flood and TSTM Wind. Let’s chart

ggplot(df.EconDam[1:10,], aes(EVTYPE,n, fill= EVTYPE))+
geom_bar(stat="identity") + theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))+ggtitle("Top 10 Events with Highest Total Economic Damage") +labs(x="EVENT TYPE", y="Total Economic Damage")

Economic Impact Assessment

While drought has the largest impact on crops, it is easy to see that flooding produces the largest overall weather-related impact to the economy. With the cost fully associated with crop destruction is not in the scope of this analysis, futher research is required to determine the full economic impact of one of these weather related events.