Synopsis: Exploratory Analysis of the NOAA Storm Database (1950-2011) to analyze severe weather outcomes.
Goals: 1. Identify events that are harmful to population health. 2. Identify events that have the greatest economic consequences.
Let us download zip file, unzip it and assign it to different dataframe
library(tidyverse)
## Registered S3 methods overwritten by 'ggplot2':
## method from
## [.quosures rlang
## c.quosures rlang
## print.quosures rlang
## -- Attaching packages ---------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.1 v purrr 0.3.3
## v tibble 2.1.1 v dplyr 0.8.3
## v tidyr 1.0.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts ------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", dest="tmp.bz2", method="curl")
df <- read.csv(bzfile("tmp.bz2"), header=TRUE, sep=",", stringsAsFactors=FALSE)
Showing download file
head(df)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
List of events # Question 1 Which Types of Events are most harmful to population health?
List of Fatalities and injuries which affect the health then sum of both assigned to column health damage
df.healthDM<- df %>%
count(EVTYPE,FATALITIES, INJURIES, sort=TRUE)%>%
mutate( healthDam = FATALITIES + INJURIES)%>%
count(EVTYPE, healthDam>1, sort=TRUE)
head(df.healthDM)
## # A tibble: 6 x 3
## EVTYPE `healthDam > 1` n
## <chr> <lgl> <int>
## 1 TORNADO TRUE 551
## 2 EXCESSIVE HEAT TRUE 115
## 3 TSTM WIND TRUE 70
## 4 FLASH FLOOD TRUE 53
## 5 WINTER STORM TRUE 53
## 6 LIGHTNING TRUE 50
The above code displays types of events which are most to population health.
ggplot(df.healthDM[1:10,], aes(EVTYPE,n, fill= EVTYPE))+
geom_bar(stat="identity") + theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))+ggtitle("Top 10 Events with Highest Total Health Damage") +labs(x="EVENT TYPE", y="Total Health Damage")
Which types of Event have the greatest economic consequences? Group events by economic costs We have property damage and property damage exp Crop damage and crop damage exp
df.EconDam<-df %>%
count(EVTYPE,PROPDMG,CROPDMG, sort=TRUE)%>%
mutate( EconDam = PROPDMG + CROPDMG)%>%
count(EVTYPE, EconDam>0, sort=TRUE)
head(df.EconDam)
## # A tibble: 6 x 3
## EVTYPE `EconDam > 0` n
## <chr> <lgl> <int>
## 1 HAIL TRUE 1182
## 2 FLOOD TRUE 1144
## 3 TSTM WIND TRUE 1101
## 4 FLASH FLOOD TRUE 1058
## 5 TORNADO TRUE 976
## 6 THUNDERSTORM WIND TRUE 594
We can see the most economic damage is caused by Hail then Flood and TSTM Wind. Let’s chart
ggplot(df.EconDam[1:10,], aes(EVTYPE,n, fill= EVTYPE))+
geom_bar(stat="identity") + theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))+ggtitle("Top 10 Events with Highest Total Economic Damage") +labs(x="EVENT TYPE", y="Total Economic Damage")
While drought has the largest impact on crops, it is easy to see that flooding produces the largest overall weather-related impact to the economy. With the cost fully associated with crop destruction is not in the scope of this analysis, futher research is required to determine the full economic impact of one of these weather related events.