This analysis investigates the NOAA Storm Database to determine which types of extreme weather events are most harmful to human wellbeing and which have the greatest economic consequences. The source data used ran from 1950 to 2011 however a more broad set of event types began being recorded in 1993. We took this as the starting point of the analysis for consistency and to take advantage of the more comprehensive data.
In terms of human impact, tornadoes followed by excessive heat were responsible for the greatest number of fatalities or injuries. However considering fatalities only, heat-related events resulted in the greatest deaths.
For financial impact, flooding had the greatest economic consequences and caused more loss than hurricanes, typhoons, and storm surges combined. Looking just at crop damage, the situation was slightly different with drought being the leading cause of financial loss.
Load the storm data from url, expand the bz2 file, and load into data frame.
url <-"http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url,destfile="StormData.csv.bz2")
data <- read.csv(bzfile("StormData.csv.bz2"))
Convert to date to allow us to see the number of events over time.
data$BGN_DATE <- as.Date(data$BGN_DATE, format="%m/%d/%Y %H:%M:%S")
Create columns giving dollar value of property and crop damage using the “K”,“M”, and “B” prefixes. Add a column showing total damage.
suppressWarnings(library(dplyr))
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
suppressWarnings(library(tidyr))
data<-mutate(data,PROPDMGVAL=0)
data<-mutate(data,CROPDMGVAL=0)
data<-mutate(data,TOTALDMGVAL=0)
map <- data.frame(abbrev = c("K","M","B"), val = c(1000,1000000,1000000000))
data$PROPDMGVAL<-map$val[match(data$PROPDMGEXP,map$abbrev)] * data$PROPDMG
data$CROPDMGVAL<-map$val[match(data$CROPDMGEXP,map$abbrev)] * data$CROPDMG
data$PROPDMGVAL[is.na(data$PROPDMGVAL)]<-0
data$CROPDMGVAL[is.na(data$CROPDMGVAL)]<-0
data$TOTALDMGVAL<-data$PROPDMGVAL + data$CROPDMGVAL
Determine the most common event types
Events<-count(data,EVTYPE)
Events<-Events[order(-Events$n),]
head(Events,5)
## Source: local data frame [5 x 2]
##
## EVTYPE n
## (fctr) (int)
## 1 HAIL 288661
## 2 TSTM WIND 219940
## 3 THUNDERSTORM WIND 82563
## 4 TORNADO 60652
## 5 FLASH FLOOD 54277
Some of the top event types are the same category with spelling variations. Merge these categories.
data$EVTYPE<-gsub("TSTM","THUNDERSTORM",data$EVTYPE)
data$EVTYPE<-gsub("THUNDERSTORM WINDS","THUNDERSTORM WIND",data$EVTYPE)
data$EVTYPE<-as.factor(data$EVTYPE)
View the results of the recategorization.
Events<-count(data,EVTYPE)
Events<-Events[order(-Events$n),]
head(Events,5)
## Source: local data frame [5 x 2]
##
## EVTYPE n
## (fctr) (int)
## 1 THUNDERSTORM WIND 323352
## 2 HAIL 288661
## 3 TORNADO 60652
## 4 FLASH FLOOD 54277
## 5 FLOOD 25326
Look at the counts over time
Counts<-count(data,BGN_DATE)
Counts<-Counts[order(-Counts$n),]
AnnualCounts<-separate(Counts,BGN_DATE,c("y","m","d"))
CountsPerYear<-AnnualCounts %>% group_by(y) %>% summarise(sum=sum(n))
CountsPerYear$y <-as.numeric(CountsPerYear$y)
suppressWarnings(library(ggplot2))
plot1<-ggplot(data=CountsPerYear,aes(y,sum))
plot1 + geom_line() + labs(title ="Event counts per Year", x = "Year", y = "Number of Events")
Note from the chart the dramatic increase in the number of events recorded. This suggests we should also check whether there are changes in Event Types over time.
data<-separate(data,BGN_DATE,c("y","m","d"),remove=FALSE)
data$y <-as.numeric(data$y)
EventTypesPerYear<-data %>% group_by(y) %>% summarise(Unique_Events = n_distinct(EVTYPE))
EventTypesPerYear[40:50,]
## Source: local data frame [11 x 2]
##
## y Unique_Events
## (dbl) (int)
## 1 1989 3
## 2 1990 3
## 3 1991 3
## 4 1992 3
## 5 1993 158
## 6 1994 264
## 7 1995 380
## 8 1996 228
## 9 1997 170
## 10 1998 125
## 11 1999 121
Note from the table above the increase in event types in 1993. Both the number of events per year and the set of possible event type change dramatically over the years. To try and ensure we are considering a period when data collection captured all relevent types of events, focus the analysis on only those events in 1993 or later:
data<-filter(data, y>=1993)
fatal<-data%>% group_by(EVTYPE) %>% summarise(sum=sum(FATALITIES)) %>% arrange(desc(sum))
head(fatal, 5)
## Source: local data frame [5 x 2]
##
## EVTYPE sum
## (fctr) (dbl)
## 1 EXCESSIVE HEAT 1903
## 2 TORNADO 1621
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
Top fatalities during the period 1993 to 2011 were due to excessive heat, followed by tornadoes. Note that #4 in the list is also a heat-related issue.
injured<-data%>% group_by(EVTYPE) %>% summarise(sum=sum(INJURIES)) %>% arrange(desc(sum))
head(injured, 5)
## Source: local data frame [5 x 2]
##
## EVTYPE sum
## (fctr) (dbl)
## 1 TORNADO 23310
## 2 FLOOD 6789
## 3 EXCESSIVE HEAT 6525
## 4 THUNDERSTORM WIND 6027
## 5 LIGHTNING 5230
Top injuries were due to tornadoes followed by floods.
totalhealth<-data%>% group_by(EVTYPE) %>% summarise(sum=sum(FATALITIES+INJURIES)) %>% arrange(desc(sum))
head(fatal, 5)
## Source: local data frame [5 x 2]
##
## EVTYPE sum
## (fctr) (dbl)
## 1 EXCESSIVE HEAT 1903
## 2 TORNADO 1621
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
f<-ggplot(totalhealth[1:5,],aes(EVTYPE,sum))
f+ geom_bar(stat = "identity")+ labs(title ="Weather Related Fatalities and Injuries by Event Type", x = "Event Type", y = "Count")
When both fatalities and injuries are considered, tornadoes were the primary cause followed by excessive heat.
healthsum<-data%>% group_by(EVTYPE) %>% summarise(sum=sum(PROPDMGVAL)) %>% arrange(desc(sum))
head(healthsum, 5)
## Source: local data frame [5 x 2]
##
## EVTYPE sum
## (fctr) (dbl)
## 1 FLOOD 144657709800
## 2 HURRICANE/TYPHOON 69305840000
## 3 STORM SURGE 43323536000
## 4 TORNADO 26327461910
## 5 FLASH FLOOD 16140811510
During the period 1993 to 2011, floods caused the most economic damage, followed by hurricanes/typhoons.
cropsum<-data%>% group_by(EVTYPE) %>% summarise(sum=sum(CROPDMGVAL)) %>% arrange(desc(sum))
head(cropsum, 5)
## Source: local data frame [5 x 2]
##
## EVTYPE sum
## (fctr) (dbl)
## 1 DROUGHT 13972566000
## 2 FLOOD 5661968450
## 3 RIVER FLOOD 5029459000
## 4 ICE STORM 5022113500
## 5 HAIL 3025537450
Droughts are responsible for more than double the crop-related damage caused by floods.
totalsum<-data%>% group_by(EVTYPE) %>% summarise(sum=sum(TOTALDMGVAL)) %>% arrange(desc(sum))
head(totalsum, 5)
## Source: local data frame [5 x 2]
##
## EVTYPE sum
## (fctr) (dbl)
## 1 FLOOD 150319678250
## 2 HURRICANE/TYPHOON 71913712800
## 3 STORM SURGE 43323541000
## 4 TORNADO 26742415020
## 5 HAIL 18752904170
f<-ggplot(totalsum[1:5,],aes(EVTYPE,sum))
f+ geom_bar(stat = "identity")+ labs(title ="Economic loss by weather event type", x = "Event Type", y = "Value Lost")
Overall, flooding was responsible for the greatest economic consequences during the period 1993 to 2011, and caused more than hurricanes, typhoons, and storm surges combined.