Synopsis

This analysis investigates the NOAA Storm Database to determine which types of extreme weather events are most harmful to human wellbeing and which have the greatest economic consequences. The source data used ran from 1950 to 2011 however a more broad set of event types began being recorded in 1993. We took this as the starting point of the analysis for consistency and to take advantage of the more comprehensive data.

In terms of human impact, tornadoes followed by excessive heat were responsible for the greatest number of fatalities or injuries. However considering fatalities only, heat-related events resulted in the greatest deaths.

For financial impact, flooding had the greatest economic consequences and caused more loss than hurricanes, typhoons, and storm surges combined. Looking just at crop damage, the situation was slightly different with drought being the leading cause of financial loss.

Data Processing

Load the storm data from url, expand the bz2 file, and load into data frame.

url <-"http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url,destfile="StormData.csv.bz2")
data <- read.csv(bzfile("StormData.csv.bz2"))

Convert to date to allow us to see the number of events over time.

data$BGN_DATE <- as.Date(data$BGN_DATE, format="%m/%d/%Y %H:%M:%S")

Create columns giving dollar value of property and crop damage using the “K”,“M”, and “B” prefixes. Add a column showing total damage.

suppressWarnings(library(dplyr))
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
suppressWarnings(library(tidyr))
data<-mutate(data,PROPDMGVAL=0)
data<-mutate(data,CROPDMGVAL=0)
data<-mutate(data,TOTALDMGVAL=0)
map <- data.frame(abbrev = c("K","M","B"), val = c(1000,1000000,1000000000))
data$PROPDMGVAL<-map$val[match(data$PROPDMGEXP,map$abbrev)] * data$PROPDMG
data$CROPDMGVAL<-map$val[match(data$CROPDMGEXP,map$abbrev)] * data$CROPDMG
data$PROPDMGVAL[is.na(data$PROPDMGVAL)]<-0
data$CROPDMGVAL[is.na(data$CROPDMGVAL)]<-0
data$TOTALDMGVAL<-data$PROPDMGVAL + data$CROPDMGVAL

Determine the most common event types

Events<-count(data,EVTYPE)
Events<-Events[order(-Events$n),]
head(Events,5)
## Source: local data frame [5 x 2]
## 
##              EVTYPE      n
##              (fctr)  (int)
## 1              HAIL 288661
## 2         TSTM WIND 219940
## 3 THUNDERSTORM WIND  82563
## 4           TORNADO  60652
## 5       FLASH FLOOD  54277

Some of the top event types are the same category with spelling variations. Merge these categories.

data$EVTYPE<-gsub("TSTM","THUNDERSTORM",data$EVTYPE)
data$EVTYPE<-gsub("THUNDERSTORM WINDS","THUNDERSTORM WIND",data$EVTYPE)
data$EVTYPE<-as.factor(data$EVTYPE)

View the results of the recategorization.

Events<-count(data,EVTYPE)
Events<-Events[order(-Events$n),]
head(Events,5)
## Source: local data frame [5 x 2]
## 
##              EVTYPE      n
##              (fctr)  (int)
## 1 THUNDERSTORM WIND 323352
## 2              HAIL 288661
## 3           TORNADO  60652
## 4       FLASH FLOOD  54277
## 5             FLOOD  25326

Look at the counts over time

Counts<-count(data,BGN_DATE)
Counts<-Counts[order(-Counts$n),]
AnnualCounts<-separate(Counts,BGN_DATE,c("y","m","d"))
CountsPerYear<-AnnualCounts %>% group_by(y) %>% summarise(sum=sum(n))
CountsPerYear$y <-as.numeric(CountsPerYear$y)
suppressWarnings(library(ggplot2))
plot1<-ggplot(data=CountsPerYear,aes(y,sum))
plot1 + geom_line() + labs(title ="Event counts per Year", x = "Year", y = "Number of Events")

Note from the chart the dramatic increase in the number of events recorded. This suggests we should also check whether there are changes in Event Types over time.

data<-separate(data,BGN_DATE,c("y","m","d"),remove=FALSE)
data$y <-as.numeric(data$y)
EventTypesPerYear<-data %>% group_by(y) %>% summarise(Unique_Events = n_distinct(EVTYPE))  
EventTypesPerYear[40:50,]
## Source: local data frame [11 x 2]
## 
##        y Unique_Events
##    (dbl)         (int)
## 1   1989             3
## 2   1990             3
## 3   1991             3
## 4   1992             3
## 5   1993           158
## 6   1994           264
## 7   1995           380
## 8   1996           228
## 9   1997           170
## 10  1998           125
## 11  1999           121

Note from the table above the increase in event types in 1993. Both the number of events per year and the set of possible event type change dramatically over the years. To try and ensure we are considering a period when data collection captured all relevent types of events, focus the analysis on only those events in 1993 or later:

data<-filter(data, y>=1993)

Results

Part 1: Which types of events are most harmful with respect to population health?

Fatalities

fatal<-data%>% group_by(EVTYPE) %>% summarise(sum=sum(FATALITIES)) %>% arrange(desc(sum))
head(fatal, 5)
## Source: local data frame [5 x 2]
## 
##           EVTYPE   sum
##           (fctr) (dbl)
## 1 EXCESSIVE HEAT  1903
## 2        TORNADO  1621
## 3    FLASH FLOOD   978
## 4           HEAT   937
## 5      LIGHTNING   816

Top fatalities during the period 1993 to 2011 were due to excessive heat, followed by tornadoes. Note that #4 in the list is also a heat-related issue.

Injuries

injured<-data%>% group_by(EVTYPE) %>% summarise(sum=sum(INJURIES)) %>% arrange(desc(sum))
head(injured, 5)
## Source: local data frame [5 x 2]
## 
##              EVTYPE   sum
##              (fctr) (dbl)
## 1           TORNADO 23310
## 2             FLOOD  6789
## 3    EXCESSIVE HEAT  6525
## 4 THUNDERSTORM WIND  6027
## 5         LIGHTNING  5230

Top injuries were due to tornadoes followed by floods.

Fatalities and Injuries

totalhealth<-data%>% group_by(EVTYPE) %>% summarise(sum=sum(FATALITIES+INJURIES)) %>% arrange(desc(sum))
head(fatal, 5)
## Source: local data frame [5 x 2]
## 
##           EVTYPE   sum
##           (fctr) (dbl)
## 1 EXCESSIVE HEAT  1903
## 2        TORNADO  1621
## 3    FLASH FLOOD   978
## 4           HEAT   937
## 5      LIGHTNING   816
f<-ggplot(totalhealth[1:5,],aes(EVTYPE,sum))
f+ geom_bar(stat = "identity")+ labs(title ="Weather Related Fatalities and Injuries by Event Type", x = "Event Type", y = "Count")

When both fatalities and injuries are considered, tornadoes were the primary cause followed by excessive heat.

Part 2: Which types of events have the greatest economic consequences?

Property Damage

healthsum<-data%>% group_by(EVTYPE) %>% summarise(sum=sum(PROPDMGVAL)) %>% arrange(desc(sum))
head(healthsum, 5)
## Source: local data frame [5 x 2]
## 
##              EVTYPE          sum
##              (fctr)        (dbl)
## 1             FLOOD 144657709800
## 2 HURRICANE/TYPHOON  69305840000
## 3       STORM SURGE  43323536000
## 4           TORNADO  26327461910
## 5       FLASH FLOOD  16140811510

During the period 1993 to 2011, floods caused the most economic damage, followed by hurricanes/typhoons.

Crop Damage

cropsum<-data%>% group_by(EVTYPE) %>% summarise(sum=sum(CROPDMGVAL)) %>% arrange(desc(sum))
head(cropsum, 5)
## Source: local data frame [5 x 2]
## 
##        EVTYPE         sum
##        (fctr)       (dbl)
## 1     DROUGHT 13972566000
## 2       FLOOD  5661968450
## 3 RIVER FLOOD  5029459000
## 4   ICE STORM  5022113500
## 5        HAIL  3025537450

Droughts are responsible for more than double the crop-related damage caused by floods.

Total economic loss

totalsum<-data%>% group_by(EVTYPE) %>% summarise(sum=sum(TOTALDMGVAL)) %>% arrange(desc(sum))
head(totalsum, 5)
## Source: local data frame [5 x 2]
## 
##              EVTYPE          sum
##              (fctr)        (dbl)
## 1             FLOOD 150319678250
## 2 HURRICANE/TYPHOON  71913712800
## 3       STORM SURGE  43323541000
## 4           TORNADO  26742415020
## 5              HAIL  18752904170
f<-ggplot(totalsum[1:5,],aes(EVTYPE,sum))
f+ geom_bar(stat = "identity")+ labs(title ="Economic loss by weather event type", x = "Event Type", y = "Value Lost")

Overall, flooding was responsible for the greatest economic consequences during the period 1993 to 2011, and caused more than hurricanes, typhoons, and storm surges combined.