Fatalities, injuries and property damage caused by severe weather in the US between Jan 1950 and Nov 2011

Martin Livingstone 2016-10-24

Synopsis

Data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database has been analyzed to identify (a) the weather events that are most harmful to the population’s health (as indicated by the number of fatatities and injuries), and (b) the weather events that have the greatest econonmic consequenses (as indicated by the value of property and crop damage). The data covers Jan 1950 to Nov 2011. Over this period there were 15,145 fatalities and 140,528 injuries, and $427.3 billion of damage. The analysis has found that Tornados, by far, caused the greatest fatalities (5,633) and injuries (91,346). Floods caused the greatest damage ($150.3 billion).


Data Processing

Required libraries

library(ggplot2)
library(gridExtra)
library(dplyr)
library(scales)
library(reshape2)

Read in the data

Download the NOAA storm data via the link provided by Coursera, read in the data from the zipped csv file, then inspect it.

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","StormData.csv.bz2")
dat <- read.csv("StormData.csv.bz2")
# It can be useful to save the data into a local RDS file and read from that instead (its much quicker than reading from the csv file every time) when developing the rMarkdown file
# saveRDS(dat,"storm_data.RDS")
# dat <- readRDS('storm_data.RDS')
dim(dat)
## [1] 902297     37
colnames(dat)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

The dataset contains 37 columns but we are only interested in the following columns:

colnames(dat)[c(8,23:28)]
## [1] "EVTYPE"     "FATALITIES" "INJURIES"   "PROPDMG"    "PROPDMGEXP"
## [6] "CROPDMG"    "CROPDMGEXP"

For columns PROPDMGEXP, CROPDMGEXP alphabetical characters are used to signify the magnitude of the damage (given in PROPDMG, CROPDMG) using “K” for thousands, “M” for millions, and “B” for billions.

Over what period does the data cover?

min(as.Date(dat$BGN_DATE,"%m/%d/%Y"))
## [1] "1950-01-03"
max(as.Date(dat$BGN_DATE,"%m/%d/%Y"))
## [1] "2011-11-30"

The data covers the period 1950-01-03 to 2011-11-30.


Process the data

Extract the relevant data relating to injuries and fatalities and create a corresponding tidy dataset. Rows where there are no injuries or fatalities can be excluded.

i <- dat %>% filter(FATALITIES + INJURIES > 0) %>% select(EVTYPE,FATALITIES,INJURIES)
mi <- melt(i,id=c("EVTYPE"))
smi <- mi %>% group_by(EVTYPE,variable) %>% summarize(ival=sum(value)) %>% arrange(desc(ival))

Extract the relevant data relating to economic impact (property and crop damage) and create a corresponding tidy dataset.

e <- dat %>% 
        filter(PROPDMG + CROPDMG > 0) %>%
        mutate(propd=PROPDMG*
                        ifelse(PROPDMGEXP=="B",1e9,
                        ifelse(PROPDMGEXP=='M',1e6,
                        ifelse(PROPDMGEXP=='K',1e3,1)))) %>%
        mutate(cropd=CROPDMG*
                        ifelse(CROPDMGEXP=="B",1e9,
                        ifelse(CROPDMGEXP=='M',1e6,
                        ifelse(CROPDMGEXP=='K',1e3,1)))) %>%
        select(EVTYPE, propd, cropd)
me <- melt(e, id=c("EVTYPE"))
sme <- me %>% group_by(EVTYPE,variable) %>% summarize(eval=sum(value)) %>% arrange(desc(eval))

We now have two tidy datasets - one for fatalities and injuries (smi), the other for property and crop damage (sme). These datasets are used to answer the following questions:

  • Across the US, which types of events are most harmful with respect to population health?
  • Across the US, which types of events have the greatest economic consequences?

Results

Across the US, which types of events are most harmful with respect to population health?

fatalities <- sum(subset(smi,variable=="FATALITIES")$ival)
injuries <- sum(subset(smi,variable=="INJURIES")$ival)

Over the period 1950-01-03 to 2011-11-30, storms and severe weather events caused a total of 15,145 fatalities and 140,528 injuries.

To show the weather events that are most harmful to the population’s health we plot, by weather event, the number of fatatlities and injuries caused by each (top 20 only).

f <- ggplot(subset(smi,variable=="FATALITIES")[1:20,], aes(x = reorder(EVTYPE, ival), y = ival))
        f + geom_bar(stat = "identity") + coord_flip() + 
        labs(title="US Fatalities due to Weather Events between Jan 1950 to Nov 2011") +
        labs(y = "Fatalities", x = "Weather Event") +
        scale_y_continuous(breaks=seq(0,6000,500),labels=comma) + theme_bw()

i <- ggplot(subset(smi,variable=="INJURIES")[1:20,], aes(x = reorder(EVTYPE, ival), y = ival))
        i + geom_bar(stat = "identity") + coord_flip() + 
        labs(title="US Injuries due to Weather Events between Jan 1950 to Nov 2011") +
        labs(y = "Injuries", x = "Weather Event") +
        scale_y_continuous(breaks=seq(0,1e5,1e4),labels=comma) + theme_bw()

As shown in the graphs above, Tornados are by far the most harmful weather events with respect to human health - they caused both the most fatalities (5,633) and the most injuries (91,346).

Across the US, which types of events have the greatest economic consequences?

propd <- sum(subset(sme,variable=="propd")$eval)
cropd <- sum(subset(sme,variable=="cropd")$eval)

Over the period 1950-01-03 to 2011-11-30, storms and severe weather events caused $427.3 billion of property damage and $49.1 billion of crop damage ($476.4 billion in total).

To show the weather events that have the greatest economic consequences we plot, by weather event, the total damage (property damage + crop damage) caused by each (top 20 only); the greater the damage the greater the economic consequence.

totd <- aggregate(sme$eval,list(sme$EVTYPE),sum)
colnames(totd)<-c("EVTYPE","damage")
totd <- head((totd[order(-totd$damage),]),n=20)
d <- ggplot(totd, aes(x = reorder(EVTYPE, damage), y = damage/1e9))
        d + geom_bar(stat = "identity") + coord_flip() + 
        labs(title="US Damage due to Weather Events between Jan 1950 to Nov 2011") +
        labs(y = "Damage $ Billions", x = "Weather Event") +
        scale_y_continuous(breaks=seq(0,160,10),labels=comma) + theme_bw()

As shown in the graph above, Floods caused the greatest econonmic damage ($150.3 billion).


Summary

An analysis of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database shows that between Jan 1950 and Nov 2011 there were 15,145 fatalities and 140,528 injuries, and a total of $476.4 billion of damage caused.

The weather event that caused the most harm to the population’s health was Tornados (5,633 fatalities and 91,346 injuries).

The weather event that had the greatest economic consequences was Floods ($150.3 billion).