Synopsis

Data from the US National Oceanic and Atmospheric Administration’s storm database, which describes all major weather events from 1950 to 2011, were analysed to ascertain the types of events that have the greatest health-related and economic impacts. Using the total numbers of reported fatalities and injuries as a measure of negative consequences to human health, tornadoes were found to cause the largest number of both fatalities and injuries by a wide margin (5633 fatalities and 91346 injuries over the 61-year period, compared to the next most harmful type of event, excessive heat, which caused 1903 fatalities and 8428 injuries). Tornadoes also appeared to have the greatest economic consquences, causing a total of 57,800 million dollars’ worth of damage to property and crops over the recording period, more than three times the cost incurred by the next most expensive type of event (flash floods, which caused a total of 18,100 million dollars’ worth of damage).

Data Processing

The data were downloaded from the internet as a .bz2 file, unzipped and read into R.

if(!file.exists("stormData.csv.bz2")) {
  download.file(url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "stormData.csv.bz2")
}
data <- read.csv(bzfile("stormData.csv.bz2"))

The numbers of fatalities and injuries were then added by event type, both separately and together. (Using the mean instead of the sum was considered during exploratory analysis, but this was found to place too much importance on one-off events.) The resulting data frame was sorted to show the most harmful types of event.

library(dplyr, quietly=T)
## Warning: package 'dplyr' was built under R version 3.1.3
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
health <- summarize(group_by(data, EVTYPE),
                    sum(FATALITIES), sum(INJURIES), sum(FATALITIES) + sum(INJURIES))
names(health) <- c("eventtype", "fatalities", "injuries", "total")
health <- health[order(health$total, decreasing=T),]

A similar analysis was performed using the figures for property and crop damage to estimate the economic impacts of various types of event. These figures are each recorded as a number and a scale prefix, which had to be combined into a single number in millions of dollars before summation.

data<-mutate(data,PROPDMG2=ifelse(PROPDMGEXP%in%c("B","b"),PROPDMG*1000,PROPDMG))#billion->million
data<-mutate(data,PROPDMG2=ifelse(PROPDMGEXP%in%c("K","k"),PROPDMG/1000,PROPDMG2))#thous.->million
data<-mutate(data,PROPDMG2=ifelse(PROPDMGEXP%in%c("H","h"),PROPDMG/10000,PROPDMG2))#hundr.->mill.
data<-mutate(data,PROPDMG2=ifelse(PROPDMGEXP%in%c("0","1","2","3","4","5","6","7","8"),
                                  PROPDMG/100000,PROPDMG2)) # ten -> million
data<-mutate(data,PROPDMG2=ifelse(PROPDMGEXP=="+",PROPDMG/1000000,PROPDMG2)) # one -> million
data<-mutate(data,CROPDMG2=ifelse(CROPDMGEXP%in%c("B","b"),CROPDMG*1000,CROPDMG))
data<-mutate(data,CROPDMG2=ifelse(CROPDMGEXP%in%c("K","k"),CROPDMG/1000,CROPDMG2))
data<-mutate(data,CROPDMG2=ifelse(CROPDMGEXP%in%c("H","h"),CROPDMG/10000,CROPDMG2))
data<-mutate(data,CROPDMG2=ifelse(CROPDMGEXP%in%c("0","1","2","3","4","5","6","7","8"),
                                  CROPDMG/100000,CROPDMG2))
data<-mutate(data,CROPDMG2=ifelse(CROPDMGEXP=="+",CROPDMG/1000000,CROPDMG2)) # one -> million
econ <- summarize(group_by(data, EVTYPE),
                  sum(PROPDMG2), sum(CROPDMG2), sum(PROPDMG2) + sum(CROPDMG2))
names(econ) <- c("eventtype", "propertydamage", "cropdamage", "total")
econ <- econ[order(econ$total, decreasing=T),]

Results

The types of event that have the greatest health-related and economic consequences are shown below.

## [1] "Most harmful to human health:"
##            eventtype fatalities injuries total
## 1            TORNADO       5633    91346 96979
## 2     EXCESSIVE HEAT       1903     6525  8428
## 3          TSTM WIND        504     6957  7461
## 4              FLOOD        470     6789  7259
## 5          LIGHTNING        816     5230  6046
## 6               HEAT        937     2100  3037
## 7        FLASH FLOOD        978     1777  2755
## 8          ICE STORM         89     1975  2064
## 9  THUNDERSTORM WIND        133     1488  1621
## 10      WINTER STORM        206     1321  1527
## [1] "Greatest economic impact:"
##            eventtype propertydamage cropdamage      total
## 1              FLOOD     144664.710  5661.9685 150326.678
## 2  HURRICANE/TYPHOON      69305.840  2607.8728  71913.713
## 3            TORNADO      56940.163   414.9547  57355.118
## 4        STORM SURGE      43323.536     0.0050  43323.541
## 5               HAIL      15789.270  3028.9547  18818.225
## 6        FLASH FLOOD      16347.815  1421.3171  17769.132
## 7            DROUGHT       1046.106 13972.5660  15018.672
## 8          HURRICANE      11868.319  2741.9100  14610.229
## 9        RIVER FLOOD       5118.945  5029.4590  10148.405
## 10         ICE STORM       3944.928  5022.1135   8967.042

From these tables, it would appear that tornadoes have the largest impact in both areas under consideration. Floods, flash floods, lightning and thunderstorm winds also feature prominently. However, as is clear from the table of costs, some broad types of event, such as thunderstorm winds, have been recorded as multiple categories, reducing their apparent impact. Re-analysis taking this into account may be beneficial. Barplots of the top fifteen events in each category are shown below.

barplot(health$total[1:15],
        names.arg = tolower(health$eventtype[1:15]),
        cex.names = 0.4,
        main = "Bar plot of total fatalities and injuries")

barplot(econ$total[1:15],
        names.arg = tolower(econ$eventtype[1:15]),
        cex.names = 0.4,
        main = "Bar plot of total property and crop damage")