Analysis of U.S. Storms and Servere Weather Events

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

In this report we analyse the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database (events starting in 1950 thru to November 2011) to answer the following questions:

1. Across the United States, which types of events are most harmful with respect to population health?

2. Across the United States, which types of events have the greatest economic consequences?

With respect to population health, this project aggregates Fatalities and Injuries both separately and then combined, by Event Type (I decided both were required and would be weighted equally to identify most harmful events); these are sorted and top ones displayed.

With respect to economic consequences, this project normalises the Property (PROPDMG) and Crop (CROPDMG) Damages into Dollars (they are stored as Hundreds, Thousands, Millions and Billions) before aggregating; these are sorted and top ones displayed.

Data Processing

Reading in the Data

Start by downloading the data source which is in the form of a comma-separated values (CSV) file compressed via the bzip2 algorithm to reduce its size - Total size uncompressed is approx 535MB; compressed size approx 47MB. There are 902,297 rows in this data set (37 Variables)

Note the file can also be downloaded from the following link:

Storms Data

Set the Working Directory and Load Storm Data data

setwd("c:/R Programming/Rep_Research/Assignment2")
storm.data <- read.table("repdata-data-StormData.csv.bz2", sep = ",", header=T)  

Transforming the Data

Aggregate Fatalities and Injuries separately by Event Type (and sort)

fatalities <- aggregate(FATALITIES ~ EVTYPE, data=storm.data, FUN = sum)
sorted.fatalities <- fatalities[order(-fatalities$FATALITIES), ]

injuries <- aggregate(INJURIES ~ EVTYPE, data=storm.data, FUN = sum)
sorted.injuries <- injuries[order(-injuries$INJURIES), ]

Aggregate combined Fatalities + Injuries by Event Type

combined <- aggregate(cbind(FATALITIES + INJURIES) ~ EVTYPE, data=storm.data, FUN = sum)
combined.injuries.fatalities <- subset(combined, combined$V1 > 1700)

Reformat Population columns to make more meaningful when displayed on plots

names(combined.injuries.fatalities)[names(combined.injuries.fatalities)=="V1"] <- "Total"
names(combined.injuries.fatalities)[names(combined.injuries.fatalities)=="EVTYPE"] <- "Event_Type"

Transform/convert Property (PROPDMG) Damages into dollars ($) This report only processes “H”,“K”,“M"and "B” - I wasn't able to determine the meaning of the other values.

storm.data$PROPDMG = ifelse(as.character(storm.data$PROPDMGEXP)=="H", storm.data$PROPDMG*100, storm.data$PROPDMG)
storm.data$PROPDMG = ifelse(as.character(storm.data$PROPDMGEXP)=="K", storm.data$PROPDMG*1000, storm.data$PROPDMG)
storm.data$PROPDMG = ifelse(as.character(storm.data$PROPDMGEXP)=="M", storm.data$PROPDMG*1000000, storm.data$PROPDMG)
storm.data$PROPDMG = ifelse(as.character(storm.data$PROPDMGEXP)=="B", storm.data$PROPDMG*1000000000, storm.data$PROPDMG)

Transform/convert Crop (CROPDMG) Damages into dollars ($) This report only processes “H”,“K”,“M"and "B” - I wasn't able to determine the meaning of the other values.

storm.data$CROPDMG = ifelse(as.character(storm.data$CROPDMGEXP)=="H", storm.data$CROPDMG*100, storm.data$CROPDMG)
storm.data$CROPDMG = ifelse(as.character(storm.data$CROPDMGEXP)=="K", storm.data$CROPDMG*1000, storm.data$CROPDMG)
storm.data$CROPDMG = ifelse(as.character(storm.data$CROPDMGEXP)=="M", storm.data$CROPDMG*1000000, storm.data$CROPDMG)
storm.data$CROPDMG = ifelse(as.character(storm.data$CROPDMGEXP)=="B", storm.data$CROPDMG*1000000000, storm.data$CROPDMG)

Aggregate and sort the PROPDMG and CROPDMG columns separately by EVTYPE

prop.cost.by.evtype <- aggregate(PROPDMG ~ EVTYPE, data = storm.data, FUN=sum)
sorted.prop.cost <- prop.cost.by.evtype[order(-prop.cost.by.evtype$PROPDMG), ]

crop.cost.by.evtype <- aggregate(CROPDMG ~ EVTYPE, data = storm.data, FUN=sum)
sorted.crop.cost <- crop.cost.by.evtype[order(-crop.cost.by.evtype$CROPDMG), ]

cost.by.evtype <- aggregate(cbind(PROPDMG + CROPDMG) ~ EVTYPE, data = storm.data, FUN=sum)
sorted.cost.by.evtype <- cost.by.evtype[order(-cost.by.evtype$V1), ]

Reformat Economic Factor columns to make more meaningful when displayed on plots

sorted.prop.cost$PROPDMG<-sorted.prop.cost$PROPDMG/1000000 ## convert to Millions Dollars
sorted.crop.cost$CROPDMG<-sorted.crop.cost$CROPDMG/1000000 ## convert to Millions Dollars
names(sorted.prop.cost)[names(sorted.prop.cost)=="PROPDMG"] <- "Property_Damage(Millions_$)"
names(sorted.crop.cost)[names(sorted.crop.cost)=="CROPDMG"] <- "Crop_Damage(Millions_$)"
names(sorted.prop.cost)[names(sorted.prop.cost)=="EVTYPE"] <- "Event_Type"
names(sorted.crop.cost)[names(sorted.crop.cost)=="EVTYPE"] <- "Event_Type"
names(sorted.cost.by.evtype)[names(sorted.cost.by.evtype)=="V1"] <- "Total_Cost_Billions"
names(sorted.cost.by.evtype)[names(sorted.cost.by.evtype)=="EVTYPE"] <- "Event_Type"
sorted.cost.by.evtype$Total_Cost_Billions <- sorted.cost.by.evtype$Total_Cost_Billions / 1000000000 ## convert to Billions Dollars

Results

Events most harmful to Population Health

The events which are most harmful (resulting in fatality) to the population health is Tornado (in 1st place), followed by Excessive Heat.

The most harmful events causing fatality are detailed below together with the corresponding number of fatalities (ordered):

print(head(sorted.fatalities))
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504

The most harmful events causing injury (ordered) are detailed below (once again “Tornado” is the highest event causing injury):

print(head(sorted.injuries))
##             EVTYPE INJURIES
## 834        TORNADO    91346
## 856      TSTM WIND     6957
## 170          FLOOD     6789
## 130 EXCESSIVE HEAT     6525
## 464      LIGHTNING     5230
## 275           HEAT     2100
library("ggplot2")

ggplot(data=combined.injuries.fatalities, aes(x=Event_Type, y=Total, fill=Event_Type)) +
  geom_bar(colour="black", stat="identity") +
  guides(fill=FALSE) +
  ggtitle("Combined Top (Injuries + Fatalities) by Event Type \n ") +
  theme(axis.text.x  = element_text(angle=90, vjust=0.5, size=10))

plot of chunk unnamed-chunk-11

From the plot above we can see that the Tornado event was the most harmful to Population Health .

Greatest Economic Consequence

The analysis has identified the Flood event as having the greatest economic consequence with a value of approx. $144.66B.

The most harmful events causing greatest Property Damage (ordered) together with costs (in $Millions):

print(head(sorted.prop.cost))
##            Event_Type Property_Damage(Millions_$)
## 170             FLOOD                   144657.71
## 411 HURRICANE/TYPHOON                    69305.84
## 834           TORNADO                    56925.66
## 670       STORM SURGE                    43323.54
## 153       FLASH FLOOD                    16140.81
## 244              HAIL                    15727.37

Drought was identified as the most harmful event causing greatest Crop Damage.

The other highest events causing crop damage are headlined below (ordered):

print(head(sorted.crop.cost))
##      Event_Type Crop_Damage(Millions_$)
## 95      DROUGHT               13972.566
## 170       FLOOD                5661.968
## 590 RIVER FLOOD                5029.459
## 427   ICE STORM                5022.114
## 244        HAIL                3025.538
## 402   HURRICANE                2741.910
library("ggplot2")

# Select only the events with a high cost so the plot is more readable
high.cost.events <- head(sorted.cost.by.evtype)

ggplot(data=high.cost.events, aes(x=Event_Type, y=Total_Cost_Billions, fill=Event_Type)) +
  geom_bar(colour="black", stat="identity") +
  guides(fill=FALSE) +
  ggtitle("Combined Property & Crop Damage by Event Type \n ") +
  theme(axis.text.x  = element_text(angle=90, vjust=0.5, size=10))

plot of chunk unnamed-chunk-14

From the plot above we can see that the Flood event has the largest Economic Consequence.