NOAA provides a database of US severe weather events since 1950 along with their recorded details such as place and time, approximate economic impact, and reported fatalities and injuries. This paper documents cleaning and processing of the available data so that we can use simple diagrams to answer two questions:

  1. Across the United States, which types of events are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

The original data used for this project is available for download at https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2.

The document outlining NOAAs data and categorization procedures can be found at https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf.

Data snaphot is dated late 2011.

Data Retrieval

# download and unzip weather data

fileURL<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL,destfile="./storm.csv.bz2",method="curl")
storm<-read.csv(bzfile("storm.csv.bz2"))

Preprocessing

Documented here is the subsetting of raw data into a smaller frame with columns relevant to the two questions being asked.

# load required libraries for processing

suppressMessages(library(ggplot2))
suppressMessages(library(lubridate))
suppressMessages(library(tidyr))
suppressMessages(library(RColorBrewer))

# change date characters to internal date format

storm$BGN_DATE<-year(as.POSIXct(as.character(storm$BGN_DATE), format="%m/%d/%Y %H:%M:%S"))

# process economic impact data as numeric dollar amounts

storm$PROPDMG[storm$PROPDMGEXP == "K"]<-storm$PROPDMG[storm$PROPDMGEXP == "K"]*1000
storm$PROPDMG[storm$PROPDMGEXP == "M"]<-storm$PROPDMG[storm$PROPDMGEXP == "M"]*1000000
storm$PROPDMG[storm$PROPDMGEXP == "B"]<-storm$PROPDMG[storm$PROPDMGEXP == "B"]*1000000000
storm$CROPDMG[storm$CROPDMGEXP == "K"]<-storm$CROPDMG[storm$CROPDMGEXP == "K"]*1000
storm$CROPDMG[storm$CROPDMGEXP == "M"]<-storm$CROPDMG[storm$CROPDMGEXP == "M"]*1000000
storm$CROPDMG[storm$CROPDMGEXP == "B"]<-storm$CROPDMG[storm$CROPDMGEXP == "B"]*1000000000

# create event-type sum columns for incidents (death/injury) and economic data (property/crop), and bind them into a new smaller data frame for easier handling

inj<-tapply(storm$INJURIES,storm$EVTYPE,FUN=sum)
inj<-as.data.frame(inj)
fat<-tapply(storm$FATALITIES,storm$EVTYPE,FUN=sum)
fat<-as.data.frame(fat)
crop<-tapply(storm$CROPDMG,storm$EVTYPE,FUN=sum)
crop<-as.data.frame(crop)
prop<-tapply(storm$PROPDMG,storm$EVTYPE,FUN=sum)
prop<-as.data.frame(prop)

danger<-data.frame(cbind(fat$fat,inj$inj,crop$crop,prop$prop),row.names = rownames(fat))
names(danger)[1]<-"Fatalities"
names(danger)[2]<-"Injuries"
names(danger)[3]<-"Crop"
names(danger)[4]<-"Property"

Data Transformation

There are too many similar event categories due to entry error; for example “HIGHT HEAT” and “HEAT/DROUGHT” are separate event types. Below we document the use of character matching to collect rows from the ‘danger’ data frame and overwrite them with broader categories such as “Heat/Drought”. Under this scheme, data under categories like “FREEZING RAIN” may be copied into both the “Precipitation/Flood” and “Winter Weather” classifications. Care was taken that named storms, such as Hurricane Katrina, are classified as “Windstorms”. Note that the NOAA data details flooding as a result of a major windstorm as a separate event to that particular storm.

# gather columns for similar events via character matching with grepl

AVALANCHE <- danger[grepl("AVALANCHE", rownames(danger), ignore.case = TRUE) ,]
HEAT.DROUGHT <- danger[grepl("HEAT", rownames(danger), ignore.case = TRUE)|grepl("WARM", rownames(danger), ignore.case = TRUE)|grepl("DRY", rownames(danger), ignore.case = TRUE) ,]
LAND<-danger[grepl("MUD", rownames(danger), ignore.case = TRUE)|grepl("LAND", rownames(danger), ignore.case = TRUE) ,]
FIRE<- danger[grepl("FIRE", rownames(danger), ignore.case = TRUE) ,]
DUST<- danger[grepl("DUST", rownames(danger), ignore.case = TRUE) ,]
FOG<- danger[grepl("FOG", rownames(danger), ignore.case = TRUE) ,]
FLOOD.PRECIP <- danger[grepl("FLOOD", rownames(danger), ignore.case = TRUE)|grepl("RAIN", rownames(danger), ignore.case = TRUE)|grepl("FL ", rownames(danger), ignore.case = TRUE)|grepl("THUNDERSTORM", rownames(danger), ignore.case = TRUE)|grepl("TSTM", rownames(danger), ignore.case = TRUE) ,]
HAIL.SLEET<-danger[grepl("HAIL", rownames(danger), ignore.case = TRUE)|grepl("SLEET", rownames(danger), ignore.case = TRUE) ,]
SEA<- danger[grepl("SEA", rownames(danger), ignore.case = TRUE)|grepl("OCEAN", rownames(danger), ignore.case = TRUE)|grepl("MARINE", rownames(danger), ignore.case = TRUE)|grepl("SURF", rownames(danger), ignore.case = TRUE)|grepl("WAVE", rownames(danger), ignore.case = TRUE)|grepl("TSUNAMI", rownames(danger), ignore.case = TRUE)|grepl("RIP", rownames(danger), ignore.case = TRUE)  ,]
WIND.STORM <- danger[grepl("WIND", rownames(danger), ignore.case = TRUE)|grepl("HURRICANE", rownames(danger), ignore.case = TRUE)|grepl("TYPHOON", rownames(danger), ignore.case = TRUE)|grepl("TROPICAL", rownames(danger), ignore.case = TRUE)|grepl("TORNADO", rownames(danger), ignore.case = TRUE) ,]
WINTER.WEATHER <- danger[grepl("WINT", rownames(danger), ignore.case = TRUE) | grepl("SNOW", rownames(danger), ignore.case = TRUE) | grepl("BLIZZARD", rownames(danger), ignore.case = TRUE)|grepl("COLD", rownames(danger), ignore.case = TRUE)|grepl("WINDCHILL", rownames(danger), ignore.case = TRUE)|grepl("FROST", rownames(danger), ignore.case = TRUE)|grepl("GLAZE", rownames(danger), ignore.case = TRUE)|grepl("BLACK ICE", rownames(danger), ignore.case = TRUE)|grepl("EXPOSURE", rownames(danger), ignore.case = TRUE)|grepl("ICE", rownames(danger), ignore.case = TRUE),]

# overwrite 'danger' dataframe with column sums for all the consolidated event classifications

danger<- rbind(colSums(AVALANCHE),colSums(HEAT.DROUGHT),colSums(LAND),colSums(FIRE),colSums(DUST),colSums(FOG),colSums(FLOOD.PRECIP),colSums(HAIL.SLEET),colSums(SEA),colSums(WIND.STORM),colSums(WINTER.WEATHER))
danger<-as.data.frame(danger)
danger$Event.Class <- c("Avalanche","Heat/Drought","Landslide","Fire","Dust","Fog","Precipitation/Flood","Hail/Sleet","Ocean Hazard/Wave","Windstorm/Hurricane","Winter Weather")

The final result cannot be plotted into a stacked bar chart unless it is melted into a form usable by the plotting package.

# melt 'danger' into format usable by ggplot for barplot stacking by death and injury totals

danger_inc_barplot<- pivot_longer(danger, c(Fatalities,Injuries), names_to = "Incident", values_to = "sum")

# melt 'danger' into format usable by ggplot for barplot stacking by property and crop damage estimates

danger_eco_barplot<- pivot_longer(danger, c(Property,Crop), names_to = "Impact", values_to = "sum")

Results

danger
##    Fatalities Injuries        Crop     Property         Event.Class
## 1         224      171           0      8721800           Avalanche
## 2        3181     9272   904494280     27058350        Heat/Drought
## 3          44       55    20017000    327309100           Landslide
## 4          90     1608   403281630   8501628500                Fire
## 5          24      483     3600000      6338130                Dust
## 6          81     1077           0     25011500                 Fog
## 7        2392    18453 14460436902 183357594602 Precipitation/Flood
## 8          47     1467  3113796290  17617091077          Hail/Sleet
## 9        1063     1414    47322500    279282290   Ocean Hazard/Wave
## 10       7286   104621  8765270877 167674216674 Windstorm/Hurricane
## 11       1115     6637  7957252950  12695072263      Winter Weather
  1. Total incidents (death or injury) by weather event classification are displayed below.
# plot the categorical data filled by incident type

plot(ggplot(danger_inc_barplot, aes(x = reorder(Event.Class, -sum), y = sum, fill = Incident)) + 
  geom_bar(stat = "identity")+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + ggtitle("Injury and Death from Weather Events") + ylab("Total Incidents Since 1950") + xlab("")+scale_fill_brewer(palette="Dark2"))

  1. Total estimated economic impact in dollars (to crops or personal property) by weather event classification is displayed below.
# plot the categorical data filled by economic impact type
plot(ggplot(danger_eco_barplot, aes(x = reorder(Event.Class, -sum), y = sum, fill = Impact)) + 
  geom_bar(stat = "identity")+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + ggtitle("Economic Impact from Weather Events") + ylab("Total Cost Since 1950, Dollars") + xlab("")+scale_fill_brewer(palette="Accent"))

The data indicates that the United States has historically been disproportionately affected by weather events of the Windstorm (which include Named Hurricanes, Typhoons, and Tropical Storms) and Flood classifications. Many of the flood and precipitation events are documented by NOAA as separate but related to named windstorms, which partially explains the prominence of these two classifications in the bar plots. Combined, Windstorms and Precipitation events since 1950 account for approximately 123,000 injuries, 10,000 fatalities, and $374 trillion in combined economic impact as measured by crop and property damage. Crisis response costs are not available in this data set, but may be available from resources such as FEMA.gov.