NOAA provides a database of US severe weather events since 1950 along with their recorded details such as place and time, approximate economic impact, and reported fatalities and injuries. This paper documents cleaning and processing of the available data so that we can use simple diagrams to answer two questions:
Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
The original data used for this project is available for download at https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2.
The document outlining NOAAs data and categorization procedures can be found at https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf.
Data snaphot is dated late 2011.
# download and unzip weather data
fileURL<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL,destfile="./storm.csv.bz2",method="curl")
storm<-read.csv(bzfile("storm.csv.bz2"))
Documented here is the subsetting of raw data into a smaller frame with columns relevant to the two questions being asked.
# load required libraries for processing
suppressMessages(library(ggplot2))
suppressMessages(library(lubridate))
suppressMessages(library(tidyr))
suppressMessages(library(RColorBrewer))
# change date characters to internal date format
storm$BGN_DATE<-year(as.POSIXct(as.character(storm$BGN_DATE), format="%m/%d/%Y %H:%M:%S"))
# process economic impact data as numeric dollar amounts
storm$PROPDMG[storm$PROPDMGEXP == "K"]<-storm$PROPDMG[storm$PROPDMGEXP == "K"]*1000
storm$PROPDMG[storm$PROPDMGEXP == "M"]<-storm$PROPDMG[storm$PROPDMGEXP == "M"]*1000000
storm$PROPDMG[storm$PROPDMGEXP == "B"]<-storm$PROPDMG[storm$PROPDMGEXP == "B"]*1000000000
storm$CROPDMG[storm$CROPDMGEXP == "K"]<-storm$CROPDMG[storm$CROPDMGEXP == "K"]*1000
storm$CROPDMG[storm$CROPDMGEXP == "M"]<-storm$CROPDMG[storm$CROPDMGEXP == "M"]*1000000
storm$CROPDMG[storm$CROPDMGEXP == "B"]<-storm$CROPDMG[storm$CROPDMGEXP == "B"]*1000000000
# create event-type sum columns for incidents (death/injury) and economic data (property/crop), and bind them into a new smaller data frame for easier handling
inj<-tapply(storm$INJURIES,storm$EVTYPE,FUN=sum)
inj<-as.data.frame(inj)
fat<-tapply(storm$FATALITIES,storm$EVTYPE,FUN=sum)
fat<-as.data.frame(fat)
crop<-tapply(storm$CROPDMG,storm$EVTYPE,FUN=sum)
crop<-as.data.frame(crop)
prop<-tapply(storm$PROPDMG,storm$EVTYPE,FUN=sum)
prop<-as.data.frame(prop)
danger<-data.frame(cbind(fat$fat,inj$inj,crop$crop,prop$prop),row.names = rownames(fat))
names(danger)[1]<-"Fatalities"
names(danger)[2]<-"Injuries"
names(danger)[3]<-"Crop"
names(danger)[4]<-"Property"
There are too many similar event categories due to entry error; for example “HIGHT HEAT” and “HEAT/DROUGHT” are separate event types. Below we document the use of character matching to collect rows from the ‘danger’ data frame and overwrite them with broader categories such as “Heat/Drought”. Under this scheme, data under categories like “FREEZING RAIN” may be copied into both the “Precipitation/Flood” and “Winter Weather” classifications. Care was taken that named storms, such as Hurricane Katrina, are classified as “Windstorms”. Note that the NOAA data details flooding as a result of a major windstorm as a separate event to that particular storm.
# gather columns for similar events via character matching with grepl
AVALANCHE <- danger[grepl("AVALANCHE", rownames(danger), ignore.case = TRUE) ,]
HEAT.DROUGHT <- danger[grepl("HEAT", rownames(danger), ignore.case = TRUE)|grepl("WARM", rownames(danger), ignore.case = TRUE)|grepl("DRY", rownames(danger), ignore.case = TRUE) ,]
LAND<-danger[grepl("MUD", rownames(danger), ignore.case = TRUE)|grepl("LAND", rownames(danger), ignore.case = TRUE) ,]
FIRE<- danger[grepl("FIRE", rownames(danger), ignore.case = TRUE) ,]
DUST<- danger[grepl("DUST", rownames(danger), ignore.case = TRUE) ,]
FOG<- danger[grepl("FOG", rownames(danger), ignore.case = TRUE) ,]
FLOOD.PRECIP <- danger[grepl("FLOOD", rownames(danger), ignore.case = TRUE)|grepl("RAIN", rownames(danger), ignore.case = TRUE)|grepl("FL ", rownames(danger), ignore.case = TRUE)|grepl("THUNDERSTORM", rownames(danger), ignore.case = TRUE)|grepl("TSTM", rownames(danger), ignore.case = TRUE) ,]
HAIL.SLEET<-danger[grepl("HAIL", rownames(danger), ignore.case = TRUE)|grepl("SLEET", rownames(danger), ignore.case = TRUE) ,]
SEA<- danger[grepl("SEA", rownames(danger), ignore.case = TRUE)|grepl("OCEAN", rownames(danger), ignore.case = TRUE)|grepl("MARINE", rownames(danger), ignore.case = TRUE)|grepl("SURF", rownames(danger), ignore.case = TRUE)|grepl("WAVE", rownames(danger), ignore.case = TRUE)|grepl("TSUNAMI", rownames(danger), ignore.case = TRUE)|grepl("RIP", rownames(danger), ignore.case = TRUE) ,]
WIND.STORM <- danger[grepl("WIND", rownames(danger), ignore.case = TRUE)|grepl("HURRICANE", rownames(danger), ignore.case = TRUE)|grepl("TYPHOON", rownames(danger), ignore.case = TRUE)|grepl("TROPICAL", rownames(danger), ignore.case = TRUE)|grepl("TORNADO", rownames(danger), ignore.case = TRUE) ,]
WINTER.WEATHER <- danger[grepl("WINT", rownames(danger), ignore.case = TRUE) | grepl("SNOW", rownames(danger), ignore.case = TRUE) | grepl("BLIZZARD", rownames(danger), ignore.case = TRUE)|grepl("COLD", rownames(danger), ignore.case = TRUE)|grepl("WINDCHILL", rownames(danger), ignore.case = TRUE)|grepl("FROST", rownames(danger), ignore.case = TRUE)|grepl("GLAZE", rownames(danger), ignore.case = TRUE)|grepl("BLACK ICE", rownames(danger), ignore.case = TRUE)|grepl("EXPOSURE", rownames(danger), ignore.case = TRUE)|grepl("ICE", rownames(danger), ignore.case = TRUE),]
# overwrite 'danger' dataframe with column sums for all the consolidated event classifications
danger<- rbind(colSums(AVALANCHE),colSums(HEAT.DROUGHT),colSums(LAND),colSums(FIRE),colSums(DUST),colSums(FOG),colSums(FLOOD.PRECIP),colSums(HAIL.SLEET),colSums(SEA),colSums(WIND.STORM),colSums(WINTER.WEATHER))
danger<-as.data.frame(danger)
danger$Event.Class <- c("Avalanche","Heat/Drought","Landslide","Fire","Dust","Fog","Precipitation/Flood","Hail/Sleet","Ocean Hazard/Wave","Windstorm/Hurricane","Winter Weather")
The final result cannot be plotted into a stacked bar chart unless it is melted into a form usable by the plotting package.
# melt 'danger' into format usable by ggplot for barplot stacking by death and injury totals
danger_inc_barplot<- pivot_longer(danger, c(Fatalities,Injuries), names_to = "Incident", values_to = "sum")
# melt 'danger' into format usable by ggplot for barplot stacking by property and crop damage estimates
danger_eco_barplot<- pivot_longer(danger, c(Property,Crop), names_to = "Impact", values_to = "sum")
danger
## Fatalities Injuries Crop Property Event.Class
## 1 224 171 0 8721800 Avalanche
## 2 3181 9272 904494280 27058350 Heat/Drought
## 3 44 55 20017000 327309100 Landslide
## 4 90 1608 403281630 8501628500 Fire
## 5 24 483 3600000 6338130 Dust
## 6 81 1077 0 25011500 Fog
## 7 2392 18453 14460436902 183357594602 Precipitation/Flood
## 8 47 1467 3113796290 17617091077 Hail/Sleet
## 9 1063 1414 47322500 279282290 Ocean Hazard/Wave
## 10 7286 104621 8765270877 167674216674 Windstorm/Hurricane
## 11 1115 6637 7957252950 12695072263 Winter Weather
# plot the categorical data filled by incident type
plot(ggplot(danger_inc_barplot, aes(x = reorder(Event.Class, -sum), y = sum, fill = Incident)) +
geom_bar(stat = "identity")+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + ggtitle("Injury and Death from Weather Events") + ylab("Total Incidents Since 1950") + xlab("")+scale_fill_brewer(palette="Dark2"))
# plot the categorical data filled by economic impact type
plot(ggplot(danger_eco_barplot, aes(x = reorder(Event.Class, -sum), y = sum, fill = Impact)) +
geom_bar(stat = "identity")+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + ggtitle("Economic Impact from Weather Events") + ylab("Total Cost Since 1950, Dollars") + xlab("")+scale_fill_brewer(palette="Accent"))
The data indicates that the United States has historically been disproportionately affected by weather events of the Windstorm (which include Named Hurricanes, Typhoons, and Tropical Storms) and Flood classifications. Many of the flood and precipitation events are documented by NOAA as separate but related to named windstorms, which partially explains the prominence of these two classifications in the bar plots. Combined, Windstorms and Precipitation events since 1950 account for approximately 123,000 injuries, 10,000 fatalities, and $374 trillion in combined economic impact as measured by crop and property damage. Crisis response costs are not available in this data set, but may be available from resources such as FEMA.gov.