Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
In this report, the NOAA Storm Database is explored to answer some basic questions about severe weather events. The results are displayed using Barplots to show the top 10 most fatal, injury incurring and economically damaging events. From our analysis, it can be inferred that the event type: Tornado, produced the most fatalities and injuries while Flooding incurred the highest economic damage in Dollar amounts.
The data for this assignment comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events.
See Section 3
The data was downloaded and stored locally. Then it was loaded in R using the following code.
setwd("/Users/sandraezidiegwu/Documents/Data Science/5_REPDATA/RepResearch/Storm Data Project/")
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "StormData", method = "curl")
storm.data <- read.table("StormData", header = TRUE, sep = ",")
library(dplyr)
library(ggplot2)
head(storm.data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
Due to excessive and unneeded data, the data was cut to show the useful bit with the code below.
cut.data <- select(storm.data, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
head(cut.data)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
Alphabetical characters used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions. The code below reassigns those values to show numeric values. It further combines the property and crop damage to create a new vector of their total damage values.
cut.data[cut.data$PROPDMG == 0,]$PROPDMGEXP <- "K"
cut.data[cut.data$CROPDMG == 0,]$CROPDMGEXP <- "K"
cut.data$PROPDMGEXP <- factor(cut.data$PROPDMGEXP, levels = c("K", "M", "B"))
cut.data$CROPDMGEXP <- factor(cut.data$CROPDMGEXP, levels = c("K", "M", "B"))
cut.data$PROPEXP[cut.data$PROPDMGEXP == "K"] <- 1000
cut.data$CROPEXP[cut.data$CROPDMGEXP == "K"] <- 1000
cut.data$PROPEXP[cut.data$PROPDMGEXP == "M"] <- 1e+06
cut.data$CROPEXP[cut.data$CROPDMGEXP == "M"] <- 1e+06
cut.data$PROPEXP[cut.data$PROPDMGEXP == "B"] <- 1e+09
cut.data$CROPEXP[cut.data$CROPDMGEXP == "B"] <- 1e+09
cut.data <- mutate(cut.data, PROP.DMG = PROPDMG*PROPEXP)
cut.data <- mutate(cut.data, CROP.DMG = CROPDMG*CROPEXP)
cut.data <- mutate(cut.data, TOTDMG = PROP.DMG + CROP.DMG)
A lot of text values were rather jumbled up and required the gsub function to be gathered together for further analysis. Also, all NA values were removed from data. The code below helps refine the data into a new data frame ‘ref.data’.
ref.data <- select(cut.data, EVTYPE, FATALITIES, INJURIES, TOTDMG)
ref.data$EVTYPE <- toupper(ref.data$EVTYPE)
ref.data$EVTYPE <- factor(ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("^\\s+|\\s+$", "", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("TSTMW", "THUNDERSTORM WIND", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("TSTM", "THUNDERSTORM", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUNDERSTORMW", "THUNDERSTORM WIND", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUNDERSTORMS", "THUNDERSTORM WIND", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUNDERSNOW$", "THUNDERSTORM WIND", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUNDERSNOW SHOWER", "THUNDERSTORM WIND", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUNDEERSTORM", "THUNDERSTORM", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUNDERESTORM", "THUNDERSTORM", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUDERSTORM", "THUNDERSTORM", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUNDERSTORM$", "THUNDERSTORM WIND", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUNERSTORM", "THUNDERSTORM", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUNDERSTORMW", "THUNDERSTORM WIND", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("TUNDERSTORM", "THUNDERSTORM", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUNDERTSORM", "THUNDERSTORM", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("^THUNDERSTORM WIND$", "THUNDERSTORM WINDS", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("^HEAT$", "EXCESSIVE HEAT", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("FLOOD/$", "FLASH FLOODS", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("^FLASH FLOOD$", "FLASH FLOODS", ref.data$EVTYPE)
ref.data[ref.data == "?"] <- NA
ref.data <- na.omit(ref.data)
Fatalities and Injuries constituted the most harmful to population health while total damage constituted most harmful to economic problems. As a result, for this anaylsis, by each incident, the total values for each of those colums were selected for evaluation. The code is shown below.
fatal.type <- aggregate(ref.data$FATALITIES, by = list(ref.data$EVTYPE), FUN = sum)
names(fatal.type) <- c("EVENT TYPE", "TOTAL FATALITY")
top.fatal <- fatal.type[order(-fatal.type$`TOTAL FATALITY`),][1:10,]
top.fatal
## EVENT TYPE TOTAL FATALITY
## 744 TORNADO 5630
## 106 EXCESSIVE HEAT 2840
## 140 FLASH FLOODS 980
## 404 LIGHTNING 816
## 705 THUNDERSTORM WINDS 703
## 142 FLOOD 470
## 509 RIP CURRENT 368
## 306 HIGH WIND 246
## 10 AVALANCHE 224
## 850 WINTER STORM 206
injury.type <- aggregate(ref.data$INJURIES, by = list(ref.data$EVTYPE), FUN = sum)
names(injury.type) <- c("EVENT TYPE", "TOTAL INJURY")
top.injury <- injury.type[order(-injury.type$`TOTAL INJURY`),][1:10,]
top.injury
## EVENT TYPE TOTAL INJURY
## 744 TORNADO 91286
## 705 THUNDERSTORM WINDS 9392
## 106 EXCESSIVE HEAT 8625
## 142 FLOOD 6789
## 404 LIGHTNING 5228
## 373 ICE STORM 1975
## 140 FLASH FLOODS 1777
## 199 HAIL 1358
## 850 WINTER STORM 1321
## 358 HURRICANE/TYPHOON 1275
event.dmg <- aggregate(ref.data$TOTDMG, by = list(ref.data$EVTYPE), FUN = sum)
names(event.dmg) <- c("EVENT TYPE", "TOTAL DAMAGE")
top.dmg <- event.dmg[order(-event.dmg$`TOTAL DAMAGE`),][1:10,]
top.dmg
## EVENT TYPE TOTAL DAMAGE
## 142 FLOOD 150319678250
## 358 HURRICANE/TYPHOON 71913712800
## 744 TORNADO 57290440590
## 583 STORM SURGE 43323541000
## 199 HAIL 18727698170
## 140 FLASH FLOODS 17570307110
## 74 DROUGHT 15018672000
## 349 HURRICANE 14610229010
## 705 THUNDERSTORM WINDS 10865796580
## 514 RIVER FLOOD 10148404500
par(mfrow = c(1,2), mar = c(12, 4, 3, 2), mgp = c(3,1,0), cex = 0.8)
barplot(top.fatal$`TOTAL FATALITY`, names.arg = top.fatal$`EVENT TYPE`, las = 3, col = 'magenta', main = "Highest Fatality by Events", ylab = "# of Fatalities")
barplot(top.injury$`TOTAL INJURY`, names.arg = top.injury$`EVENT TYPE`, las = 3, col = 'red', main = "Highest Injury by Events", ylab = "# of Injuries")
From the graph shown above, Tornados brought about the highest number of fatalities and injuries. It was followed by Excessive Heat and Thunderstorm Winds for fatalities and injuries respectively.
par(mfrow = c(1,1), mar = c(12, 4, 3, 2), mgp = c(3,1,0), cex = 0.8)
barplot(top.dmg$`TOTAL DAMAGE`/(1e+9), names.arg = top.dmg$`EVENT TYPE`, las = 3, col = 'green', main = "Highest Economic Damage by Events", ylab = "Damage Cost ($ billions)")
As shown above, Floods caused the most economic damage at over $140B, followed by Hurricane/Typhoons at ~$70B.