1. Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

In this report, the NOAA Storm Database is explored to answer some basic questions about severe weather events. The results are displayed using Barplots to show the top 10 most fatal, injury incurring and economically damaging events. From our analysis, it can be inferred that the event type: Tornado, produced the most fatalities and injuries while Flooding incurred the highest economic damage in Dollar amounts.

2. Data Processing

2.1. Introduction

The data for this assignment comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm Data

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation

National Climatic Data Center Storm Events FAQ

2.2. Assignment

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events.

2.2.1. Analysis Criteria

  • Across the United States, which types of events (as indicated in the 𝙴𝚅𝚃𝚈𝙿𝙴 variable) are most harmful with respect to population health?
  • Across the United States, which types of events have the greatest economic consequences?

See Section 3

2.3. Analysis Process

2.3.1. Loading the data

The data was downloaded and stored locally. Then it was loaded in R using the following code.

setwd("/Users/sandraezidiegwu/Documents/Data Science/5_REPDATA/RepResearch/Storm Data Project/")
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "StormData", method = "curl")
storm.data <- read.table("StormData", header = TRUE, sep = ",")
library(dplyr)
library(ggplot2)
head(storm.data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

2.3.2. Data Extraction

Due to excessive and unneeded data, the data was cut to show the useful bit with the code below.

cut.data <- select(storm.data, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
head(cut.data)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0

2.3.3. Damage Re-arrangement

Alphabetical characters used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions. The code below reassigns those values to show numeric values. It further combines the property and crop damage to create a new vector of their total damage values.

cut.data[cut.data$PROPDMG == 0,]$PROPDMGEXP <- "K"
cut.data[cut.data$CROPDMG == 0,]$CROPDMGEXP <- "K"
cut.data$PROPDMGEXP <- factor(cut.data$PROPDMGEXP, levels = c("K", "M", "B"))
cut.data$CROPDMGEXP <- factor(cut.data$CROPDMGEXP, levels = c("K", "M", "B"))

cut.data$PROPEXP[cut.data$PROPDMGEXP == "K"] <- 1000
cut.data$CROPEXP[cut.data$CROPDMGEXP == "K"] <- 1000
cut.data$PROPEXP[cut.data$PROPDMGEXP == "M"] <- 1e+06
cut.data$CROPEXP[cut.data$CROPDMGEXP == "M"] <- 1e+06
cut.data$PROPEXP[cut.data$PROPDMGEXP == "B"] <- 1e+09
cut.data$CROPEXP[cut.data$CROPDMGEXP == "B"] <- 1e+09

cut.data <- mutate(cut.data, PROP.DMG = PROPDMG*PROPEXP)
cut.data <- mutate(cut.data, CROP.DMG = CROPDMG*CROPEXP)
cut.data <- mutate(cut.data, TOTDMG = PROP.DMG + CROP.DMG)

2.3.4. Data Refinement

A lot of text values were rather jumbled up and required the gsub function to be gathered together for further analysis. Also, all NA values were removed from data. The code below helps refine the data into a new data frame ‘ref.data’.

ref.data <- select(cut.data, EVTYPE, FATALITIES, INJURIES, TOTDMG)

ref.data$EVTYPE <- toupper(ref.data$EVTYPE)
ref.data$EVTYPE <- factor(ref.data$EVTYPE)

ref.data$EVTYPE <- gsub("^\\s+|\\s+$", "", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("TSTMW", "THUNDERSTORM WIND", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("TSTM", "THUNDERSTORM", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUNDERSTORMW", "THUNDERSTORM WIND", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUNDERSTORMS", "THUNDERSTORM WIND", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUNDERSNOW$", "THUNDERSTORM WIND", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUNDERSNOW SHOWER", "THUNDERSTORM WIND", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUNDEERSTORM", "THUNDERSTORM", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUNDERESTORM", "THUNDERSTORM", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUDERSTORM", "THUNDERSTORM", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUNDERSTORM$", "THUNDERSTORM WIND", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUNERSTORM", "THUNDERSTORM", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUNDERSTORMW", "THUNDERSTORM WIND", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("TUNDERSTORM", "THUNDERSTORM", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("THUNDERTSORM", "THUNDERSTORM", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("^THUNDERSTORM WIND$", "THUNDERSTORM WINDS", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("^HEAT$", "EXCESSIVE HEAT", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("FLOOD/$", "FLASH FLOODS", ref.data$EVTYPE)
ref.data$EVTYPE <- gsub("^FLASH FLOOD$", "FLASH FLOODS", ref.data$EVTYPE)

ref.data[ref.data == "?"] <- NA
ref.data <- na.omit(ref.data)

2.3.5. Top Incident Totals by Event Type

Fatalities and Injuries constituted the most harmful to population health while total damage constituted most harmful to economic problems. As a result, for this anaylsis, by each incident, the total values for each of those colums were selected for evaluation. The code is shown below.

fatal.type <- aggregate(ref.data$FATALITIES, by = list(ref.data$EVTYPE), FUN = sum)
names(fatal.type) <- c("EVENT TYPE", "TOTAL FATALITY")
top.fatal <- fatal.type[order(-fatal.type$`TOTAL FATALITY`),][1:10,]
top.fatal
##             EVENT TYPE TOTAL FATALITY
## 744            TORNADO           5630
## 106     EXCESSIVE HEAT           2840
## 140       FLASH FLOODS            980
## 404          LIGHTNING            816
## 705 THUNDERSTORM WINDS            703
## 142              FLOOD            470
## 509        RIP CURRENT            368
## 306          HIGH WIND            246
## 10           AVALANCHE            224
## 850       WINTER STORM            206
injury.type <- aggregate(ref.data$INJURIES, by = list(ref.data$EVTYPE), FUN = sum)
names(injury.type) <- c("EVENT TYPE", "TOTAL INJURY")
top.injury <- injury.type[order(-injury.type$`TOTAL INJURY`),][1:10,]
top.injury
##             EVENT TYPE TOTAL INJURY
## 744            TORNADO        91286
## 705 THUNDERSTORM WINDS         9392
## 106     EXCESSIVE HEAT         8625
## 142              FLOOD         6789
## 404          LIGHTNING         5228
## 373          ICE STORM         1975
## 140       FLASH FLOODS         1777
## 199               HAIL         1358
## 850       WINTER STORM         1321
## 358  HURRICANE/TYPHOON         1275
event.dmg <- aggregate(ref.data$TOTDMG, by = list(ref.data$EVTYPE), FUN = sum)
names(event.dmg) <- c("EVENT TYPE", "TOTAL DAMAGE")
top.dmg <- event.dmg[order(-event.dmg$`TOTAL DAMAGE`),][1:10,]
top.dmg
##             EVENT TYPE TOTAL DAMAGE
## 142              FLOOD 150319678250
## 358  HURRICANE/TYPHOON  71913712800
## 744            TORNADO  57290440590
## 583        STORM SURGE  43323541000
## 199               HAIL  18727698170
## 140       FLASH FLOODS  17570307110
## 74             DROUGHT  15018672000
## 349          HURRICANE  14610229010
## 705 THUNDERSTORM WINDS  10865796580
## 514        RIVER FLOOD  10148404500

3. Results and Conclusions

par(mfrow = c(1,2), mar = c(12, 4, 3, 2), mgp = c(3,1,0), cex = 0.8)
barplot(top.fatal$`TOTAL FATALITY`, names.arg = top.fatal$`EVENT TYPE`, las = 3, col = 'magenta', main = "Highest Fatality by Events", ylab = "# of Fatalities")
barplot(top.injury$`TOTAL INJURY`, names.arg = top.injury$`EVENT TYPE`, las = 3, col = 'red', main = "Highest Injury by Events", ylab = "# of Injuries")

From the graph shown above, Tornados brought about the highest number of fatalities and injuries. It was followed by Excessive Heat and Thunderstorm Winds for fatalities and injuries respectively.

par(mfrow = c(1,1), mar = c(12, 4, 3, 2), mgp = c(3,1,0), cex = 0.8)
barplot(top.dmg$`TOTAL DAMAGE`/(1e+9), names.arg = top.dmg$`EVENT TYPE`, las = 3, col = 'green', main = "Highest Economic Damage by Events", ylab = "Damage Cost ($ billions)")

As shown above, Floods caused the most economic damage at over $140B, followed by Hurricane/Typhoons at ~$70B.