Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data Processing

Working Environment Setup

The code shown below is used to load the necessary packages for future data processing and data reading.

# Don't forget to set the working directory
# Load all the necessary packages
library(dplyr)
library(lubridate)
library(ggplot2)

# Read the data
data <- read.csv("StormData.csv.bz2")

Health Impact

Looking through data, it is easy to find that bothINJURIES and FATALITIES contribute to the health impact. The basic idea here is extract a data frame with three columns, which are EVTYPE, COUNT and TYPE.

# For injuries
inj <- aggregate(data = data, INJURIES ~ EVTYPE, sum, na.rm = TRUE)
inj[, 3] <- "INJURIES"
names(inj)[2:3] <- c("COUNT", "TYPE")
# For fatalities
fatal <- aggregate(data = data, FATALITIES ~ EVTYPE, sum, na.rm = TRUE)
fatal[, 3] <- "FATALITIES"
names(fatal)[2:3] <- c("COUNT", "TYPE")
# Combine
harm <- rbind(inj, fatal)
harm[, 3] <- as.factor(harm[, 3])

Obviously there are too many types of event but only the most harmful, let’s say, top 10 events, are worth to be addressed. The following code is used to sort the harm data frame in descending order based on the sum of INJURIES and FATALITIES.

total.harm <- aggregate(data = data, INJURIES + FATALITIES ~ EVTYPE, sum, na.rm = TRUE)
names(total.harm)[2] <- "TOTAL"
# Find out the specific types of events. Set top 10.
tharm.type <- arrange(total.harm, desc(TOTAL))[1:10, 1]
top.harm <- filter(harm, EVTYPE%in% tharm.type)
# Adjust the order
top.harm$EVTYPE <- factor(top.harm$EVTYPE, levels = as.character(tharm.type), ordered = TRUE)
top.harm <- top.harm[order(top.harm[, 1]), ]
top.harm
##               EVTYPE COUNT       TYPE
## 8            TORNADO 91346   INJURIES
## 18           TORNADO  5633 FATALITIES
## 1     EXCESSIVE HEAT  6525   INJURIES
## 11    EXCESSIVE HEAT  1903 FATALITIES
## 9          TSTM WIND  6957   INJURIES
## 19         TSTM WIND   504 FATALITIES
## 3              FLOOD  6789   INJURIES
## 13             FLOOD   470 FATALITIES
## 6          LIGHTNING  5230   INJURIES
## 16         LIGHTNING   816 FATALITIES
## 4               HEAT  2100   INJURIES
## 14              HEAT   937 FATALITIES
## 2        FLASH FLOOD  1777   INJURIES
## 12       FLASH FLOOD   978 FATALITIES
## 5          ICE STORM  1975   INJURIES
## 15         ICE STORM    89 FATALITIES
## 7  THUNDERSTORM WIND  1488   INJURIES
## 17 THUNDERSTORM WIND   133 FATALITIES
## 10      WINTER STORM  1321   INJURIES
## 20      WINTER STORM   206 FATALITIES

Economic Impact

Again, looking through the dataset, there are two types of economic impact: PROPDMG, which is property damage, and CROPDMG, which is crop damage. The actual damage value of each type of damage needs to be calculated with the help of the parameters PROPDMGEXP and CROPDMGEXP. The idea here is construct a data frame named converter, helping to calculate the actual damage value of each type of event.

unit <- sort(as.character(unique(unique(data$PROPDMGEXP), unique(data$CROPDMGEXP))))
multiplier <- c(0,0,0,1,10,10,10,10,10,10,10,10,10,10^9,10^2,10^2,10^3,10^6,10^6)
converter <- data.frame(unit, multiplier)

damage <- select(data, EVTYPE, PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)
# Substitute the corresponded value
damage$PROPDMGEXP <- converter[match(damage$PROPDMGEXP, converter$unit), 2]
damage$CROPDMGEXP <- converter[match(damage$CROPDMGEXP, converter$unit), 2]
# Calculate the actual value by multiplying those two columns
damage[, 6] <- transmute(damage, PROPDMG.VAL = PROPDMG*PROPDMGEXP)
damage[, 7] <- transmute(damage, CROPDMG.VAL = CROPDMG*CROPDMGEXP)

Just like mentioned above, the different damage vales for each type of event can be obtained by using the code in following chunk.

# Extract the loss for each type of damage
prop <- aggregate(data = damage, PROPDMG.VAL ~ EVTYPE, sum, na.rm = TRUE)
prop[, 3] <- "PROPERTIES"
names(prop)[2:3] <- c("VALUES", "DAMAGE.TYPE")
crop <- aggregate(data = damage, CROPDMG.VAL ~ EVTYPE, sum, na.rm = TRUE)
crop[, 3] <- "CROPS"
names(crop)[2:3] <- c("VALUES", "DAMAGE.TYPE")

Also, the top 10 economic impact information need to be decided by the total damage value of properties and crops.

# Find out the top 10 events with economic consequences
total.dmg <- aggregate(data = damage, CROPDMG.VAL + PROPDMG.VAL ~ EVTYPE, sum, 
                       na.rm = TRUE)

names(total.dmg)[2] <- "TOTAL.DAMAGE"
# Find the top 10 types of events that we are looking for
tdmg.type <- arrange(total.dmg, desc(TOTAL.DAMAGE))[1:10, 1]

# Combine prop and crop, adjust the order
# Reuse the data frame "damage" here
damage <- rbind(prop, crop)
damage[, 3] <- as.factor(damage[, 3])
# Using the type factor to find out the needed dataset
top.dmg <- filter(damage, EVTYPE %in% tdmg.type)
# Adjust the order
top.dmg$EVTYPE <- factor(top.dmg$EVTYPE, levels = as.character(tdmg.type), ordered = TRUE)
top.dmg <- top.dmg[order(top.dmg[, 1]), ]
top.dmg
##               EVTYPE       VALUES DAMAGE.TYPE
## 3              FLOOD 144657709800  PROPERTIES
## 13             FLOOD   5661968450       CROPS
## 6  HURRICANE/TYPHOON  69305840000  PROPERTIES
## 16 HURRICANE/TYPHOON   2607872800       CROPS
## 10           TORNADO  56937162897  PROPERTIES
## 20           TORNADO    414954710       CROPS
## 9        STORM SURGE  43323536000  PROPERTIES
## 19       STORM SURGE         5000       CROPS
## 4               HAIL  15732269877  PROPERTIES
## 14              HAIL   3025537650       CROPS
## 2        FLASH FLOOD  16140815011  PROPERTIES
## 12       FLASH FLOOD   1421317100       CROPS
## 1            DROUGHT   1046106000  PROPERTIES
## 11           DROUGHT  13972566000       CROPS
## 5          HURRICANE  11868319010  PROPERTIES
## 15         HURRICANE   2741910000       CROPS
## 8        RIVER FLOOD   5118945500  PROPERTIES
## 18       RIVER FLOOD   5029459000       CROPS
## 7          ICE STORM   3944928310  PROPERTIES
## 17         ICE STORM   5022113500       CROPS

Results and Analysis

The figure below shows the top 10 harmful events, combined with injuries and fatalities.
It is easy to see that the tornado is the most horrible event. Fortunately, fatalities only takes a small portation of casualties among all these events.

p1 <- ggplot(data = top.harm, aes(x = EVTYPE, y = COUNT, fill = TYPE))
p1 + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle=90, 
vjust=0.5, hjust=1)) + ggtitle("Top 10 Harmful Events") + labs(x = "EVENT", y = 
"Total Numbers of Fatalities and Injuries")

The figure below shows the top 10 eonomic damage events, consisted with properties and crops. Flood causes the highest total damage, then followed by hurricane/typhoon, tornado and so on. Properties damage take a huge proportion in the total loss for most of those events, where drought, river flood and ice storm are the exception as shown on the plot.

p2 <- ggplot(data = top.dmg, aes(x = EVTYPE, y = VALUES, fill = DAMAGE.TYPE))
p2 + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle=90, 
vjust=0.5, hjust=1)) + ggtitle("Top 10 Economic Loss Events") + labs(x = "EVENT",
y =  "Total Values of Loss")

Summary

According to the analysis conducted above, tornado is the most horrible events in terms of injuries and fatalies, while flood has the greatest impact on economy.