Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The code shown below is used to load the necessary packages for future data processing and data reading.
# Don't forget to set the working directory
# Load all the necessary packages
library(dplyr)
library(lubridate)
library(ggplot2)
# Read the data
data <- read.csv("StormData.csv.bz2")
Looking through data, it is easy to find that bothINJURIES and FATALITIES contribute to the health impact. The basic idea here is extract a data frame with three columns, which are EVTYPE, COUNT and TYPE.
# For injuries
inj <- aggregate(data = data, INJURIES ~ EVTYPE, sum, na.rm = TRUE)
inj[, 3] <- "INJURIES"
names(inj)[2:3] <- c("COUNT", "TYPE")
# For fatalities
fatal <- aggregate(data = data, FATALITIES ~ EVTYPE, sum, na.rm = TRUE)
fatal[, 3] <- "FATALITIES"
names(fatal)[2:3] <- c("COUNT", "TYPE")
# Combine
harm <- rbind(inj, fatal)
harm[, 3] <- as.factor(harm[, 3])
Obviously there are too many types of event but only the most harmful, let’s say, top 10 events, are worth to be addressed. The following code is used to sort the harm data frame in descending order based on the sum of INJURIES and FATALITIES.
total.harm <- aggregate(data = data, INJURIES + FATALITIES ~ EVTYPE, sum, na.rm = TRUE)
names(total.harm)[2] <- "TOTAL"
# Find out the specific types of events. Set top 10.
tharm.type <- arrange(total.harm, desc(TOTAL))[1:10, 1]
top.harm <- filter(harm, EVTYPE%in% tharm.type)
# Adjust the order
top.harm$EVTYPE <- factor(top.harm$EVTYPE, levels = as.character(tharm.type), ordered = TRUE)
top.harm <- top.harm[order(top.harm[, 1]), ]
top.harm
## EVTYPE COUNT TYPE
## 8 TORNADO 91346 INJURIES
## 18 TORNADO 5633 FATALITIES
## 1 EXCESSIVE HEAT 6525 INJURIES
## 11 EXCESSIVE HEAT 1903 FATALITIES
## 9 TSTM WIND 6957 INJURIES
## 19 TSTM WIND 504 FATALITIES
## 3 FLOOD 6789 INJURIES
## 13 FLOOD 470 FATALITIES
## 6 LIGHTNING 5230 INJURIES
## 16 LIGHTNING 816 FATALITIES
## 4 HEAT 2100 INJURIES
## 14 HEAT 937 FATALITIES
## 2 FLASH FLOOD 1777 INJURIES
## 12 FLASH FLOOD 978 FATALITIES
## 5 ICE STORM 1975 INJURIES
## 15 ICE STORM 89 FATALITIES
## 7 THUNDERSTORM WIND 1488 INJURIES
## 17 THUNDERSTORM WIND 133 FATALITIES
## 10 WINTER STORM 1321 INJURIES
## 20 WINTER STORM 206 FATALITIES
Again, looking through the dataset, there are two types of economic impact: PROPDMG, which is property damage, and CROPDMG, which is crop damage. The actual damage value of each type of damage needs to be calculated with the help of the parameters PROPDMGEXP and CROPDMGEXP. The idea here is construct a data frame named converter, helping to calculate the actual damage value of each type of event.
unit <- sort(as.character(unique(unique(data$PROPDMGEXP), unique(data$CROPDMGEXP))))
multiplier <- c(0,0,0,1,10,10,10,10,10,10,10,10,10,10^9,10^2,10^2,10^3,10^6,10^6)
converter <- data.frame(unit, multiplier)
damage <- select(data, EVTYPE, PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)
# Substitute the corresponded value
damage$PROPDMGEXP <- converter[match(damage$PROPDMGEXP, converter$unit), 2]
damage$CROPDMGEXP <- converter[match(damage$CROPDMGEXP, converter$unit), 2]
# Calculate the actual value by multiplying those two columns
damage[, 6] <- transmute(damage, PROPDMG.VAL = PROPDMG*PROPDMGEXP)
damage[, 7] <- transmute(damage, CROPDMG.VAL = CROPDMG*CROPDMGEXP)
Just like mentioned above, the different damage vales for each type of event can be obtained by using the code in following chunk.
# Extract the loss for each type of damage
prop <- aggregate(data = damage, PROPDMG.VAL ~ EVTYPE, sum, na.rm = TRUE)
prop[, 3] <- "PROPERTIES"
names(prop)[2:3] <- c("VALUES", "DAMAGE.TYPE")
crop <- aggregate(data = damage, CROPDMG.VAL ~ EVTYPE, sum, na.rm = TRUE)
crop[, 3] <- "CROPS"
names(crop)[2:3] <- c("VALUES", "DAMAGE.TYPE")
Also, the top 10 economic impact information need to be decided by the total damage value of properties and crops.
# Find out the top 10 events with economic consequences
total.dmg <- aggregate(data = damage, CROPDMG.VAL + PROPDMG.VAL ~ EVTYPE, sum,
na.rm = TRUE)
names(total.dmg)[2] <- "TOTAL.DAMAGE"
# Find the top 10 types of events that we are looking for
tdmg.type <- arrange(total.dmg, desc(TOTAL.DAMAGE))[1:10, 1]
# Combine prop and crop, adjust the order
# Reuse the data frame "damage" here
damage <- rbind(prop, crop)
damage[, 3] <- as.factor(damage[, 3])
# Using the type factor to find out the needed dataset
top.dmg <- filter(damage, EVTYPE %in% tdmg.type)
# Adjust the order
top.dmg$EVTYPE <- factor(top.dmg$EVTYPE, levels = as.character(tdmg.type), ordered = TRUE)
top.dmg <- top.dmg[order(top.dmg[, 1]), ]
top.dmg
## EVTYPE VALUES DAMAGE.TYPE
## 3 FLOOD 144657709800 PROPERTIES
## 13 FLOOD 5661968450 CROPS
## 6 HURRICANE/TYPHOON 69305840000 PROPERTIES
## 16 HURRICANE/TYPHOON 2607872800 CROPS
## 10 TORNADO 56937162897 PROPERTIES
## 20 TORNADO 414954710 CROPS
## 9 STORM SURGE 43323536000 PROPERTIES
## 19 STORM SURGE 5000 CROPS
## 4 HAIL 15732269877 PROPERTIES
## 14 HAIL 3025537650 CROPS
## 2 FLASH FLOOD 16140815011 PROPERTIES
## 12 FLASH FLOOD 1421317100 CROPS
## 1 DROUGHT 1046106000 PROPERTIES
## 11 DROUGHT 13972566000 CROPS
## 5 HURRICANE 11868319010 PROPERTIES
## 15 HURRICANE 2741910000 CROPS
## 8 RIVER FLOOD 5118945500 PROPERTIES
## 18 RIVER FLOOD 5029459000 CROPS
## 7 ICE STORM 3944928310 PROPERTIES
## 17 ICE STORM 5022113500 CROPS
The figure below shows the top 10 harmful events, combined with injuries and fatalities.
It is easy to see that the tornado is the most horrible event. Fortunately, fatalities only takes a small portation of casualties among all these events.
p1 <- ggplot(data = top.harm, aes(x = EVTYPE, y = COUNT, fill = TYPE))
p1 + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle=90,
vjust=0.5, hjust=1)) + ggtitle("Top 10 Harmful Events") + labs(x = "EVENT", y =
"Total Numbers of Fatalities and Injuries")
The figure below shows the top 10 eonomic damage events, consisted with properties and crops. Flood causes the highest total damage, then followed by hurricane/typhoon, tornado and so on. Properties damage take a huge proportion in the total loss for most of those events, where drought, river flood and ice storm are the exception as shown on the plot.
p2 <- ggplot(data = top.dmg, aes(x = EVTYPE, y = VALUES, fill = DAMAGE.TYPE))
p2 + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle=90,
vjust=0.5, hjust=1)) + ggtitle("Top 10 Economic Loss Events") + labs(x = "EVENT",
y = "Total Values of Loss")
According to the analysis conducted above, tornado is the most horrible events in terms of injuries and fatalies, while flood has the greatest impact on economy.