The basic goal of this assignment is to explore the NOAA Storm Database which contains events from 1950 to Nov 2011 and answer some basic questions about severe weather events. Will try to analyze & present my findings in the following areas:
Across the United States, which types of events are most harmful with respect to population health? Across the United States, which types of events have the greatest economic consequences? Introduction Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm Database. This Database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The stormData for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
Storm Data[47Mb]
There is also some documentation of the Database available. Here you will find how some of the variables are constructed/defined.
National Weather Service Storm Data Documentation National Climatic stormData Center Storm Events FAQ The events in the Database start in the year 1950 and end in November 2011. In the earlier years of the Database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
The analysis was performed on Storm Events Database, provided by National Climatic Data Center. The data is from a comma-separated-value file available here. There is also some documentation of the data available here.
The first step is to read the data into a data frame.
setwd("C:/Users/xzhenning/Documents/R/spring 2019/")
storm <- read.csv("C:./data/repdata_data_StormData.csv.bz2")
Before the analysis, the data need some preprocessing. Event types don’t have a specific format. For instance, there are events with types Frost/Freeze
, FROST/FREEZE
and FROST\\FREEZE
which obviously refer to the same type of event.
# number of unique event types
length(unique(storm$EVTYPE))
## [1] 985
# translate all letters to lowercase
event_types <- tolower(storm$EVTYPE)
# replace all punct. characters with a space
event_types <- gsub("[[:blank:][:punct:]+]", " ", event_types)
length(unique(event_types))
## [1] 874
# update the data frame
storm$EVTYPE <- event_types
No further data preprocessing was performed although the event type field can be processed further to merge event types such as tstm wind
and thunderstorm wind
.
After performing data cleaning, as expected, the number of unique event types reduce significantly. For further analysis, the cleaned event types are used.
To find the event types that are most harmful to population health, the number of casualties are aggregated by the event type.
library(plyr)
casualties <- ddply(storm, .(EVTYPE), summarize,
fatalities = sum(FATALITIES),
injuries = sum(INJURIES))
# Find events that caused most death and injury
fatal_events <- head(casualties[order(casualties$fatalities, decreasing = T), ], 10)
injury_events <- head(casualties[order(casualties$injuries, decreasing = T), ], 10)
Top 10 events that caused largest number of deaths are
fatal_events[, c("EVTYPE", "fatalities")]
## EVTYPE fatalities
## 741 tornado 5633
## 116 excessive heat 1903
## 138 flash flood 978
## 240 heat 937
## 410 lightning 816
## 762 tstm wind 504
## 154 flood 470
## 515 rip current 368
## 314 high wind 248
## 19 avalanche 224
Top 10 events that caused most number of injuries are
injury_events[, c("EVTYPE", "injuries")]
## EVTYPE injuries
## 741 tornado 91346
## 762 tstm wind 6957
## 154 flood 6789
## 116 excessive heat 6525
## 410 lightning 5230
## 240 heat 2100
## 382 ice storm 1975
## 138 flash flood 1777
## 671 thunderstorm wind 1488
## 209 hail 1361
To analyze the impact of weather events on the economy, available property damage and crop damage reportings/estimates were used.
Note: in the raw data, the property damage is represented with two fields, a number PROPDMG
in dollars and the exponent PROPDMGEXP
. Similarly, the crop damage is represented using two fields, CROPDMG
and CROPDMGEXP
. The first step in the analysis is to calculate the property and crop damage for each event.
exp_transform <- function(e) {
# h -> hundred, k -> thousand, m -> million, b -> billion
if (e %in% c('h', 'H'))
return(2)
else if (e %in% c('k', 'K'))
return(3)
else if (e %in% c('m', 'M'))
return(6)
else if (e %in% c('b', 'B'))
return(9)
else if (!is.na(as.numeric(e))) # if a digit
return(as.numeric(e))
else if (e %in% c('', '-', '?', '+'))
return(0)
else {
stop("Invalid exponent value.")
}
}
prop_dmg_exp <- sapply(storm$PROPDMGEXP, FUN=exp_transform)
storm$prop_dmg <- storm$PROPDMG * (10 ** prop_dmg_exp)
crop_dmg_exp <- sapply(storm$CROPDMGEXP, FUN=exp_transform)
storm$crop_dmg <- storm$CROPDMG * (10 ** crop_dmg_exp)
# Compute the economic loss by event type
library(plyr)
econ_loss <- ddply(storm, .(EVTYPE), summarize,
prop_dmg = sum(prop_dmg),
crop_dmg = sum(crop_dmg))
# filter out events that caused no economic loss
econ_loss <- econ_loss[(econ_loss$prop_dmg > 0 | econ_loss$crop_dmg > 0), ]
prop_dmg_events <- head(econ_loss[order(econ_loss$prop_dmg, decreasing = T), ], 10)
crop_dmg_events <- head(econ_loss[order(econ_loss$crop_dmg, decreasing = T), ], 10)
Top 10 events that caused most property damage (in dollars) are as follows
prop_dmg_events[, c("EVTYPE", "prop_dmg")]
## EVTYPE prop_dmg
## 138 flash flood 6.820237e+13
## 697 thunderstorm winds 2.086532e+13
## 741 tornado 1.078951e+12
## 209 hail 3.157558e+11
## 410 lightning 1.729433e+11
## 154 flood 1.446577e+11
## 366 hurricane typhoon 6.930584e+10
## 166 flooding 5.920826e+10
## 585 storm surge 4.332354e+10
## 270 heavy snow 1.793259e+10
Similarly, the events that caused biggest crop damage are
crop_dmg_events[, c("EVTYPE", "crop_dmg")]
## EVTYPE crop_dmg
## 84 drought 13972566000
## 154 flood 5661968450
## 519 river flood 5029459000
## 382 ice storm 5022113500
## 209 hail 3025974480
## 357 hurricane 2741910000
## 366 hurricane typhoon 2607872800
## 138 flash flood 1421317100
## 125 extreme cold 1312973000
## 185 frost freeze 1094186000
The following plots show top dangerous weather event types.
library(ggplot2)
## Registered S3 methods overwritten by 'ggplot2':
## method from
## [.quosures rlang
## c.quosures rlang
## print.quosures rlang
library(gridExtra)
# Set the levels in order
p1 <- ggplot(data=fatal_events,
aes(x=reorder(EVTYPE, fatalities), y=fatalities, fill=fatalities)) +
geom_bar(stat="identity") +
coord_flip() +
ylab("Total number of fatalities") +
xlab("Event type") +
ggtitle ("Top 20 events for total fatalities") +
theme(axis.text.x=element_text(angle=45,hjust=1))
p2 <- ggplot(data=injury_events,
aes(x=reorder(EVTYPE, injuries), y=injuries, fill=injuries)) +
geom_bar(stat="identity") +
coord_flip() +
ylab("Total number of injuries") +
xlab("Event type") +
ggtitle ("Top 20 events for total injuries") +
theme(axis.text.x=element_text(angle=45,hjust=1))
p1
p2
Tornadoes cause most number of deaths and injuries among all event types. There are more than 5,000 deaths and more than 10,000 injuries in the last 60 years in US, due to tornadoes. The other event types that are most dangerous with respect to population health are excessive heat and flash floods.
The following plot shows the most severe weather event types with respect to economic cost that they have costed since 1950s.
library(ggplot2)
library(gridExtra)
# Set the levels in order
p1 <- ggplot(data=prop_dmg_events,
aes(x=reorder(EVTYPE, prop_dmg), y=log10(prop_dmg), fill=prop_dmg )) +
geom_bar(stat="identity") +
coord_flip() +
xlab("Event type") +
ylab("Property damage in dollars (log-scale)") +
ggtitle ("Top 20 events for total property damage") +
theme(axis.text.x=element_text(angle=45,hjust=1))
p2 <- ggplot(data=crop_dmg_events,
aes(x=reorder(EVTYPE, crop_dmg), y=crop_dmg, fill=crop_dmg)) +
geom_bar(stat="identity") +
coord_flip() +
xlab("Event type") +
ylab("Crop damage in dollars") +
ggtitle ("Top 20 events for total crop damage") +
theme(axis.text.x=element_text(angle=45,hjust=1))
p1
p2
The data shows that flash floods and thunderstorm winds cost the largest property damages among weather-related natural diseasters.
The most severe weather event in terms of crop damage is drought. In the last 50 years, drought has caused more than 10 billion dollars in loss. Other severe crop-damage-causing event types include floods and hails.
Note:Property damages are converted to logarithmic scales due to large range of values.