Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The analysis of the weather event database has revealed that torandoes are the most fatal and injurious weather events, and that excessive heat is the second most dangerous. The most damaging weather events (in US dollars) for property was water and wind damage, and for crops (in US dollars) was drought, followed by water and wind.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size downloaded from (http://www.ncdc.noaa.gov/stormevents/ftp.jsp), provided by National Climatic Data Center.
There is also some documentation of the database available at (https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf).
(The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.- from Coursera)
The first step is to fetch the data and read it into a data frame.
data <- read.csv("repdata_data_StormData.csv.bz2")
Before the data are analyzed, some pre-processing is required. Event types don’t have a specific format. For instance, there are events with types Frost/Freeze
, FROST/FREEZE
and FROST\\FREEZE
that refer to the same type of event. Additionally, some of the formatting of the data is inconsistent and must be standardized.
# translate all letters to lowercase
event_types <- tolower(data$EVTYPE)
# replace all punct. characters with a space
event_types <- gsub("[[:blank:][:punct:]+]", " ", event_types)
length(unique(event_types))
## [1] 874
# update the data frame
data$EVTYPE <- event_types
No further pre-processing of the data was conducted, although the event types field can be later re-processed to merge similar events, for examples ‘tstm’ and ‘thunderstrom wind’.
All further analysis was conducted on cleaned event types.
To find the impact of weather events on population health, weather events that are either fatal or injurious are used, and casulaties are aggregated by event type.
library(plyr)
dim(data)
## [1] 902297 37
nmissing <- function(x) sum(is.na(x))
colwise(nmissing)(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 0 0 0 0 0 0 0 0
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 0 0 0 0 902297
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES
## 1 0 0 0 0 0 843563 0 0 0
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE
## 1 0 0 0 0 0 0 0 47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 0 40 0 0 0
#subsets for harmfulevents
casualties <- ddply(data, .(EVTYPE), summarize,
fatalities = sum(FATALITIES),
injuries = sum(INJURIES))
The 10 most fatal types of weather events are:
injury_events <- head(casualties[order(casualties$injuries, decreasing = T),], 10)
injury_events[, c("EVTYPE", "injuries")]
## EVTYPE injuries
## 741 tornado 91346
## 762 tstm wind 6957
## 154 flood 6789
## 116 excessive heat 6525
## 410 lightning 5230
## 240 heat 2100
## 382 ice storm 1975
## 138 flash flood 1777
## 671 thunderstorm wind 1488
## 209 hail 1361
and the 10 most injurious types of weather events are:
fatal_events <- head(casualties[order(casualties$fatalities, decreasing = T),], 10)
fatal_events[, c("EVTYPE", "fatalities")]
## EVTYPE fatalities
## 741 tornado 5633
## 116 excessive heat 1903
## 138 flash flood 978
## 240 heat 937
## 410 lightning 816
## 762 tstm wind 504
## 154 flood 470
## 515 rip current 368
## 314 high wind 248
## 19 avalanche 224
The economic impacts of weather events are are assesed by available propery and crop damage. In the raw data, property damage is represented with two fields, a numberPROPDMG
in dollars and the exponent PROPDMGEXP
. Similarly, crop damage is represented using two fields, CROPDMG
and CROPDMGEXP
. The first step in the analysis is to calculate the property and crop damage for each event.
exp_transform <- function(e) {
# h -> hundred, k -> thousand, m -> million, b -> billion
if (e %in% c('h', 'H'))
return(2)
else if (e %in% c('k', 'K'))
return(3)
else if (e %in% c('m', 'M'))
return(6)
else if (e %in% c('b', 'B'))
return(9)
else if (!is.na(as.numeric(e))) # if a digit
return(as.numeric(e))
else if (e %in% c('', '-', '?', '+'))
return(0)
else {
stop("Invalid exponent value.")
}
}
prop_dmg_exp <- sapply(data$PROPDMGEXP, FUN=exp_transform)
data$prop_dmg <- data$PROPDMG * (10 ** prop_dmg_exp)
crop_dmg_exp <- sapply(data$CROPDMGEXP, FUN=exp_transform)
data$crop_dmg <- data$CROPDMG * (10 ** crop_dmg_exp)
library(plyr)
eco_loss <- ddply(data, .(EVTYPE), summarize,
prop_dmg = sum(prop_dmg),
crop_dmg = sum(crop_dmg))
# filter out events that caused no economic loss
eco_loss <- eco_loss[(eco_loss$prop_dmg > 0 | eco_loss$crop_dmg > 0), ]
prop_dmg_events <- head(eco_loss[order(eco_loss$prop_dmg, decreasing = T), ], 10)
crop_dmg_events <- head(eco_loss[order(eco_loss$crop_dmg, decreasing = T), ], 10)
The 10 types of weather event that have caused the most property damage are:
# events that caused most property damage (in dollars)
prop_dmg_events[, c("EVTYPE", "prop_dmg")]
## EVTYPE prop_dmg
## 138 flash flood 6.820237e+13
## 697 thunderstorm winds 2.086532e+13
## 741 tornado 1.078951e+12
## 209 hail 3.157558e+11
## 410 lightning 1.729433e+11
## 154 flood 1.446577e+11
## 366 hurricane typhoon 6.930584e+10
## 166 flooding 5.920826e+10
## 585 storm surge 4.332354e+10
## 270 heavy snow 1.793259e+10
and the 10 types of weather events that have caused the most damage to crops are:
# events that caused most crop damage
crop_dmg_events[, c("EVTYPE", "crop_dmg")]
## EVTYPE crop_dmg
## 84 drought 13972566000
## 154 flood 5661968450
## 519 river flood 5029459000
## 382 ice storm 5022113500
## 209 hail 3025974480
## 357 hurricane 2741910000
## 366 hurricane typhoon 2607872800
## 138 flash flood 1421317100
## 125 extreme cold 1312973000
## 185 frost freeze 1094186000
Fatalities and injuries are often related, and both are readouts of the health effects on the popultaion, so it is pertinent to examine the relationship between these two types of health impacts.
library(ggplot2)
qplot(log(injuries),log(fatalities),data=casualties,
xlab="injuries (log scale)",ylab="fatalties (log scale)",
main="injuries and fatalities from the same type of event")
As illustrated by this figure, types of Weather event that cause a lot of injuries also cause a lot of fatalties. An important distinction is illustrated here, in that types of events that cause fatalities also cause many injuries, but there are many examples of a type either causing fatalities without injuries (e.g.,avalanches), or injuries without fatalities (e.g. small hail). However, it is safe to conclude that the most fatal and most injurious types of weather event are similar.
Based on this data, tornadoes are the most dangerous weather events. Tornadoes have been recorded 741 times, and are responsible for 5633 deaths and 91346 injuries. This is by far the most dangerous type of weather event.
Additionally dangerous types of weather events include excessive heat, responsible for 1903 deaths (second most) and 6525 injuries (fourth most), and flash floods, responsible for 978 deaths (third most) and 6789 injuries (third most).
The property damage inflicted by adverse weather events is assessed in US dollars required to repair the damage, and crop damage assesed in lost product. Both of these may be skewed towards recent events due to inflation.
It is important to realize that weather events impacting crops may have little impact on property damage, but property-damaging events may also adversely impact crops.
ggplot(data=prop_dmg_events,
aes(x=reorder(EVTYPE, prop_dmg), y=(prop_dmg), fill=prop_dmg )) +
geom_bar(stat="identity") +
coord_flip() +
xlab("event type") +
ylab("$ of property damage") +
theme(legend.position="none")
As this figure illustrates, property was most damaged by flashfloods (>$68 trillion), and thunderstorm winds (>$20 trillion), which may be derived from the same events. The types of weather events that have caused the most property damage between 1950 and 2011 involved water in some form (flashflood, hail, flood, flooding, storm surge, or heavey snow).
ggplot(data=crop_dmg_events,
aes(x=reorder(EVTYPE, crop_dmg), y=crop_dmg, fill=crop_dmg)) +
geom_bar(stat="identity") +
coord_flip() +
xlab("Event type") +
ylab("$ of crop damage") +
theme(legend.position="none")
Between 1950 and 2011, the weather events from a lack of water, causing drought, have caused the most crop damage (more than $13.9 billion), with excess water, flood (~$5.9billion) and river flood (~$5billion), also having severe adverse effects on crops.