Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Synopsis

The analysis of the weather event database has revealed that torandoes are the most fatal and injurious weather events, and that excessive heat is the second most dangerous. The most damaging weather events (in US dollars) for property was water and wind damage, and for crops (in US dollars) was drought, followed by water and wind.

Data Processing

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size downloaded from (http://www.ncdc.noaa.gov/stormevents/ftp.jsp), provided by National Climatic Data Center.

There is also some documentation of the database available at (https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf).

(The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.- from Coursera)

The first step is to fetch the data and read it into a data frame.

data <- read.csv("repdata_data_StormData.csv.bz2")

Before the data are analyzed, some pre-processing is required. Event types don’t have a specific format. For instance, there are events with types Frost/Freeze, FROST/FREEZE and FROST\\FREEZE that refer to the same type of event. Additionally, some of the formatting of the data is inconsistent and must be standardized.

# translate all letters to lowercase
event_types <- tolower(data$EVTYPE)
# replace all punct. characters with a space
event_types <- gsub("[[:blank:][:punct:]+]", " ", event_types)
length(unique(event_types))
## [1] 874
# update the data frame
data$EVTYPE <- event_types

No further pre-processing of the data was conducted, although the event types field can be later re-processed to merge similar events, for examples ‘tstm’ and ‘thunderstrom wind’.

All further analysis was conducted on cleaned event types.

Population Health is Effected by Dangerous Weather Events

To find the impact of weather events on population health, weather events that are either fatal or injurious are used, and casulaties are aggregated by event type.

library(plyr)
dim(data)
## [1] 902297     37
nmissing <- function(x) sum(is.na(x))
colwise(nmissing)(data)
##   STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1       0        0        0         0      0          0     0      0
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0       0          0        0        0          0     902297
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH      F MAG FATALITIES INJURIES
## 1         0       0          0      0     0 843563   0          0        0
##   PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE
## 1       0          0       0          0   0          0         0       47
##   LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1         0         40          0       0      0
#subsets for harmfulevents
casualties <- ddply(data, .(EVTYPE), summarize,
                    fatalities = sum(FATALITIES),
                    injuries = sum(INJURIES))

The 10 most fatal types of weather events are:

injury_events <- head(casualties[order(casualties$injuries, decreasing = T),], 10)
injury_events[, c("EVTYPE", "injuries")]
##                EVTYPE injuries
## 741           tornado    91346
## 762         tstm wind     6957
## 154             flood     6789
## 116    excessive heat     6525
## 410         lightning     5230
## 240              heat     2100
## 382         ice storm     1975
## 138       flash flood     1777
## 671 thunderstorm wind     1488
## 209              hail     1361

and the 10 most injurious types of weather events are:

fatal_events <- head(casualties[order(casualties$fatalities, decreasing = T),], 10)
fatal_events[, c("EVTYPE", "fatalities")]
##             EVTYPE fatalities
## 741        tornado       5633
## 116 excessive heat       1903
## 138    flash flood        978
## 240           heat        937
## 410      lightning        816
## 762      tstm wind        504
## 154          flood        470
## 515    rip current        368
## 314      high wind        248
## 19       avalanche        224

Weather events also have economic impacts

The economic impacts of weather events are are assesed by available propery and crop damage. In the raw data, property damage is represented with two fields, a numberPROPDMG in dollars and the exponent PROPDMGEXP. Similarly, crop damage is represented using two fields, CROPDMG and CROPDMGEXP. The first step in the analysis is to calculate the property and crop damage for each event.

exp_transform <- function(e) {
  # h -> hundred, k -> thousand, m -> million, b -> billion
  if (e %in% c('h', 'H'))
    return(2)
  else if (e %in% c('k', 'K'))
    return(3)
  else if (e %in% c('m', 'M'))
    return(6)
  else if (e %in% c('b', 'B'))
    return(9)
  else if (!is.na(as.numeric(e))) # if a digit
    return(as.numeric(e))
  else if (e %in% c('', '-', '?', '+'))
    return(0)
  else {
    stop("Invalid exponent value.")
  }
}

prop_dmg_exp <- sapply(data$PROPDMGEXP, FUN=exp_transform)
data$prop_dmg <- data$PROPDMG * (10 ** prop_dmg_exp)
crop_dmg_exp <- sapply(data$CROPDMGEXP, FUN=exp_transform)
data$crop_dmg <- data$CROPDMG * (10 ** crop_dmg_exp)

library(plyr)
eco_loss <- ddply(data, .(EVTYPE), summarize,
                   prop_dmg = sum(prop_dmg),
                   crop_dmg = sum(crop_dmg))

# filter out events that caused no economic loss
eco_loss <- eco_loss[(eco_loss$prop_dmg > 0 | eco_loss$crop_dmg > 0), ]
prop_dmg_events <- head(eco_loss[order(eco_loss$prop_dmg, decreasing = T), ], 10)
crop_dmg_events <- head(eco_loss[order(eco_loss$crop_dmg, decreasing = T), ], 10)

The 10 types of weather event that have caused the most property damage are:

# events that caused most property damage (in dollars)
prop_dmg_events[, c("EVTYPE", "prop_dmg")]
##                 EVTYPE     prop_dmg
## 138        flash flood 6.820237e+13
## 697 thunderstorm winds 2.086532e+13
## 741            tornado 1.078951e+12
## 209               hail 3.157558e+11
## 410          lightning 1.729433e+11
## 154              flood 1.446577e+11
## 366  hurricane typhoon 6.930584e+10
## 166           flooding 5.920826e+10
## 585        storm surge 4.332354e+10
## 270         heavy snow 1.793259e+10

and the 10 types of weather events that have caused the most damage to crops are:

# events that caused most crop damage
crop_dmg_events[, c("EVTYPE", "crop_dmg")]
##                EVTYPE    crop_dmg
## 84            drought 13972566000
## 154             flood  5661968450
## 519       river flood  5029459000
## 382         ice storm  5022113500
## 209              hail  3025974480
## 357         hurricane  2741910000
## 366 hurricane typhoon  2607872800
## 138       flash flood  1421317100
## 125      extreme cold  1312973000
## 185      frost freeze  1094186000

Results

The danger of adverse weather events

Fatalities and injuries are often related, and both are readouts of the health effects on the popultaion, so it is pertinent to examine the relationship between these two types of health impacts.

library(ggplot2)
qplot(log(injuries),log(fatalities),data=casualties,
      xlab="injuries (log scale)",ylab="fatalties (log scale)",
      main="injuries and fatalities from the same type of event")

As illustrated by this figure, types of Weather event that cause a lot of injuries also cause a lot of fatalties. An important distinction is illustrated here, in that types of events that cause fatalities also cause many injuries, but there are many examples of a type either causing fatalities without injuries (e.g.,avalanches), or injuries without fatalities (e.g. small hail). However, it is safe to conclude that the most fatal and most injurious types of weather event are similar.

Based on this data, tornadoes are the most dangerous weather events. Tornadoes have been recorded 741 times, and are responsible for 5633 deaths and 91346 injuries. This is by far the most dangerous type of weather event.

Additionally dangerous types of weather events include excessive heat, responsible for 1903 deaths (second most) and 6525 injuries (fourth most), and flash floods, responsible for 978 deaths (third most) and 6789 injuries (third most).

The economic impacts of adverse weather events

The property damage inflicted by adverse weather events is assessed in US dollars required to repair the damage, and crop damage assesed in lost product. Both of these may be skewed towards recent events due to inflation.

It is important to realize that weather events impacting crops may have little impact on property damage, but property-damaging events may also adversely impact crops.

ggplot(data=prop_dmg_events,
       aes(x=reorder(EVTYPE, prop_dmg), y=(prop_dmg), fill=prop_dmg )) +
  geom_bar(stat="identity") +
  coord_flip() +
  xlab("event type") +
  ylab("$ of property damage") +
  theme(legend.position="none")

As this figure illustrates, property was most damaged by flashfloods (>$68 trillion), and thunderstorm winds (>$20 trillion), which may be derived from the same events. The types of weather events that have caused the most property damage between 1950 and 2011 involved water in some form (flashflood, hail, flood, flooding, storm surge, or heavey snow).

ggplot(data=crop_dmg_events,
       aes(x=reorder(EVTYPE, crop_dmg), y=crop_dmg, fill=crop_dmg)) +
  geom_bar(stat="identity") +
  coord_flip() + 
  xlab("Event type") +
  ylab("$ of crop damage") + 
  theme(legend.position="none")

Between 1950 and 2011, the weather events from a lack of water, causing drought, have caused the most crop damage (more than $13.9 billion), with excess water, flood (~$5.9billion) and river flood (~$5billion), also having severe adverse effects on crops.