Local severe weather events have nation-wide effects

U.S. National Oceanic and Atmospheric Administration (NOAA) publishes database which describes characteristics of major storms and weather events in the United States covering timespan from 1950 to 2011. Present report investigates the nation-wide economic consequences and public health effects of adverse weather events recorded in the database.

Overall,tornadoes are the cause of majority of weather-related deaths and injuries in addition to the significant economic impact. Economically, floods are the biggest cause of loss.

Typically weather events are relatively local even though their impact can be nation wide, depending on the area they occur. Therefore it is noted, that the preparedness for severe weather events should be based on information of local weather conditions.

Preparing the working environment

library(data.table)
library(ggplot2)
library(plyr)
library(maps)
setwd('/Users/majulass/Documents/2014/reproducible-research/peer-assesment-02')
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "storm-dataset.csv.bz2", method = "curl")
weather.df <- read.csv(bzfile("./storm-dataset.csv.bz2"))

Utility functions

# Process multipliers in exponent field.

process.multipliers <- function(multiplier,number){
  switch(tolower(as.character(multiplier)), 
  h={
    100 * number 
  },
  k={
    1000 * number
  },
  m={
   1000000 * number    
  },
  b={
    1000000000 * number
  },
  {
   number
  }
  )


  }

# Function for calculating total impact on public health
calculate.health.cost <- function(injuries,fatalities){
  injuries + fatalities
  }

# Utility function for creating Title Case strings.
# From: http://stackoverflow.com/a/15778614
topropper <- function(x) {
  # Makes Proper Capitalization out of a string or collection of strings. 
  sapply(x, function(strn)
   { s <- strsplit(strn, "\\s")[[1]]
       paste0(toupper(substring(s, 1,1)), 
             tolower(substring(s, 2)),
             collapse=" ")}, USE.NAMES=FALSE)
}

# Utility function for wrapping long titles in ggplot plots.
wrapper <- function(x, ...) paste(strwrap(x, ...), collapse = "\n")

Data processing

The original, unprocessed dataset contains 902297 rows. The dataset is prepared for analysis by renaming the variables following the principles of tidy data.

# Renaming the field based on information found at http://ire.org/nicar/database-library/databases/storm-events/
weather.names<-c(
  'state.code',
  'event.began.date',
  'event.began.time',
  'timezone',
  'county.code',
  'county.names',
  'state',
  'event.type',
  'event.start.point',
  'event.begin.azimuth',
  'event.start.location',
  'event.end.date',
  'event.end.time',
  'event.end.county',
  'event.coundn.end',
  'event.start.point',
  'event.end.azimuth',
  'event.end.location',
  'tornado.path.length',
  'tordano.path.width',
  'tornado.intensity.scale',
  'hail.magnitude',
  'fatalities',
  'injuries',
  'property.damage',
  'property.damage.exp',
  'crop.damage',
  'crop.damage.exp',
  'wfo.office.id',
  'event.us.state',
  'event.zones',
  'event.start.latitude',
  'event.start.longitude',
  'event.end.latitude',
  'event.end.longitude',
  'remarks',
  'ncdc.reference')

names(weather.df)<-weather.names

By observing the dataset visually it can be seen that event though the dataset is published by established govermental organization, the dataset is rather messy; compared to the codebook it contains non-standard and missing values which can distort the result of analyses based on the dataset. Therefore a strategy for cleaning the dataset is needed. For serious academic research or policy-making, the chosen strategy should try to minimize the loss of observations resulted from cleaning. Eg. observations with non-standard or unknown values should be imputed with replacement values.

For the purposes of this assingment more straightforward strategy is sufficient – the ambiquous observations are removed and only the major errors are corrected.

The following table lists top 50 weather event codes in the dataset:

event.type.counts<-as.data.frame(sort(table(weather.df$event.type),decreasing=TRUE))[c(1:50),]
event.type.counts <- as.data.frame(event.type.counts)
kable(as.data.frame(event.type.counts), 
      format="html",
      caption="Table 1: Top 50 weather events in unprocessed NOAA data.")
Table 1: Top 50 weather events in unprocessed NOAA data.
event.type.counts
HAIL 288661
TSTM WIND 219940
THUNDERSTORM WIND 82563
TORNADO 60652
FLASH FLOOD 54277
FLOOD 25326
THUNDERSTORM WINDS 20843
HIGH WIND 20212
LIGHTNING 15754
HEAVY SNOW 15708
HEAVY RAIN 11723
WINTER STORM 11433
WINTER WEATHER 7026
FUNNEL CLOUD 6839
MARINE TSTM WIND 6175
MARINE THUNDERSTORM WIND 5812
WATERSPOUT 3796
STRONG WIND 3566
URBAN/SML STREAM FLD 3392
WILDFIRE 2761
BLIZZARD 2719
DROUGHT 2488
ICE STORM 2006
EXCESSIVE HEAT 1678
HIGH WINDS 1533
WILD/FOREST FIRE 1457
FROST/FREEZE 1342
DENSE FOG 1293
WINTER WEATHER/MIX 1104
TSTM WIND/HAIL 1028
EXTREME COLD/WIND CHILL 1002
HEAT 767
HIGH SURF 725
TROPICAL STORM 690
FLASH FLOODING 682
EXTREME COLD 655
COASTAL FLOOD 650
LAKE-EFFECT SNOW 636
FLOOD/FLASH FLOOD 624
LANDSLIDE 600
SNOW 587
COLD/WIND CHILL 539
FOG 538
RIP CURRENT 470
MARINE HAIL 442
DUST STORM 427
AVALANCHE 386
WIND 340
RIP CURRENTS 304
STORM SURGE 261

Based on the table, at least non-standard event codes related to thunderstorm winds (TSTM WIND, THUNDERSTORM WINDS) should be mapped to official values given in National Weather Service Instruction (NWSI) 10-1605, hereafter NWSI-10-1065. For the purposes of this assingment the effect of mapping other values is negligible taking into account the total number of observations with valid event type code.

weather.df$event.type <- mapvalues(weather.df$event.type, from = c("TSTM WIND", "THUNDERSTORM WINDS"), to = c("THUNDERSTORM WIND", "THUNDERSTORM WIND"))

Next, the dataset is filtered using permitted weather event types and exponent values.

weather.dt <- data.table(weather.df)

permitted.event.types <- c(
  "ASTRONOMICAL LOW TIDE",
    "AVALANCHE",
    "BLIZZARD",
    "COASTAL FLOOD",
    "COLD/WIND CHILL",
    "DEBRIS FLOW",
    "DENSE FOG",
    "DENSE SMOKE",
    "DROUGHT",
    "DUST DEVIL",
    "DUST STORM",
    "EXCESSIVE HEAT",
    "EXTREME COLD/WIND CHILL",
    "FLASH FLOOD",
    "FLOOD",
    "FROST/FREEZE",
    "FUNNEL CLOUD",
    "FREEZING FOG",
    "HAIL",
    "HEAT",
    "HEAVY RAIN",
    "HEAVY SNOW",
    "HIGH SURF",
    "HIGH WIND",
    "HURRICANE (TYPHOON)",
    "ICE STORM",
    "LAKE-EFFECT SNOW",
    "LAKESHORE FLOOD",
    "LIGHTNING",
    "MARINE HAIL",
    "MARINE HIGH WIND",
    "MARINE STRONG WIND",
    "MARINE THUNDERSTORM WIND",
    "RIP CURRENT",
    "SEICHE",
    "SLEET",
    "STORM SURGE/TIDE",
    "STRONG WIND",
    "THUNDERSTORM WIND",
    "TORNADO",
    "TROPICAL DEPRESSION",
    "TROPICAL STORM",
    "TSUNAMI",
    "VOLCANIC ASH",
    "WATERSPOUT",
    "WILDFIRE",
    "WINTER STORM",
    "WINTER WEATHER")

In the original data, the dollar impact values of weather events are given in a format in which the numeric value is given separatery of the exponent of the number. In the NWSI-10-1065 the permitted exponents are:

Only observations with permitted exponent values are included in the analysis, to remove potentially ambiquous observations.

multipliers=c('h','k','m','b','')
weather.dt$property.damage.exp <- tolower(weather.dt$property.damage.exp)
weather.dt$crop.damage.exp <- tolower(weather.dt$crop.damage.exp)
weather.dt.subset <- weather.dt[weather.dt$event.type %in% permitted.event.types,]
weather.dt.subset.subset <- weather.dt.subset[weather.dt.subset$property.damage.exp %in% multipliers & weather.dt.subset$crop.damage.exp %in% multipliers ,]

The original dataset had 26239 observations with ambiquous weather event type and 341 observations with and exponent value with unknown meaning. When rows with ambiquous values are removed, the resulting final dataset used in the analysis contains 875737 rows.

In the last data processing step, the exponent values are transformed to actual dollar amounts. After the last processing step, the dataset is ready for analysis, in which the following variables are used:

In later steps, derived variable state.name is included to the analysis to enable geographic investigation of the data.

weather.dt.subset.subset$property.damage.dollars <- mapply(
  process.multipliers, 
  weather.dt.subset.subset$property.damage.exp,
  weather.dt.subset.subset$property.damage)

weather.dt.subset.subset$crop.damage.dollars <- mapply(
  process.multipliers,
  weather.dt.subset.subset$crop.damage.exp,
  weather.dt.subset.subset$crop.damage)

weather.dt.subset.subset$total.damage.dollars <- weather.dt.subset.subset$property.damage.dollars + 
  weather.dt.subset.subset$crop.damage.dollars

weather.dt.subset.subset$population.health.cost <-mapply(
  calculate.health.cost,
  weather.dt.subset.subset$injuries,
  weather.dt.subset.subset$fatalities)

Calculating the total impact to the US economy.

# Prepare dataset for summing over event type

setkey(weather.dt.subset.subset, event.type)

## Group damages by event type, US wide.

# Damages to economy

total.us.crop.damages <- weather.dt.subset.subset[,sum(crop.damage.dollars),by = list(event.type)]
total.us.crop.damages$damage.type <- 'Crop'

total.us.property.damages <- weather.dt.subset.subset[,sum(property.damage.dollars),by = list(event.type)]
total.us.property.damages$damage.type <- 'Property'

total.us.economic.damages <- rbind(total.us.crop.damages,total.us.property.damages)
total.us.economic.damages <- total.us.economic.damages[order(total.us.economic.damages$event.type),]
total.us.economic.damages$event.name <- lapply(total.us.economic.damages$event.type,as.character)
total.us.economic.damages$event.name <- topropper(total.us.economic.damages$event.name)

Calculating the total damage to public health.

# Damages to public health

total.us.fatalities <- weather.dt.subset.subset[,sum(fatalities),by = list(event.type)]
total.us.fatalities$damage.type <- 'Fatalities'

total.us.injuries <- weather.dt.subset.subset[,sum(injuries),by = list(event.type)]
total.us.injuries$damage.type <- 'Injuries'

total.us.public.health.effect <- rbind(total.us.fatalities,total.us.injuries)

total.us.public.health.effect <- total.us.public.health.effect[order(total.us.public.health.effect$event.type),]
total.us.public.health.effect$event.name <- lapply(total.us.public.health.effect$event.type,as.character)
total.us.public.health.effect$event.name <- topropper(total.us.public.health.effect$event.name)

Results

Nation-wide effect on public health

Overall, tornadoes are the cause of majority of weather-related deaths and injuries (Figure 1). Other significant causes are thunderstorm winds, flood and extensive heat, but in relation to the effect of tornadoes, their scale is minuscule.

dollar.impact.labels<- c('<1M','50B','100B')
dollar.impact.breaks<- c(1000000,50000000000,100000000000)


ggplot(
    data = total.us.public.health.effect,
    aes(
        x = event.name,
        y = V1,
        group = damage.type,
        fill = damage.type)) +       
    geom_bar(stat="identity") + 
    xlab("Event") +
    ylab("Number of casualties") +
    ggtitle(wrapper("Figure 1. Casualties of weather-related events from 1950 to 2011.", width=80)) +
    guides(fill=guide_legend(title="Category of casualty")) +
    coord_flip() +
    theme_minimal()

plot of chunk total-public-health-effect

Total impact on US economy

Nation-wide, floods and tornadoes have the greatest impact on the economy (Figure 2). When damage to the agriculture is inspected separately, droughts are the number one cause of economic loss.

ggplot(
    data = total.us.economic.damages,
    aes(
        x = event.name,
        y = V1,
        group = damage.type,
        fill = damage.type)) +       
    geom_bar(stat="identity") + 
    scale_y_continuous(labels=dollar.impact.labels, breaks=dollar.impact.breaks) + 
    xlab("Event") +
    ylab("Economic impact") +
    ggtitle(wrapper("Figure 2. Total economic damage of weather-related events from 1950 to 2011.", width=80)) +
    guides(fill=guide_legend(title="Category of damage")) +
    coord_flip() +
    theme_minimal()

plot of chunk total-economic-damage

Geographic distribution of weather event impact

Even though the nation-wide aggregation of the weather event statistics gives a good overview of the total economic and public health impact of weather events, it should be noted that for preparing for severe weather events, more localized information is needed.

As the geographic area of the US is vast, the possible weather conditions vary highly – there's no such thing as the weather of the USA (Tables 2 and 3). Nation-wide aggregation also hides the fact that the weather events can be extremely local; for example, floods typically affect relatively small areas but can cause serious damage. Also, tornadoes which are the major cause of deaths and injuries related to weather events nation-wide, are occuring typically in particular geographical area, known as Tornado Alley.

## Derive state names from the more detailed area descriptions.

weather.dt.subset.subset$state.name <- NA
weather.dt.subset.subset$state.name <- lapply(weather.dt.subset.subset$event.us.state,as.character)
weather.dt.subset.subset$state.name <- sub(',.*','',weather.dt.subset.subset$state.name)


weather.dt.subset.geographic <- weather.dt.subset.subset[weather.dt.subset.subset$state.name !="",]


setkey(weather.dt.subset.subset, event.type, event.us.state)

## State-part level damages

area.property.damages<-weather.dt.subset.geographic[,sum(total.damage.dollars),by = list(event.us.state,event.type,state.name)]
area.property.damages <- area.property.damages[area.property.damages$V1>1000000,]
area.property.damages<-area.property.damages[order(area.property.damages$event.us.state, -area.property.damages$V1), ]

# Detailed look on state level
new.hampshire.property.damages <- area.property.damages[area.property.damages$state.name == "NEW HAMPSHIRE",-3, with=FALSE]
new.hampshire.property.damages$event.name <- lapply(new.hampshire.property.damages$event.type,as.character)
new.hampshire.property.damages$event.name <- topropper(new.hampshire.property.damages$event.name)
new.hampshire.property.damages <- new.hampshire.property.damages[,-2, with=FALSE]
new.hampshire.property.damages <- new.hampshire.property.damages[,c(1,3,2),with=FALSE]

montana.property.damages <- area.property.damages[area.property.damages$state.name == "MONTANA",-3, with=FALSE]
montana.property.damages$event.name <- lapply(montana.property.damages$event.type,as.character)
montana.property.damages$event.name <- topropper(montana.property.damages$event.name)
montana.property.damages <- montana.property.damages[,-2, with=FALSE]
montana.property.damages <- montana.property.damages[,c(1,3,2),with=FALSE]

setnames(new.hampshire.property.damages,
         c('event.us.state','event.name','V1'),
         c('US state','Event','Economic impact (in USD)')
         )
setnames(montana.property.damages,
         c('event.us.state','event.name','V1'),
         c('US state','Event','Economic impact (in USD)')
         )
kable(montana.property.damages,
      format = "html",
      caption="Table 2: Economic effect of weather-related events in Montana")
Table 2: Economic effect of weather-related events in Montana
US state Event Economic impact (in USD)
MONTANA, Central Hail 60056600
MONTANA, Central Tornado 4301000
MONTANA, Central Flood 3218400
MONTANA, Central Winter Storm 3215000
MONTANA, Central High Wind 2165000
MONTANA, East Hail 43283100
MONTANA, East Thunderstorm Wind 22990900
MONTANA, East Wildfire 4075500
MONTANA, East Flash Flood 3447000
MONTANA, East Winter Storm 3310000
MONTANA, East Tornado 1312000
MONTANA, South Tornado 63094000
MONTANA, South Flood 20430000
MONTANA, South Hail 16606000
MONTANA, South Flash Flood 4030000
MONTANA, South Winter Storm 3000000
MONTANA, South Wildfire 1700000
MONTANA, South Blizzard 1500000
MONTANA, South Heavy Snow 1214000
MONTANA, West Flash Flood 7048600
MONTANA, West Flood 3061000
MONTANA, West Thunderstorm Wind 2864000
MONTANA, West Hail 2206000
MONTANA, West High Wind 1252800
MONTANA, West Heavy Snow 1042000
kable(new.hampshire.property.damages,
      format = "html",
      caption="Table 3: Economic effect of weather-related events in New Hampshire")
Table 3: Economic effect of weather-related events in New Hampshire
US state Event Economic impact (in USD)
NEW HAMPSHIRE, North and Central Flood 57595270
NEW HAMPSHIRE, North and Central Ice Storm 27629000
NEW HAMPSHIRE, North and Central Flash Flood 21868000
NEW HAMPSHIRE, North and Central Lightning 8895000
NEW HAMPSHIRE, North and Central Coastal Flood 5304500
NEW HAMPSHIRE, Southern Ice Storm 37305000
NEW HAMPSHIRE, Southern Flash Flood 14975000
NEW HAMPSHIRE, Southern Heavy Snow 7433000
NEW HAMPSHIRE, Southern High Wind 4645000
NEW HAMPSHIRE, Southern Flood 3918000
NEW HAMPSHIRE, Southern Thunderstorm Wind 1551000
NEW HAMPSHIRE, Southern Lightning 1131500
## Group health costs damages by state, statewide
statewide.area.health.cost <- weather.dt.subset.subset[,sum(population.health.cost),by = list(state.name,event.type)]
statewide.area.health.cost <- statewide.area.health.cost[statewide.area.health.cost$V1>10,]
statewide.area.health.cost <- statewide.area.health.cost[order(statewide.area.health.cost$state.name, -statewide.area.health.cost$V1), ]
statewide.area.health.cost$cost.breaks<-cut(statewide.area.health.cost$V1, 
                      breaks=c(0, 50, 250, 500, 1000, 2000, 10000), 
                      labels=c("<50", "50-250", "250-500", "500-1000", "1000-2000", "2000<"),
                      include.lowest=TRUE)


tornado.health.cost.state <- weather.dt.subset.subset[weather.dt.subset.subset$event.type=="TORNADO",sum(population.health.cost),by = list(state.name)]

tornado.health.cost.state$cost.breaks<-cut(tornado.health.cost.state$V1, 
                      breaks=c(0, 50, 250, 500, 1000, 2000, 10000), 
                      labels=c("<50", "50-250", "250-500", "500-1000", "1000-2000", "2000<"),
                      include.lowest=TRUE)


setnames(tornado.health.cost.state, 'state.name','region')
tornado.health.cost.state$region<-tolower(tornado.health.cost.state$region)

all_states <- map_data("state")
map.state <- data.table(map_data('state'))
setkey(map.state,region)


tornado.health.map.df <- merge(map.state,tornado.health.cost.state, by="region")
ggplot(tornado.health.map.df, aes(x=long, y=lat, group=group, fill=cost.breaks)) + 
  geom_polygon() +
  coord_map() +
  scale_fill_brewer(type="seq",palette="Reds") +
  labs(
    fill = "Number of casualties", 
    title = "Figure 3. Geographic distribution of tornado casualties from 1950 to 2011",
    x="",
    y="") + 
  scale_y_continuous(breaks=c()) +
  scale_x_continuous(breaks=c()) +
  theme(panel.border =  element_blank()) +
  theme_minimal()

plot of chunk tornado-geographic-area-impact