Severe Weather Events in the United States

Craig Lewis

Assignment 2 - Reproducible Research

Synopsis

This report addresses the human and monetary impacts of severe weather events in the United States. This data was described, in part, by the National Weather Service Storm data documentation and the National Climatic Data Center Storm Events FAQ. The input data has what appears to be errors and problems with the coding of the event types making thorough analysis problematic. In order to address this, only the top 10 events in terms of impact, are considered in this analysis.

Based on this perspective of the data, tornados represent the most destructive events both in terms of harm to population health (death and injuries) as well as in terms of monetary damages. Making some basic assumptions, tornados are followed by thunderstorm winds.

Frankly the data is in such poor shape that any analysis seems specious. One might seek out other alternatives before publishing research in this area.

Data Processing

The file is read as a CSV file decompressed from a compressed source file (bz2). The first bit of processing was to filter out states that were clearly not one of the 50 US states or District of Columbia. One challenge with this data is that the event types found in the file are quite messy – there are 985 different event types in the file – many of which appear to be in error. No attempt was made in the input process to clean the event types as the author lacks the technical expertise to know which fields could be combined.

For the analysis only the even types with the 10 largest values in the following are considered (The specific variable form the input file follows the text description):

  1. Deaths - FATALITIES
  2. Injuries - INJURIES
  3. Total damages in dollars - PROPDMG

To obtain these the values are summed by the variable for each event types. These values are then sorted by each of these three values and the top 10 are selected.

    # load libraries
   
    ##library(dplyr)
    ##library(xtable)
    library(ggplot2)
    
    storm<-read.csv(bzfile("repdata_data_StormData.csv.bz2"))
    # List of event types - these are quite messy
    #evtypes<-levels(storm$EVTYPE)
    us_storm<-storm[storm$STATE %in% state.abb,]

    death_by_evt<-tapply(us_storm$FATALITIES,us_storm$EVTYPE,sum)
    death_by_evt<-sort(death_by_evt,decreasing=TRUE) 
    fbe<-data.frame(events=names(death_by_evt)[1:10],fatalities=death_by_evt[1:10])
  
    
    inj_by_evt<-tapply(us_storm$INJURIES,us_storm$EVTYPE,sum)
    inj_by_evt<-sort(inj_by_evt,decreasing=TRUE)
    ibe<-data.frame(events=names(inj_by_evt)[1:10],inj=inj_by_evt[1:10])
    
    dmg_by_evt<-tapply(us_storm$PROPDMG,us_storm$EVTYPE,sum)
    dmg_by_evt<-sort(dmg_by_evt,decreasing=TRUE)
    dbe<-data.frame(events=names(dmg_by_evt)[1:10],damage=dmg_by_evt[1:10])

Results

Harm to Population Health

To determine the harm the US population two factors are considered.

  1. Number of fatalities per event type (recall only the first 10 events are selected).

  2. Number of injuries per event type (top 10 events)

In both cases tornados are the most harmful events.

In terms of deaths due to these types of events, excessive heat is second, followed by heat. The remainder can be read off of the graph.

Fatalities by Event

    c<-qplot(x=fbe$events,y=fbe$fatalities,geom="bar",stat="identity",xlab="Event Type",
             ylab="Number of Deaths",main="Death from Events")
    c+theme(axis.text=element_text(size="9"))  + theme(axis.text.x=element_text(angle=90))

Injuries by Event In terms of injuries, following tornados the problems with the input data become apparent. Examining the list of events (recall these are the top 10) one notes that “TSTM WIND”, presumably Thunderstorm Wind appears as well as “THUNDERSTORM WIND”. Note though that TSTM WIND is the second most injurious events so that the addition of THUNDESTORM WIND only reinforces the fact that the second most injurious event is Thunderstorm. The remainder of the events can be read from the graph.

    c<-qplot(x=ibe$events,y=ibe$inj,geom="bar",stat="identity",xlab="Event Type",
          ylab="Number of Injuries",main="Injuries from Events")
    c+theme(axis.text=element_text(size="9"))  + theme(axis.text.x=element_text(angle=90))

Greatest Economic Consequences

Monetary Damages by Event

In terms of monetary damages, again Tornados are the significant variable. As noted with the injuries by event type described above we see the challenges with Thunderstorm Winds (TSTM WIND and THUNDERSTORM WINDS even types). Assuming these are in fact the same event then thunderstorm winds would again be the second large event type in terms of monetary damages. From there the remainder of the event types in order can be ready from the graph.

    c<-qplot(x=dbe$events,y=dbe$damage,geom="bar",stat="identity",xlab="Event Type",
             ylab="Monetary Damagess",main="Damage from Events")
    c+theme(axis.text=element_text(size="9"))  + theme(axis.text.x=element_text(angle=90))