SYNOPSIS

In this project I explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to answer the following questions:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

I found that tornadoes caused the most fatalities and injuries. They also caused the most economic damaged.

In the next sections, I discussed the data analysis process including data processing and results obtained. All code is included for reproducibility. In the last section, I take a closer look at tornadoes with plots of their effect over the years.

DATA

The NOAA storm events database can be found at: http://www.ncdc.noaa.gov/stormevents/ftp.jsp. However, for this project, I used the dataset provided for the Reproducible Research class in Coursera. This dataset only includes data up to November 2011.

DATA PROCESSING

As a first step in processing the data, I downloaded the dataset and created a summary table of the effects of the different storm events. I quantified the “most harmful” by adding the number of fatalities and injuries per event type. Similarly, I quantified the “greatest economic consequences” by adding the damage to property and to crops.

# download file
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
downloadedFile <- "./data/StormData.csv.bz2"
if (!file.exists(downloadedFile)) {
  download.file(fileUrl, destfile=downloadedFile, mode="wb")
}
# load data
stormData <- read.csv(downloadedFile)
library(plyr)
# summarize data by event type, fatalities and damage
summaryData <- ddply(stormData, "EVTYPE", summarize, N=length(CROPDMG),
                     fatalities = sum(FATALITIES),
                     injuries = sum(INJURIES), 
                     prop.dmg = sum(PROPDMG),
                     crop.dmg = sum(CROPDMG))

A first look at the summary data reveals that there are 985 storm events in the original dataset. However, there are some records that seem to be from the same type of event. So, I proceeded to clean the EVTYPE by first converting all labels to upper case and then, selectively changing the EVTYPE to a common label. Cleaning the data is 80% of the work!

summaryData$EVTYPE <- toupper(summaryData$EVTYPE)
summaryData$EVTYPE[grep("AVALANCE", summaryData$EVTYPE)] <- "AVALANCHE"
summaryData$EVTYPE[grep("BEACH EROSIN", summaryData$EVTYPE)] <- "BEACH EROSION"
summaryData$EVTYPE[grep("BEACH EROSION/COASTAL FLOOD", summaryData$EVTYPE)] <- "BEACH EROSION"
summaryData$EVTYPE[grep("BITTER WIND CHILL TEMPERATURES", summaryData$EVTYPE)] <- "BITTER WIND CHILL"
summaryData$EVTYPE[grep("BLIZZARD AND EXTREME WIND CHIL", summaryData$EVTYPE)] <- "BLIZZARD"
summaryData$EVTYPE[grep("BLIZZARD AND HEAVY SNOW", summaryData$EVTYPE)] <- "BLIZZARD"
summaryData$EVTYPE[grep("BLIZZARD SUMMARY", summaryData$EVTYPE)] <- "BLIZZARD"
summaryData$EVTYPE[grep("BLIZZARD WEATHER", summaryData$EVTYPE)] <- "BLIZZARD"
summaryData$EVTYPE[grep("BLIZZARD/FREEZING RAIN", summaryData$EVTYPE)] <- "BLIZZARD"
summaryData$EVTYPE[grep("BLIZZARD/HEAVY SNOW", summaryData$EVTYPE)] <- "BLIZZARD"
summaryData$EVTYPE[grep("BLIZZARD/HIGH WIND", summaryData$EVTYPE)] <- "BLIZZARD"
summaryData$EVTYPE[grep("BLIZZARD/WINTER STORM", summaryData$EVTYPE)] <- "BLIZZARD"
summaryData$EVTYPE[grep("BLOW-OUT TIDES", summaryData$EVTYPE)] <- "BLOW-OUT TIDE"
summaryData$EVTYPE[grep("BLOWING SNOW- EXTREME WIND CHI", summaryData$EVTYPE)] <- "BLOWING SNOW"
summaryData$EVTYPE[grep("BLOWING SNOW & EXTREME WIND CH", summaryData$EVTYPE)] <- "BLOWING SNOW"
summaryData$EVTYPE[grep("BLOWING SNOW/EXTREME WIND CHIL", summaryData$EVTYPE)] <- "BLOWING SNOW"
summaryData$EVTYPE[grep("BRUSH FIRES", summaryData$EVTYPE)] <- "BRUSH FIRE"
summaryData$EVTYPE[grep("COASTAL  FLOODING/EROSION", summaryData$EVTYPE)] <- "COASTAL FLOOD"
summaryData$EVTYPE[grep("COASTAL EROSION", summaryData$EVTYPE)] <- "COASTAL FLOOD"
summaryData$EVTYPE[grep("COASTAL FLOODING", summaryData$EVTYPE)] <- "COASTAL FLOOD"
summaryData$EVTYPE[grep("COASTAL FLOODING/EROSION", summaryData$EVTYPE)] <- "COASTAL FLOOD"
summaryData$EVTYPE[grep("COASTAL/TIDAL FLOOD", summaryData$EVTYPE)] <- "COASTAL FLOOD"
summaryData$EVTYPE[grep("COASTALFLOOD", summaryData$EVTYPE)] <- "COASTAL FLOOD"
summaryData$EVTYPE[grep("COASTALSTORM", summaryData$EVTYPE)] <- "COASTAL STORM"
summaryData$EVTYPE[grep("COLD AIR FUNNELS", summaryData$EVTYPE)] <- "COLD AIR FUNNEL"
summaryData$EVTYPE[grep("COLD AIR TORNADO", summaryData$EVTYPE)] <- "COLD AIR FUNNEL"
summaryData$EVTYPE[grep("^COLD$", summaryData$EVTYPE)] <- "COLD TEMPERATURE"
summaryData$EVTYPE[grep("COLD TEMPERATURES", summaryData$EVTYPE)] <- "COLD TEMPERATURE"
summaryData$EVTYPE[grep("COLD WAVE", summaryData$EVTYPE)] <- "COLD TEMPERATURE"
summaryData$EVTYPE[grep("COLD WEATHER", summaryData$EVTYPE)] <- "COLD TEMPERATURE"
summaryData$EVTYPE[grep("COLD WIND CHILL TEMPERATURES", summaryData$EVTYPE)] <- "COLD TEMPERATURE"
summaryData$EVTYPE[grep("COLD/WIND CHILL", summaryData$EVTYPE)] <- "COLD TEMPERATURE"
summaryData$EVTYPE[grep("COLD/WINDS", summaryData$EVTYPE)] <- "COLD TEMPERATURE"
summaryData$EVTYPE[grep("COOL SPELL", summaryData$EVTYPE)] <- "COLD TEMPERATURE"
summaryData$EVTYPE[grep("COOL AND WET", summaryData$EVTYPE)] <- "COLD TEMPERATURE"
summaryData$EVTYPE[grep(" COASTAL FLOOD", summaryData$EVTYPE)] <- "COASTAL FLOOD"
summaryData$EVTYPE[grep("CSTL FLOODING/EROSION", summaryData$EVTYPE)] <- "COASTAL FLOOD"
summaryData$EVTYPE[grep("DAM BREAK", summaryData$EVTYPE)] <- "DAM FAILURE"
summaryData$EVTYPE[grep("DOWNBURST", summaryData$EVTYPE)] <- "DOWNBURST WINDS"
summaryData$EVTYPE[grep("DROUGHT/EXCESSIVE HEAT", summaryData$EVTYPE)] <- "DROUGHT"
summaryData$EVTYPE[grep("^DRY", summaryData$EVTYPE)] <- "DRY CONDITIONS"
summaryData$EVTYPE[grep("DUST DEVEL", summaryData$EVTYPE)] <- "DUST DEVIL"
summaryData$EVTYPE[grep("DUST DEVIL WATERSPOUT", summaryData$EVTYPE)] <- "DUST DEVIL"
summaryData$EVTYPE[grep("DUST STORM/HIGH WINDS", summaryData$EVTYPE)] <- "DUST STORM"
summaryData$EVTYPE[grep("DUSTSTORM", summaryData$EVTYPE)] <- "DUST STORM"
summaryData$EVTYPE[grep("EARLY SNOW", summaryData$EVTYPE)] <- "EARLY SNOWFALL"
stormData[stormData$EVTYPE=="EXCESSIVE",] # FIND OUT WHAT KIND OF EVENT IT IS
##        STATE__         BGN_DATE BGN_TIME TIME_ZONE COUNTY
## 245455      51 7/1/1995 0:00:00     0000       EST      0
##                COUNTYNAME STATE    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI
## 245455 VAZ001>002 005>007    VA EXCESSIVE         0                   
##                 END_DATE END_TIME COUNTY_END COUNTYENDN END_RANGE END_AZI
## 245455 7/31/1995 0:00:00                   0         NA         0        
##        END_LOCATI LENGTH WIDTH  F MAG FATALITIES INJURIES PROPDMG
## 245455                 0     0 NA   0          0        0       0
##        PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC
## 245455                  0                          
##                                                                                ZONENAMES
## 245455 LEE - LEE - WISE - LEE - WISE - DICKENSON - BUCHANAN - SCOTT - RUSSELL - TAZEWELL
##        LATITUDE LONGITUDE LATITUDE_E LONGITUDE_
## 245455        0         0          0          0
##                                                                                                                                                                                                                                                                                                                                                                                                                                   REMARKS
## 245455 Hot, dry weather began early in the month over the Mountain Empire, and continued unabated well into August.  Some of the tobacco crop, especially the burley variety, withered under the dry weather.  In general, less than one inch of rain fell in a month where average rainfall of three to four inches is common.  The Tri-Cities Airport (located in Bristol, TN) recorded over 15 maximum temperatures above 90 degrees. 
##        REFNUM
## 245455 245443
summaryData$EVTYPE[grep("^EXCESSIVE$", summaryData$EVTYPE)] <- "EXCESSIVE HEAT"
summaryData$EVTYPE[grep("EXCESSIVE HEAT/DROUGHT", summaryData$EVTYPE)] <- "EXCESSIVE HEAT"
summaryData$EVTYPE[grep("EXCESSIVE PRECIPITATION", summaryData$EVTYPE)] <- "EXCESSIVE RAIN"
summaryData$EVTYPE[grep("EXCESSIVE RAINFALL", summaryData$EVTYPE)] <- "EXCESSIVE RAIN"
summaryData$EVTYPE[grep("EXCESSIVELY DRY", summaryData$EVTYPE)] <- "EXCESSIVE HEAT"
summaryData$EVTYPE[grep("EXTREME COLD/WIND CHILL", summaryData$EVTYPE)] <- "EXTREME COLD"
summaryData$EVTYPE[grep("EXTREME WIND CHILL", summaryData$EVTYPE)] <- "EXTREME COLD"
summaryData$EVTYPE[grep("EXTREME WINDCHILL", summaryData$EVTYPE)] <- "EXTREME COLD"
summaryData$EVTYPE[grep("EXTREME/RECORD COLD", summaryData$EVTYPE)] <- "EXTREME COLD"
summaryData$EVTYPE[grep("FLASH FLOOD", summaryData$EVTYPE)] <- "FLASH FLOOD"
summaryData$EVTYPE[grep("FLASH FLOOODING", summaryData$EVTYPE)] <- "FLASH FLOOD"
summaryData$EVTYPE[grep("FLOOD FLASH", summaryData$EVTYPE)] <- "FLASH FLOOD"
summaryData$EVTYPE[grep("FLOOD FLOOD/FLASH", summaryData$EVTYPE)] <- "FLASH FLOOD"
summaryData$EVTYPE[grep("FLOOD/FLASH", summaryData$EVTYPE)] <- "FLASH FLOOD"
summaryData$EVTYPE[grep("^FLOOD", summaryData$EVTYPE)] <- "FLOOD"
summaryData$EVTYPE[grep("FREEZING DRIZZLE AND FREEZING", summaryData$EVTYPE)] <- "FREEZING DRIZZLE"
summaryData$EVTYPE[grep("^FREEZING RAIN", summaryData$EVTYPE)] <- "FREEZING RAIN"
summaryData$EVTYPE[grep("^FROST", summaryData$EVTYPE)] <- "FROST"
summaryData$EVTYPE[grep("^FUNNEL", summaryData$EVTYPE)] <- "FUNNEL CLOUD"
summaryData$EVTYPE[grep("^GLAZE", summaryData$EVTYPE)] <- "GLAZE"
summaryData$EVTYPE[grep("GRADIENT WINDS", summaryData$EVTYPE)] <- "GRADIENT WIND"
summaryData$EVTYPE[grep("GUSTNADO", summaryData$EVTYPE)] <- "GUSTNADO"
summaryData$EVTYPE[grep("GUSTY", summaryData$EVTYPE)] <- "GUSTY WINDS"
summaryData$EVTYPE[grep("HAIL", summaryData$EVTYPE)] <- "HAILSTORM"
summaryData$EVTYPE[grep("^HEAT", summaryData$EVTYPE)] <- "HEAT"
summaryData$EVTYPE[grep("HEAVY PREC", summaryData$EVTYPE)] <- "HEAVY RAIN"
summaryData$EVTYPE[grep("HEAVY RAIN", summaryData$EVTYPE)] <- "HEAVY RAIN"
summaryData$EVTYPE[grep("HEAVY SHOWER", summaryData$EVTYPE)] <- "HEAVY RAIN"
summaryData$EVTYPE[grep("HEAVY SNOW", summaryData$EVTYPE)] <- "HEAVY SNOW"
summaryData$EVTYPE[grep("HEAVY SURF", summaryData$EVTYPE)] <- "HEAVY SURF"
summaryData$EVTYPE[grep("HEAVY WET SNOW", summaryData$EVTYPE)] <- "HEAVY SNOW"
summaryData$EVTYPE[grep("HEAVY SWELLS", summaryData$EVTYPE)] <- "HIGH SWELLS"
summaryData$EVTYPE[grep("HIGH  SWELLS", summaryData$EVTYPE)] <- "HIGH SWELLS"
summaryData$EVTYPE[grep("HIGH SURF", summaryData$EVTYPE)] <- "HIGH SURF"
summaryData$EVTYPE[grep("HIGH TEMPERATURE RECORD", summaryData$EVTYPE)] <- "HEAT"
summaryData$EVTYPE[grep("HIGH WIND", summaryData$EVTYPE)] <- "HIGH WINDS"
summaryData$EVTYPE[grep("^HOT", summaryData$EVTYPE)] <- "HEAT"
summaryData$EVTYPE[grep("^HURRICANE", summaryData$EVTYPE)] <- "HURRICANE"
summaryData$EVTYPE[grep("HVY RAIN", summaryData$EVTYPE)] <- "HEAVY RAIN"
summaryData$EVTYPE[grep("HYPOTHERMIA", summaryData$EVTYPE)] <- "HYPOTHERMIA"
summaryData$EVTYPE[grep("^ICE", summaryData$EVTYPE)] <- "ICE STORM"
summaryData$EVTYPE[grep("LAKE-EFFECT SNOW", summaryData$EVTYPE)] <- "LAKE EFFECT SNOW"
summaryData$EVTYPE[grep("LAKE FLOOD", summaryData$EVTYPE)] <- "LAKESHORE FLOOD"
summaryData$EVTYPE[grep("LANDSLIDE", summaryData$EVTYPE)] <- "LANDSLIDES"
summaryData$EVTYPE[grep("LATE-SEASON SNOWFALL", summaryData$EVTYPE)] <- "LATE SEASON SNOWFALL"
summaryData$EVTYPE[grep("LATE SEASON SNOW", summaryData$EVTYPE)] <- "LATE SEASON SNOWFALL"
summaryData$EVTYPE[grep("LATE SNOW", summaryData$EVTYPE)] <- "LATE SEASON SNOWFALL"
summaryData$EVTYPE[grep("LIGHT SNOW", summaryData$EVTYPE)] <- "LIGHT SNOWFALL"
summaryData$EVTYPE[grep("LIGHTING", summaryData$EVTYPE)] <- "LIGHTNING"
summaryData$EVTYPE[grep("LIGHTNING", summaryData$EVTYPE)] <- "LIGHTNING"
summaryData$EVTYPE[grep("LIGNTNING", summaryData$EVTYPE)] <- "LIGHTNING"
summaryData$EVTYPE[grep("LOW TEMPERATURE", summaryData$EVTYPE)] <- "LOW TEMPERATURE"
summaryData$EVTYPE[grep("MICROBURST", summaryData$EVTYPE)] <- "MICROBURST"
summaryData$EVTYPE[grep("^MILD", summaryData$EVTYPE)] <- "MILD PATTERN"
summaryData$EVTYPE[grep("^MINOR FLOOD", summaryData$EVTYPE)] <- "MINOR FLOOD"
summaryData$EVTYPE[grep("^MIXED PRECIP", summaryData$EVTYPE)] <- "MIXED PRECIPITATION"
summaryData$EVTYPE[grep("^MUD", summaryData$EVTYPE)] <- "MUDSLIDE"
summaryData$EVTYPE[grep("^NON-", summaryData$EVTYPE)] <- "WINDS"
summaryData$EVTYPE[grep("NON TSTM WIND", summaryData$EVTYPE)] <- "WINDS"
summaryData$EVTYPE[grep("NORMAL PRECIPITATION", summaryData$EVTYPE)] <- "RAIN"
summaryData$EVTYPE[grep("PROLONG COLD", summaryData$EVTYPE)] <- "PROLONG COLD"
summaryData$EVTYPE[grep("^RAIN", summaryData$EVTYPE)] <- "RAIN"
summaryData$EVTYPE[grep("RECORD  COLD", summaryData$EVTYPE)] <- "RECORD COLD"
summaryData$EVTYPE[grep("RECORD COLD", summaryData$EVTYPE)] <- "RECORD COLD"
summaryData$EVTYPE[grep("RECORD COOL", summaryData$EVTYPE)] <- "RECORD COLD"
summaryData$EVTYPE[grep("RECORD DRY", summaryData$EVTYPE)] <- "RECORD HEAT"
summaryData$EVTYPE[grep("RECORD HEAT", summaryData$EVTYPE)] <- "RECORD HEAT"
summaryData$EVTYPE[grep("RECORD HIGH", summaryData$EVTYPE)] <- "RECORD HEAT"
summaryData$EVTYPE[grep("RECORD MAY SNOW", summaryData$EVTYPE)] <- "RECORD SNOW"
summaryData$EVTYPE[grep("RECORD PRECIPITATION", summaryData$EVTYPE)] <- "RECORD RAINFALL"
summaryData$EVTYPE[grep("RECORD SNOW", summaryData$EVTYPE)] <- "RECORD SNOWFALL"
summaryData$EVTYPE[grep("RECORD TEMPERATURE", summaryData$EVTYPE)] <- "RECORD TEMPERATURES"
summaryData$EVTYPE[grep("RECORD WARM", summaryData$EVTYPE)] <- "RECORD HEAT"
summaryData$EVTYPE[grep("RECORD/EXCESSIVE HEAT", summaryData$EVTYPE)] <- "RECORD HEAT"
summaryData$EVTYPE[grep("REMNANT OF FLOYD", summaryData$EVTYPE)] <- "HURRICANE"
summaryData$EVTYPE[grep("RIP CURRENT", summaryData$EVTYPE)] <- "RIP CURRENT"
summaryData$EVTYPE[grep("RIVER", summaryData$EVTYPE)] <- "RIVER FLOOD"
summaryData$EVTYPE[grep("SEVERE THUNDERSTORM", summaryData$EVTYPE)] <- "SEVERE THUNDERSTORM"
summaryData$EVTYPE[grep("SLEET", summaryData$EVTYPE)] <- "SLEET"
summaryData$EVTYPE[grep("SMALL STREAM", summaryData$EVTYPE)] <- "SMALL STREAM FLOOD"
summaryData$EVTYPE[grep("SML STREAM", summaryData$EVTYPE)] <- "SMALL STREAM FLOOD"
summaryData$EVTYPE[grep("SNOW", summaryData$EVTYPE)] <- "SNOW"
summaryData$EVTYPE[grep("STORM FORCE WINDS", summaryData$EVTYPE)] <- "STORM SURGE"
summaryData$EVTYPE[grep("STORM SURGE", summaryData$EVTYPE)] <- "STORM SURGE"
summaryData$EVTYPE[grep("STREET FLOOD", summaryData$EVTYPE)] <- "STREET FLOODING"
summaryData$EVTYPE[grep("STRONG WIND", summaryData$EVTYPE)] <- "STRONG WIND"
summaryData$EVTYPE[grep("SUMMARY", summaryData$EVTYPE)] <- "SUMMARY" #NO EFFECT
summaryData$EVTYPE[grep("THUDERSTORM", summaryData$EVTYPE)] <- "THUNDERSTORM"
summaryData$EVTYPE[grep("THUNDEERSTORM", summaryData$EVTYPE)] <- "THUNDERSTORM"
summaryData$EVTYPE[grep("THUNDERESTORM", summaryData$EVTYPE)] <- "THUNDERSTORM"
summaryData$EVTYPE[grep("THUNDERSTORM", summaryData$EVTYPE)] <- "THUNDERSTORM WINDS"
summaryData$EVTYPE[grep("THUNDERSTROM", summaryData$EVTYPE)] <- "THUNDERSTORM WINDS"
summaryData$EVTYPE[grep("THUNDERTORM", summaryData$EVTYPE)] <- "THUNDERSTORM WINDS"
summaryData$EVTYPE[grep("THUNDERTSORM", summaryData$EVTYPE)] <- "THUNDERSTORM WINDS"
summaryData$EVTYPE[grep("TIDAL FLOOD", summaryData$EVTYPE)] <- "TIDAL FLOODING"
summaryData$EVTYPE[grep("^TORNADO", summaryData$EVTYPE)] <- "TORNADOES"
summaryData$EVTYPE[grep("TORNDAO", summaryData$EVTYPE)] <- "TORNADOES"
summaryData$EVTYPE[grep("TORRENTIAL", summaryData$EVTYPE)] <- "TORRENTIAL RAINFALL"
summaryData$EVTYPE[grep("TROPICAL STORM", summaryData$EVTYPE)] <- "TROPICAL STORM"
summaryData$EVTYPE[grep("TSTM", summaryData$EVTYPE)] <- "THUNDERSTORM WINDS"
summaryData$EVTYPE[grep("TUNDERSTORM", summaryData$EVTYPE)] <- "THUNDERSTORM WINDS"
summaryData$EVTYPE[grep("UNSEASONABLE COLD", summaryData$EVTYPE)] <- "UNSEASONABLY COLD"
summaryData$EVTYPE[grep("UNSEASONABLY COOL", summaryData$EVTYPE)] <- "UNSEASONABLY COLD"
summaryData$EVTYPE[grep("UNSEASONABLY WARM", summaryData$EVTYPE)] <- "UNSEASONABLY WARM"
summaryData$EVTYPE[grep("UNUSUAL WARM", summaryData$EVTYPE)] <- "UNSEASONABLY WARM"
summaryData$EVTYPE[grep("UNUSUALLY WARM", summaryData$EVTYPE)] <- "UNSEASONABLY WARM"
summaryData$EVTYPE[grep("UNUSUALLY COLD", summaryData$EVTYPE)] <- "UNSEASONABLY COLD"
summaryData$EVTYPE[grep("URBAN FLOOD", summaryData$EVTYPE)] <- "URBAN FLOODING"
summaryData$EVTYPE[grep("URBAN", summaryData$EVTYPE)] <- "URBAN FLOODING"
summaryData$EVTYPE[grep("VOLCANIC", summaryData$EVTYPE)] <- "VOLCANIC ASHFALL"
summaryData$EVTYPE[grep("WALL CLOUD", summaryData$EVTYPE)] <- "WALL CLOUD"
summaryData$EVTYPE[grep("WATERSPOUT", summaryData$EVTYPE)] <- "WATERSPOUTS"
summaryData$EVTYPE[grep("WAYTERSPOUT", summaryData$EVTYPE)] <- "WATERSPOUTS"
summaryData$EVTYPE[grep("WET MICOBURST", summaryData$EVTYPE)] <- "MICROBURST"
summaryData$EVTYPE[grep("^WILD", summaryData$EVTYPE)] <- "WILDFIRES"
summaryData$EVTYPE[grep("^WIND$", summaryData$EVTYPE)] <- "WIND DAMAGE"
summaryData$EVTYPE[grep("WIND AND WAVE", summaryData$EVTYPE)] <- "WIND DAMAGE"
summaryData$EVTYPE[grep("WIND GUSTS", summaryData$EVTYPE)] <- "WIND DAMAGE"
summaryData$EVTYPE[grep("WIND STORM", summaryData$EVTYPE)] <- "WIND DAMAGE"
summaryData$EVTYPE[grep("^WINDS$", summaryData$EVTYPE)] <- "WIND DAMAGE"
summaryData$EVTYPE[grep("WINTER MIX", summaryData$EVTYPE)] <- "WINTER WEATHER"
summaryData$EVTYPE[grep("WINTER STORMS", summaryData$EVTYPE)] <- "WINTER WEATHER"
summaryData$EVTYPE[grep("WINTER WEATHER", summaryData$EVTYPE)] <- "WINTER WEATHER"
summaryData$EVTYPE[grep("WINTERY", summaryData$EVTYPE)] <- "WINTER WEATHER"
summaryData$EVTYPE[grep("WINTRY", summaryData$EVTYPE)] <- "WINTER WEATHER"
summaryData$EVTYPE[grep("^WND$", summaryData$EVTYPE)] <- "WIND DAMAGE"
summaryData$EVTYPE[grep("^ WIND$", summaryData$EVTYPE)] <- "WIND DAMAGE"

# Create summary again
summaryData <- ddply(summaryData, "EVTYPE", summarize, N=sum(N), 
                     fatalities=sum(fatalities), injuries=sum(injuries), 
                     prop.dmg=sum(prop.dmg), crop.dmg=sum(crop.dmg))

RESULTS

After cleaning the data, we can now examine the results. The storm event database reports fatalities and injuries per event. To determine the events most harmful, I looked at the top 10 events based on their combined number of fatalities and injuries. Similarly, to determine the events with greatest economic consequences, I examined the events with highest combined property damage and crop damage.

TYPES OF EVENTS MOST HARMFUL TO POPULATION HEALTH

The table below shows the top 10 most harmful events by their combined number of fatalities and injuries:

TOP 10 EVENTS MOST HARMFUL

head(arrange(summaryData, desc(fatalities+injuries)),10)
##                EVTYPE      N fatalities injuries   prop.dmg  crop.dmg
## 1           TORNADOES  60686       5633    91364 3214532.36 100026.72
## 2  THUNDERSTORM WINDS 335673        725     9446 2668174.86 194904.73
## 3      EXCESSIVE HEAT   1681       1903     6525    1460.00    494.40
## 4               FLOOD  25463        478     6791  906919.38 171561.68
## 5           LIGHTNING  15777        817     5232  603682.28   3585.61
## 6                HEAT    858       1118     2494    1767.75    968.00
## 7         FLASH FLOOD  55677       1035     1802 1474373.90 186484.21
## 8           ICE STORM   2093         96     2113   74308.17   1693.95
## 9          HIGH WINDS  21939        297     1522  382258.77  21062.81
## 10          WILDFIRES   4231         90     1606  125148.29   9065.74

TYPES OF EVENTS WITH GREATEST ECONOMIC CONSEQUENCES

Similarly, the table below shows the top 10 events with greatest economic consequences by their combined property damage and crop damage:

TOP 10 EVENTS MOST COSTLY

head(arrange(summaryData, desc(prop.dmg+crop.dmg)),10)
##                EVTYPE      N fatalities injuries  prop.dmg  crop.dmg
## 1           TORNADOES  60686       5633    91364 3214532.4 100026.72
## 2  THUNDERSTORM WINDS 335673        725     9446 2668174.9 194904.73
## 3         FLASH FLOOD  55677       1035     1802 1474373.9 186484.21
## 4           HAILSTORM 290399         45     1467  699300.4 585956.66
## 5               FLOOD  25463        478     6791  906919.4 171561.68
## 6           LIGHTNING  15777        817     5232  603682.3   3585.61
## 7          HIGH WINDS  21939        297     1522  382258.8  21062.81
## 8                SNOW  17650        161     1122  151641.3   2195.72
## 9        WINTER STORM  11433        206     1321  132720.6   1978.99
## 10          WILDFIRES   4231         90     1606  125148.3   9065.74

An interesting thing to note is that tornadoes cause the most damage overall but if we look only at crop damage, hail is the most costly. The table below shows the top 10 events ordered by crop damage in descending order:

TOP 10 EVENTS MOST COSTLY TO CROPS

head(arrange(summaryData, desc(crop.dmg)),10)
##                EVTYPE      N fatalities injuries   prop.dmg  crop.dmg
## 1           HAILSTORM 290399         45     1467  699300.38 585956.66
## 2  THUNDERSTORM WINDS 335673        725     9446 2668174.86 194904.73
## 3         FLASH FLOOD  55677       1035     1802 1474373.90 186484.21
## 4               FLOOD  25463        478     6791  906919.38 171561.68
## 5           TORNADOES  60686       5633    91364 3214532.36 100026.72
## 6             DROUGHT   2501          2        4    4099.05  33904.40
## 7          HIGH WINDS  21939        297     1522  382258.77  21062.81
## 8          HEAVY RAIN  11820         98      255   55673.19  12050.30
## 9           HURRICANE    287        133     1328   23757.15  10802.79
## 10          WILDFIRES   4231         90     1606  125148.29   9065.74

A CLOSER LOOK AT TORNADOES

I wanted to see the effect of tornadoes over the years. I extracted from the original dataset the data corresponding to TORNADO events and summarize the statistics of interest by the year of the events. The following figures show the number of events, fatalities and injuries, and the damage caused by tornadoes from 1950 to 2011.

# Extract all TORNADO events
tornadoData <- stormData[grep("^TORN", stormData$EVTYPE),]
# Determine year of the event
tornadoData <- mutate(tornadoData, YEAR = 
                        format(strptime(as.character(tornadoData$BGN_DATE), 
                                           "%m/%d/%Y %T"), format="%Y"))
# Summarize fatalities and damage by year
summaryTornado <- ddply(tornadoData, "YEAR", summarize, N=length(CROPDMG), 
                     fatalities = sum(FATALITIES),
                     injuries = sum(INJURIES), 
                     prop.dmg = sum(PROPDMG),
                     crop.dmg = sum(CROPDMG))
summaryTornado
##    YEAR    N fatalities injuries  prop.dmg crop.dmg
## 1  1950  223         70      659  16999.15     0.00
## 2  1951  269         34      524  10560.99     0.00
## 3  1952  272        230     1915  16679.74     0.00
## 4  1953  492        519     5131  19182.20     0.00
## 5  1954  609         36      715  23367.82     0.00
## 6  1955  632        129      926  27715.63     0.00
## 7  1956  567         83     1355  27002.35     0.00
## 8  1957  930        193     1976  44568.89     0.00
## 9  1958  608         67      535  26597.11     0.00
## 10 1959  630         58      734  25015.54     0.00
## 11 1960  645         46      737  28314.24     0.00
## 12 1961  772         52     1087  39528.73     0.00
## 13 1962  673         30      551  22245.73     0.00
## 14 1963  493         31      538  24793.08     0.00
## 15 1964  760         73     1148  38618.32     0.00
## 16 1965  995        301     5197  46716.54     0.00
## 17 1966  606         98     2030  27079.36     0.00
## 18 1967  966        114     2144  40056.72     0.00
## 19 1968  715        131     2522  27762.41     0.00
## 20 1969  650         66     1311  33354.23     0.00
## 21 1970  700         73     1355  30800.98     0.00
## 22 1971  963        159     2723  44778.05     0.00
## 23 1972  775         27      976  36287.57     0.00
## 24 1973 1199         89     2406  75056.56     0.00
## 25 1974 1123        366     6824  57905.28     0.00
## 26 1975  962         60     1457  52498.79     0.00
## 27 1976  935         44     1195  56249.69     0.00
## 28 1977  922         43      771  54811.73     0.00
## 29 1978  875         53      919  49135.66     0.00
## 30 1979  918         84     3014  54633.74     0.00
## 31 1980  972         28     1157  64904.35     0.00
## 32 1981  830         24      798  46608.31     0.00
## 33 1982 1181         64     1276  83065.50     0.00
## 34 1983  995         34      756  64058.09     0.00
## 35 1984 1020        122     2499  63241.39     0.00
## 36 1985  773         94     1299  39667.21     0.00
## 37 1986  849         15      536  50897.34     0.00
## 38 1987  695         59     1018  33212.95     0.00
## 39 1988  773         32      688  50289.65     0.00
## 40 1989  921         50     1270  49263.55     0.00
## 41 1990 1264         53     1177  69560.57     0.00
## 42 1991 1208         39      864  60888.93     0.00
## 43 1992 1404         39     1323  70526.40     0.00
## 44 1993  615         53      739  64256.30  3181.30
## 45 1994  942         48      806  84865.25  5289.70
## 46 1995 1211         34     1116  53033.11  1429.72
## 47 1996 1239         26      705  62085.73  4017.70
## 48 1997 1180         68     1033  57907.75  1636.72
## 49 1998 1529        130     1874  91168.44  7064.67
## 50 1999 1519         94     1842  69604.61  3774.50
## 51 2000 1169         41      882  46487.56  1878.10
## 52 2001 1351         40      743  66361.19  3914.00
## 53 2002 1041         55      968  52653.55   799.01
## 54 2003 1534         54     1087  59887.74  2221.20
## 55 2004 1947         35      396  73341.16  7136.50
## 56 2005 1343         38      537  59885.81  9977.10
## 57 2006 1264         67      992  66879.08  2691.00
## 58 2007 1238         81      659  60048.70  3270.00
## 59 2008 1891        129     1690 106413.26 14214.00
## 60 2009 1272         21      397  71409.00  4492.00
## 61 2010 1446         45      699  88280.14  4668.00
## 62 2011 2192        587     6163 155464.51 18374.00
# Plot number of events, fatalities and damage by year
plot(summaryTornado$YEAR, summaryTornado$N, type="l",
     main = "Figure 1. Number of tornado events per year",
     xlab = "Year", ylab = "Number of tornado events")

plot(summaryTornado$YEAR, summaryTornado$fatalities, type="l", col="red",
     main = "Figure 2. Fatalities and injuries caused by tornadoes per year",
     xlab = "Year", ylab = "Number of fatalities/injuries", ylim = c(0,7000))
points(summaryTornado$YEAR, summaryTornado$injuries, type="l", col="blue")
legend("top", legend=c("Fatalities", "Injuries"), text.col=c("red", "blue"),
       lty = 1, col = c("red","blue"))

plot(summaryTornado$YEAR, summaryTornado$prop.dmg, type="l", col="purple",
     main="Figure 3. Damage caused by tornadoes",
     xlab = "Year", ylab = "Amount of damage to property/crops", ylim = c(0,160000))
points(summaryTornado$YEAR, summaryTornado$crop.dmg, type="l", col="green")
legend("top", legend=c("Property damage", "Crop damage"), text.col=c("purple", "green"),
       lty = 1, col = c("purple","green"))

From Figure 1 I noticed that the number of tornado events seem to be increasing over the years. Tornadoes cause a larger number of injuries than fatalities as shown in figure 2. Similarly, tornadoes cause more property damage than crop damage. We see from the summaryTornado table that crop damage is reported as $0 from 1950 to 1992. This may be the reason why it was not the top event for crop damage. It’s interesting to note that, even when we don’t have complete data for 2011, there is a spike in all the metrics for that year. In any case, we need to be prepared for tornadoes.