In this project I explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to answer the following questions:
I found that tornadoes caused the most fatalities and injuries. They also caused the most economic damaged.
In the next sections, I discussed the data analysis process including data processing and results obtained. All code is included for reproducibility. In the last section, I take a closer look at tornadoes with plots of their effect over the years.
The NOAA storm events database can be found at: http://www.ncdc.noaa.gov/stormevents/ftp.jsp. However, for this project, I used the dataset provided for the Reproducible Research class in Coursera. This dataset only includes data up to November 2011.
As a first step in processing the data, I downloaded the dataset and created a summary table of the effects of the different storm events. I quantified the “most harmful” by adding the number of fatalities and injuries per event type. Similarly, I quantified the “greatest economic consequences” by adding the damage to property and to crops.
# download file
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
downloadedFile <- "./data/StormData.csv.bz2"
if (!file.exists(downloadedFile)) {
download.file(fileUrl, destfile=downloadedFile, mode="wb")
}
# load data
stormData <- read.csv(downloadedFile)
library(plyr)
# summarize data by event type, fatalities and damage
summaryData <- ddply(stormData, "EVTYPE", summarize, N=length(CROPDMG),
fatalities = sum(FATALITIES),
injuries = sum(INJURIES),
prop.dmg = sum(PROPDMG),
crop.dmg = sum(CROPDMG))
A first look at the summary data reveals that there are 985 storm events in the original dataset. However, there are some records that seem to be from the same type of event. So, I proceeded to clean the EVTYPE by first converting all labels to upper case and then, selectively changing the EVTYPE to a common label. Cleaning the data is 80% of the work!
summaryData$EVTYPE <- toupper(summaryData$EVTYPE)
summaryData$EVTYPE[grep("AVALANCE", summaryData$EVTYPE)] <- "AVALANCHE"
summaryData$EVTYPE[grep("BEACH EROSIN", summaryData$EVTYPE)] <- "BEACH EROSION"
summaryData$EVTYPE[grep("BEACH EROSION/COASTAL FLOOD", summaryData$EVTYPE)] <- "BEACH EROSION"
summaryData$EVTYPE[grep("BITTER WIND CHILL TEMPERATURES", summaryData$EVTYPE)] <- "BITTER WIND CHILL"
summaryData$EVTYPE[grep("BLIZZARD AND EXTREME WIND CHIL", summaryData$EVTYPE)] <- "BLIZZARD"
summaryData$EVTYPE[grep("BLIZZARD AND HEAVY SNOW", summaryData$EVTYPE)] <- "BLIZZARD"
summaryData$EVTYPE[grep("BLIZZARD SUMMARY", summaryData$EVTYPE)] <- "BLIZZARD"
summaryData$EVTYPE[grep("BLIZZARD WEATHER", summaryData$EVTYPE)] <- "BLIZZARD"
summaryData$EVTYPE[grep("BLIZZARD/FREEZING RAIN", summaryData$EVTYPE)] <- "BLIZZARD"
summaryData$EVTYPE[grep("BLIZZARD/HEAVY SNOW", summaryData$EVTYPE)] <- "BLIZZARD"
summaryData$EVTYPE[grep("BLIZZARD/HIGH WIND", summaryData$EVTYPE)] <- "BLIZZARD"
summaryData$EVTYPE[grep("BLIZZARD/WINTER STORM", summaryData$EVTYPE)] <- "BLIZZARD"
summaryData$EVTYPE[grep("BLOW-OUT TIDES", summaryData$EVTYPE)] <- "BLOW-OUT TIDE"
summaryData$EVTYPE[grep("BLOWING SNOW- EXTREME WIND CHI", summaryData$EVTYPE)] <- "BLOWING SNOW"
summaryData$EVTYPE[grep("BLOWING SNOW & EXTREME WIND CH", summaryData$EVTYPE)] <- "BLOWING SNOW"
summaryData$EVTYPE[grep("BLOWING SNOW/EXTREME WIND CHIL", summaryData$EVTYPE)] <- "BLOWING SNOW"
summaryData$EVTYPE[grep("BRUSH FIRES", summaryData$EVTYPE)] <- "BRUSH FIRE"
summaryData$EVTYPE[grep("COASTAL FLOODING/EROSION", summaryData$EVTYPE)] <- "COASTAL FLOOD"
summaryData$EVTYPE[grep("COASTAL EROSION", summaryData$EVTYPE)] <- "COASTAL FLOOD"
summaryData$EVTYPE[grep("COASTAL FLOODING", summaryData$EVTYPE)] <- "COASTAL FLOOD"
summaryData$EVTYPE[grep("COASTAL FLOODING/EROSION", summaryData$EVTYPE)] <- "COASTAL FLOOD"
summaryData$EVTYPE[grep("COASTAL/TIDAL FLOOD", summaryData$EVTYPE)] <- "COASTAL FLOOD"
summaryData$EVTYPE[grep("COASTALFLOOD", summaryData$EVTYPE)] <- "COASTAL FLOOD"
summaryData$EVTYPE[grep("COASTALSTORM", summaryData$EVTYPE)] <- "COASTAL STORM"
summaryData$EVTYPE[grep("COLD AIR FUNNELS", summaryData$EVTYPE)] <- "COLD AIR FUNNEL"
summaryData$EVTYPE[grep("COLD AIR TORNADO", summaryData$EVTYPE)] <- "COLD AIR FUNNEL"
summaryData$EVTYPE[grep("^COLD$", summaryData$EVTYPE)] <- "COLD TEMPERATURE"
summaryData$EVTYPE[grep("COLD TEMPERATURES", summaryData$EVTYPE)] <- "COLD TEMPERATURE"
summaryData$EVTYPE[grep("COLD WAVE", summaryData$EVTYPE)] <- "COLD TEMPERATURE"
summaryData$EVTYPE[grep("COLD WEATHER", summaryData$EVTYPE)] <- "COLD TEMPERATURE"
summaryData$EVTYPE[grep("COLD WIND CHILL TEMPERATURES", summaryData$EVTYPE)] <- "COLD TEMPERATURE"
summaryData$EVTYPE[grep("COLD/WIND CHILL", summaryData$EVTYPE)] <- "COLD TEMPERATURE"
summaryData$EVTYPE[grep("COLD/WINDS", summaryData$EVTYPE)] <- "COLD TEMPERATURE"
summaryData$EVTYPE[grep("COOL SPELL", summaryData$EVTYPE)] <- "COLD TEMPERATURE"
summaryData$EVTYPE[grep("COOL AND WET", summaryData$EVTYPE)] <- "COLD TEMPERATURE"
summaryData$EVTYPE[grep(" COASTAL FLOOD", summaryData$EVTYPE)] <- "COASTAL FLOOD"
summaryData$EVTYPE[grep("CSTL FLOODING/EROSION", summaryData$EVTYPE)] <- "COASTAL FLOOD"
summaryData$EVTYPE[grep("DAM BREAK", summaryData$EVTYPE)] <- "DAM FAILURE"
summaryData$EVTYPE[grep("DOWNBURST", summaryData$EVTYPE)] <- "DOWNBURST WINDS"
summaryData$EVTYPE[grep("DROUGHT/EXCESSIVE HEAT", summaryData$EVTYPE)] <- "DROUGHT"
summaryData$EVTYPE[grep("^DRY", summaryData$EVTYPE)] <- "DRY CONDITIONS"
summaryData$EVTYPE[grep("DUST DEVEL", summaryData$EVTYPE)] <- "DUST DEVIL"
summaryData$EVTYPE[grep("DUST DEVIL WATERSPOUT", summaryData$EVTYPE)] <- "DUST DEVIL"
summaryData$EVTYPE[grep("DUST STORM/HIGH WINDS", summaryData$EVTYPE)] <- "DUST STORM"
summaryData$EVTYPE[grep("DUSTSTORM", summaryData$EVTYPE)] <- "DUST STORM"
summaryData$EVTYPE[grep("EARLY SNOW", summaryData$EVTYPE)] <- "EARLY SNOWFALL"
stormData[stormData$EVTYPE=="EXCESSIVE",] # FIND OUT WHAT KIND OF EVENT IT IS
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY
## 245455 51 7/1/1995 0:00:00 0000 EST 0
## COUNTYNAME STATE EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI
## 245455 VAZ001>002 005>007 VA EXCESSIVE 0
## END_DATE END_TIME COUNTY_END COUNTYENDN END_RANGE END_AZI
## 245455 7/31/1995 0:00:00 0 NA 0
## END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 245455 0 0 NA 0 0 0 0
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC
## 245455 0
## ZONENAMES
## 245455 LEE - LEE - WISE - LEE - WISE - DICKENSON - BUCHANAN - SCOTT - RUSSELL - TAZEWELL
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_
## 245455 0 0 0 0
## REMARKS
## 245455 Hot, dry weather began early in the month over the Mountain Empire, and continued unabated well into August. Some of the tobacco crop, especially the burley variety, withered under the dry weather. In general, less than one inch of rain fell in a month where average rainfall of three to four inches is common. The Tri-Cities Airport (located in Bristol, TN) recorded over 15 maximum temperatures above 90 degrees.
## REFNUM
## 245455 245443
summaryData$EVTYPE[grep("^EXCESSIVE$", summaryData$EVTYPE)] <- "EXCESSIVE HEAT"
summaryData$EVTYPE[grep("EXCESSIVE HEAT/DROUGHT", summaryData$EVTYPE)] <- "EXCESSIVE HEAT"
summaryData$EVTYPE[grep("EXCESSIVE PRECIPITATION", summaryData$EVTYPE)] <- "EXCESSIVE RAIN"
summaryData$EVTYPE[grep("EXCESSIVE RAINFALL", summaryData$EVTYPE)] <- "EXCESSIVE RAIN"
summaryData$EVTYPE[grep("EXCESSIVELY DRY", summaryData$EVTYPE)] <- "EXCESSIVE HEAT"
summaryData$EVTYPE[grep("EXTREME COLD/WIND CHILL", summaryData$EVTYPE)] <- "EXTREME COLD"
summaryData$EVTYPE[grep("EXTREME WIND CHILL", summaryData$EVTYPE)] <- "EXTREME COLD"
summaryData$EVTYPE[grep("EXTREME WINDCHILL", summaryData$EVTYPE)] <- "EXTREME COLD"
summaryData$EVTYPE[grep("EXTREME/RECORD COLD", summaryData$EVTYPE)] <- "EXTREME COLD"
summaryData$EVTYPE[grep("FLASH FLOOD", summaryData$EVTYPE)] <- "FLASH FLOOD"
summaryData$EVTYPE[grep("FLASH FLOOODING", summaryData$EVTYPE)] <- "FLASH FLOOD"
summaryData$EVTYPE[grep("FLOOD FLASH", summaryData$EVTYPE)] <- "FLASH FLOOD"
summaryData$EVTYPE[grep("FLOOD FLOOD/FLASH", summaryData$EVTYPE)] <- "FLASH FLOOD"
summaryData$EVTYPE[grep("FLOOD/FLASH", summaryData$EVTYPE)] <- "FLASH FLOOD"
summaryData$EVTYPE[grep("^FLOOD", summaryData$EVTYPE)] <- "FLOOD"
summaryData$EVTYPE[grep("FREEZING DRIZZLE AND FREEZING", summaryData$EVTYPE)] <- "FREEZING DRIZZLE"
summaryData$EVTYPE[grep("^FREEZING RAIN", summaryData$EVTYPE)] <- "FREEZING RAIN"
summaryData$EVTYPE[grep("^FROST", summaryData$EVTYPE)] <- "FROST"
summaryData$EVTYPE[grep("^FUNNEL", summaryData$EVTYPE)] <- "FUNNEL CLOUD"
summaryData$EVTYPE[grep("^GLAZE", summaryData$EVTYPE)] <- "GLAZE"
summaryData$EVTYPE[grep("GRADIENT WINDS", summaryData$EVTYPE)] <- "GRADIENT WIND"
summaryData$EVTYPE[grep("GUSTNADO", summaryData$EVTYPE)] <- "GUSTNADO"
summaryData$EVTYPE[grep("GUSTY", summaryData$EVTYPE)] <- "GUSTY WINDS"
summaryData$EVTYPE[grep("HAIL", summaryData$EVTYPE)] <- "HAILSTORM"
summaryData$EVTYPE[grep("^HEAT", summaryData$EVTYPE)] <- "HEAT"
summaryData$EVTYPE[grep("HEAVY PREC", summaryData$EVTYPE)] <- "HEAVY RAIN"
summaryData$EVTYPE[grep("HEAVY RAIN", summaryData$EVTYPE)] <- "HEAVY RAIN"
summaryData$EVTYPE[grep("HEAVY SHOWER", summaryData$EVTYPE)] <- "HEAVY RAIN"
summaryData$EVTYPE[grep("HEAVY SNOW", summaryData$EVTYPE)] <- "HEAVY SNOW"
summaryData$EVTYPE[grep("HEAVY SURF", summaryData$EVTYPE)] <- "HEAVY SURF"
summaryData$EVTYPE[grep("HEAVY WET SNOW", summaryData$EVTYPE)] <- "HEAVY SNOW"
summaryData$EVTYPE[grep("HEAVY SWELLS", summaryData$EVTYPE)] <- "HIGH SWELLS"
summaryData$EVTYPE[grep("HIGH SWELLS", summaryData$EVTYPE)] <- "HIGH SWELLS"
summaryData$EVTYPE[grep("HIGH SURF", summaryData$EVTYPE)] <- "HIGH SURF"
summaryData$EVTYPE[grep("HIGH TEMPERATURE RECORD", summaryData$EVTYPE)] <- "HEAT"
summaryData$EVTYPE[grep("HIGH WIND", summaryData$EVTYPE)] <- "HIGH WINDS"
summaryData$EVTYPE[grep("^HOT", summaryData$EVTYPE)] <- "HEAT"
summaryData$EVTYPE[grep("^HURRICANE", summaryData$EVTYPE)] <- "HURRICANE"
summaryData$EVTYPE[grep("HVY RAIN", summaryData$EVTYPE)] <- "HEAVY RAIN"
summaryData$EVTYPE[grep("HYPOTHERMIA", summaryData$EVTYPE)] <- "HYPOTHERMIA"
summaryData$EVTYPE[grep("^ICE", summaryData$EVTYPE)] <- "ICE STORM"
summaryData$EVTYPE[grep("LAKE-EFFECT SNOW", summaryData$EVTYPE)] <- "LAKE EFFECT SNOW"
summaryData$EVTYPE[grep("LAKE FLOOD", summaryData$EVTYPE)] <- "LAKESHORE FLOOD"
summaryData$EVTYPE[grep("LANDSLIDE", summaryData$EVTYPE)] <- "LANDSLIDES"
summaryData$EVTYPE[grep("LATE-SEASON SNOWFALL", summaryData$EVTYPE)] <- "LATE SEASON SNOWFALL"
summaryData$EVTYPE[grep("LATE SEASON SNOW", summaryData$EVTYPE)] <- "LATE SEASON SNOWFALL"
summaryData$EVTYPE[grep("LATE SNOW", summaryData$EVTYPE)] <- "LATE SEASON SNOWFALL"
summaryData$EVTYPE[grep("LIGHT SNOW", summaryData$EVTYPE)] <- "LIGHT SNOWFALL"
summaryData$EVTYPE[grep("LIGHTING", summaryData$EVTYPE)] <- "LIGHTNING"
summaryData$EVTYPE[grep("LIGHTNING", summaryData$EVTYPE)] <- "LIGHTNING"
summaryData$EVTYPE[grep("LIGNTNING", summaryData$EVTYPE)] <- "LIGHTNING"
summaryData$EVTYPE[grep("LOW TEMPERATURE", summaryData$EVTYPE)] <- "LOW TEMPERATURE"
summaryData$EVTYPE[grep("MICROBURST", summaryData$EVTYPE)] <- "MICROBURST"
summaryData$EVTYPE[grep("^MILD", summaryData$EVTYPE)] <- "MILD PATTERN"
summaryData$EVTYPE[grep("^MINOR FLOOD", summaryData$EVTYPE)] <- "MINOR FLOOD"
summaryData$EVTYPE[grep("^MIXED PRECIP", summaryData$EVTYPE)] <- "MIXED PRECIPITATION"
summaryData$EVTYPE[grep("^MUD", summaryData$EVTYPE)] <- "MUDSLIDE"
summaryData$EVTYPE[grep("^NON-", summaryData$EVTYPE)] <- "WINDS"
summaryData$EVTYPE[grep("NON TSTM WIND", summaryData$EVTYPE)] <- "WINDS"
summaryData$EVTYPE[grep("NORMAL PRECIPITATION", summaryData$EVTYPE)] <- "RAIN"
summaryData$EVTYPE[grep("PROLONG COLD", summaryData$EVTYPE)] <- "PROLONG COLD"
summaryData$EVTYPE[grep("^RAIN", summaryData$EVTYPE)] <- "RAIN"
summaryData$EVTYPE[grep("RECORD COLD", summaryData$EVTYPE)] <- "RECORD COLD"
summaryData$EVTYPE[grep("RECORD COLD", summaryData$EVTYPE)] <- "RECORD COLD"
summaryData$EVTYPE[grep("RECORD COOL", summaryData$EVTYPE)] <- "RECORD COLD"
summaryData$EVTYPE[grep("RECORD DRY", summaryData$EVTYPE)] <- "RECORD HEAT"
summaryData$EVTYPE[grep("RECORD HEAT", summaryData$EVTYPE)] <- "RECORD HEAT"
summaryData$EVTYPE[grep("RECORD HIGH", summaryData$EVTYPE)] <- "RECORD HEAT"
summaryData$EVTYPE[grep("RECORD MAY SNOW", summaryData$EVTYPE)] <- "RECORD SNOW"
summaryData$EVTYPE[grep("RECORD PRECIPITATION", summaryData$EVTYPE)] <- "RECORD RAINFALL"
summaryData$EVTYPE[grep("RECORD SNOW", summaryData$EVTYPE)] <- "RECORD SNOWFALL"
summaryData$EVTYPE[grep("RECORD TEMPERATURE", summaryData$EVTYPE)] <- "RECORD TEMPERATURES"
summaryData$EVTYPE[grep("RECORD WARM", summaryData$EVTYPE)] <- "RECORD HEAT"
summaryData$EVTYPE[grep("RECORD/EXCESSIVE HEAT", summaryData$EVTYPE)] <- "RECORD HEAT"
summaryData$EVTYPE[grep("REMNANT OF FLOYD", summaryData$EVTYPE)] <- "HURRICANE"
summaryData$EVTYPE[grep("RIP CURRENT", summaryData$EVTYPE)] <- "RIP CURRENT"
summaryData$EVTYPE[grep("RIVER", summaryData$EVTYPE)] <- "RIVER FLOOD"
summaryData$EVTYPE[grep("SEVERE THUNDERSTORM", summaryData$EVTYPE)] <- "SEVERE THUNDERSTORM"
summaryData$EVTYPE[grep("SLEET", summaryData$EVTYPE)] <- "SLEET"
summaryData$EVTYPE[grep("SMALL STREAM", summaryData$EVTYPE)] <- "SMALL STREAM FLOOD"
summaryData$EVTYPE[grep("SML STREAM", summaryData$EVTYPE)] <- "SMALL STREAM FLOOD"
summaryData$EVTYPE[grep("SNOW", summaryData$EVTYPE)] <- "SNOW"
summaryData$EVTYPE[grep("STORM FORCE WINDS", summaryData$EVTYPE)] <- "STORM SURGE"
summaryData$EVTYPE[grep("STORM SURGE", summaryData$EVTYPE)] <- "STORM SURGE"
summaryData$EVTYPE[grep("STREET FLOOD", summaryData$EVTYPE)] <- "STREET FLOODING"
summaryData$EVTYPE[grep("STRONG WIND", summaryData$EVTYPE)] <- "STRONG WIND"
summaryData$EVTYPE[grep("SUMMARY", summaryData$EVTYPE)] <- "SUMMARY" #NO EFFECT
summaryData$EVTYPE[grep("THUDERSTORM", summaryData$EVTYPE)] <- "THUNDERSTORM"
summaryData$EVTYPE[grep("THUNDEERSTORM", summaryData$EVTYPE)] <- "THUNDERSTORM"
summaryData$EVTYPE[grep("THUNDERESTORM", summaryData$EVTYPE)] <- "THUNDERSTORM"
summaryData$EVTYPE[grep("THUNDERSTORM", summaryData$EVTYPE)] <- "THUNDERSTORM WINDS"
summaryData$EVTYPE[grep("THUNDERSTROM", summaryData$EVTYPE)] <- "THUNDERSTORM WINDS"
summaryData$EVTYPE[grep("THUNDERTORM", summaryData$EVTYPE)] <- "THUNDERSTORM WINDS"
summaryData$EVTYPE[grep("THUNDERTSORM", summaryData$EVTYPE)] <- "THUNDERSTORM WINDS"
summaryData$EVTYPE[grep("TIDAL FLOOD", summaryData$EVTYPE)] <- "TIDAL FLOODING"
summaryData$EVTYPE[grep("^TORNADO", summaryData$EVTYPE)] <- "TORNADOES"
summaryData$EVTYPE[grep("TORNDAO", summaryData$EVTYPE)] <- "TORNADOES"
summaryData$EVTYPE[grep("TORRENTIAL", summaryData$EVTYPE)] <- "TORRENTIAL RAINFALL"
summaryData$EVTYPE[grep("TROPICAL STORM", summaryData$EVTYPE)] <- "TROPICAL STORM"
summaryData$EVTYPE[grep("TSTM", summaryData$EVTYPE)] <- "THUNDERSTORM WINDS"
summaryData$EVTYPE[grep("TUNDERSTORM", summaryData$EVTYPE)] <- "THUNDERSTORM WINDS"
summaryData$EVTYPE[grep("UNSEASONABLE COLD", summaryData$EVTYPE)] <- "UNSEASONABLY COLD"
summaryData$EVTYPE[grep("UNSEASONABLY COOL", summaryData$EVTYPE)] <- "UNSEASONABLY COLD"
summaryData$EVTYPE[grep("UNSEASONABLY WARM", summaryData$EVTYPE)] <- "UNSEASONABLY WARM"
summaryData$EVTYPE[grep("UNUSUAL WARM", summaryData$EVTYPE)] <- "UNSEASONABLY WARM"
summaryData$EVTYPE[grep("UNUSUALLY WARM", summaryData$EVTYPE)] <- "UNSEASONABLY WARM"
summaryData$EVTYPE[grep("UNUSUALLY COLD", summaryData$EVTYPE)] <- "UNSEASONABLY COLD"
summaryData$EVTYPE[grep("URBAN FLOOD", summaryData$EVTYPE)] <- "URBAN FLOODING"
summaryData$EVTYPE[grep("URBAN", summaryData$EVTYPE)] <- "URBAN FLOODING"
summaryData$EVTYPE[grep("VOLCANIC", summaryData$EVTYPE)] <- "VOLCANIC ASHFALL"
summaryData$EVTYPE[grep("WALL CLOUD", summaryData$EVTYPE)] <- "WALL CLOUD"
summaryData$EVTYPE[grep("WATERSPOUT", summaryData$EVTYPE)] <- "WATERSPOUTS"
summaryData$EVTYPE[grep("WAYTERSPOUT", summaryData$EVTYPE)] <- "WATERSPOUTS"
summaryData$EVTYPE[grep("WET MICOBURST", summaryData$EVTYPE)] <- "MICROBURST"
summaryData$EVTYPE[grep("^WILD", summaryData$EVTYPE)] <- "WILDFIRES"
summaryData$EVTYPE[grep("^WIND$", summaryData$EVTYPE)] <- "WIND DAMAGE"
summaryData$EVTYPE[grep("WIND AND WAVE", summaryData$EVTYPE)] <- "WIND DAMAGE"
summaryData$EVTYPE[grep("WIND GUSTS", summaryData$EVTYPE)] <- "WIND DAMAGE"
summaryData$EVTYPE[grep("WIND STORM", summaryData$EVTYPE)] <- "WIND DAMAGE"
summaryData$EVTYPE[grep("^WINDS$", summaryData$EVTYPE)] <- "WIND DAMAGE"
summaryData$EVTYPE[grep("WINTER MIX", summaryData$EVTYPE)] <- "WINTER WEATHER"
summaryData$EVTYPE[grep("WINTER STORMS", summaryData$EVTYPE)] <- "WINTER WEATHER"
summaryData$EVTYPE[grep("WINTER WEATHER", summaryData$EVTYPE)] <- "WINTER WEATHER"
summaryData$EVTYPE[grep("WINTERY", summaryData$EVTYPE)] <- "WINTER WEATHER"
summaryData$EVTYPE[grep("WINTRY", summaryData$EVTYPE)] <- "WINTER WEATHER"
summaryData$EVTYPE[grep("^WND$", summaryData$EVTYPE)] <- "WIND DAMAGE"
summaryData$EVTYPE[grep("^ WIND$", summaryData$EVTYPE)] <- "WIND DAMAGE"
# Create summary again
summaryData <- ddply(summaryData, "EVTYPE", summarize, N=sum(N),
fatalities=sum(fatalities), injuries=sum(injuries),
prop.dmg=sum(prop.dmg), crop.dmg=sum(crop.dmg))
After cleaning the data, we can now examine the results. The storm event database reports fatalities and injuries per event. To determine the events most harmful, I looked at the top 10 events based on their combined number of fatalities and injuries. Similarly, to determine the events with greatest economic consequences, I examined the events with highest combined property damage and crop damage.
The table below shows the top 10 most harmful events by their combined number of fatalities and injuries:
TOP 10 EVENTS MOST HARMFUL
head(arrange(summaryData, desc(fatalities+injuries)),10)
## EVTYPE N fatalities injuries prop.dmg crop.dmg
## 1 TORNADOES 60686 5633 91364 3214532.36 100026.72
## 2 THUNDERSTORM WINDS 335673 725 9446 2668174.86 194904.73
## 3 EXCESSIVE HEAT 1681 1903 6525 1460.00 494.40
## 4 FLOOD 25463 478 6791 906919.38 171561.68
## 5 LIGHTNING 15777 817 5232 603682.28 3585.61
## 6 HEAT 858 1118 2494 1767.75 968.00
## 7 FLASH FLOOD 55677 1035 1802 1474373.90 186484.21
## 8 ICE STORM 2093 96 2113 74308.17 1693.95
## 9 HIGH WINDS 21939 297 1522 382258.77 21062.81
## 10 WILDFIRES 4231 90 1606 125148.29 9065.74
Similarly, the table below shows the top 10 events with greatest economic consequences by their combined property damage and crop damage:
TOP 10 EVENTS MOST COSTLY
head(arrange(summaryData, desc(prop.dmg+crop.dmg)),10)
## EVTYPE N fatalities injuries prop.dmg crop.dmg
## 1 TORNADOES 60686 5633 91364 3214532.4 100026.72
## 2 THUNDERSTORM WINDS 335673 725 9446 2668174.9 194904.73
## 3 FLASH FLOOD 55677 1035 1802 1474373.9 186484.21
## 4 HAILSTORM 290399 45 1467 699300.4 585956.66
## 5 FLOOD 25463 478 6791 906919.4 171561.68
## 6 LIGHTNING 15777 817 5232 603682.3 3585.61
## 7 HIGH WINDS 21939 297 1522 382258.8 21062.81
## 8 SNOW 17650 161 1122 151641.3 2195.72
## 9 WINTER STORM 11433 206 1321 132720.6 1978.99
## 10 WILDFIRES 4231 90 1606 125148.3 9065.74
An interesting thing to note is that tornadoes cause the most damage overall but if we look only at crop damage, hail is the most costly. The table below shows the top 10 events ordered by crop damage in descending order:
TOP 10 EVENTS MOST COSTLY TO CROPS
head(arrange(summaryData, desc(crop.dmg)),10)
## EVTYPE N fatalities injuries prop.dmg crop.dmg
## 1 HAILSTORM 290399 45 1467 699300.38 585956.66
## 2 THUNDERSTORM WINDS 335673 725 9446 2668174.86 194904.73
## 3 FLASH FLOOD 55677 1035 1802 1474373.90 186484.21
## 4 FLOOD 25463 478 6791 906919.38 171561.68
## 5 TORNADOES 60686 5633 91364 3214532.36 100026.72
## 6 DROUGHT 2501 2 4 4099.05 33904.40
## 7 HIGH WINDS 21939 297 1522 382258.77 21062.81
## 8 HEAVY RAIN 11820 98 255 55673.19 12050.30
## 9 HURRICANE 287 133 1328 23757.15 10802.79
## 10 WILDFIRES 4231 90 1606 125148.29 9065.74
I wanted to see the effect of tornadoes over the years. I extracted from the original dataset the data corresponding to TORNADO events and summarize the statistics of interest by the year of the events. The following figures show the number of events, fatalities and injuries, and the damage caused by tornadoes from 1950 to 2011.
# Extract all TORNADO events
tornadoData <- stormData[grep("^TORN", stormData$EVTYPE),]
# Determine year of the event
tornadoData <- mutate(tornadoData, YEAR =
format(strptime(as.character(tornadoData$BGN_DATE),
"%m/%d/%Y %T"), format="%Y"))
# Summarize fatalities and damage by year
summaryTornado <- ddply(tornadoData, "YEAR", summarize, N=length(CROPDMG),
fatalities = sum(FATALITIES),
injuries = sum(INJURIES),
prop.dmg = sum(PROPDMG),
crop.dmg = sum(CROPDMG))
summaryTornado
## YEAR N fatalities injuries prop.dmg crop.dmg
## 1 1950 223 70 659 16999.15 0.00
## 2 1951 269 34 524 10560.99 0.00
## 3 1952 272 230 1915 16679.74 0.00
## 4 1953 492 519 5131 19182.20 0.00
## 5 1954 609 36 715 23367.82 0.00
## 6 1955 632 129 926 27715.63 0.00
## 7 1956 567 83 1355 27002.35 0.00
## 8 1957 930 193 1976 44568.89 0.00
## 9 1958 608 67 535 26597.11 0.00
## 10 1959 630 58 734 25015.54 0.00
## 11 1960 645 46 737 28314.24 0.00
## 12 1961 772 52 1087 39528.73 0.00
## 13 1962 673 30 551 22245.73 0.00
## 14 1963 493 31 538 24793.08 0.00
## 15 1964 760 73 1148 38618.32 0.00
## 16 1965 995 301 5197 46716.54 0.00
## 17 1966 606 98 2030 27079.36 0.00
## 18 1967 966 114 2144 40056.72 0.00
## 19 1968 715 131 2522 27762.41 0.00
## 20 1969 650 66 1311 33354.23 0.00
## 21 1970 700 73 1355 30800.98 0.00
## 22 1971 963 159 2723 44778.05 0.00
## 23 1972 775 27 976 36287.57 0.00
## 24 1973 1199 89 2406 75056.56 0.00
## 25 1974 1123 366 6824 57905.28 0.00
## 26 1975 962 60 1457 52498.79 0.00
## 27 1976 935 44 1195 56249.69 0.00
## 28 1977 922 43 771 54811.73 0.00
## 29 1978 875 53 919 49135.66 0.00
## 30 1979 918 84 3014 54633.74 0.00
## 31 1980 972 28 1157 64904.35 0.00
## 32 1981 830 24 798 46608.31 0.00
## 33 1982 1181 64 1276 83065.50 0.00
## 34 1983 995 34 756 64058.09 0.00
## 35 1984 1020 122 2499 63241.39 0.00
## 36 1985 773 94 1299 39667.21 0.00
## 37 1986 849 15 536 50897.34 0.00
## 38 1987 695 59 1018 33212.95 0.00
## 39 1988 773 32 688 50289.65 0.00
## 40 1989 921 50 1270 49263.55 0.00
## 41 1990 1264 53 1177 69560.57 0.00
## 42 1991 1208 39 864 60888.93 0.00
## 43 1992 1404 39 1323 70526.40 0.00
## 44 1993 615 53 739 64256.30 3181.30
## 45 1994 942 48 806 84865.25 5289.70
## 46 1995 1211 34 1116 53033.11 1429.72
## 47 1996 1239 26 705 62085.73 4017.70
## 48 1997 1180 68 1033 57907.75 1636.72
## 49 1998 1529 130 1874 91168.44 7064.67
## 50 1999 1519 94 1842 69604.61 3774.50
## 51 2000 1169 41 882 46487.56 1878.10
## 52 2001 1351 40 743 66361.19 3914.00
## 53 2002 1041 55 968 52653.55 799.01
## 54 2003 1534 54 1087 59887.74 2221.20
## 55 2004 1947 35 396 73341.16 7136.50
## 56 2005 1343 38 537 59885.81 9977.10
## 57 2006 1264 67 992 66879.08 2691.00
## 58 2007 1238 81 659 60048.70 3270.00
## 59 2008 1891 129 1690 106413.26 14214.00
## 60 2009 1272 21 397 71409.00 4492.00
## 61 2010 1446 45 699 88280.14 4668.00
## 62 2011 2192 587 6163 155464.51 18374.00
# Plot number of events, fatalities and damage by year
plot(summaryTornado$YEAR, summaryTornado$N, type="l",
main = "Figure 1. Number of tornado events per year",
xlab = "Year", ylab = "Number of tornado events")
plot(summaryTornado$YEAR, summaryTornado$fatalities, type="l", col="red",
main = "Figure 2. Fatalities and injuries caused by tornadoes per year",
xlab = "Year", ylab = "Number of fatalities/injuries", ylim = c(0,7000))
points(summaryTornado$YEAR, summaryTornado$injuries, type="l", col="blue")
legend("top", legend=c("Fatalities", "Injuries"), text.col=c("red", "blue"),
lty = 1, col = c("red","blue"))
plot(summaryTornado$YEAR, summaryTornado$prop.dmg, type="l", col="purple",
main="Figure 3. Damage caused by tornadoes",
xlab = "Year", ylab = "Amount of damage to property/crops", ylim = c(0,160000))
points(summaryTornado$YEAR, summaryTornado$crop.dmg, type="l", col="green")
legend("top", legend=c("Property damage", "Crop damage"), text.col=c("purple", "green"),
lty = 1, col = c("purple","green"))
From Figure 1 I noticed that the number of tornado events seem to be increasing over the years. Tornadoes cause a larger number of injuries than fatalities as shown in figure 2. Similarly, tornadoes cause more property damage than crop damage. We see from the summaryTornado table that crop damage is reported as $0 from 1950 to 1992. This may be the reason why it was not the top event for crop damage. It’s interesting to note that, even when we don’t have complete data for 2011, there is a spike in all the metrics for that year. In any case, we need to be prepared for tornadoes.