Severe weather events in the United States some time lead to severe, even catastrophic consequences, for population health and the economy. These events result in personal injuries and fatalities and often severely damage properties and crops. Preventing such outcomes is of a vital concern to any modern society impacted by such natural events. A proper understanding the impacts of these weather events is needed to allow focus on those events more consequential to the population and economy. The analysis of weather even types based on population health impacts and economic consequences is based on publicly available data from the U.S. National Oceanic and Atmospheric Administration (NOAA) storm database. This analysis aims to answer the basic questions concerning which types of weather events are most harmful with respect to population health and which weather events have the greatest economic consequences. The results of the analysis indicate:
The data processing including defining subsets the data set for specific focus along with additional data elements developed from the raw data is described below:
# Load required librbaries
library(ggplot2)
library(scales)
# Set the working directory
setwd("~/Documents/R_Programming/ReproducibleResearch/RepData_PeerAssessment2")
# Download the raw data file to the local machine
raw.data.url <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
raw.data.file <- "./data/repdata-data-StormData.csv.bz2"
if(!file.exists(raw.data.file)) { # only download the file if it's not locally present
download.file(raw.data.url, raw.data.file)
print("File download complete and now beginning to read the data")
}else {
print("File already exists, beginning to read the data")
}
## [1] "File already exists, beginning to read the data"
# Read the local raw data file into R
stormdata <- read.table(file = raw.data.file,
header = TRUE,
sep = ",",
stringsAsFactors = FALSE)
cat("The raw dataset has", nrow(stormdata), "rows and", ncol(stormdata), "columns.")
## The raw dataset has 902297 rows and 37 columns.
print("The first few rows of stormdata:")
## [1] "The first few rows of stormdata:"
head(stormdata)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
print("Summary Analysis of stormdata")
## [1] "Summary Analysis of stormdata"
summary(stormdata)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
##
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31 Class :character Class :character Class :character
## Median : 75 Mode :character Mode :character Mode :character
## Mean :101
## 3rd Qu.:131
## Max. :873
##
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE
## Min. : 0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 0 Class :character Class :character Class :character
## Median : 0 Mode :character Mode :character Mode :character
## Mean : 1
## 3rd Qu.: 1
## Max. :3749
##
## END_TIME COUNTY_END COUNTYENDN END_RANGE
## Length:902297 Min. :0 Mode:logical Min. : 0
## Class :character 1st Qu.:0 NA's:902297 1st Qu.: 0
## Mode :character Median :0 Median : 0
## Mean :0 Mean : 1
## 3rd Qu.:0 3rd Qu.: 0
## Max. :0 Max. :925
##
## END_AZI END_LOCATI LENGTH WIDTH
## Length:902297 Length:902297 Min. : 0.0 Min. : 0
## Class :character Class :character 1st Qu.: 0.0 1st Qu.: 0
## Mode :character Mode :character Median : 0.0 Median : 0
## Mean : 0.2 Mean : 8
## 3rd Qu.: 0.0 3rd Qu.: 0
## Max. :2315.0 Max. :4400
##
## F MAG FATALITIES INJURIES
## Min. :0 Min. : 0 Min. : 0 Min. : 0.0
## 1st Qu.:0 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0.0
## Median :1 Median : 50 Median : 0 Median : 0.0
## Mean :1 Mean : 47 Mean : 0 Mean : 0.2
## 3rd Qu.:1 3rd Qu.: 75 3rd Qu.: 0 3rd Qu.: 0.0
## Max. :5 Max. :22000 Max. :583 Max. :1700.0
## NA's :843563
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0 Length:902297 Min. : 0.0 Length:902297
## 1st Qu.: 0 Class :character 1st Qu.: 0.0 Class :character
## Median : 0 Mode :character Median : 0.0 Mode :character
## Mean : 12 Mean : 1.5
## 3rd Qu.: 0 3rd Qu.: 0.0
## Max. :5000 Max. :990.0
##
## WFO STATEOFFIC ZONENAMES LATITUDE
## Length:902297 Length:902297 Length:902297 Min. : 0
## Class :character Class :character Class :character 1st Qu.:2802
## Mode :character Mode :character Mode :character Median :3540
## Mean :2875
## 3rd Qu.:4019
## Max. :9706
## NA's :47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## Min. :-14451 Min. : 0 Min. :-14455 Length:902297
## 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 8707 Median : 0 Median : 0 Mode :character
## Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. : 17124 Max. :9706 Max. :106220
## NA's :40
## REFNUM
## Min. : 1
## 1st Qu.:225575
## Median :451149
## Mean :451149
## 3rd Qu.:676723
## Max. :902297
##
# Add Column for "Population.Harm" by adding the values in the FATALITIES and
# INJURIES columns
stormdata$Population.Harm <- stormdata$FATALITIES + stormdata$INJURIES
# Subset the storm data to examine the human cost defined as Population.Harm > 0
humancost <- subset(stormdata, Population.Harm > 0)
# Calculate the mean of the total fatalities for the dataset
meanfatalities <- mean(humancost$FATALITIES)
# Calculate the mean of the total injuries for the dataset
meaninjuries <- mean(humancost$INJURIES)
# Create a Fatality.Index based on Fatality/Population.Harm
humancost$Fatality.Index <- humancost$FATALITIES / meanfatalities
# Create a Injury.Index based on Injuries/Population.Harm
humancost$Injury.Index <- humancost$INJURIES / meaninjuries
# Create a Population.Harm.Index for each weather event based on the following formula:
# (FATALITIES/mean(FATALITIES) + (INJURIES/mean(INJURIES)
humancost$Population.Harm.Index <- humancost$Fatality.Index +
humancost$Injury.Index
NOTE: This grouping could be done on the larger stormdata dataset but I chose to perform the grouping at the subset level to make the manual mapping process more manageable for the unique EVTYPE values in the subset
# Create the EVTYPE.Grouping (event type grouping) to "normalize/clean" the various EVTYPE
# entries
## Make all entries uppercase to increase consistency
humancost$EVTYPE <- toupper(humancost$EVTYPE)
## "Map"" the events into groupings
### Note: I realize this is a brture force method instead regexpr but after
### examning the data I decide to do the mapping manually because I actually want
### use this same dataset for a side project not related to the Coursera courses
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("AVALANCE",
"AVALANCHE")] <- "Avalanche"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("BLACK ICE",
"BLIZZARD",
"BLOWING SNOW",
"COLD AND SNOW",
"EXCESSIVE SNOW",
"FALLING SNOW/ICE",
"FREEZING RAIN",
"FREEZING RAIN/SNOW",
"FREEZING SPRAY",
"FROST",
"GLAZE",
"GLAZE/ICE STORM",
"HEAVY SNOW",
"HEAVY SNOW AND HIGH WINDS",
"HEAVY SNOW SHOWER",
"HEAVY SNOW/BLIZZARD/AVALANCHE",
"HEAVY SNOW/ICE",
"HEAVY WIND/HEAVY SNOW",
"HIGH WINDS/SNOW",
"ICE",
"ICE ON ROAD",
"ICE ROADS",
"ICE STORM",
"ICE STORM/FLAH FLOOD",
"ICY ROADS",
"LIGHT SNOW",
"MIXED PRECIP",
"RAIN/SNOW",
"SLEET",
"SNOW",
"SNOW AND ICE",
"SNOW SQUALL",
"SNOW SQUALLS",
"SNOW/ BITTER COLD",
"SNOW/HIGH WINDS",
"THUNDERSNOW",
"WINTER STORM",
"WINTER STORM HIGH WINDS",
"WINTER STORMS",
"WINTER WEATHER",
"WINTER WEATHER MIX",
"WINTER WEATHER/MIX",
"WINTRY MIX",
"ICE STORM/FLASH FLOOD",
"SNOW/ICE",
"HIGH WIND/HEAVY SNOW",
"BLIZZARD/WINTER STORM",
"FROST/FREEZE",
"FROST\\FREEZE",
"GLAZE ICE",
"GROUND BLIZZARD",
"HEAVY LAKE SNOW",
"HEAVY MIX",
"HEAVY SNOW AND STRONG WINDS",
"HEAVY SNOW SQUALLS",
"HEAVY SNOW-SQUALLS",
"HEAVY SNOW/BLIZZARD",
"HEAVY SNOW/FREEZING RAIN",
"HEAVY SNOW/HIGH WINDS & FLOOD",
"HEAVY SNOW/SQUALLS",
"HEAVY SNOW/WIND",
"HEAVY SNOW/WINTER STORM",
"HEAVY SNOWPACK",
"ICE AND SNOW",
"ICE FLOES",
"ICE JAM",
"ICE/STRONG WINDS",
"LAKE EFFECT SNOW",
"LAKE-EFFECT SNOW",
"LATE SEASON SNOW",
"LIGHT FREEZING RAIN",
"LIGHT SNOWFALL",
"MIXED PRECIPITATION",
"RECORD SNOW",
"SLEET/ICE STORM",
"SNOW ACCUMULATION",
"SNOW AND HEAVY SNOW",
"SNOW AND ICE STORM",
"SNOW FREEZING RAIN",
"SNOW/ ICE",
"SNOW/BLOWING SNOW",
"SNOW/COLD",
"SNOW/FREEZING RAIN",
"SNOW/HEAVY SNOW",
"SNOW/ICE STORM",
"SNOW/SLEET",
"SNOW/SLEET/FREEZING RAIN",
"")] <- "Winter Weather"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("BRUSH FIRE",
"WILD FIRES",
"WILD/FOREST FIRE",
"WILDFIRE",
"DENSE SMOKE",
"FOREST FIRES",
"GRASS FIRES",
"WILD/FOREST FIRES",
"WILDFIRES")] <- "Wildfire"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("COASTAL FLOOD",
"COASTAL FLOODING",
"COASTAL FLOODING/EROSION",
"FLASH FLOOD",
"FLASH FLOOD/FLOOD",
"FLASH FLOODING",
"FLASH FLOODING/FLOOD",
"FLASH FLOODS",
"FLOOD",
"FLOOD & HEAVY RAIN",
"FLOOD/FLASH FLOOD",
"FLOOD/RIVER FLOOD",
"FLOODING",
"MINOR FLOODING",
"RAPIDLY RISING WATER",
"RIVER FLOOD",
"RIVER FLOODING",
"STORM SURGE",
"STORM SURGE/TIDE",
"TIDAL FLOODING",
"FLOODING/COASTAL FLOODING",
"URBAN AND SMALL STREAM FLOODIN",
"URBAN/SML STREAM FLD",
" FLASH FLOOD",
"ASTRONOMICAL HIGH TIDE",
"BEACH EROSION",
"BREAKUP FLOODING",
"COASTAL FLOODING/EROSION",
"COASTAL EROSION",
"COASTAL SURGE",
"DAM BREAK",
"EROSION/CSTL FLOOD",
"FLASH FLOOD - HEAVY RAIN",
"FLASH FLOOD FROM ICE JAMS",
"FLASH FLOOD WINDS",
"FLASH FLOOD/",
"FLASH FLOOD/ STREET",
"FLASH FLOODING/THUNDERSTORM WI",
"FLOOD FLASH",
"FLOOD/FLASH",
"FLOOD/FLASH/FLOOD",
"FLOOD/FLASHFLOOD",
"FLOOD/RAIN/WINDS",
"FLOODING/HEAVY RAIN",
"FLOODS",
"HEAVY SURF COASTAL FLOODING",
"HIGH TIDES",
"ICE JAM FLOOD (MINOR",
"ICE JAM FLOODING",
"LAKE FLOOD",
"LAKESHORE FLOOD",
"MAJOR FLOOD",
"RIVER AND STREAM FLOOD",
"RURAL FLOOD",
"SEICHE",
"SMALL STREAM FLOOD",
"SNOWMELT FLOODING",
"URBAN AND SMALL",
"URBAN FLOOD",
"URBAN FLOODING",
"URBAN FLOODS",
"URBAN SMALL",
"URBAN/SMALL STREAM",
"URBAN/SMALL STREAM FLOOD")] <- "Flooding"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("COASTAL STORM",
"COASTALSTORM")] <- "Coastal Storm"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("COLD",
"COLD TEMPERATURE",
"COLD WAVE",
"COLD WEATHER",
"COLD/WIND CHILL",
"COLD/WINDS",
"EXTENDED COLD",
"EXTREME COLD",
"EXTREME COLD/WIND CHILL",
"EXTREME WINDCHILL",
"FREEZE",
"FREEZING DRIZZLE",
"LOW TEMPERATURE",
"RECORD COLD",
"UNSEASONABLY COLD",
"AGRICULTURAL FREEZE",
"COLD AND WET CONDITIONS",
"COOL AND WET",
"DAMAGING FREEZE",
"EARLY FROST",
"EXTREME WIND CHILL",
"FREEZING RAIN/SLEET",
"HARD FREEZE",
"UNSEASONABLE COLD")] <- "Cold Weather"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("DENSE FOG",
"FOG",
"FOG AND COLD TEMPERATURES",
"FREEZING FOG")] <- "Fog"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("DROUGHT",
"DROUGHT/EXCESSIVE HEAT")] <- "Drought"
humancost$EVTYPE.Grouping[humancost$EVTYPE == "DROWNING"] <- "Drowning"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("DRY MICROBURST",
"DRY MICROBURST WINDS",
"DRY MIRCOBURST WINDS",
"DOWNBURST",
"DUST DEVIL WATERSPOUT",
"DUST STORM/HIGH WINDS",
"MICROBURST",
"MICROBURST WINDS",
"WET MICROBURST")] <- "Microburst"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("DUST DEVIL",
"DUST STORM",
"WHIRLWIND",
"BLOWING DUST")] <- "Dust Devil/Whirlwind"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("EXCESSIVE HEAT",
"EXTREME HEAT",
"HEAT",
"HEAT WAVE",
"HEAT WAVE DROUGHT",
"HEAT WAVES",
"RECORD HEAT",
"RECORD/EXCESSIVE HEAT",
"UNSEASONABLY WARM",
"UNSEASONABLY WARM AND DRY",
"WARM WEATHER")] <- "Heat"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("EXCESSIVE RAINFALL",
"HEAVY RAIN",
"HEAVY RAINS",
"RAIN/WIND",
"TORRENTIAL RAINFALL",
"HEAVY RAIN/LIGHTNING",
"EXCESSIVE WETNESS",
"HEAVY PRECIPITATION",
"HEAVY RAIN AND FLOOD",
"HEAVY RAIN/HIGH SURF",
"HEAVY RAIN/LIGHTNING",
"HEAVY RAIN/SEVERE WEATHER",
"HEAVY RAIN/SMALL STREAM URBAN",
"HEAVY RAIN/SNOW",
"HEAVY RAINS/FLOODING",
"HEAVY SHOWER",
"HVY RAIN",
"RAIN",
"RAINSTORM",
"RECORD RAINFALL",
"UNSEASONAL RAIN")] <- "Excessive Rainfall"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("FUNNEL CLOUD",
"TORNADO",
"TORNADO F2",
"TORNADO F3",
"TORNADOES, TSTM WIND, HAIL",
"TORNADO F0",
"FUNNEL",
"COLD AIR TORNADO",
"TORNADO F1",
"TORNADOES",
"TORNDAO")] <- "Tornado"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("HAIL",
"SMALL HAIL",
"HAIL 0.75",
"HAIL 075",
"HAIL 100",
"HAIL 125",
"HAIL 150",
"HAIL 175",
"HAIL 200",
"HAIL 275",
"HAIL 450",
"HAIL 75",
"HAIL DAMAGE",
"HAIL/WIND",
"HAIL/WINDS",
"HAILSTORM",
"MARINE HAIL")] <- "Hail"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("HAZARDOUS SURF",
"HEAVY SURF",
"HEAVY SURF AND WIND",
"HEAVY SURF/HIGH WIND",
"HIGH SURF",
"ROUGH SURF",
"HEAVY SURF/HIGH SURF",
" HIGH SURF ADVISORY")] <- "Hazardous Surf"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("HEAVY SEAS",
"HIGH",
"HIGH SEAS",
"HIGH SWELLS",
"HIGH WATER",
"HIGH WAVES",
"HIGH WIND/SEAS",
"HIGH WIND AND SEAS",
"ROUGH SEAS")] <- "Heavy/High Seas"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("HIGH WIND",
"HIGH WINDS 48",
"HIGH WINDS",
"HIGH WINDS/COLD",
"STRONG WIND",
"STRONG WINDS",
"WIND",
"WIND STORM",
"WINDS",
"HIGH WIND 48",
"GRADIENT WIND",
"GUSTNADO",
"GUSTY WIND/HAIL",
"GUSTY WIND/HVY RAIN",
"GUSTY WIND/RAIN",
"HIGH WIND (G40)",
"HIGH WIND DAMAGE",
"HIGH WIND/BLIZZARD",
"HIGH WINDS HEAVY RAINS",
"HIGH WINDS/",
"HIGH WINDS/COASTAL FLOOD",
"HIGH WINDS/HEAVY RAIN",
"GUSTY WIND",
"GUSTY WINDS",
"SEVERE TURBULENCE",
"STORM FORCE WINDS",
"WIND AND WAVE",
"WIND DAMAGE",
"WIND/HAIL")] <- "High Winds"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("HURRICANE",
"HURRICANE EDOUARD",
"HURRICANE EMILY",
"HURRICANE ERIN",
"HURRICANE FELIX",
"HURRICANE OPAL",
"HURRICANE OPAL/HIGH WINDS",
"HURRICANE-GENERATED SWELLS",
"HURRICANE/TYPHOON",
"TYPHOON",
"HURRICANE GORDON")] <- "Hurricane/Typhoon"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("HYPERTHERMIA/EXPOSURE",
"HYPOTHERMIA",
"HYPOTHERMIA/EXPOSURE")] <- "Hypothermia/Exposure"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("LANDSLIDE",
"LANDSLIDES",
"FLASH FLOOD LANDSLIDES",
"FLASH FLOOD/LANDSLIDE",
"ROCK SLIDE")] <- "Landslide"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("LIGHTNING",
"LIGHTNING AND THUNDERSTORM WIND",
"LIGHTNING INJURY",
"LIGHTNING.",
"LIGHTING",
"LIGHTNING AND HEAVY RAIN",
"LIGHTNING AND THUNDERSTORM WIN",
"LIGHTNING WAUSEON",
"LIGHTNING FIRE",
"LIGHTNING THUNDERSTORM WINDS",
"LIGHTNING/HEAVY RAIN",
"LIGNTNING")] <- "Lightning"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("MARINE ACCIDENT",
"MARINE MISHAP")] <- "Marine Accident"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("MARINE HIGH WIND",
"MARINE STRONG WIND",
"MARINE THUNDERSTORM WIND",
"MARINE TSTM WIND")] <- "Marine Storm"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("MUDSLIDE",
"MUDSLIDES",
"MUD SLIDE",
"MUD SLIDES",
"MUD SLIDES URBAN FLOODING")] <- "Mudslide"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("NON TSTM WIND",
"NON-SEVERE WIND DAMAGE",
"NON-TSTM WIND")] <- "Non-Thunderstorm Wind"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("OTHER",
"?",
"APACHE COUNTY",
"ASTRONOMICAL LOW TIDE",
"LANDSLUMP",
"LANDSPOUT",
"VOLCANIC ASH")] <- "Other"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("RIP CURRENT",
"RIP CURRENTS",
"RIP CURRENTS/HEAVY SURF")] <- "Rip Current"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("THUNDERSTORM", "THUNDERSTORM WINDS",
"THUNDERSTORM WIND",
"THUNDERSTORM WIND (G48)",
"THUNDERSTORM WIND G52",
"THUNDERSTORM WINDS",
"THUNDERSTORM WINDS 13",
"THUNDERSTORM WINDS/HAIL",
"THUNDERSTORM WINDSS",
"THUNDERSTORMS WINDS",
"TSTM WIND", "TSTM WIND (G35)",
"TSTM WIND (G40)", "TSTM WIND (G45)",
"TSTM WIND/HAIL",
"THUNDERSTORM WINS",
"THUNDERSTORM WINDS LIGHTNING",
"THUNDERTORM WINDS",
"THUNDERSTORMW",
"THUNDERSTORM WIND (G40)",
"THUNDERSTORM WINDS HAIL",
" TSTM WIND",
" TSTM WIND (G45)",
"SEVERE THUNDERSTORM",
"SEVERE THUNDERSTORM WINDS",
"SEVERE THUNDERSTORMS",
"THUDERSTORM WINDS",
"THUNDERESTORM WINDS",
"THUNDERSTORM DAMAGE TO",
"THUNDERSTORM HAIL",
"THUNDERSTORM WIND 60 MPH",
"THUNDERSTORM WIND 65 MPH",
"THUNDERSTORM WIND 65MPH",
"THUNDERSTORM WIND 98 MPH",
"THUNDERSTORM WIND G50",
"THUNDERSTORM WIND G55",
"THUNDERSTORM WIND G60",
"THUNDERSTORM WIND TREES",
"THUNDERSTORM WIND.",
"THUNDERSTORM WIND/ TREE",
"THUNDERSTORM WIND/ TREES",
"THUNDERSTORM WIND/AWNING",
"THUNDERSTORM WIND/HAIL",
"THUNDERSTORM WIND/LIGHTNING",
"THUNDERSTORM WINDS 63 MPH",
"THUNDERSTORM WINDS AND",
"THUNDERSTORM WINDS G60",
"THUNDERSTORM WINDS HAIL",
"THUNDERSTORM WINDS.",
"THUNDERSTORM WINDS/ FLOOD",
"THUNDERSTORM WINDS/FLOODING",
"THUNDERSTORM WINDS/FUNNEL CLOU",
"THUNDERSTORM WINDS53",
"THUNDERSTORM WINDSHAIL",
"THUNDERSTORMS",
"THUNDERSTORMS WIND",
"THUNDERSTORMWINDS",
"THUNDERSTROM WIND",
"THUNERSTORM WINDS",
"TSTM WIND (G45)",
"TSTM WIND (41)",
"TSTM WIND 40",
"TSTM WIND 45",
"TSTM WIND 55",
"TSTM WIND 65)",
"TSTM WIND AND LIGHTNING",
"TSTM WIND DAMAGE",
"TSTM WIND G45",
"TSTM WIND G58",
"TSTM WINDS",
"TSTMW",
"TUNDERSTORM WIND")] <- "Thunderstorm"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("TROPICAL STORM",
"TROPICAL STORM GORDON",
"TROPICAL DEPRESSION",
"TROPICAL STORM ALBERTO",
"TROPICAL STORM DEAN",
"TROPICAL STORM JERRY")] <- "Tropical Storm"
humancost$EVTYPE.Grouping[humancost$EVTYPE == "TSUNAMI"] <- "Tsunami"
humancost$EVTYPE.Grouping[humancost$EVTYPE %in% c("WATERSPOUT",
"WATERSPOUT TORNADO",
"WATERSPOUT/TORNADO",
"WATERSPOUT-",
"WATERSPOUT-TORNADO",
"WATERSPOUT/ TORNADO")] <- "Waterspout"
humancost$EVTYPE.Grouping[humancost$EVTYPE == "ROGUE WAVE"] <- "Rogue Wave"
# Explore the EVTYPE data vs "most harmful with respect to population health"
## scatterplot EVTYPE by Population.Harm.Index
highhumancost <- subset(humancost,
Population.Harm.Index >
quantile(Population.Harm.Index, 0.99))
# Subset the storm data to examine the economic cost
econcost <- subset(stormdata, (PROPDMG > 0 | CROPDMG > 0))
NOTE: This grouping could be done on the larger stormdata dataset but I chose to perform the grouping at the subset level to make the manual mapping process more manageable for the unique EVTYPE values in the subset.
# Create the EVTYPE.Grouping (event type grouping) to "normalize/clean" the various EVTYPE
# entries
## Make all entries uppercase to increase consistency
econcost$EVTYPE <- toupper(econcost$EVTYPE)
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("AVALANCE",
"AVALANCHE")] <- "Avalanche"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("BLACK ICE",
"BLIZZARD",
"BLOWING SNOW",
"COLD AND SNOW",
"EXCESSIVE SNOW",
"FALLING SNOW/ICE",
"FREEZING RAIN",
"FREEZING RAIN/SNOW",
"FREEZING SPRAY",
"FROST",
"GLAZE",
"GLAZE/ICE STORM",
"HEAVY SNOW",
"HEAVY SNOW AND HIGH WINDS",
"HEAVY SNOW SHOWER",
"HEAVY SNOW/BLIZZARD/AVALANCHE",
"HEAVY SNOW/ICE",
"HEAVY WIND/HEAVY SNOW",
"HIGH WINDS/SNOW",
"ICE",
"ICE ON ROAD",
"ICE ROADS",
"ICE STORM",
"ICE STORM/FLAH FLOOD",
"ICY ROADS",
"LIGHT SNOW",
"MIXED PRECIP",
"RAIN/SNOW",
"SLEET",
"SNOW",
"SNOW AND ICE",
"SNOW SQUALL",
"SNOW SQUALLS",
"SNOW/ BITTER COLD",
"SNOW/HIGH WINDS",
"THUNDERSNOW",
"WINTER STORM",
"WINTER STORM HIGH WINDS",
"WINTER STORMS",
"WINTER WEATHER",
"WINTER WEATHER MIX",
"WINTER WEATHER/MIX",
"WINTRY MIX",
"ICE STORM/FLASH FLOOD",
"SNOW/ICE",
"HIGH WIND/HEAVY SNOW",
"BLIZZARD/WINTER STORM",
"FROST/FREEZE",
"FROST\\FREEZE",
"GLAZE ICE",
"GROUND BLIZZARD",
"HEAVY LAKE SNOW",
"HEAVY MIX",
"HEAVY SNOW AND STRONG WINDS",
"HEAVY SNOW SQUALLS",
"HEAVY SNOW-SQUALLS",
"HEAVY SNOW/BLIZZARD",
"HEAVY SNOW/FREEZING RAIN",
"HEAVY SNOW/HIGH WINDS & FLOOD",
"HEAVY SNOW/SQUALLS",
"HEAVY SNOW/WIND",
"HEAVY SNOW/WINTER STORM",
"HEAVY SNOWPACK",
"ICE AND SNOW",
"ICE FLOES",
"ICE JAM",
"ICE/STRONG WINDS",
"LAKE EFFECT SNOW",
"LAKE-EFFECT SNOW",
"LATE SEASON SNOW",
"LIGHT FREEZING RAIN",
"LIGHT SNOWFALL",
"MIXED PRECIPITATION",
"RECORD SNOW",
"SLEET/ICE STORM",
"SNOW ACCUMULATION",
"SNOW AND HEAVY SNOW",
"SNOW AND ICE STORM",
"SNOW FREEZING RAIN",
"SNOW/ ICE",
"SNOW/BLOWING SNOW",
"SNOW/COLD",
"SNOW/FREEZING RAIN",
"SNOW/HEAVY SNOW",
"SNOW/ICE STORM",
"SNOW/SLEET",
"SNOW/SLEET/FREEZING RAIN",
"")] <- "Winter Weather"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("BRUSH FIRE",
"WILD FIRES",
"WILD/FOREST FIRE",
"WILDFIRE",
"DENSE SMOKE",
"FOREST FIRES",
"GRASS FIRES",
"WILD/FOREST FIRES",
"WILDFIRES")] <- "Wildfire"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("COASTAL FLOOD",
"COASTAL FLOODING",
"COASTAL FLOODING/EROSION",
"FLASH FLOOD",
"FLASH FLOOD/FLOOD",
"FLASH FLOODING",
"FLASH FLOODING/FLOOD",
"FLASH FLOODS",
"FLOOD",
"FLOOD & HEAVY RAIN",
"FLOOD/FLASH FLOOD",
"FLOOD/RIVER FLOOD",
"FLOODING",
"MINOR FLOODING",
"RAPIDLY RISING WATER",
"RIVER FLOOD",
"RIVER FLOODING",
"STORM SURGE",
"STORM SURGE/TIDE",
"TIDAL FLOODING",
"FLOODING/COASTAL FLOODING",
"URBAN AND SMALL STREAM FLOODIN",
"URBAN/SML STREAM FLD",
" FLASH FLOOD",
"ASTRONOMICAL HIGH TIDE",
"BEACH EROSION",
"BREAKUP FLOODING",
"COASTAL FLOODING/EROSION",
"COASTAL EROSION",
"COASTAL SURGE",
"DAM BREAK",
"EROSION/CSTL FLOOD",
"FLASH FLOOD - HEAVY RAIN",
"FLASH FLOOD FROM ICE JAMS",
"FLASH FLOOD WINDS",
"FLASH FLOOD/",
"FLASH FLOOD/ STREET",
"FLASH FLOODING/THUNDERSTORM WI",
"FLOOD FLASH",
"FLOOD/FLASH",
"FLOOD/FLASH/FLOOD",
"FLOOD/FLASHFLOOD",
"FLOOD/RAIN/WINDS",
"FLOODING/HEAVY RAIN",
"FLOODS",
"HEAVY SURF COASTAL FLOODING",
"HIGH TIDES",
"ICE JAM FLOOD (MINOR",
"ICE JAM FLOODING",
"LAKE FLOOD",
"LAKESHORE FLOOD",
"MAJOR FLOOD",
"RIVER AND STREAM FLOOD",
"RURAL FLOOD",
"SEICHE",
"SMALL STREAM FLOOD",
"SNOWMELT FLOODING",
"URBAN AND SMALL",
"URBAN FLOOD",
"URBAN FLOODING",
"URBAN FLOODS",
"URBAN SMALL",
"URBAN/SMALL STREAM",
"URBAN/SMALL STREAM FLOOD",
"HEAVY SWELLS")] <- "Flooding"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("COASTAL STORM",
"COASTALSTORM")] <- "Coastal Storm"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("COLD",
"COLD TEMPERATURE",
"COLD WAVE",
"COLD WEATHER",
"COLD/WIND CHILL",
"COLD/WINDS",
"EXTENDED COLD",
"EXTREME COLD",
"EXTREME COLD/WIND CHILL",
"EXTREME WINDCHILL",
"FREEZE",
"FREEZING DRIZZLE",
"LOW TEMPERATURE",
"RECORD COLD",
"UNSEASONABLY COLD",
"AGRICULTURAL FREEZE",
"COLD AND WET CONDITIONS",
"COOL AND WET",
"DAMAGING FREEZE",
"EARLY FROST",
"EXTREME WIND CHILL",
"FREEZING RAIN/SLEET",
"HARD FREEZE",
"UNSEASONABLE COLD")] <- "Cold Weather"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("DENSE FOG",
"FOG",
"FOG AND COLD TEMPERATURES",
"FREEZING FOG")] <- "Fog"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("DROUGHT",
"DROUGHT/EXCESSIVE HEAT")] <- "Drought"
econcost$EVTYPE.Grouping[econcost$EVTYPE == "DROWNING"] <- "Drowning"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("DRY MICROBURST",
"DRY MICROBURST WINDS",
"DRY MIRCOBURST WINDS",
"DOWNBURST",
"DUST DEVIL WATERSPOUT",
"DUST STORM/HIGH WINDS",
"MICROBURST",
"MICROBURST WINDS",
"WET MICROBURST")] <- "Microburst"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("DUST DEVIL",
"DUST STORM",
"WHIRLWIND",
"BLOWING DUST")] <- "Dust Devil/Whirlwind"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("EXCESSIVE HEAT",
"EXTREME HEAT",
"HEAT",
"HEAT WAVE",
"HEAT WAVE DROUGHT",
"HEAT WAVES",
"RECORD HEAT",
"RECORD/EXCESSIVE HEAT",
"UNSEASONABLY WARM",
"UNSEASONABLY WARM AND DRY",
"WARM WEATHER")] <- "Heat"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("EXCESSIVE RAINFALL",
"HEAVY RAIN",
"HEAVY RAINS",
"RAIN/WIND",
"TORRENTIAL RAINFALL",
"HEAVY RAIN/LIGHTNING",
"EXCESSIVE WETNESS",
"HEAVY PRECIPITATION",
"HEAVY RAIN AND FLOOD",
"HEAVY RAIN/HIGH SURF",
"HEAVY RAIN/LIGHTNING",
"HEAVY RAIN/SEVERE WEATHER",
"HEAVY RAIN/SMALL STREAM URBAN",
"HEAVY RAIN/SNOW",
"HEAVY RAINS/FLOODING",
"HEAVY SHOWER",
"HVY RAIN",
"RAIN",
"RAINSTORM",
"RECORD RAINFALL",
"UNSEASONAL RAIN")] <- "Excessive Rainfall"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("FUNNEL CLOUD",
"TORNADO",
"TORNADO F2",
"TORNADO F3",
"TORNADOES, TSTM WIND, HAIL",
"TORNADO F0",
"FUNNEL",
"COLD AIR TORNADO",
"TORNADO F1",
"TORNADOES",
"TORNDAO")] <- "Tornado"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("HAIL",
"SMALL HAIL",
"HAIL 0.75",
"HAIL 075",
"HAIL 100",
"HAIL 125",
"HAIL 150",
"HAIL 175",
"HAIL 200",
"HAIL 275",
"HAIL 450",
"HAIL 75",
"HAIL DAMAGE",
"HAIL/WIND",
"HAIL/WINDS",
"HAILSTORM",
"MARINE HAIL")] <- "Hail"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("HAZARDOUS SURF",
"HEAVY SURF",
"HEAVY SURF AND WIND",
"HEAVY SURF/HIGH WIND",
"HIGH SURF",
"ROUGH SURF",
"HEAVY SURF/HIGH SURF",
" HIGH SURF ADVISORY")] <- "Hazardous Surf"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("HEAVY SEAS",
"HIGH",
"HIGH SEAS",
"HIGH SWELLS",
"HIGH WATER",
"HIGH WAVES",
"HIGH WIND/SEAS",
"HIGH WIND AND SEAS",
"ROUGH SEAS")] <- "Heavy/High Seas"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("HIGH WIND",
"HIGH WINDS 48",
"HIGH WINDS",
"HIGH WINDS/COLD",
"STRONG WIND",
"STRONG WINDS",
"WIND",
"WIND STORM",
"WINDS",
"HIGH WIND 48",
"GRADIENT WIND",
"GUSTNADO",
"GUSTY WIND/HAIL",
"GUSTY WIND/HVY RAIN",
"GUSTY WIND/RAIN",
"HIGH WIND (G40)",
"HIGH WIND DAMAGE",
"HIGH WIND/BLIZZARD",
"HIGH WINDS HEAVY RAINS",
"HIGH WINDS/",
"HIGH WINDS/COASTAL FLOOD",
"HIGH WINDS/HEAVY RAIN",
"GUSTY WIND",
"GUSTY WINDS",
"SEVERE TURBULENCE",
"STORM FORCE WINDS",
"WIND AND WAVE",
"WIND DAMAGE",
"WIND/HAIL",
"HIGH WINDS")] <- "High Winds"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("HURRICANE",
"HURRICANE EDOUARD",
"HURRICANE EMILY",
"HURRICANE ERIN",
"HURRICANE FELIX",
"HURRICANE OPAL",
"HURRICANE OPAL/HIGH WINDS",
"HURRICANE-GENERATED SWELLS",
"HURRICANE/TYPHOON",
"TYPHOON",
"HURRICANE GORDON")] <- "Hurricane/Typhoon"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("HYPERTHERMIA/EXPOSURE",
"HYPOTHERMIA",
"HYPOTHERMIA/EXPOSURE")] <- "Hypothermia/Exposure"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("LANDSLIDE",
"LANDSLIDES",
"FLASH FLOOD LANDSLIDES",
"FLASH FLOOD/LANDSLIDE",
"ROCK SLIDE")] <- "Landslide"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("LIGHTNING",
"LIGHTNING AND THUNDERSTORM WIND",
"LIGHTNING INJURY",
"LIGHTNING.",
"LIGHTING",
"LIGHTNING AND HEAVY RAIN",
"LIGHTNING AND THUNDERSTORM WIN",
"LIGHTNING WAUSEON",
"LIGHTNING FIRE",
"LIGHTNING THUNDERSTORM WINDS",
"LIGHTNING/HEAVY RAIN",
"LIGNTNING")] <- "Lightning"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("MARINE ACCIDENT",
"MARINE MISHAP")] <- "Marine Accident"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("MARINE HIGH WIND",
"MARINE STRONG WIND",
"MARINE THUNDERSTORM WIND",
"MARINE TSTM WIND")] <- "Marine Storm"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("MUDSLIDE",
"MUDSLIDES",
"MUD SLIDE",
"MUD SLIDES",
"MUD SLIDES URBAN FLOODING")] <- "Mudslide"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("NON TSTM WIND",
"NON-SEVERE WIND DAMAGE",
"NON-TSTM WIND")] <- "Non-Thunderstorm Wind"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("OTHER",
"?",
"APACHE COUNTY",
"ASTRONOMICAL LOW TIDE",
"LANDSLUMP",
"LANDSPOUT",
"VOLCANIC ASH")] <- "Other"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("RIP CURRENT",
"RIP CURRENTS",
"RIP CURRENTS/HEAVY SURF")] <- "Rip Current"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("THUNDERSTORM", "THUNDERSTORM WINDS",
"THUNDERSTORM WIND",
"THUNDERSTORM WIND (G48)",
"THUNDERSTORM WIND G52",
"THUNDERSTORM WINDS",
"THUNDERSTORM WINDS 13",
"THUNDERSTORM WINDS/HAIL",
"THUNDERSTORM WINDSS",
"THUNDERSTORMS WINDS",
"TSTM WIND", "TSTM WIND (G35)",
"TSTM WIND (G40)", "TSTM WIND (G45)",
"TSTM WIND/HAIL",
"THUNDERSTORM WINS",
"THUNDERSTORM WINDS LIGHTNING",
"THUNDERTORM WINDS",
"THUNDERSTORMW",
"THUNDERSTORM WIND (G40)",
"THUNDERSTORM WINDS HAIL",
" TSTM WIND",
" TSTM WIND (G45)",
"SEVERE THUNDERSTORM",
"SEVERE THUNDERSTORM WINDS",
"SEVERE THUNDERSTORMS",
"THUDERSTORM WINDS",
"THUNDERESTORM WINDS",
"THUNDERSTORM DAMAGE TO",
"THUNDERSTORM HAIL",
"THUNDERSTORM WIND 60 MPH",
"THUNDERSTORM WIND 65 MPH",
"THUNDERSTORM WIND 65MPH",
"THUNDERSTORM WIND 98 MPH",
"THUNDERSTORM WIND G50",
"THUNDERSTORM WIND G55",
"THUNDERSTORM WIND G60",
"THUNDERSTORM WIND TREES",
"THUNDERSTORM WIND.",
"THUNDERSTORM WIND/ TREE",
"THUNDERSTORM WIND/ TREES",
"THUNDERSTORM WIND/AWNING",
"THUNDERSTORM WIND/HAIL",
"THUNDERSTORM WIND/LIGHTNING",
"THUNDERSTORM WINDS 63 MPH",
"THUNDERSTORM WINDS AND",
"THUNDERSTORM WINDS G60",
"THUNDERSTORM WINDS HAIL",
"THUNDERSTORM WINDS.",
"THUNDERSTORM WINDS/ FLOOD",
"THUNDERSTORM WINDS/FLOODING",
"THUNDERSTORM WINDS/FUNNEL CLOU",
"THUNDERSTORM WINDS53",
"THUNDERSTORM WINDSHAIL",
"THUNDERSTORMS",
"THUNDERSTORMS WIND",
"THUNDERSTORMWINDS",
"THUNDERSTROM WIND",
"THUNERSTORM WINDS",
"TSTM WIND (G45)",
"TSTM WIND (41)",
"TSTM WIND 40",
"TSTM WIND 45",
"TSTM WIND 55",
"TSTM WIND 65)",
"TSTM WIND AND LIGHTNING",
"TSTM WIND DAMAGE",
"TSTM WIND G45",
"TSTM WIND G58",
"TSTM WINDS",
"TSTMW",
"TUNDERSTORM WIND",
"THUNDEERSTORM WINDS")] <- "Thunderstorm"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("TROPICAL STORM",
"TROPICAL STORM GORDON",
"TROPICAL DEPRESSION",
"TROPICAL STORM ALBERTO",
"TROPICAL STORM DEAN",
"TROPICAL STORM JERRY")] <- "Tropical Storm"
econcost$EVTYPE.Grouping[econcost$EVTYPE == "TSUNAMI"] <- "Tsunami"
econcost$EVTYPE.Grouping[econcost$EVTYPE %in% c("WATERSPOUT",
"WATERSPOUT TORNADO",
"WATERSPOUT/TORNADO",
"WATERSPOUT-",
"WATERSPOUT-TORNADO",
"WATERSPOUT/ TORNADO")] <- "Waterspout"
econcost$EVTYPE.Grouping[econcost$EVTYPE == "ROGUE WAVE"] <- "Rogue Wave"
# Cleanup the PROPDMGEXP field to obtain the proper multiple of PROPDMG
econcost$PROPDMGEXP <- toupper(econcost$PROPDMGEXP)
## Note: interpreting numbers in the PROPDMGEXP column as the "number of zeros
## to append to the value in PROPDMG
### No or invalid mulitplier specified
econcost$PROPDMGEXP[econcost$PROPDMGEXP %in% c("0", "", "+", "-")] <- 1
### Hundreds
econcost$PROPDMGEXP[econcost$PROPDMGEXP %in% c("H", "2")] <- 100
### Thousands
econcost$PROPDMGEXP[econcost$PROPDMGEXP %in% c("K", "3")] <- 1000
### Ten Thousands
econcost$PROPDMGEXP[econcost$PROPDMGEXP == "4"] <- 10000
### Hundred Thousands
econcost$PROPDMGEXP[econcost$PROPDMGEXP == "5"] <- 100000
### Millions
econcost$PROPDMGEXP[econcost$PROPDMGEXP %in% c("M", "6")] <- 1000000
### Ten Millions
econcost$PROPDMGEXP[econcost$PROPDMGEXP == "7"] <- 10000000
### Billions
econcost$PROPDMGEXP[econcost$PROPDMGEXP == "B"] <- 1000000000
# Add Column for Extended Property Damage based on PROPDMG and PROPDMGEXP column
# data
econcost$Extended.Property.Damage <- as.numeric(econcost$PROPDMG) *
as.numeric(econcost$PROPDMGEXP)
# Cleanup the CROPDMGEXT field to obtain the proper multiple of CROPDMG
econcost$CROPDMGEXP <- toupper(econcost$CROPDMGEXP)
## Note: interpreting numbers in the CROPDMGEXP column as the "number of zeros
## to append to the value in CROPDMG
### No or invalid mulitplier specified
econcost$CROPDMGEXP[econcost$CROPDMGEXP %in% c("0", "", "?")] <- 1
### Thousands
econcost$CROPDMGEXP[econcost$CROPDMGEXP == "K"] <- 1000
## Millions
econcost$CROPDMGEXP[econcost$CROPDMGEXP == "M"] <- 1000000
### Billions
econcost$CROPDMGEXP[econcost$CROPDMGEXP == "B"] <- 1000000000
# Add Column for Extended Property Damage based on CROPDMG and CROPDMGEXP column
# data
econcost$Extended.Crop.Damage <- as.numeric(econcost$CROPDMG) *
as.numeric(econcost$CROPDMGEXP)
# Add Column for Total.Economic.Damage
econcost$Total.Economic.Damage <- econcost$Extended.Property.Damage +
econcost$Extended.Crop.Damage
# Add Column for Total.Ecnomic.Damage.in.Billions
econcost$Total.Economic.Damage.in.Billions <- econcost$Total.Economic.Damage / 1000000000
# Explore the EVTYPE data vs "most harmful with respect to economic damage"
## scatterplot EVTYPE by Economic Damage
higheconcost <- subset(econcost,
Total.Economic.Damage.in.Billions >
quantile(Total.Economic.Damage.in.Billions, 0.99))
The results from the data set analysis appear below answering the two questions posed at the beginning of the analysis:
# Build the plot data for High Human Cost
hhc.plot.data <- aggregate(highhumancost$Population.Harm.Index,
by = list(highhumancost$EVTYPE.Grouping),
FUN = sum)
names(hhc.plot.data) <- c("EVTYPE.Grouping", "Total.Population.Harm.Index")
# Build the plot for the High Human Cost
hhc.plot <- ggplot(hhc.plot.data,
aes(x = reorder(EVTYPE.Grouping, -Total.Population.Harm.Index),
y = Total.Population.Harm.Index,
ymax = max(Total.Population.Harm.Index) + 1000)) +
geom_bar(stat = "identity",
fill = "blue") +
geom_text(aes(label = round(Total.Population.Harm.Index)),
vjust = -0.5) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Event Type Grouping") +
ylab("Total Population Harm Index") +
ggtitle("Weather Events Most Harmful to Population Health \nbetween 1950 and Nov. 2011\n") +
theme(axis.text=element_text(size=14, face="bold"),
axis.title=element_text(size=14, face="bold"),
plot.title = element_text(lineheight=1,
face="bold",
color="black",
size=18))
Table: Top Population Health by Weather Events
hhc.plot.data
## EVTYPE.Grouping Total.Population.Harm.Index
## 1 Excessive Rainfall 28.76
## 2 Flooding 882.68
## 3 Heat 3085.76
## 4 Hurricane/Typhoon 220.56
## 5 Tornado 9286.43
## 6 Tropical Storm 64.51
## 7 Tsunami 66.46
## 8 Wildfire 62.07
## 9 Winter Weather 401.61
print(hhc.plot)
Tornadoes are the most harmful to population health based on the NOAA Storm Database dataset from 1950 through November 2011
# Build the plot data for High Economic Cost
hec.plot.data <- aggregate(higheconcost$Total.Economic.Damage.in.Billions,
by = list(higheconcost$EVTYPE.Grouping),
FUN = sum)
names(hec.plot.data) <- c("EVTYPE.Grouping", "Total.Economic.Damage.in.Billions")
hec.plot <- ggplot(hec.plot.data,
aes(x = reorder(EVTYPE.Grouping, -Total.Economic.Damage.in.Billions),
y = Total.Economic.Damage.in.Billions,
ymax = max(Total.Economic.Damage.in.Billions) + 15)) +
geom_bar(stat = "identity",
fill = "red") +
geom_text(aes(label = round(Total.Economic.Damage.in.Billions)),
vjust = -0.5) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Event Type Grouping") +
ylab("Total Economic Damage in Billions USD") +
ggtitle("Weather Events with Greatest Economic \nConsequences between 1950 and Nov. 2011\n") +
theme(axis.text=element_text(size=14, face="bold"),
axis.title=element_text(size=14, face="bold"),
plot.title = element_text(lineheight=1,
face="bold",
color="black",
size=18))
Table: Top Economic Consequences by Weather Events
hec.plot.data
## EVTYPE.Grouping Total.Economic.Damage.in.Billions
## 1 Cold Weather 2.21812
## 2 Drought 14.79117
## 3 Excessive Rainfall 3.77264
## 4 Flooding 216.71963
## 5 Hail 14.38426
## 6 Hazardous Surf 0.04792
## 7 Heat 0.89257
## 8 High Winds 5.41130
## 9 Hurricane/Typhoon 90.58742
## 10 Landslide 0.24180
## 11 Lightning 0.02900
## 12 Thunderstorm 6.95576
## 13 Tornado 45.73866
## 14 Tropical Storm 8.08619
## 15 Tsunami 0.12182
## 16 Waterspout 0.05000
## 17 Wildfire 8.14838
## 18 Winter Weather 17.02068
print(hec.plot)
Flooding Events lead to the greatest economic consequences based on the NOAA Storm Database dataset from 1950 through November 2011