Based on NOAA Storm Database, Tropical depessions, Strong Wind, Excessive Heat, Flood and Hail are the most harmful types of events with respect to population health (fatalities and injuries). In terms of economic consequences, Tropical Depression, Strong Wind, Hail, Flash Flood, Flood and High Wind are those with the greatest economic consequences (property and cop damages).
Data is directly downloaded from the URL, unzipped and loaded in R. There is no preprocessing step done outside this document. Some processing stpes were carried out: - Removing some summary observations - Grouping event type to match the 48 official categories by using string approximative match with Jaro-Winker distance - Computing health impacts which is the sum of fatalities and injuries - Computing economic impacts which is the sum of property and crop damages
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stringdist)
## downloading and reading data
fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
data <- download.file(fileurl, destfile="repdata%2Fdata%2FStormData.csv.bz2")
data <- bzfile ("repdata%2Fdata%2FStormData.csv.bz2")
data <- read.csv(data)
## Removing summary observations
data <- subset(x=data, subset=!grepl("^Summary",EVTYPE, ignore.case = TRUE))
## Grouping EVTYPE to match 48 events type
evtype <- c("Astronomical Low Tide", "Avalanche","Blizzard","Coastal Flood", "Cold/Wind Chill","Debris Flow","Dense Fog","Dense Smoke","Drought","Dust Devil", "Dust Storm","Excessive Heat","Extreme Cold/Wind Chill","Flash Flood","Flood", "Freezing Fog", "Frost/Freeze", "Funnel Cloud", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf","High Wind","Hurricane/Typhoon","Ice Storm", "Lakeshore Flood","Lake-Effect Snow","Lightning","Marine Hail","Marine High Wind", "Marine Strong Wind","Marine Thunderstorm Wind","Rip Current","Seiche", "Sleet", "Storm Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")
## using approximative matchig (amacth) with JW distance to group EVTYPE
index <- amatch(data$EVTYPE, evtype, nomatch = 0, matchNA = TRUE,
method = "jw", maxDist = 5)
newtype <- evtype[index]
data <- cbind(data, as.data.frame(newtype))
## health impact : assuming that health impact is fatalities and injuries
data <- mutate(data, healthimpact=FATALITIES + INJURIES)
## economic impact : assuming that economic impact are property and crop damages
data <- mutate(data, ecoimpact=PROPDMG + CROPDMG)
As we can see in the figure below and table, tropical depression is the most harmful type of event with respect to population health. Fatalities and injuries were added to measure health impact. The top five is completed by Strong Wind, Excessive Heat, Flood and Hail.
## Grouping by event type
byevt <- group_by(data, newtype)
## Most healh impact
health <- summarize(byevt, sum(healthimpact))
## `summarise()` ungrouping output (override with `.groups` argument)
colnames(health) <- c("Type", "Health.impact")
with (health, plot(Type, Health.impact, main="Health impact by type of event", ylab = "Health impact", xlab = "Type of event" ))
maxhealth <- arrange(health, desc(Health.impact))
head(maxhealth, 5)
## # A tibble: 5 x 2
## Type Health.impact
## <fct> <dbl>
## 1 Tropical Depression 96980
## 2 Strong Wind 9649
## 3 Excessive Heat 9028
## 4 Flood 8065
## 5 Hail 7428
Figure and table below show that Tropical Depression, Strong Wind, Hail, Flash Flood, Flood and High Wind are the ones with the greatest economic impacts (crop and property damages).
eco <- summarize(byevt, sum(ecoimpact))
## `summarise()` ungrouping output (override with `.groups` argument)
colnames(eco) <- c("Type", "Economic.impact")
with (eco, plot(Type, Economic.impact, main="Economic impact by type of event", ylab = "Economic impact", xlab = "Type of event" ))
maxeco <- arrange(eco, desc(Economic.impact))
head(maxeco)
## # A tibble: 6 x 2
## Type Economic.impact
## <fct> <dbl>
## 1 Tropical Depression 3312553.
## 2 Strong Wind 2463691.
## 3 Hail 1875227.
## 4 Flash Flood 1684209.
## 5 Flood 1088256.
## 6 High Wind 814813.