Synopsis

In the following analysis I’ve determined the most harmful weather incidences based on U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database (see link in the script).

The two questions I would like to answer:

Across the United States, which types of events are most harmful with respect to population health? Both in term of fatalities and injuries the most harmful events are the heat (excessive heat and heat events) and the tsunamis if we see it per event. Nevertheless considering the frequency of different events, the most harmful type is the tornado.

Across the United States, which types of events have the greatest economic consequences? In terms of economical damages the most harmful type is the “thunders” as thunderstorm wind and tornado.

Data Processing

I’ve removed event types with less than 20 occurrences for the reason that handle problem of low sample size. You can find also a regrouping of events to eliminate the typos (it’s a long list so I’ve hidden it, please check the script).

knitr::opts_chunk$set(echo = TRUE)
library(R.utils)
library(dplyr)
library(stringr)
setwd("c:/R/Coursera - R Programming/RepData_PeerAssessment2")
ifelse(!dir.exists(file.path(getwd(), "zip")), 
       dir.create(file.path(getwd(), "zip")), FALSE)
## [1] FALSE
url<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url,"zip/repdata_data_StormData.csv.bz2","curl")
bunzip2("zip/repdata_data_StormData.csv.bz2", "repdata_data_StormData.csv", 
        remove = FALSE, skip = TRUE)
## [1] "repdata_data_StormData.csv"
## attr(,"temporary")
## [1] FALSE
df<-read.csv("repdata_data_StormData.csv")

## there's a lot of typos and low sample sized events, hence I remove events
## with less then 20 freqency
df <- df %>% 
  group_by(EVTYPE) %>% 
  mutate(freq = n()) %>% 
  ungroup() %>% 
  filter(freq > 19) %>%
  select(-freq)
## as we have some additional typos, I have to fine tune the categories
EventDict<-data.frame("EVTYPE"=c("ASTRONOMICAL HIGH TIDE","ASTRONOMICAL LOW TIDE",
  "AVALANCHE","BLIZZARD","COASTAL FLOOD","Coastal Flooding","COASTAL FLOODING",
  "COLD","COLD/WIND CHILL","DENSE FOG","DROUGHT","DRY MICROBURST","DUST DEVIL",
  "DUST STORM","EXCESSIVE HEAT","EXCESSIVE SNOW","EXTREME COLD","EXTREME COLD/WIND CHILL",
  "EXTREME HEAT","EXTREME WINDCHILL","FLASH FLOOD","FLASH FLOOD/FLOOD","FLASH FLOODING",
  "FLASH FLOODS","FLOOD","FLOOD/FLASH FLOOD","FLOODING","FOG","FREEZE","FREEZING DRIZZLE",
  "FREEZING FOG","FREEZING RAIN","FROST","FROST/FREEZE","FUNNEL","FUNNEL CLOUD",
  "FUNNEL CLOUDS","GLAZE","GUSTY WIND","GUSTY WINDS","HAIL","HAIL 75","HEAT","HEAT WAVE",
  "HEAVY LAKE SNOW","HEAVY RAIN","HEAVY RAINS","HEAVY SNOW","HEAVY SNOW SQUALLS",
  "HEAVY SURF","HEAVY SURF/HIGH SURF","HIGH SURF","HIGH WIND","HIGH WINDS","HURRICANE",
  "HURRICANE/TYPHOON","ICE","ICE STORM","ICY ROADS","LAKE EFFECT SNOW","LAKE-EFFECT SNOW",
  "LAKESHORE FLOOD","LANDSLIDE","LIGHT FREEZING RAIN","Light Snow","LIGHT SNOW",
  "LIGHTNING","MARINE HAIL","MARINE HIGH WIND", "MARINE STRONG WIND", "MARINE THUNDERSTORM WIND",
  "MARINE TSTM WIND", "MIXED PRECIPITATION","MODERATE SNOWFALL","MONTHLY PRECIPITATION",
  "OTHER","RECORD COLD","RECORD HEAT","RECORD WARMTH","RIP CURRENT","RIP CURRENTS",
  "RIVER FLOOD","RIVER FLOODING", "SEICHE", "SEVERE THUNDERSTORMS", "SLEET",
  "SMALL HAIL", "SNOW", "Snow", "SNOW AND ICE", "STORM SURGE","STORM SURGE/TIDE",
  "STRONG WIND","STRONG WINDS", "Temperature record", "THUNDERSTORM", "THUNDERSTORM WIND",
  "THUNDERSTORM WINDS", "THUNDERSTORM WINDS HAIL","THUNDERSTORM WINDS/HAIL","THUNDERSTORM WINDSS",
  "TIDAL FLOODING", "TORNADO","TROPICAL DEPRESSION","TROPICAL STORM", "TSTM WIND",
  "TSTM WIND (G45)","TSTM WIND/HAIL", "TSUNAMI","UNSEASONABLY COLD","UNSEASONABLY DRY",
  "UNSEASONABLY WARM","URBAN FLOOD","URBAN FLOODING", "URBAN/SMALL STREAM FLOOD",
  "URBAN/SML STREAM FLD", "VOLCANIC ASH", "WATERSPOUT", "WATERSPOUTS","WILD/FOREST FIRE",
  "WILDFIRE", "WIND", "WIND DAMAGE","WINDS","WINTER STORM", "WINTER WEATHER", "WINTER WEATHER/MIX",
  "WINTRY MIX"),
"EVTYPE2"=c("High Surf","Astronomical Low Tide","Avalanche","Blizzard", "Coastal Flood",
  "Coastal Flood","Coastal Flood","Cold/Wind Chill","Cold/Wind Chill","Dense Fog","Drought",
  "Downbursts", "Dust Devil", "Dust Storm", "Excessive Heat", "Heavy Snow", "Extreme Cold/Wind Chill",
  "Extreme Cold/Wind Chill","Excessive Heat", "Extreme Cold/Wind Chill","Flash Flood",
  "Flash Flood","Flash Flood","Flash Flood","Flood","Flood","Flood","Dense Fog",
  "Frost/Freeze","Winter Weather","Freezing Fog", "Winter Weather", "Frost/Freeze",
  "Frost/Freeze", "Funnel Cloud", "Funnel Cloud", "Funnel Cloud", "Freezing Fog",
  "Strong Wind","Strong Wind","Hail", "Hail", "Heat", "Heat", "Heavy Snow", "Heavy Rain",
  "Heavy Rain", "Heavy Snow", "Heavy Snow", "High Surf","High Surf","High Surf",
  "High Wind","High Wind","Hurricane/Typhoon","Hurricane/Typhoon","Ice Storm",
  "Ice Storm","Winter Weather", "Lake-Effect Snow", "Lake-Effect Snow", "Lakeshore Flood",
  "Debris Flow","Winter Weather", "Winter Storm", "Winter Storm",  "Lightning", "Marine Hail",
  "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Marine Thunderstorm Wind",
  "Other","Winter Weather", "Other","Other","Cold/Wind Chill","Excessive Heat", "Excessive Heat",
  "Rip Current","Rip Current","Flood","Flood","Seiche", "Thunderstorm Wind","Sleet",
  "Hail", "Heavy Snow", "Heavy Snow", "Heavy Snow", "Storm Tide", "Storm Tide",
  "Strong Wind","Strong Wind","Excessive Heat", "Thunderstorm Wind","Thunderstorm Wind",
  "Thunderstorm Wind","Thunderstorm Wind","Thunderstorm Wind","Thunderstorm Wind",
  "Flood","Tornado","Tropical Depression","Tropical Storm", "Thunderstorm Wind",
  "Thunderstorm Wind","Thunderstorm Wind","Tsunami","Cold/Wind Chill","Downbursts",
  "Excessive Heat", "Flood","Flood","Flood","Flood","Volcanic Ash", "Waterspout",
  "Waterspout", "Wildfire", "Wildfire", "High Wind","High Wind","High Wind","Winter Storm",
  "Winter Weather", "Winter Weather", "Winter Weather"))
df<-merge(df, EventDict, all.x = TRUE)

Results

Across the United States, which types of events are most harmful with respect to population health?

You can see on the first barplot the average injuries and fatalities per event grouped by event types. (Eg. 1 injuries and 0.5 fatalities mean that 1 person hurts and 0.5 person dies due to a given event.) The second one is a sum of these figures instead of means. You can see the total casualties attributes to the given event types.

harmfulness<-df %>%
  group_by(EVTYPE2) %>%
  summarise_at(vars(FATALITIES, INJURIES),
               list(name = mean)) 
harmfulness <- t(harmfulness)
colnames(harmfulness) <- harmfulness[1,]
harmfulness <- harmfulness[-1,]

harmfulnesssum<-df %>%
  group_by(EVTYPE2) %>%
  summarise_at(vars(FATALITIES, INJURIES),
               list(name = sum)) 
harmfulnesssum <- t(harmfulnesssum)
colnames(harmfulnesssum) <- harmfulnesssum[1,]
harmfulnesssum <- harmfulnesssum[-1,] 

par(mfrow=c(2,1))
barplot(harmfulness, 
        main="Mean of injuries and fatalities per event",
        cex.main=0.8,
        cex.axis=0.5,
        col=c("red","blue"),
        names.arg=names(harmfulness),
        las=2, cex.names=0.5)
legend("topleft", 
  legend = c("Injuries","Fatalities"), 
  col = c("blue","red"), 
  pch = c(19,19), 
  bty = "n", 
  pt.cex = 0.5, 
  cex = 0.5, 
  text.col = "black", 
  horiz = F , 
  inset = c(0.1, 0.1))
barplot(harmfulnesssum,
        main="Total sum of injuries and fatalities in 1950-2011",
        cex.main=0.8,
        cex.axis=0.5,
        col=c("red","blue"),
        names.arg=names(harmfulnesssum),
        las=2, cex.names=0.5)

Both in term of fatalities and injuries the most harmful events are the heat (excessive heat and heat events) and the tsunamis if we see it per event. Nevertheless considering the frequency of different events, the most harmful type is the tornado.

Across the United States, which types of events have the greatest economic consequences?

Dataset contains two types of economical damages as property and crops damages. You can see the sum of them per types on the chart below.

EcoCons<-df %>%
  group_by(EVTYPE2) %>%
  mutate(EcoCons = PROPDMG + CROPDMG) %>%
  summarise_at(vars(EcoCons),
               list(name = sum)) 

barplot(EcoCons$name,
        main="Total sum of damages per event type in 1950-2011",
        cex.main=0.8,
        cex.axis=0.5,
        names.arg=EcoCons$EVTYPE2,
        las=2, cex.names=0.5)

In terms of economical damages the most harmful type is the “thunders” as thunderstorm wind and tornado.