Synopsis

In accordance with “Storm data” (1950-2011) Tornado, thunderstorm wind, excessive heat, flood, lightnin, heat, flash flood, ice storm, high wind, and wildfire are in TOP10 wether ewents wich are most harmful with respect to population health across the United States.
Flood, hurricane (typhoon), tornado, storm surge/tide, hail, flash flood, drought, thunderstorm wind, ice storm, and wildfire are in TOP10 wether ewents wich have the greatest economic consequences.

Introduction

My data analysis addresses the following questions:
1. Across the United States, which types of events are most harmful with respect to population health?
2. Across the United States, which types of events have the greatest economic consequences?

Data describing

“Storm Data” is an official publication of the National Oceanic and Atmospheric Administration (NOAA).

Data Processing

Loading packages we are going to work with

## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Downloading data if it’s needed

if( !file.exists("./data") ) {dir.create("./data")}
if ( !file.exists("./data/StormData.csv.bz2") ) 
        {
        ## downloading file into 'working_directory/data'
        library(downloader)
        url <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
        download(url, "./data/StormData.csv.bz2")
        unlink(url)
        rm(url) ## removing spent variable
        }

Reading file into data frame

StormData <- read.csv("./data/StormData.csv.bz2", 
                     header = TRUE,
                     na.strings="", 
                     stringsAsFactors = FALSE) ## very importent
colnames(StormData) <- tolower(colnames(StormData)) ##!!
dim(StormData)
## [1] 902297     37

Data Subsetting and transformation

StormData1 <- StormData %>%
        select(evtype, bgn_date,  
               fatalities, injuries,
               propdmg,propdmgexp,
               cropdmg,cropdmgexp
               ) %>%
        mutate(evtype = tolower(evtype)) %>%
        arrange(desc(fatalities),desc(injuries))

unique(StormData1$propdmgexp)
##  [1] NA  "B" "M" "K" "0" "5" "-" "m" "H" "7" "+" "6" "?" "4" "2" "3" "h"
## [18] "1" "8"
unique(StormData1$cropdmgexp)
## [1] NA  "K" "M" "B" "0" "m" "?" "k" "2"

For the purposes of this analysis, I have assumed that
. characters {+,-,?, 0 2 3 4 5 6 7 8 9 } in columns ‘propdmgexp’ and ‘propdmgexp’ are errors, so I will replace these characters with NA; . “[kK]” “[mM]” “[bB]” mean “kilo”, “mega”or“millins”, “billions”, so I will replace these charachters with “103“,”106”, “10^9”; . “[hH]” mean maybe ‘hundreds’ so I will replace these with “10^2”.

Fixing ‘propdmgexp’ data

StormData1$propdmgexp <- gsub(pattern="[0-9]", replacement="NA", StormData1$propdmgexp)
StormData1$propdmgexp <- gsub(pattern="[h,H]", replacement="1E2", StormData1$propdmgexp)
StormData1$propdmgexp <- gsub(pattern="[k,K]", replacement="1E3", StormData1$propdmgexp)
StormData1$propdmgexp <- gsub(pattern="[m,M]", replacement="1E6", StormData1$propdmgexp)
StormData1$propdmgexp <- gsub(pattern="[b,B]", replacement="1E9", StormData1$propdmgexp)
StormData1$propdmgexp <- as.numeric(StormData1$propdmgexp)
## Warning: NAs introduced by coercion
StormData1$propdmgexp[is.na(StormData1$propdmgexp)] <- 0

Fixing ‘cropdmgexp’ data

StormData1$cropdmgexp <- gsub(pattern="[0-9]", replacement="NA", StormData1$cropdmgexp)
StormData1$cropdmgexp <- gsub(pattern="[h,H]", replacement="1E2", StormData1$cropdmgexp)
StormData1$cropdmgexp <- gsub(pattern="[k,K]", replacement="1E3", StormData1$cropdmgexp)
StormData1$cropdmgexp <- gsub(pattern="[m,M]", replacement="1E6", StormData1$cropdmgexp)
StormData1$cropdmgexp <- gsub(pattern="[b,B]", replacement="1E9", StormData1$cropdmgexp)
StormData1$cropdmgexp <- as.numeric(StormData1$cropdmgexp)
## Warning: NAs introduced by coercion
StormData1$cropdmgexp[is.na(StormData1$cropdmgexp)] <- 0
StormData1 <- StormData1 %>%
        mutate(propdmg = propdmg*propdmgexp,
               cropdmg = cropdmg*cropdmgexp, 
               dmg = propdmg+cropdmg,
               victims = fatalities+injuries) %>%
        select(-c(propdmgexp,cropdmgexp)) %>%
        filter(!(is.na(fatalities) & is.na(injuries) & 
                        is.na(propdmg) & is.na(cropdmg)))

StormData1$bgn_date <- strptime(StormData1$bgn_date, format = "%m/%d/%Y %H:%M:%S")
StormData1 <- mutate(StormData1,evyear = year(bgn_date)) %>%
        select(-bgn_date)

In “Storm Data” are only following 48 types of events: “astronomical low tide”, “avalanche”, “blizzard”, “coastal flood”,“cold/wind chill”,“debris flow”, “dense fog”, “dense smoke”,“drought”,“dust devil”, “dust storm”, “excessive heat”, “extreme cold/wind chill”, “flash flood”, “flood”, “frost/freeze”,“funnel cloud”, “freezing fog”, “hail”,“heat”,“heavy rain”,“heavy snow”, “high surf”,“high wind”, “hurricane (typhoon)”, “ice storm”, “lake-effect snow”, “lakeshore flood”, “lightning”, “marine hail”, “marine high wind”, “marine strong wind”, “marine thunderstorm wind”, “rip current”, “seiche”, “sleet”,“storm surge/tide”, “strong wind”, “thunderstorm wind”, “tornado”, “tropical depression”,“tropical storm”,“tsunami”, “volcanic ash”,“waterspout”, “wildfire”, “winter storm”,“winter weather”.

StormData1 has now some more types of events than in “Storm Data”:

length(unique(StormData1$evtype))
## [1] 898

Basic fixing and agregation values in ‘evtype’ column

StormData1$evtype = gsub(pattern="[?)(1-9]", replacement="", StormData1$evtype)
StormData1$evtype[grepl("astronomical low tide", StormData1$evtype)] <- "astronomical low tide"
StormData1$evtype[grepl("(avalanch?e)|(landslide)", StormData1$evtype)] <- "avalanche"
StormData1$evtype[grepl("blizzards?", StormData1$evtype)] <- "blizzard"
# StormData1$evtype[grepl("astronomical high tide", 
#                         StormData1$evtype)] <- "coastal flood"
StormData1$evtype[grepl("coastal flood", 
                        StormData1$evtype)] <- "coastal flood"
# ?"cold/wind chill"
StormData1$evtype[grepl("debris flow", StormData1$evtype)] <- "debris flow"
StormData1$evtype[grepl("(dense )?fog", StormData1$evtype)] <- "dense fog"
StormData1$evtype[grepl("dense smoke", StormData1$evtype)] <- "dense smoke"
StormData1$evtype[grepl("drought", StormData1$evtype)] <- "drought"
StormData1$evtype[grepl("dust devil", StormData1$evtype)] <- "dust devil"
StormData1$evtype[grepl("dust storm", StormData1$evtype)] <- "dust storm"
StormData1$evtype[grepl("((excessive|extreme) heat)|(hyperthermia/exposure)", 
                        StormData1$evtype)] <- "excessive heat"
StormData1$evtype[grepl("(extreme cold)|(winds?.chill)|(extreme windchill)", 
                        StormData1$evtype)] <- "extreme cold/wind chill"

StormData1$evtype[grepl("(flash.floods?)|(floods?.flash)", 
                        StormData1$evtype)] <- "flash flood"
StormData1$evtype[grepl("flooding|(river(.+)flood)|(flood/rain/winds)", 
                        StormData1$evtype)] <- "flood"
StormData1$evtype[grepl("urban(.+)floods?", StormData1$evtype)] <- "flood"
StormData1$evtype[grepl("(agricultural freeze)|(frost/freeze)"
                        , StormData1$evtype)] <- "frost/freeze"
StormData1$evtype[grepl("funnel cloud", StormData1$evtype)] <- "funnel cloud"
StormData1$evtype[grepl("freezing fog", StormData1$evtype)] <- "freezing fog"

# ?"hail" contained in "marine hail"
# ?"heat" contained in "excessive heat"
StormData1$evtype[grepl("(heat wave)|(unseasonably warm)", 
                        StormData1$evtype)] <- "heat"
StormData1$evtype[grepl("heavy rains?", StormData1$evtype)] <- "heavy rain"
StormData1$evtype[grepl("heavy snow", StormData1$evtype)] <- "heavy snow"
StormData1$evtype[grepl("high surf", StormData1$evtype)] <- "high surf"
StormData1$evtype[grepl("high winds?", StormData1$evtype)] <- "high wind"
StormData1$evtype[grepl("hurricanes?|typhoons?", StormData1$evtype)] <- "hurricane (typhoon)"

StormData1$evtype[grepl("ice storm", StormData1$evtype)] <- "ice storm"
StormData1$evtype[grepl("lake.effect snow", StormData1$evtype)] <- "lake-effect snow"
StormData1$evtype[grepl("lakeshore flood", StormData1$evtype)] <- "lakeshore flood"
StormData1$evtype[grepl("heavy rains?", StormData1$evtype)] <- "heavy rain"
StormData1$evtype[grepl("lightn?ing", StormData1$evtype)] <- "lightning"

StormData1$evtype[grepl("marine hail", StormData1$evtype)] <- "marine hail"
StormData1$evtype[grepl("marine high winds?", StormData1$evtype)] <- "marine high wind"
StormData1$evtype[grepl("marine strong winds?", StormData1$evtype)] <- "marine strong wind"
StormData1$evtype[grepl("marine thunderstorms? winds?", StormData1$evtype)] <- "marine thunderstorm wind"

StormData1$evtype[grepl("rip currents?", StormData1$evtype)] <- "rip currents"
StormData1$evtype[grepl("seiche", StormData1$evtype)] <- "seiche"
StormData1$evtype[grepl("sleet", StormData1$evtype)] <- "sleet"
StormData1$evtype[grepl("storm surge", StormData1$evtype)] <- "storm surge/tide"
StormData1$evtype[grepl("strong winds", StormData1$evtype)] <- "strong wind"

StormData1$evtype[grepl("tstm winds?", StormData1$evtype)] <- "thunderstorm wind"
StormData1$evtype[grepl("severe thunderstorms?", StormData1$evtype)] <- "thunderstorm wind"
StormData1$evtype[grepl("thunderstorms winds", 
                        StormData1$evtype)] <- "thunderstorm wind"
StormData1$evtype[grepl("tornado", StormData1$evtype)] <- "tornado"
StormData1$evtype[grepl("tropical depression", StormData1$evtype)] <- "tropical depression"
StormData1$evtype[grepl("tropical storms?", StormData1$evtype)] <- "tropical storm"
StormData1$evtype[grepl("tsunami", StormData1$evtype)] <- "tsunami"

StormData1$evtype[grepl("volcanic ash", StormData1$evtype)] <- "volcanic ash"
StormData1$evtype[grepl("waterspout?", StormData1$evtype)] <- "waterspout"
StormData1$evtype[grepl("(wild)?fires?", StormData1$evtype)] <- "wildfire"
StormData1$evtype[grepl("winter storms?", StormData1$evtype)] <- "winter storm"
StormData1$evtype[grepl("winter weather|wintry", StormData1$evtype)] <- "winter weather"

After fixing StormData1 has foolowing number of event types:

length(unique(StormData1$evtype))
## [1] 487

Calculation total losses in USA vs time

losses_by_year <- StormData1 %>%
        group_by(evyear) %>%
        summarize(total_fatalities = sum(fatalities, na.rm = TRUE),
                  total_injuries = sum(injuries, na.rm = TRUE),
                  total_victims = sum(victims, na.rm = TRUE),
                  total_propdmg = sum(propdmg, na.rm = TRUE),
                  total_cropdmg = sum(cropdmg, na.rm = TRUE),
                  total_dmg = sum(dmg, na.rm = TRUE)
        )
par(mfrow = c(2,2), mar=c(4, 4, 1, 1))
with (losses_by_year,
{
        plot(total_fatalities ~ evyear,
             type = "b", lty = "dashed",
             xlab = "Year", ylab = "Fatalities")
        plot(total_injuries ~ evyear,
             type = "b", lty = "dashed",
             xlab = "Year", ylab = "Injuries")
        plot(total_propdmg ~ evyear,log = "y",
             type = "b", lty = "dashed",
             xlab = "Year", ylab = "Property damage (in $USA)")
        plot(total_cropdmg ~ evyear,log = "y", xlim = c(1993,2011),
             type = "b", lty = "dashed",
             xlab = "Year", ylab = "Crop damage (in $USA)")
}
)
## Warning in xy.coords(x, y, xlabel, ylabel, log): 43 y values <= 0 omitted
## from logarithmic plot

Fig.1. Victims and damages in USA (1950-2011) caused by the weather events (detailed)

Estimates of crop damage exist in “Storm Data” only for time period from 1993 to 2011 year.

model1 <- lm(total_victims ~ evyear, losses_by_year)
par(mfrow = c(1,2), mar=c(4, 4, 1, 1))
with (losses_by_year,
{
        plot(x=evyear, y=total_victims,
             xlab = "Year", ylab = "Victims")
        abline(model1, lwd = 2, col = "red")
        plot(x=evyear, y=total_dmg, log="y", 
             xlab = "Year", ylab = "Damage (in $USA)")
}  
)

Fig.2. Victims and damages in USA (1950-2011) caused by the weather events (in general)

Red line on Fig.1 mean linear aproximation. It seems total damages have exponential depends from time.

Calculation data frame with total fatalities, injuries, and damage in all time period from 4/18/1950 to 11/29/2011 for each type of wether events

losses_by_evtype <- StormData1 %>%
        group_by(evtype)%>% #, state
        summarize(total_fatalities = sum(fatalities, na.rm = TRUE),
                  total_injuries = sum(injuries, na.rm = TRUE),
                  total_victims = sum(victims, na.rm = TRUE),
                  total_propdmg = sum(propdmg, na.rm = TRUE),
                  total_cropdmg = sum(cropdmg, na.rm = TRUE),
                  total_dmg = sum(dmg, na.rm = TRUE)
        ) %>%
        arrange(desc(total_victims), desc(total_dmg))

Results

1. Across the United States, which types of events are most harmful with respect to population health?

TopFatEv <- arrange(losses_by_evtype, desc(total_fatalities))[1:10,1]
print(TopFatEv)
## Source: local data frame [10 x 1]
## 
##                     evtype
## 1                  tornado
## 2           excessive heat
## 3                     heat
## 4              flash flood
## 5                lightning
## 6        thunderstorm wind
## 7             rip currents
## 8                    flood
## 9  extreme cold/wind chill
## 10               high wind
TopInjEv <- arrange(losses_by_evtype, desc(total_injuries))[1:10,1]
print(TopInjEv)
## Source: local data frame [10 x 1]
## 
##               evtype
## 1            tornado
## 2  thunderstorm wind
## 3              flood
## 4     excessive heat
## 5          lightning
## 6               heat
## 7          ice storm
## 8        flash flood
## 9           wildfire
## 10         high wind

Following 10 types of wether events are most harmful with respect to population health across the United States (1950-2011):

TopVictEv <- arrange(losses_by_evtype, desc(total_victims))[1:10,1]
print(TopVictEv)
## Source: local data frame [10 x 1]
## 
##               evtype
## 1            tornado
## 2  thunderstorm wind
## 3     excessive heat
## 4              flood
## 5          lightning
## 6               heat
## 7        flash flood
## 8          ice storm
## 9          high wind
## 10          wildfire

2. Across the United States, which types of events have the greatest economic consequences?

Following 10 types of events have the greatest economic consequences across the United States (1950-2011):

TopDmgEv <- arrange(losses_by_evtype, desc(total_dmg))[1:10,1]
print(TopDmgEv)
## Source: local data frame [10 x 1]
## 
##                 evtype
## 1                flood
## 2  hurricane (typhoon)
## 3              tornado
## 4     storm surge/tide
## 5                 hail
## 6          flash flood
## 7              drought
## 8    thunderstorm wind
## 9            ice storm
## 10            wildfire
TopEv <- data.frame(rank = c(1:10), TopVictEv, TopDmgEv)
colnames(TopEv) = c("rank", "evtypeVict", "evtypeDmg")
print(TopEv)
##    rank        evtypeVict           evtypeDmg
## 1     1           tornado               flood
## 2     2 thunderstorm wind hurricane (typhoon)
## 3     3    excessive heat             tornado
## 4     4             flood    storm surge/tide
## 5     5         lightning                hail
## 6     6              heat         flash flood
## 7     7       flash flood             drought
## 8     8         ice storm   thunderstorm wind
## 9     9         high wind           ice storm
## 10   10          wildfire            wildfire

Below is a list with most dangerous for peple and economics types of wether event:

TopDangerEv <- unique(rbind(TopVictEv,TopDmgEv))
print(TopDangerEv)
## Source: local data frame [14 x 1]
## 
##                 evtype
## 1              tornado
## 2    thunderstorm wind
## 3       excessive heat
## 4                flood
## 5            lightning
## 6                 heat
## 7          flash flood
## 8            ice storm
## 9            high wind
## 10            wildfire
## 11 hurricane (typhoon)
## 12    storm surge/tide
## 13                hail
## 14             drought