Analysis of Weather Events and the Impact on Population Heath and Economic Consequences

Data Proccessing

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm Data There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation National Climatic Data Center Storm Events FAQ The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 3.4.4

## -- Attaching packages ------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --

## v ggplot2 2.2.1     v purrr   0.2.4
## v tibble  1.4.2     v dplyr   0.7.4
## v tidyr   0.8.0     v stringr 1.3.0
## v readr   1.1.1     v forcats 0.3.0

## Warning: package 'ggplot2' was built under R version 3.4.4

## Warning: package 'tibble' was built under R version 3.4.4

## Warning: package 'tidyr' was built under R version 3.4.4

## Warning: package 'readr' was built under R version 3.4.4

## Warning: package 'purrr' was built under R version 3.4.4

## Warning: package 'stringr' was built under R version 3.4.4

## Warning: package 'forcats' was built under R version 3.4.4

## -- Conflicts ---------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(data.table)

## Warning: package 'data.table' was built under R version 3.4.4

## 
## Attaching package: 'data.table'

## The following objects are masked from 'package:dplyr':
## 
##     between, first, last

## The following object is masked from 'package:purrr':
## 
##     transpose

library(lubridate)

## Warning: package 'lubridate' was built under R version 3.4.4

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:data.table':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday,
##     week, yday, year

## The following object is masked from 'package:base':
## 
##     date

library(anytime)

## Warning: package 'anytime' was built under R version 3.4.4

library(R.utils)

## Warning: package 'R.utils' was built under R version 3.4.4

## Loading required package: R.oo

## Warning: package 'R.oo' was built under R version 3.4.4

## Loading required package: R.methodsS3

## R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.

## R.oo v1.22.0 (2018-04-21) successfully loaded. See ?R.oo for help.

## 
## Attaching package: 'R.oo'

## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods

## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save

## R.utils v2.7.0 successfully loaded. See ?R.utils for help.

## 
## Attaching package: 'R.utils'

## The following object is masked from 'package:tidyr':
## 
##     extract

## The following object is masked from 'package:utils':
## 
##     timestamp

## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, parse, warnings

if(!file.exists(".data")){dir.create("./data")}

## Warning in dir.create("./data"): '.\data' already exists

fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = "./data/stormdata.csv.bz2")
bunzip2("./data/stormdata.csv.bz2", "dataset.csv", remove = FALSE, skip = TRUE)

## [1] "dataset.csv"
## attr(,"temporary")
## [1] FALSE

dataset <- read_csv("./data/dataset.csv")

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   STATE__ = col_double(),
##   COUNTY = col_double(),
##   BGN_RANGE = col_double(),
##   COUNTY_END = col_double(),
##   END_RANGE = col_double(),
##   LENGTH = col_double(),
##   WIDTH = col_double(),
##   F = col_integer(),
##   MAG = col_double(),
##   FATALITIES = col_double(),
##   INJURIES = col_double(),
##   PROPDMG = col_double(),
##   CROPDMG = col_double(),
##   LATITUDE = col_double(),
##   LONGITUDE = col_double(),
##   LATITUDE_E = col_double(),
##   LONGITUDE_ = col_double(),
##   REFNUM = col_double()
## )

## See spec(...) for full column specifications.

time_merge <- function(time, date, timezone) {
        t <- strptime(time, format = "%H%M")
        t <- format(t, format="%H:%M:%S")
        t <- paste(t, timezone)
        x <- mdy_hms(date)
        x <- paste(x, t)
        new_date <- as.POSIXlt(x,format="%Y-%m-%d %H:%M:%S")
        return(new_date)
}

start_dates <- time_merge(dataset$BGN_TIME, dataset$BGN_DATE, dataset$TIME_ZONE)
end_dates <- time_merge(dataset$END_TIME, dataset$END_DATE, dataset$TIME_ZONE)


dataset.dates <- cbind(start_dates, end_dates, dataset)
remove(start_dates)
remove(end_dates)
dataset.dates <- dataset.dates %>% 
        mutate(Year = year((dataset.dates$start_dates)))

## Warning: package 'bindrcpp' was built under R version 3.4.4

glimpse(dataset.dates)

## Observations: 902,297
## Variables: 40
## $ start_dates <dttm> 1950-04-18 01:30:00, 1950-04-18 01:45:00, 1951-02...
## $ end_dates   <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N...
## $ STATE__     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...
## $ BGN_DATE    <chr> "4/18/1950 0:00:00", "4/18/1950 0:00:00", "2/20/19...
## $ BGN_TIME    <chr> "0130", "0145", "1600", "0900", "1500", "2000", "0...
## $ TIME_ZONE   <chr> "CST", "CST", "CST", "CST", "CST", "CST", "CST", "...
## $ COUNTY      <dbl> 97, 3, 57, 89, 43, 77, 9, 123, 125, 57, 43, 9, 73,...
## $ COUNTYNAME  <chr> "MOBILE", "BALDWIN", "FAYETTE", "MADISON", "CULLMA...
## $ STATE       <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "A...
## $ EVTYPE      <chr> "TORNADO", "TORNADO", "TORNADO", "TORNADO", "TORNA...
## $ BGN_RANGE   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ BGN_AZI     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ BGN_LOCATI  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ END_DATE    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ END_TIME    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ COUNTY_END  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ COUNTYENDN  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ END_RANGE   <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ END_AZI     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ END_LOCATI  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ LENGTH      <dbl> 14.0, 2.0, 0.1, 0.0, 0.0, 1.5, 1.5, 0.0, 3.3, 2.3,...
## $ WIDTH       <dbl> 100, 150, 123, 100, 150, 177, 33, 33, 100, 100, 40...
## $ F           <int> 3, 2, 2, 2, 2, 2, 2, 1, 3, 3, 1, 1, 3, 3, 3, 4, 1,...
## $ MAG         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ FATALITIES  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 4, 0,...
## $ INJURIES    <dbl> 15, 0, 2, 2, 2, 6, 1, 0, 14, 0, 3, 3, 26, 12, 6, 5...
## $ PROPDMG     <dbl> 25.0, 2.5, 25.0, 2.5, 2.5, 2.5, 2.5, 2.5, 25.0, 25...
## $ PROPDMGEXP  <chr> "K", "K", "K", "K", "K", "K", "K", "K", "K", "K", ...
## $ CROPDMG     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## $ CROPDMGEXP  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ WFO         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ STATEOFFIC  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ ZONENAMES   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ LATITUDE    <dbl> 3040, 3042, 3340, 3458, 3412, 3450, 3405, 3255, 33...
## $ LONGITUDE   <dbl> 8812, 8755, 8742, 8626, 8642, 8748, 8631, 8558, 87...
## $ LATITUDE_E  <dbl> 3051, 0, 0, 0, 0, 0, 0, 0, 3336, 3337, 3402, 3404,...
## $ LONGITUDE_  <dbl> 8806, 0, 0, 0, 0, 0, 0, 0, 8738, 8737, 8644, 8640,...
## $ REMARKS     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA...
## $ REFNUM      <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,...
## $ Year        <dbl> 1950, 1950, 1951, 1951, 1951, 1951, 1951, 1952, 19...

To determine which EVTYPEs were most damaging to population health I explore the impact on population health I explore fatalities and injuries

deaths <- dataset.dates %>% 
        mutate(EVTYPE = (stringr::str_to_upper(EVTYPE))) %>% 
        group_by(EVTYPE, Year) %>%
        filter(!is.na(Year)) %>%
        summarise(Deaths = sum(FATALITIES, na.rm = T), Injuries = sum(INJURIES, na.rm = T)) 

deaths.final <-  deaths %>% 
        mutate(Cause = case_when(str_detect(EVTYPE, ("HEAT|WARM")) ~ "HEAT",
                                str_detect(EVTYPE, ("FLOOD|STORM")) ~ "STORM",
                                str_detect(EVTYPE, ("TORNADO|WIND")) ~ "WIND",
                                str_detect(EVTYPE, ("COLD"))~ "COLD")) 

deaths.final$Cause[is.na(deaths.final$Cause)] <- "OTHER"

deaths.final %>% 
        group_by(Cause) %>% 
        summarise(Deaths = sum(Deaths))

## # A tibble: 5 x 2
##   Cause Deaths
##   <chr>  <dbl>
## 1 COLD     78.
## 2 HEAT   1142.
## 3 OTHER   359.
## 4 STORM   349.
## 5 WIND   4485.

deaths_aggr <- aggregate(Deaths ~ Cause, data = deaths.final, mean)

Let’s plot by average deaths in years recorded.

ggplot(deaths_aggr, aes(x = reorder(Cause, -Deaths), y = Deaths)) + geom_bar(stat = "identity") +
        labs(x = "Cause", y = "Average Deaths in Years Recorded")

It looks like Heat and Wind are the two biggest contributers to death, is that the same for economics?

To determine which EVTYPEs were most damaging to economic health I look at PROPDMG and CROPDMG.

dmg <- dataset.dates %>% 
        mutate(EVTYPE = (stringr::str_to_upper(EVTYPE))) %>% 
        group_by(EVTYPE, Year) %>%
        filter(!is.na(Year)) %>%
        summarise(Damage = sum(PROPDMG, na.rm = T) + sum(CROPDMG, na.rm = T))

dmg.final <- dmg %>% 
        mutate(Cause = case_when(str_detect(EVTYPE, ("HEAT|WARM")) ~ "HEAT",
                                 str_detect(EVTYPE, ("FLOOD|STORM")) ~ "STORM",
                                 str_detect(EVTYPE, ("TORNADO|WIND")) ~ "WIND",
                                 str_detect(EVTYPE, ("COLD"))~ "COLD")) 

dmg.final$Cause[is.na(dmg.final$Cause)] <- "OTHER"


dmg_aggr <- aggregate(Damage ~ Cause, data = dmg.final, mean)

ggplot(dmg_aggr, aes(x = reorder(Cause, -Damage), y = Damage)) + geom_bar(stat = "identity") +
        labs(x = "Cause", y = "Average Damage in Years Recorded")

Once again Wind is major cause, however Heat is now the lowest.

Analysis of Weather Events and the Impact on Population Heath and Economic Consequences

Michael Arthur

November 4, 2018

Synopsis

Data Proccessing

Results