Synopsis

In this report, we aim to assess the destruction caused by weather events between 1950 and 2011. Our goal is to determine which event causes the most damage, as measured by both health risks (injures and deaths) and financial risks (property and crop damage). From these data, we have determined that, across of the documented events that Tornadoes are most harmful with respect to both health and economic consequences.

Loading and Processing the Raw Data

Reading in the data

We loaded in the necessary libraries and read in the data from the included zip archive. The data is in .csv format and the specific variables that we need do not need any tidying

library(data.table)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(ggpubr)

data_file <- "repdata_data_StormData.csv.bz2"

data <- read.csv(data_file)

After reading in the data, we check the first few rows of the dataset.

dim(data)
## [1] 902297     37
head(data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6

Processing the data

We first want to answer the question, which types of events are most harmful with respect to population health? To assess this, we must summarize the variables, injuries and deaths, by weather event.

data_deaths <- data %>%
                group_by(event = EVTYPE) %>%
                summarize(deaths = sum(FATALITIES)) %>%
                arrange(desc(deaths)) %>%
                top_n(10, wt = deaths)
## `summarise()` ungrouping output (override with `.groups` argument)
data_injuries <- data %>%
        group_by(event = EVTYPE) %>%
        summarize(injuries = sum(INJURIES)) %>%
        arrange(desc(injuries)) %>%
        top_n(10, wt = injuries)
## `summarise()` ungrouping output (override with `.groups` argument)

Next, we want to answer the question, which type of events have the greatest economic consequences? To assess this, we must summarize the variables, crop damange and property damage.

data_damage <- data %>%
        group_by(event = EVTYPE) %>%
        summarize(property_damage = sum(PROPDMG)/1000,
                crop_damage = sum(CROPDMG)/1000,
                total_damage = (property_damage + crop_damage)) %>%
        arrange(desc(total_damage)) %>%
        top_n(10, wt = total_damage)
## `summarise()` ungrouping output (override with `.groups` argument)

Results

Events most harmful to population health

In order to show the events with the greatest health risks, we will list the top 10 in terms of both deaths and injuries.

data_deaths
## # A tibble: 10 x 2
##    event          deaths
##    <chr>           <dbl>
##  1 TORNADO          5633
##  2 EXCESSIVE HEAT   1903
##  3 FLASH FLOOD       978
##  4 HEAT              937
##  5 LIGHTNING         816
##  6 TSTM WIND         504
##  7 FLOOD             470
##  8 RIP CURRENT       368
##  9 HIGH WIND         248
## 10 AVALANCHE         224
data_injuries
## # A tibble: 10 x 2
##    event             injuries
##    <chr>                <dbl>
##  1 TORNADO              91346
##  2 TSTM WIND             6957
##  3 FLOOD                 6789
##  4 EXCESSIVE HEAT        6525
##  5 LIGHTNING             5230
##  6 HEAT                  2100
##  7 ICE STORM             1975
##  8 FLASH FLOOD           1777
##  9 THUNDERSTORM WIND     1488
## 10 HAIL                  1361

We will also produce a barchart showing that Torndaoes, clearly, cause the greatest risk to the population’s health.

death_plot <- ggplot(data_deaths, aes(event, deaths)) +
        geom_bar(stat = "identity", aes(fill = event)) +
        ylab("Total Deaths") +
        xlab("Event Type") +
        theme(axis.text.x = element_text(angle = 90, size = 6),
                legend.position = "none",
                axis.title.x = element_text(color = "blue", 
                        size = 12),
                axis.title.y = element_text(color = "blue", 
                        size = 12))
        
injuries_plot <- ggplot(data_injuries, aes(event, injuries)) +
        geom_bar(stat = "identity", aes(fill = event)) +
        ylab("Total Injuries") +
        xlab("Event Type") +
        theme(axis.text.x = element_text(angle = 90, size =6),
                legend.position = "none",
        axis.title.x = element_text(color = "blue", 
                size = 12),
        axis.title.y = element_text(color = "blue", 
                size = 12))

ggarrange(death_plot, injuries_plot, ncol = 2, 
        labels = c("Total Deaths per Event Type", 
                "Total Injuries per Event Type"),
        font.label = list(color = "blue"))

Events with gratest economic consequences

We will do the same for economic risk. Here is the top 10:

data_damage
## # A tibble: 10 x 4
##    event              property_damage crop_damage total_damage
##    <chr>                        <dbl>       <dbl>        <dbl>
##  1 TORNADO                      3212.      100.          3312.
##  2 FLASH FLOOD                  1420.      179.          1599.
##  3 TSTM WIND                    1336.      109.          1445.
##  4 HAIL                          689.      580.          1268.
##  5 FLOOD                         900.      168.          1068.
##  6 THUNDERSTORM WIND             877.       66.8          944.
##  7 LIGHTNING                     603.        3.58         607.
##  8 THUNDERSTORM WINDS            446.       18.7          465.
##  9 HIGH WIND                     325.       17.3          342.
## 10 WINTER STORM                  133.        1.98         135.

And here is the barchart showing, again, that Tornadoes cause the most risk with regards to finances, as well.

ggplot(data_damage, aes(event, total_damage)) +
        geom_bar(stat = "identity", aes(fill = event)) +
        ylab("Total Damage (,000s)") +
        xlab("Event Type") +
        ggtitle("Total Damage per Event Type") +
        theme(axis.text.x = element_text(angle = 90, size =6),
                legend.position = "none", 
                plot.title = element_text(hjust = 0.5, color = "blue", 
                        size = 16, face = "bold"),
                axis.title.x = element_text(color = "blue", 
                        size = 12, face ="bold"),
                axis.title.y = element_text(color = "blue", 
                        size = 12, face ="bold"))