Synopsis

In this analysis we wanted to analyse which weather events cause the most harm to humans and property to help officials to judge the severity and to prepare appropriate responses to severe weather events. For this we looked at the storm database from the U.S. National Oceanic and Atmospheric Administration (NOAA). This database collects major weather events and their impact. The datasets includes data from 1950 to 2011. We sorted the data by their event type and summarised which events caused the most economic damage (crop and property damage) as well as harm to human health (casualties and injuries).

The data is available online and a description can be found here.

The data shows that floods, hurricanes/typhoons and tornados are the cause for the most economic damage while tornados, excessive heat and thunderstorm winds result in the most injuries and deaths.

Data Processing

To process and visualise the data we used the following libraries.

library(data.table)
library(tidyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

The data was downloaded and loaded into the R workspace.

# Download data
if(!file.exists("StormData.csv.bz2")){
    URL<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
    download.file(URL, destfile = "StormData.csv.bz2")
}

#load data as data.table
if(!exists("StormData")){
    StormDataDF <- read.csv("~/RProgramming/Reproducable Research/CourseProject2/StormData.csv.bz2")
    StormData <- as.data.table(StormDataDF)
    rm(StormDataDF)
}

To save memory we selected only the data of interest. We also replaced the strings data in the exponent data to simplify the calculation of economic damage.

StormDataTidy <- StormData %>%
    select(EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP) %>%
    mutate(PROPDMGEXP = case_match(PROPDMGEXP, c("", "-", "+", "?") ~ 0,
                       "B" ~ 9,
                    c("h", "H") ~ 2,
                    "K" ~ 3,
                    c("m", "M") ~ 6,
                    .default = 0)) %>%
    mutate(CROPDMGEXP = case_match(CROPDMGEXP, c("?") ~ 0,
                       "B" ~ 9,
                       "K" ~ 3,
                       c("m", "M") ~ 6,
                       .default = 0))
    
str(StormDataTidy)
## Classes 'data.table' and 'data.frame':   902297 obs. of  7 variables:
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: num  3 3 3 3 3 3 3 3 3 3 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: num  0 0 0 0 0 0 0 0 0 0 ...
##  - attr(*, ".internal.selfref")=<externalptr>

To calculate the the sum of injuries and deaths caused by weather events we created a new data.table. We also sorted and only collected the data for the ten event types with the most impact. We then prepared the data for easier plotting by creating a factor to sort for injuries and deaths.

# Summarise the injury and fatality numbers by EVTYPE
TotalInjury <- StormDataTidy %>%
    group_by(EVTYPE) %>%
    summarise(TOTAL = sum(FATALITIES, INJURIES),
          FATALITIES = sum(FATALITIES),
          INJURIES = sum(INJURIES)) %>%
    slice_max(n = 10, order_by = TOTAL)
TotalInjury
## # A tibble: 10 × 4
##    EVTYPE            TOTAL FATALITIES INJURIES
##    <chr>             <dbl>      <dbl>    <dbl>
##  1 TORNADO           96979       5633    91346
##  2 EXCESSIVE HEAT     8428       1903     6525
##  3 TSTM WIND          7461        504     6957
##  4 FLOOD              7259        470     6789
##  5 LIGHTNING          6046        816     5230
##  6 HEAT               3037        937     2100
##  7 FLASH FLOOD        2755        978     1777
##  8 ICE STORM          2064         89     1975
##  9 THUNDERSTORM WIND  1621        133     1488
## 10 WINTER STORM       1527        206     1321
TotalInjury <- TotalInjury %>% pivot_longer(cols = c(FATALITIES, INJURIES), names_to = "CATEGORY", values_to = "COUNT")

To calculate the monetary damage we created a data.table. We sorted and collected the data for the ten event types which caused the most damage. We then prepared the data for easier plotting by creating a factor to sort for property and crop damage.

# Summarise Damage Data by Eventtype
TotalDamage <- StormDataTidy %>%
    group_by(EVTYPE) %>%
    mutate(CROPTOTAL = CROPDMG * 10^CROPDMGEXP,
           PROPTOTAL = PROPDMG * 10^PROPDMGEXP) %>%
    summarise(TOTAL = sum(PROPTOTAL, CROPTOTAL),
          CROPDMG = sum(CROPTOTAL),
          PROPDMG = sum(PROPTOTAL)) %>%
    slice_max(n = 10, order_by = TOTAL)
TotalDamage
## # A tibble: 10 × 4
##    EVTYPE                    TOTAL     CROPDMG       PROPDMG
##    <chr>                     <dbl>       <dbl>         <dbl>
##  1 FLOOD             150319678257   5661968450 144657709807 
##  2 HURRICANE/TYPHOON  71913712800   2607872800  69305840000 
##  3 TORNADO            57352114049.   414953270  56937160779.
##  4 STORM SURGE        43323541000         5000  43323536000 
##  5 HAIL               18757805433.  3025537890  15732267543.
##  6 FLASH FLOOD        17562129167.  1421317100  16140812067.
##  7 DROUGHT            15018672000  13972566000   1046106000 
##  8 HURRICANE          14610229010   2741910000  11868319010 
##  9 RIVER FLOOD        10148404500   5029459000   5118945500 
## 10 ICE STORM           8967041360   5022113500   3944927860
TotalDamage <- TotalDamage %>% pivot_longer(cols = c(CROPDMG, PROPDMG), names_to = "CATEGORY", values_to = "COUNT")

Results

To visualise the data we plotted both newly created data.tables TotalInjuries and TotalDamage as a barplot using ggplot.

First we looked at the impact on health.

ggplot(TotalInjury, aes(reorder(EVTYPE, -TOTAL), COUNT, fill = CATEGORY)) +
    geom_bar(stat = "identity") +
    theme(axis.text.x = element_text(angle=45, hjust=1)) +
    ylab("Casualties") +
    xlab("Event Type") +
    scale_fill_discrete(labels=c("Fatalities","Injuries")) +
    labs(title = "Most dangerous weather events to human health")

The graph shows the number of casualties, both fatalities and injuries, by event type. We could show that tornados were the most dangerous event type between 1950 and 2011 with 96,979 injuries and deaths followed by excessive heat and thunderstorm winds at around 10,000. Injuries were more common in all event types.

Next we looked at the monetary damage.

ggplot(TotalDamage, aes(reorder(EVTYPE, -COUNT), COUNT, fill = CATEGORY)) +
    geom_bar(stat = "identity") +
    theme(axis.text.x = element_text(angle=45, hjust=1)) +
    ylab("Damage (Dollars)") +
    xlab("Event Type") +
    scale_fill_discrete(labels=c("Crop Damage","Property Damage")) +
    labs(title = "Most costly weather events")

The graph shows floods were the leading cause for economic damage from 1950 to 2011 with 150,319,678,257$ (150 billion $). They were followed by hurricane/typhoons and tornados.

Conclusion

In this analysis we were able to show that historically tornados are the most dangerous weather events to human lives. They are also a major cause for economic damages though they fall behind floods and hurricanes/typhoons. This data could help officials to appropriately judge the severity of weather events.