Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. The analysis will address the following questions:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Synopsis

There are a total of 902,297 events that have been recorded to occurs over the years. The event are categorized into 977 different categories. Floods are the leading cause of injuries while flash flood is the leading cause of fatalities. Property damage is the leading cause of economic consequences from these events with floods and hurricanes causing most of the damage. Crop damage is causes most by flash floods.

Data Processing

Storm data is downloaded from the NOAA storm database if it does not exist in your local storage and loaded into the storm variable.

## Url to download the storm data
download_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

## location and name to store the file
storm_folder <- "storm"
storm_file <- "storm_data.csv.bz2"
storm_path <- paste(storm_folder, storm_file, sep = "/")

if (!file.exists(storm_path)) {
    ## if storm folder does not exist create
    if (!file.exists(storm_folder)) dir.create(storm_folder)
    ## download the storm data and store in the created folder
    download.file(download_url, storm_path)
}

## Load the storm data downloaded
library(readr)
storm <- read_csv(storm_path)

A function to standardise the cost for both crop damage and property damage is created as below

## Function to standardise the cost to million
standardise_cost <- function(cost, unit) {
    unit <- ifelse(is.na(unit), "", toupper(unit))
    factor <- if (unit == "H") {
                1E2
              }
              else if (unit == "K") {
                  1E3
              } 
              else if (unit == "M") {
                  1E6
              }
              else if (unit == "B") {
                  1E9
              }
              else {
                  1
              }
    cost * factor / 1E6
}

The relevant columns are selected from the main data which includes eventy type, injuries, fatality, property damage and crop damage. The data is summarised to give the sum of all these events as shown below.

## Filter the columns to only include those answering the study questions
## Change all the column names to be lower case
## standardise the cost to be in millions
library(dplyr)
storm_lean <- storm %>%
    select(EVTYPE, FATALITIES:CROPDMGEXP) %>%
    rename_all(tolower) %>%
    mutate(propdmg = mapply(standardise_cost,propdmg, propdmgexp),
           cropdmg = mapply(standardise_cost,cropdmg, cropdmgexp),
           evtype = as.factor(evtype)) %>%
    group_by(evtype) %>%
    summarise(injuries = sum(injuries, na.rm = T),
              fatalities = sum(fatalities, na.rm = T),
              prop_dmg = sum(propdmg, na.rm = T),
              crop_dmg = sum(cropdmg, na.rm = T))

Results

There are a total of 902,297 events that have been recorded to occurs over the years. The event are categorized into 977 different categories

## Total event occurred
dim(storm)
## [1] 902297     37
## Event categories
length(unique(storm$EVTYPE))
## [1] 977

Types of events are most harmful with respect to population health in the United States

Floods are the leading cause of injuries while flash flood is the leading cause of fatalities

library(dplyr)
library(tidyr)
library(ggplot2)

storm_lean %>%
    arrange(-fatalities, -injuries) %>%
    top_n(10) %>%
    pivot_longer(fatalities:injuries, names_to = "event") %>%
    ggplot(aes(x = reorder(evtype, -value), y = value, fill=event)) +
        geom_col(position="dodge") +
        labs(x = "Event Type", y = "Count") +
        ggtitle("Top 10 Events that cause most of the injuries and fatalities") +
        theme(axis.text.x = element_text(angle = 30, vjust = 0.7)) +
        scale_fill_manual(values = c("salmon","pink"))

Types of events have the greatest economic consequences Across the United States

Property damage is the leading cause of economic consequences from these events with floods and hurricanes causing most of the damage. Crop damage is causes most by flash floods.

library(dplyr)
library(tidyr)
library(ggplot2)

storm_lean %>%
    arrange(-prop_dmg, -crop_dmg) %>%
    top_n(10) %>%
    pivot_longer(prop_dmg:crop_dmg, names_to = "event") %>%
    ggplot(aes(x = reorder(evtype, -value), y = value, fill=event)) +
        geom_col(position="dodge") +
        labs(x = "Event Type", y = "Count") +
        ggtitle("Top 10 Events that cause most of the economic consequences") +
        theme(axis.text.x = element_text(angle = 45, vjust = 0.7)) +
        scale_fill_manual(values = c("green","seagreen"))