Synopsis

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

This analysis tries to answer the following questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Data Processing

loading the data

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, "storm_data.csv")
data <- read.csv("storm_data.csv", stringsAsFactors = F)

Take a subset with the required variables

data <- data[, c(8, 23:28)] 
  1. Question: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
injuries <- data %>%
        group_by(EVTYPE) %>%
        summarize(total_injuries = sum(INJURIES)) %>%
        arrange(desc(total_injuries))

injuries_plot <- ggplot(injuries[1:3, ], aes(x = reorder(EVTYPE, -total_injuries), 
        y = total_injuries)) +
        geom_bar(stat = "identity") +
        xlab("Type of Event") + 
        ylab("Injury") +
        ggtitle("Most Harmful Types of Events")
  1. Question: Across the United States, which types of events have the greatest economic consequences?
properties <- subset(data, data$PROPDMGEXP %in% c("K", "M", "B"))[, c(1, 4:5)]
properties$PROPDMGEXP[properties$PROPDMGEXP == "K"] <- 1000
properties$PROPDMGEXP[properties$PROPDMGEXP == "M"] <- 1000000
properties$PROPDMGEXP[properties$PROPDMGEXP == "B"] <- 1000000000

property_damage <- properties %>%
        mutate(PROPDMGVAL = as.numeric(PROPDMGEXP) * PROPDMG) %>%
        group_by(EVTYPE) %>%
        summarize(total_val = sum(PROPDMGVAL) / 1000000000) %>%
        arrange(desc(total_val))

property_damage_plot <- ggplot(property_damage[1:3, ], aes(x = reorder(EVTYPE, -total_val), 
        y = total_val)) +
        geom_bar(stat = "identity") +
        xlab("Type of Event") + 
        ylab("Property Damage") +
        ggtitle("Types of Events with greatest economic consequences")

Results

  1. Answer: The most harmful type of event across united states is the tornado.
print(injuries_plot)

  1. Answer: The type of event with the greatest economic consequences is the flood.
print(property_damage_plot)