Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage

Data Processing

  1. Loading libraries
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
  1. Reading data and showing structure
linkURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(linkURL, destfile = "repdata_data_StormData.csv.bz2")
stormdb <- read.csv("repdata_data_StormData.csv.bz2", header = TRUE, sep = ",")
file.remove("repdata_data_StormData.csv.bz2")
  1. Creating subset just to work better
storm.selected <- stormdb %>% 
        select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
  1. Correcting PROPDMGEXP
sort(unique(storm.selected$PROPDMGEXP))
##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K" "m" "M"
storm.selected$PROPDMGEXP[(storm.selected$PROPDMGEXP == "") | (storm.selected$PROPDMGEXP == "-") | (storm.selected$PROPDMGEXP == "?") | (storm.selected$PROPDMGEXP == "+") | (storm.selected$PROPDMGEXP == "0")] <- 10^0
storm.selected$PROPDMGEXP[(storm.selected$PROPDMGEXP == "1")] <- 10^1
storm.selected$PROPDMGEXP[(storm.selected$PROPDMGEXP == "2") | (storm.selected$PROPDMGEXP == "h") | (storm.selected$PROPDMGEXP == "H")] <- 10^2
storm.selected$PROPDMGEXP[(storm.selected$PROPDMGEXP == "3") | (storm.selected$PROPDMGEXP == "k") | (storm.selected$PROPDMGEXP == "K")] <- 10^3
storm.selected$PROPDMGEXP[(storm.selected$PROPDMGEXP == "4")] <- 10^4
storm.selected$PROPDMGEXP[(storm.selected$PROPDMGEXP == "5")] <- 10^5
storm.selected$PROPDMGEXP[(storm.selected$PROPDMGEXP == "6") | (storm.selected$PROPDMGEXP == "m") | (storm.selected$PROPDMGEXP == "M")] <- 10^6
storm.selected$PROPDMGEXP[(storm.selected$PROPDMGEXP == "7")] <- 10^7
storm.selected$PROPDMGEXP[(storm.selected$PROPDMGEXP == "8")] <- 10^8
storm.selected$PROPDMGEXP[(storm.selected$PROPDMGEXP == "B")] <- 10^9
storm.selected$PROPDMGEXP <- as.numeric(storm.selected$PROPDMGEXP)
sort(unique(storm.selected$PROPDMGEXP))
## [1] 1e+01 1e+02 1e+03 1e+04 1e+05 1e+06 1e+07 1e+08 1e+09
  1. Correcting CROPDMGEXP
sort(unique(storm.selected$CROPDMGEXP))
## [1] ""  "?" "0" "2" "B" "k" "K" "m" "M"
storm.selected$CROPDMGEXP[(storm.selected$CROPDMGEXP == "") | (storm.selected$CROPDMGEXP == "?") | (storm.selected$CROPDMGEXP == "0")] <- 10^0
storm.selected$CROPDMGEXP[(storm.selected$CROPDMGEXP == "2")] <- 10^2
storm.selected$CROPDMGEXP[(storm.selected$CROPDMGEXP == "k") | (storm.selected$CROPDMGEXP == "K")] <- 10^3
storm.selected$CROPDMGEXP[(storm.selected$CROPDMGEXP == "m") | (storm.selected$CROPDMGEXP == "M")] <- 10^6
storm.selected$CROPDMGEXP[(storm.selected$CROPDMGEXP == "B")] <- 10^9
storm.selected$CROPDMGEXP <- as.numeric(storm.selected$CROPDMGEXP)
sort(unique(storm.selected$CROPDMGEXP))
## [1] 1e+00 1e+02 1e+03 1e+06 1e+09
  1. Creating PROP, CROP and TOTAL DAMAGE VARIABLE
storm.selected$PROP <- storm.selected$PROPDMG * storm.selected$PROPDMGEXP
storm.selected$CROP <- storm.selected$CROPDMG * storm.selected$CROPDMGEXP
storm.selected$DMG <- storm.selected$PROP + storm.selected$CROP
  1. Creating data for plotting
storm.plotting <- storm.selected %>% group_by(EVTYPE) %>% summarise(Fatalities = sum(FATALITIES), Injuries = sum(INJURIES), People = Fatalities + Injuries) %>% filter(People > 0) %>% arrange(desc(People))
storm.plotting2 <- storm.selected %>% group_by(EVTYPE) %>% summarise(TotalDamage = sum(DMG)) %>% filter(TotalDamage > 0) %>% arrange(desc(TotalDamage))

Answering questions

1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

We can select the most harmful as the max of the sum of FATALITIES and INJURIES

g <- ggplot(storm.plotting[1:5, ], 
            aes(x = reorder(EVTYPE, -People), 
                y= People, 
                fill = EVTYPE))
g + 
        geom_col(show.legend = FALSE, width = 0.8, color="black") + 
        coord_flip() + 
        labs(x = "Storm Event Type", 
             y = "People affected") +
        theme_bw()

most_harmful <- storm.plotting[1, ]$EVTYPE

The most harmful event is TORNADO.

2. Across the United States, which types of events have the greatest economic consequences?

g2 <- ggplot(storm.plotting2[1:5, ], 
            aes(x = reorder(EVTYPE, -TotalDamage), 
                y= TotalDamage, 
                fill = EVTYPE))
g2 + 
        geom_col(show.legend = FALSE, width = 0.8, color="black") + 
        coord_flip() + 
        labs(x = "Storm Event Type", 
             y = "Economic Loss (USD)") +
        theme_bw()

most_economic <- storm.plotting2[1, ]$EVTYPE

The worst economic event is FLOOD.