Synonpsis

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.


Data Processing

Load library

library(rmarkdown)
library(dplyr)
library(ggplot2)
library(readr)

Load Dataset

if(!file.exists('repdata-data-StormData.csv.bz2')) {
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "repdata-data-StormData.csv.bz2")}

if(!file.exists('repdata-data-StormData.csv')) {
  bunzip2("repdata-data-StormData.csv.bz2", overwrite=T, remove=F)
}

if (!"rawData" %in% ls()) {
  rawData = read_csv("repdata-data-StormData.csv")
}

Results

Damaged data

Based on the National Weather Service Codebook [https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf]

EVTYPE as a measure of event type (e.g. tornado, flood, etc.) FATALITIES as a measure of harm to human health INJURIES as a measure of harm to human health PROPDMG as a measure of property damage and hence economic damage in USD PROPDMGEXP as a measure of magnitude of property damage (e.g. thousands, millions USD, etc.) CROPDMG as a measure of crop damage and hence economic damage in USD CROPDMGEXP as a measure of magnitude of crop damage (e.g. thousands, millions USD, etc.)

damaged_data = rawData %>% select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
knitr::kable(head(damaged_data, 5))
EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
TORNADO 0 15 25.0 K 0 NA
TORNADO 0 0 2.5 K 0 NA
TORNADO 0 2 25.0 K 0 NA
TORNADO 0 2 2.5 K 0 NA
TORNADO 0 2 2.5 K 0 NA

Questions 1

Across the United States, which types of events(as indicated in the EVTYPE variable) are most harmful with respect to population health?

Transform Data

fatalities = damaged_data %>% group_by(EVTYPE) %>% summarise(sum_fatalities = sum(FATALITIES)) %>% arrange(desc(sum_fatalities))
injuries = damaged_data %>% group_by(EVTYPE) %>% summarise(sum_injuries = sum(INJURIES)) %>% arrange(desc(sum_injuries))

Glimpse data

knitr::kable(head(fatalities, 5))
EVTYPE sum_fatalities
TORNADO 5633
EXCESSIVE HEAT 1903
FLASH FLOOD 978
HEAT 937
LIGHTNING 816
knitr::kable(head(injuries, 5))
EVTYPE sum_injuries
TORNADO 91346
TSTM WIND 6957
FLOOD 6789
EXCESSIVE HEAT 6525
LIGHTNING 5230

Plot

par(mfrow=c(1,2))
barplot(head(fatalities,5)$sum_fatalities, 
        las=2,
        names.arg = head(fatalities,5)$EVTYPE,
        main = "Top 5 Causes for Fatalities", 
        ylab = "count", 
        col = rainbow(5),
        cex.names=0.5)
barplot(head(injuries,5)$sum_injuries,
        las=2, names.arg = head(injuries,5)$EVTYPE, 
        main = "Top 5 Causes for Injuries", 
        ylab = "count", 
        col = rainbow(5),
        cex.names=0.5)

Question 2

Across the United States, which types of events have the greatest economic consequences?

Transform Data

prop_data = damaged_data %>% select(EVTYPE, PROPDMG, PROPDMGEXP)


unique(prop_data$PROPDMGEXP)
##  [1] "K" "M" NA  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
prop_data$PROPDMGEXP = gsub("K|k", "3", prop_data$PROPDMGEXP)
prop_data$PROPDMGEXP = gsub("M|m", "6", prop_data$PROPDMGEXP)
prop_data$PROPDMGEXP = gsub("B", "9", prop_data$PROPDMGEXP)
prop_data$PROPDMGEXP = gsub("H|h", "2", prop_data$PROPDMGEXP)
prop_data$PROPDMGEXP = gsub("\\-|\\+|\\?","0",prop_data$PROPDMGEXP)
prop_data$PROPDMGEXP = as.numeric(prop_data$PROPDMGEXP)


prop_data = prop_data %>% mutate(property_damage = PROPDMG * 10^PROPDMGEXP) %>% group_by(EVTYPE) %>% summarise(sum_prop = sum(property_damage)) %>% arrange(desc(sum_prop))
crop_data = damaged_data %>% select(EVTYPE, CROPDMG, CROPDMGEXP)

unique(crop_data$CROPDMGEXP)
## [1] NA  "M" "K" "m" "B" "?" "0" "k" "2"
crop_data$CROPDMGEXP = gsub("K|k", "3", crop_data$CROPDMGEXP)
crop_data$CROPDMGEXP = gsub("M|m", "6", crop_data$CROPDMGEXP)
crop_data$CROPDMGEXP = gsub("B", "9", crop_data$CROPDMGEXP)
crop_data$CROPDMGEXP = gsub("|\\?","0",crop_data$CROPDMGEXP)
crop_data$CROPDMGEXP = as.numeric(crop_data$CROPDMGEXP)

crop_data = crop_data %>% mutate(crop_damage = CROPDMG * 10^CROPDMGEXP) %>% group_by(EVTYPE) %>% summarise(sum_crop = sum(crop_damage)) %>% arrange(desc(sum_crop))

Glimpse Data

knitr::kable(head(prop_data, 5))
EVTYPE sum_prop
TORNADOES, TSTM WIND, HAIL 1600000000
WILD FIRES 624100000
HAILSTORM 241000000
HIGH WINDS/COLD 110500000
River Flooding 106155000
knitr::kable(head(crop_data, 5))
EVTYPE sum_crop
EXCESSIVE WETNESS 1.42e+62
COLD AND WET CONDITIONS 6.60e+61
Early Frost 4.20e+61
Damaging Freeze 3.41e+61
Freeze 1.05e+61

Plot

par(mfrow = c(1,2))

barplot(head(prop_data,5)$sum_prop, las = 2, names = head(prop_data, 5)$EVTYPE, ylab = "Property Damage",
        main = "Top 5 Causes for Property Damage",
        col = rainbow(5),
        cex.names = 0.5,
        cex.axis = 0.5)



barplot(head(crop_data,5)$sum_crop, las = 2, names = head(crop_data, 5)$EVTYPE, ylab = "Crop Damage",
        main = "Top 5 Causes for Crop Damage",
        col = rainbow(5),
        cex.names = 0.5,
        cex.axis = 0.5)

Conclusion


According to U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, most harmful event to population health is ‘Tornado’ which causes 5,633 fatalities and 91,346 injuries. Additionally, events that have the greatest economic consequences are ‘Tornadoes, Thunderstorm wind, and Hail’ for property damage and ‘Excessive wetness’ for crop damage.