Devansh Saxena

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. The events in the database start in the year 1950 and end in November 2011.

1.) Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

2.) Across the United States, which types of events have the greatest economic consequences?

Introduction

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data Processing

The analysis was performed on Storm Events Database, provided by National Climatic Data Center. The data is from a comma-separated-value file available here. There is also some documentation of the data available here.

The first step is to read the data into a data frame.

rm(list=ls())

library(knitr)
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
if(!file.exists("./StormData.csv.bz2")) {
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile="StormData.csv.bz2")
}

rawData <- read.csv("StormData.csv.bz2")

Making Data Tidy

stormData <- rawData[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
stormData <- stormData[!grepl("summary",stormData$EVTYPE,ignore.case=TRUE),]
unique(stormData[grepl("TSTM", stormData$EVTYPE),]$EVTYPE)
##  [1] TSTM WIND                  TORNADOES, TSTM WIND, HAIL
##  [3] TSTM WIND 51               TSTM WIND 50              
##  [5] TSTM WIND 52               TSTM WIND 55              
##  [7] TSTM WIND G58              TSTM WIND DAMAGE          
##  [9] TSTM WINDS                 TSTMW                     
## [11] TSTM WIND 65)              TSTM WIND/HAIL            
## [13] TSTM WIND (G45)            TSTM HEAVY RAIN           
## [15] TSTM WIND 40               TSTM WIND 45              
## [17] TSTM WIND (41)             TSTM WIND (G40)           
## [19] TSTM WND                    TSTM WIND                
## [21] TSTM WIND AND LIGHTNING     TSTM WIND (G45)          
## [23] TSTM WIND  (G45)           TSTM WIND (G35)           
## [25] TSTM                       TSTM WIND G45             
## [27] NON-TSTM WIND              NON TSTM WIND             
## [29] MARINE TSTM WIND          
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND
stormData[stormData$EVTYPE == "TSTM", ]$EVTYPE = "THUNDERSTORM WIND"
stormData[stormData$EVTYPE == "TSTMW", ]$EVTYPE = "THUNDERSTORM WIND"
stormData[stormData$EVTYPE == "TSTM WIND", ]$EVTYPE = "THUNDERSTORM WIND"
stormData[stormData$EVTYPE == " TSTM WIND", ]$EVTYPE = "THUNDERSTORM WIND"
stormData[stormData$EVTYPE == "TSTM WINDS", ]$EVTYPE = "THUNDERSTORM WIND"
stormData[stormData$EVTYPE == "TSTM WND", ]$EVTYPE = "THUNDERSTORM WIND"
unique(stormData$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
stormData$PROPEXP[stormData$PROPDMGEXP == "H"] <- 100
stormData$PROPEXP[stormData$PROPDMGEXP == "h"] <- 100
stormData$PROPEXP[stormData$PROPDMGEXP == "K"] <- 1000
stormData$PROPEXP[stormData$PROPDMGEXP == "M"] <- 1000000
stormData$PROPEXP[stormData$PROPDMGEXP == "m"] <- 1000000
stormData$PROPEXP[stormData$PROPDMGEXP == "B"] <- 1000000000
stormData$PROPEXP[stormData$PROPDMGEXP == "8"] <- 100000000
stormData$PROPEXP[stormData$PROPDMGEXP == "7"] <- 10000000
stormData$PROPEXP[stormData$PROPDMGEXP == "6"] <- 1000000
stormData$PROPEXP[stormData$PROPDMGEXP == "5"] <- 100000
stormData$PROPEXP[stormData$PROPDMGEXP == "4"] <- 10000
stormData$PROPEXP[stormData$PROPDMGEXP == "3"] <- 1000
stormData$PROPEXP[stormData$PROPDMGEXP == "2"] <- 100
stormData$PROPEXP[stormData$PROPDMGEXP == "1"] <- 10
stormData$PROPEXP[stormData$PROPDMGEXP == "0"] <- 1
stormData$PROPEXP[stormData$PROPDMGEXP == "-"] <- 0
stormData$PROPEXP[stormData$PROPDMGEXP == "?"] <- 0
stormData$PROPEXP[stormData$PROPDMGEXP == "+"] <- 0
stormData$PROPDMGCOST <- as.numeric(stormData$PROPDMG) * as.numeric(stormData$PROPDMGEXP)
stormData$CROPEXP[stormData$CROPDMGEXP == "H"] <- 100
stormData$CROPEXP[stormData$CROPDMGEXP == "h"] <- 100
stormData$CROPEXP[stormData$CROPDMGEXP == "K"] <- 1000
stormData$CROPEXP[stormData$CROPDMGEXP == "M"] <- 1000000
stormData$CROPEXP[stormData$CROPDMGEXP == "m"] <- 1000000
stormData$CROPEXP[stormData$CROPDMGEXP == "B"] <- 1000000000
stormData$CROPEXP[stormData$CROPDMGEXP == "8"] <- 100000000
stormData$CROPEXP[stormData$CROPDMGEXP == "7"] <- 10000000
stormData$CROPEXP[stormData$CROPDMGEXP == "6"] <- 1000000
stormData$CROPEXP[stormData$CROPDMGEXP == "5"] <- 100000
stormData$CROPEXP[stormData$CROPDMGEXP == "4"] <- 10000
stormData$CROPEXP[stormData$CROPDMGEXP == "3"] <- 1000
stormData$CROPEXP[stormData$CROPDMGEXP == "2"] <- 100
stormData$CROPEXP[stormData$CROPDMGEXP == "1"] <- 10
stormData$CROPEXP[stormData$CROPDMGEXP == "0"] <- 1
stormData$CROPEXP[stormData$CROPDMGEXP == "-"] <- 0
stormData$CROPEXP[stormData$CROPDMGEXP == "?"] <- 0
stormData$CROPEXP[stormData$CROPDMGEXP == "+"] <- 0
stormData$CROPDMGCOST <- as.numeric(stormData$CROPDMG) * as.numeric(stormData$CROPDMGEXP)

Results

Tornadoes cause most number of deaths and injuries among all event types. There are more than 5,000 deaths and more than 10,000 injuries in the last 60 years in US, due to tornadoes. The other event types that are most dangerous with respect to population health are excessive heat and flash floods.

fatal <- aggregate(FATALITIES ~ EVTYPE, data=stormData, sum)
fatal <- fatal[fatal$FATALITIES>0,]
fatal <- fatal[order(fatal$FATALITIES, decreasing=TRUE),]
fatal <- fatal[1:8,]
head(fatal)
##                EVTYPE FATALITIES
## 766           TORNADO       5633
## 128    EXCESSIVE HEAT       1903
## 151       FLASH FLOOD        978
## 273              HEAT        937
## 462         LIGHTNING        816
## 692 THUNDERSTORM WIND        637

As we can see, fatalities caused by Tornados and Excessive Heat are the top natural events. Looking at Injuries:

injury <- aggregate(INJURIES ~ EVTYPE, data=stormData, sum)
injury <- injury[injury$INJURIES>0,]
injury <- injury[order(injury$INJURIES, decreasing=TRUE),]
injury <- injury[1:5,]
head(injury)
##                EVTYPE INJURIES
## 766           TORNADO    91346
## 692 THUNDERSTORM WIND     8445
## 168             FLOOD     6789
## 128    EXCESSIVE HEAT     6525
## 462         LIGHTNING     5230

Now we plot this data to visually see the impact of weather events which impact the population in both Fatilities and Injuries.

barplot(fatal[1:5, 2], col=terrain.colors(5), legend.text=fatal[1:5, 1], ylab = "# of Fatalities", main = "Fatalities from Natural Events", cex.names=0.75)

barplot(injury[1:5, 2], col=terrain.colors(5), legend.text=injury[1:5, 1] , ylab = "# of Injuries", main = "Injuries from Natural Events", cex.names=0.75)

Across the United States, which types of events have the greatest economic consequences?

Taking a similar approach, we have the total amount of property damage during the Tidy process and it can be argued that it be performed here vs. a Tidy process as the processing and information may not be used during an analysis, but in this case, we will assume that we knew we needed to process the data.

Aggregate property damage along with the Event Type.

property <- aggregate(PROPDMG ~ EVTYPE, data=stormData, sum)
property <- property[property$PROPDMG >0,]
property <- property[order(property$PROPDMG, decreasing=TRUE),]
head(property)
##                EVTYPE   PROPDMG
## 766           TORNADO 3212258.2
## 692 THUNDERSTORM WIND 2213026.8
## 151       FLASH FLOOD 1420124.6
## 168             FLOOD  899938.5
## 242              HAIL  688693.4
## 462         LIGHTNING  603351.8
crop <- aggregate(CROPDMGCOST ~ EVTYPE, data=stormData, sum)
crop <- crop[crop$CROPDMGCOST >0,]
crop <- crop[order(crop$CROPDMGCOST, decreasing=TRUE),]
head(crop)
##                EVTYPE CROPDMGCOST
## 242              HAIL   4061556.6
## 151       FLASH FLOOD   1256889.9
## 692 THUNDERSTORM WIND   1233545.6
## 168             FLOOD   1187264.0
## 766           TORNADO    700120.5
## 93            DROUGHT    262189.6
barplot(property[1:5, 2], col=terrain.colors(5), legend.text=property[1:5, 1], ylab = "Property Damage ($)", main = "Property Damage from Natural Events", cex.names=0.75)