PA2 - U.S. NOAA Storm Events Impact Analysis (1950 - 2011)

Data Analysis of the Storm Event’s Impact

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project assignment involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The database, we are using to explore data, tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Where does the data come from?

NCDC receives Storm Data from the National Weather Service. The National Weather service receives their information from a variety of sources, which include but are not limited to: county, state and federal emergency management officials, local law enforcement officials, skywarn spotters, NWS damage surveys, newspaper clipping services, the insurance industry and the general public.

How accurate is the data?

Storm Data is an official publication of the National Oceanic and Atmospheric Administration (NOAA) which documents the occurrence of storms and other significant weather phenomena having sufficient intensity to cause loss of life, injuries, significant property damage, and/or disruption to commerce. The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

# Loading Libraries.    

suppressWarnings(library(knitr))
#echo = TRUE  # Make code visible
#install.packages("R.utils")
#install.packages("ggeasy")
suppressWarnings(library(R.utils))
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.8.1 (2020-08-26 16:20:06 UTC) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.24.0 (2020-08-26 16:11:58 UTC) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## The following object is masked from 'package:R.methodsS3':
## 
##     throw
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## The following objects are masked from 'package:base':
## 
##     attach, detach, load, save
## R.utils v2.10.1 (2020-08-26 22:50:31 UTC) successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## The following object is masked from 'package:utils':
## 
##     timestamp
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, nullfile, parse,
##     warnings
suppressWarnings(library(scales))
suppressWarnings(library(plyr))
suppressWarnings(library(ggplot2))
library(cdcfluview)
## Warning: package 'cdcfluview' was built under R version 4.0.3
library(mosaic)
## Warning: package 'mosaic' was built under R version 4.0.3
## Registered S3 method overwritten by 'mosaic':
##   method                           from   
##   fortify.SpatialPolygonsDataFrame ggplot2
## 
## The 'mosaic' package masks several functions from core packages in order to add 
## additional features.  The original behavior of these functions should not be affected by this.
## 
## Attaching package: 'mosaic'
## The following objects are masked from 'package:dplyr':
## 
##     count, do, tally
## The following object is masked from 'package:Matrix':
## 
##     mean
## The following object is masked from 'package:ggplot2':
## 
##     stat
## The following object is masked from 'package:plyr':
## 
##     count
## The following object is masked from 'package:scales':
## 
##     rescale
## The following object is masked from 'package:R.utils':
## 
##     resample
## The following objects are masked from 'package:stats':
## 
##     binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
##     quantile, sd, t.test, var
## The following objects are masked from 'package:base':
## 
##     max, mean, min, prod, range, sample, sum
library(dplyr)
opts_chunk$set(echo=TRUE)

Loading Data

#  Download the data from the URL provided with the instructions.
if (!file.exists("data/data.csv")) {
    fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
    download.file(fileurl, destfile = "data/data.bz2", method="curl")
    bunzip2("data/data.bz2",destname='data/data.csv')
}
#  Read data.
stormData <- read.csv("data/data.csv")

# Checking the information

# Dimension:  Total of observations and Variables
dim(stormData)
## [1] 902297     37
# Columns/Variables Names
names(stormData)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

The destructive power of storm surge and large battering waves can result in loss of life (fatalities and injuries). Storm surge can travel several miles inland. In estuaries and bayous, salt water intrusion endangers public health and the environment. Below data analysis will show the most harmful weather events that cause fatalities and injuries.

Data Transformation to retrieve the most harmful events

#  Create a vector with type of events:  
#  -  Event Type(8)
#  -  Fatalities(23)
#  -  Injuries(24)
classifiedStormdata <- stormData[,c(8, 23, 24)]
classifiedStormdata[,1] <- as.character(classifiedStormdata[,1])
aggdata <- aggregate(classifiedStormdata[,2:3], 
                     by=list(classifiedStormdata$EVTYPE), 
                     FUN=sum, 
                     na.rm=T)
names(aggdata)[1] <- "EVTYPE"

Top Ten Fatalities Results

library(ggplot2)
library(scales)
# Top 10 FATALITIES
sortedDataFatalities <- aggdata[order(-aggdata[,2]), ]
topTenfatalities <- head(sortedDataFatalities, n=10)

# List of the Top Ten Fatalities
head(topTenfatalities)
##             EVTYPE FATALITIES INJURIES
## 834        TORNADO       5633    91346
## 130 EXCESSIVE HEAT       1903     6525
## 153    FLASH FLOOD        978     1777
## 275           HEAT        937     2100
## 464      LIGHTNING        816     5230
## 856      TSTM WIND        504     6957
# Plot the Top Ten Fatalities.
ggplot(data=topTenfatalities, 
       aes(x=EVTYPE, 
           y=log10(FATALITIES))) + 
    geom_bar(stat = "identity", fill="#0072B2", colour="black") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    labs(title = "The most harmful events causing FATALITIES")+
    labs(y=expression(log[10](FATALITIES)), x="U.S. NOAA Event Type")

Top Ten Injuries Results

#  B.  INJURIES
sortedDatainjuries <- aggdata[order(-aggdata[,3]), ]
names(sortedDatainjuries)[1] <- "EVTYPE"
topTeninjuries <- head(sortedDatainjuries, n=10)

# List of the Top Ten Injuries
head(topTenfatalities)
##             EVTYPE FATALITIES INJURIES
## 834        TORNADO       5633    91346
## 130 EXCESSIVE HEAT       1903     6525
## 153    FLASH FLOOD        978     1777
## 275           HEAT        937     2100
## 464      LIGHTNING        816     5230
## 856      TSTM WIND        504     6957
# Plot the graphic with the Top Ten Injuries
ggplot(data=topTeninjuries, 
       aes(x=EVTYPE, y=log10(INJURIES))) + 
    geom_bar(stat = "identity", 
             fill="#009E73", 
             colour="black") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    labs(title = "The most harmful events causing INJURIES")+
    labs(y=expression(log[10](INJURIES)), x="U.S. NOAA Event Type") 

intersect(topTenfatalities[,1], topTeninjuries[,1])
## [1] "TORNADO"        "EXCESSIVE HEAT" "FLASH FLOOD"    "HEAT"          
## [5] "LIGHTNING"      "TSTM WIND"      "FLOOD"

Across the United States, which types of events have the greatest economic consequences?

Storm Surge is an abnormal rise of water generated by a storm’s winds. The tremendous power of storm surge and large battering waves can result in buildings destroyed, beach and dune erosion and road and bridge damage along the coast. In this section, we will analyze the data in order to conclude the greatest economic consequences of those harmful events.

Tranforming data to calculate the economist lost

##    EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO    25.0          K       0           
## 2 TORNADO     2.5          K       0           
## 3 TORNADO    25.0          K       0           
## 4 TORNADO     2.5          K       0           
## 5 TORNADO     2.5          K       0           
## 6 TORNADO     2.5          K       0
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"
# Converting exponents to uppercase for consistency.
# Given a value and exponent - lets calculate the actual value and return
expval <- function(x, exp = "") {
    switch(exp, `-` = x * -1, `?` = x, `+` = x, `1` = x, `2` = x * (10^2), `3` = x * 
               (10^3), `4` = x * (10^4), `5` = x * (10^5), `6` = x * (10^6), `7` = x * 
               (10^7), `8` = x * (10^8), H = x * 100, K = x * 1000, M = x * 1e+06, 
           B = x * 1e+09, x)
}
# Aggregating based on event type and designator
pad <- aggregate(eventEconomicImpact[,c(2,4)], 
                 by=list(eventEconomicImpact$EVTYPE, 
                         eventEconomicImpact$PROPDMGEXP, 
                         eventEconomicImpact$CROPDMGEXP), 
                 FUN=sum, 
                 na.rm=T)

# Remove rows with values 0 - they wont be relevant for analysis.
pfinance = pad[apply(pad[c(4,5)], 1, function(row) any(row != 0 )), ]
# Given an event lets calculate the total dollar impact combining property and crop damage.
nr <- nrow(pfinance)
total <- numeric()
for (i in 1:nr) {
    val <- expval(pfinance[i,4], pfinance[i,2])
    val <- val + expval(pfinance[i,5], pfinance[i,3])
    total <- append(total, val)
}

length(total)
## [1] 881
pfinance$TOTALDMG <- total
names(pfinance)[1] <- "EVTYPE"
head(pfinance)
##                 EVTYPE Group.2 Group.3 PROPDMG CROPDMG TOTALDMG
## 131        FLASH FLOOD                     132       0      132
## 135     FLASH FLOODING                       4       0        4
## 139              FLOOD                       7       0        7
## 195               HAIL                      54       0       54
## 253 HEAVY SNOW SQUALLS                      10       0       10
## 294         HIGH WINDS                       3       0        3
damage <- aggregate(pfinance$TOTALDMG, by=list(pfinance$EVTYPE), FUN=sum, na.rm=T)
names(damage) <- c("EVTYPE", "TOTALDMG")
head(damage)
##                  EVTYPE TOTALDMG
## 1    HIGH SURF ADVISORY   200000
## 2           FLASH FLOOD    50000
## 3             TSTM WIND  8100000
## 4       TSTM WIND (G45)     8000
## 5                     ?     5000
## 6   AGRICULTURAL FREEZE 28820000
financedamage <- damage[order(-damage[,2]), ]
head(financedamage)
##                EVTYPE     TOTALDMG
## 72              FLOOD 150319678257
## 197 HURRICANE/TYPHOON  71913712800
## 354           TORNADO  57350833958
## 299       STORM SURGE  43323541000
## 116              HAIL  18755905408
## 59        FLASH FLOOD  18243991079
topteneconomiconseq <- head(financedamage, n=10)

Top Ten Economics Results

ggplot(data=topteneconomiconseq, 
       aes(x=EVTYPE, 
           y=TOTALDMG/1e+9)) + 
    geom_bar(stat = "identity", fill="#D55E00", colour="black") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    ggtitle("Top Ten events with the greatest economic consequences") + 
    labs(y="Economic Damage(in Billion Dollars)", x="U.S. NOAA Event Type")