Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project assignment involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The database, we are using to explore data, tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
NCDC receives Storm Data from the National Weather Service. The National Weather service receives their information from a variety of sources, which include but are not limited to: county, state and federal emergency management officials, local law enforcement officials, skywarn spotters, NWS damage surveys, newspaper clipping services, the insurance industry and the general public.
Storm Data is an official publication of the National Oceanic and Atmospheric Administration (NOAA) which documents the occurrence of storms and other significant weather phenomena having sufficient intensity to cause loss of life, injuries, significant property damage, and/or disruption to commerce. The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
# Loading Libraries.
suppressWarnings(library(knitr))
#echo = TRUE # Make code visible
#install.packages("R.utils")
#install.packages("ggeasy")
suppressWarnings(library(R.utils))
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.8.1 (2020-08-26 16:20:06 UTC) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.24.0 (2020-08-26 16:11:58 UTC) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
## The following object is masked from 'package:R.methodsS3':
##
## throw
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
## The following objects are masked from 'package:base':
##
## attach, detach, load, save
## R.utils v2.10.1 (2020-08-26 22:50:31 UTC) successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
## The following object is masked from 'package:utils':
##
## timestamp
## The following objects are masked from 'package:base':
##
## cat, commandArgs, getOption, inherits, isOpen, nullfile, parse,
## warnings
suppressWarnings(library(scales))
suppressWarnings(library(plyr))
suppressWarnings(library(ggplot2))
library(cdcfluview)
## Warning: package 'cdcfluview' was built under R version 4.0.3
library(mosaic)
## Warning: package 'mosaic' was built under R version 4.0.3
## Registered S3 method overwritten by 'mosaic':
## method from
## fortify.SpatialPolygonsDataFrame ggplot2
##
## The 'mosaic' package masks several functions from core packages in order to add
## additional features. The original behavior of these functions should not be affected by this.
##
## Attaching package: 'mosaic'
## The following objects are masked from 'package:dplyr':
##
## count, do, tally
## The following object is masked from 'package:Matrix':
##
## mean
## The following object is masked from 'package:ggplot2':
##
## stat
## The following object is masked from 'package:plyr':
##
## count
## The following object is masked from 'package:scales':
##
## rescale
## The following object is masked from 'package:R.utils':
##
## resample
## The following objects are masked from 'package:stats':
##
## binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
## quantile, sd, t.test, var
## The following objects are masked from 'package:base':
##
## max, mean, min, prod, range, sample, sum
library(dplyr)
opts_chunk$set(echo=TRUE)
# Download the data from the URL provided with the instructions.
if (!file.exists("data/data.csv")) {
fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileurl, destfile = "data/data.bz2", method="curl")
bunzip2("data/data.bz2",destname='data/data.csv')
}
# Read data.
stormData <- read.csv("data/data.csv")
# Checking the information
# Dimension: Total of observations and Variables
dim(stormData)
## [1] 902297 37
# Columns/Variables Names
names(stormData)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
The destructive power of storm surge and large battering waves can result in loss of life (fatalities and injuries). Storm surge can travel several miles inland. In estuaries and bayous, salt water intrusion endangers public health and the environment. Below data analysis will show the most harmful weather events that cause fatalities and injuries.
# Create a vector with type of events:
# - Event Type(8)
# - Fatalities(23)
# - Injuries(24)
classifiedStormdata <- stormData[,c(8, 23, 24)]
classifiedStormdata[,1] <- as.character(classifiedStormdata[,1])
aggdata <- aggregate(classifiedStormdata[,2:3],
by=list(classifiedStormdata$EVTYPE),
FUN=sum,
na.rm=T)
names(aggdata)[1] <- "EVTYPE"
library(ggplot2)
library(scales)
# Top 10 FATALITIES
sortedDataFatalities <- aggdata[order(-aggdata[,2]), ]
topTenfatalities <- head(sortedDataFatalities, n=10)
# List of the Top Ten Fatalities
head(topTenfatalities)
## EVTYPE FATALITIES INJURIES
## 834 TORNADO 5633 91346
## 130 EXCESSIVE HEAT 1903 6525
## 153 FLASH FLOOD 978 1777
## 275 HEAT 937 2100
## 464 LIGHTNING 816 5230
## 856 TSTM WIND 504 6957
# Plot the Top Ten Fatalities.
ggplot(data=topTenfatalities,
aes(x=EVTYPE,
y=log10(FATALITIES))) +
geom_bar(stat = "identity", fill="#0072B2", colour="black") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title = "The most harmful events causing FATALITIES")+
labs(y=expression(log[10](FATALITIES)), x="U.S. NOAA Event Type")
# B. INJURIES
sortedDatainjuries <- aggdata[order(-aggdata[,3]), ]
names(sortedDatainjuries)[1] <- "EVTYPE"
topTeninjuries <- head(sortedDatainjuries, n=10)
# List of the Top Ten Injuries
head(topTenfatalities)
## EVTYPE FATALITIES INJURIES
## 834 TORNADO 5633 91346
## 130 EXCESSIVE HEAT 1903 6525
## 153 FLASH FLOOD 978 1777
## 275 HEAT 937 2100
## 464 LIGHTNING 816 5230
## 856 TSTM WIND 504 6957
# Plot the graphic with the Top Ten Injuries
ggplot(data=topTeninjuries,
aes(x=EVTYPE, y=log10(INJURIES))) +
geom_bar(stat = "identity",
fill="#009E73",
colour="black") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
labs(title = "The most harmful events causing INJURIES")+
labs(y=expression(log[10](INJURIES)), x="U.S. NOAA Event Type")
intersect(topTenfatalities[,1], topTeninjuries[,1])
## [1] "TORNADO" "EXCESSIVE HEAT" "FLASH FLOOD" "HEAT"
## [5] "LIGHTNING" "TSTM WIND" "FLOOD"
Storm Surge is an abnormal rise of water generated by a storm’s winds. The tremendous power of storm surge and large battering waves can result in buildings destroyed, beach and dune erosion and road and bridge damage along the coast. In this section, we will analyze the data in order to conclude the greatest economic consequences of those harmful events.
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 25.0 K 0
## 2 TORNADO 2.5 K 0
## 3 TORNADO 25.0 K 0
## 4 TORNADO 2.5 K 0
## 5 TORNADO 2.5 K 0
## 6 TORNADO 2.5 K 0
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
# Converting exponents to uppercase for consistency.
# Given a value and exponent - lets calculate the actual value and return
expval <- function(x, exp = "") {
switch(exp, `-` = x * -1, `?` = x, `+` = x, `1` = x, `2` = x * (10^2), `3` = x *
(10^3), `4` = x * (10^4), `5` = x * (10^5), `6` = x * (10^6), `7` = x *
(10^7), `8` = x * (10^8), H = x * 100, K = x * 1000, M = x * 1e+06,
B = x * 1e+09, x)
}
# Aggregating based on event type and designator
pad <- aggregate(eventEconomicImpact[,c(2,4)],
by=list(eventEconomicImpact$EVTYPE,
eventEconomicImpact$PROPDMGEXP,
eventEconomicImpact$CROPDMGEXP),
FUN=sum,
na.rm=T)
# Remove rows with values 0 - they wont be relevant for analysis.
pfinance = pad[apply(pad[c(4,5)], 1, function(row) any(row != 0 )), ]
# Given an event lets calculate the total dollar impact combining property and crop damage.
nr <- nrow(pfinance)
total <- numeric()
for (i in 1:nr) {
val <- expval(pfinance[i,4], pfinance[i,2])
val <- val + expval(pfinance[i,5], pfinance[i,3])
total <- append(total, val)
}
length(total)
## [1] 881
pfinance$TOTALDMG <- total
names(pfinance)[1] <- "EVTYPE"
head(pfinance)
## EVTYPE Group.2 Group.3 PROPDMG CROPDMG TOTALDMG
## 131 FLASH FLOOD 132 0 132
## 135 FLASH FLOODING 4 0 4
## 139 FLOOD 7 0 7
## 195 HAIL 54 0 54
## 253 HEAVY SNOW SQUALLS 10 0 10
## 294 HIGH WINDS 3 0 3
damage <- aggregate(pfinance$TOTALDMG, by=list(pfinance$EVTYPE), FUN=sum, na.rm=T)
names(damage) <- c("EVTYPE", "TOTALDMG")
head(damage)
## EVTYPE TOTALDMG
## 1 HIGH SURF ADVISORY 200000
## 2 FLASH FLOOD 50000
## 3 TSTM WIND 8100000
## 4 TSTM WIND (G45) 8000
## 5 ? 5000
## 6 AGRICULTURAL FREEZE 28820000
financedamage <- damage[order(-damage[,2]), ]
head(financedamage)
## EVTYPE TOTALDMG
## 72 FLOOD 150319678257
## 197 HURRICANE/TYPHOON 71913712800
## 354 TORNADO 57350833958
## 299 STORM SURGE 43323541000
## 116 HAIL 18755905408
## 59 FLASH FLOOD 18243991079
topteneconomiconseq <- head(financedamage, n=10)
ggplot(data=topteneconomiconseq,
aes(x=EVTYPE,
y=TOTALDMG/1e+9)) +
geom_bar(stat = "identity", fill="#D55E00", colour="black") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
ggtitle("Top Ten events with the greatest economic consequences") +
labs(y="Economic Damage(in Billion Dollars)", x="U.S. NOAA Event Type")