_____________________________________________________________________________________
A. Synopsis
Severe weather events impact public health, imparts economic stress, and on a personal level may lead to loss of life and property damage.
The National Oceanic & Atmospheric Administration (NOAA), along with the National Weather Service (NWS) and Department of Commerce, maintains a database that tracks parameters of major storms and weather events in the United States, among them, loss of life, personal injury, and damage to property,
This report considers, from the NOAA database, the impact of weather related events on a personal as well as property damages level. Graphical representation of the data highlights the 10 worst weather events impacting life, injury and property damage. The results suggest most fatalities and injuries arise from tornados, while the biggest monetary impacts is as a result of flooding and drought.
_____________________________________________________________________________________
B. Data Processing
The NOAA Storm Data [47Mb] events database is a comma-separated-value file compressed via the bzip2 algorithm and can be downloaded from the following web-link:
The database contains information for the period beginning in 1950 and ending in November 2011.
A .pdf document description of the weather data collected is available from the National Weather Service (NWS) from the following web-link:
There is also a FAQ made available by NOAA at the following web-link:
B.1 Downloading the data
Set working directory
Important:
Set correct file path below, before the “\Project2”
Create work folder to extract and write data to
if(!file.exists("./Project2")){dir.create("./Project2")}
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
Download raw data file as data.csv.bz2
download.file(fileUrl,destfile="./Project2/data.csv.bz2")
Unzip & Read the raw data .csv files from the NOAA Storm Database from original folder position as received during unzip procedure into variable noaa_raw.
This line of code will likely take a couple of minutes. Be patient!
library(R.utils)
## Warning: package 'R.utils' was built under R version 3.4.1
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.21.0 (2016-10-30) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
## The following objects are masked from 'package:base':
##
## attach, detach, gc, load, save
## R.utils v2.5.0 (2016-11-07) successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
## The following object is masked from 'package:utils':
##
## timestamp
## The following objects are masked from 'package:base':
##
## cat, commandArgs, getOption, inherits, isOpen, parse, warnings
noaa_raw <- read.csv(bzfile("./Project2/data.csv.bz2"))
————————————————————————————-
B.2 Exploratory Data Analysis of Raw Data File
Explored the raw data file using the R elements below. Only showing the summary information below.
object.size(noaa_raw)
dim(noaa_raw)
class(noaa_raw)
str(noaa_raw)
head(noaa_raw)
tail(noaa_raw)
sapply(noaa_raw, class)
summary(noaa_raw)
## STATE__ BGN_DATE BGN_TIME
## Min. : 1.0 5/25/2011 0:00:00: 1202 12:00:00 AM: 10163
## 1st Qu.:19.0 4/27/2011 0:00:00: 1193 06:00:00 PM: 7350
## Median :30.0 6/9/2011 0:00:00 : 1030 04:00:00 PM: 7261
## Mean :31.2 5/30/2004 0:00:00: 1016 05:00:00 PM: 6891
## 3rd Qu.:45.0 4/4/2011 0:00:00 : 1009 12:00:00 PM: 6703
## Max. :95.0 4/2/2006 0:00:00 : 981 03:00:00 PM: 6700
## (Other) :895866 (Other) :857229
## TIME_ZONE COUNTY COUNTYNAME STATE
## CST :547493 Min. : 0.0 JEFFERSON : 7840 TX : 83728
## EST :245558 1st Qu.: 31.0 WASHINGTON: 7603 KS : 53440
## MST : 68390 Median : 75.0 JACKSON : 6660 OK : 46802
## PST : 28302 Mean :100.6 FRANKLIN : 6256 MO : 35648
## AST : 6360 3rd Qu.:131.0 LINCOLN : 5937 IA : 31069
## HST : 2563 Max. :873.0 MADISON : 5632 NE : 30271
## (Other): 3631 (Other) :862369 (Other):621339
## EVTYPE BGN_RANGE BGN_AZI
## HAIL :288661 Min. : 0.000 :547332
## TSTM WIND :219940 1st Qu.: 0.000 N : 86752
## THUNDERSTORM WIND: 82563 Median : 0.000 W : 38446
## TORNADO : 60652 Mean : 1.484 S : 37558
## FLASH FLOOD : 54277 3rd Qu.: 1.000 E : 33178
## FLOOD : 25326 Max. :3749.000 NW : 24041
## (Other) :170878 (Other):134990
## BGN_LOCATI END_DATE END_TIME
## :287743 :243411 :238978
## COUNTYWIDE : 19680 4/27/2011 0:00:00: 1214 06:00:00 PM: 9802
## Countywide : 993 5/25/2011 0:00:00: 1196 05:00:00 PM: 8314
## SPRINGFIELD : 843 6/9/2011 0:00:00 : 1021 04:00:00 PM: 8104
## SOUTH PORTION: 810 4/4/2011 0:00:00 : 1007 12:00:00 PM: 7483
## NORTH PORTION: 784 5/30/2004 0:00:00: 998 11:59:00 PM: 7184
## (Other) :591444 (Other) :653450 (Other) :622432
## COUNTY_END COUNTYENDN END_RANGE END_AZI
## Min. :0 Mode:logical Min. : 0.0000 :724837
## 1st Qu.:0 NA's:902297 1st Qu.: 0.0000 N : 28082
## Median :0 Median : 0.0000 S : 22510
## Mean :0 Mean : 0.9862 W : 20119
## 3rd Qu.:0 3rd Qu.: 0.0000 E : 20047
## Max. :0 Max. :925.0000 NE : 14606
## (Other): 72096
## END_LOCATI LENGTH WIDTH
## :499225 Min. : 0.0000 Min. : 0.000
## COUNTYWIDE : 19731 1st Qu.: 0.0000 1st Qu.: 0.000
## SOUTH PORTION : 833 Median : 0.0000 Median : 0.000
## NORTH PORTION : 780 Mean : 0.2301 Mean : 7.503
## CENTRAL PORTION: 617 3rd Qu.: 0.0000 3rd Qu.: 0.000
## SPRINGFIELD : 575 Max. :2315.0000 Max. :4400.000
## (Other) :380536
## F MAG FATALITIES INJURIES
## Min. :0.0 Min. : 0.0 Min. : 0.0000 Min. : 0.0000
## 1st Qu.:0.0 1st Qu.: 0.0 1st Qu.: 0.0000 1st Qu.: 0.0000
## Median :1.0 Median : 50.0 Median : 0.0000 Median : 0.0000
## Mean :0.9 Mean : 46.9 Mean : 0.0168 Mean : 0.1557
## 3rd Qu.:1.0 3rd Qu.: 75.0 3rd Qu.: 0.0000 3rd Qu.: 0.0000
## Max. :5.0 Max. :22000.0 Max. :583.0000 Max. :1700.0000
## NA's :843563
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0.00 :465934 Min. : 0.000 :618413
## 1st Qu.: 0.00 K :424665 1st Qu.: 0.000 K :281832
## Median : 0.00 M : 11330 Median : 0.000 M : 1994
## Mean : 12.06 0 : 216 Mean : 1.527 k : 21
## 3rd Qu.: 0.50 B : 40 3rd Qu.: 0.000 0 : 19
## Max. :5000.00 5 : 28 Max. :990.000 B : 9
## (Other): 84 (Other): 9
## WFO STATEOFFIC
## :142069 :248769
## OUN : 17393 TEXAS, North : 12193
## JAN : 13889 ARKANSAS, Central and North Central: 11738
## LWX : 13174 IOWA, Central : 11345
## PHI : 12551 KANSAS, Southwest : 11212
## TSA : 12483 GEORGIA, North and Central : 11120
## (Other):690738 (Other) :595920
## ZONENAMES
## :594029
## :205988
## GREATER RENO / CARSON CITY / M - GREATER RENO / CARSON CITY / M : 639
## GREATER LAKE TAHOE AREA - GREATER LAKE TAHOE AREA : 592
## JEFFERSON - JEFFERSON : 303
## MADISON - MADISON : 302
## (Other) :100444
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_
## Min. : 0 Min. :-14451 Min. : 0 Min. :-14455
## 1st Qu.:2802 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0
## Median :3540 Median : 8707 Median : 0 Median : 0
## Mean :2875 Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.:4019 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. :9706 Max. : 17124 Max. :9706 Max. :106220
## NA's :47 NA's :40
## REMARKS REFNUM
## :287433 Min. : 1
## : 24013 1st Qu.:225575
## Trees down.\n : 1110 Median :451149
## Several trees were blown down.\n : 568 Mean :451149
## Trees were downed.\n : 446 3rd Qu.:676723
## Large trees and power lines were blown down.\n: 432 Max. :902297
## (Other) :588295
————————————————————————————-
B.3 Subsetting and manipulate columns for data needed in this analysis
noaa_health_subset <- noaa_raw[ ,c("EVTYPE", "FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
Page 12 of the STORM DATA PREPARATION document indicate that Damage Estimates are rounded to three significant digits, followed by an alphabetical character signifying the magnitude of the number, i.e., 1.55B for $1,550,000,000. Alphabetical characters used to signify magnitude include: “K” for thousands, “M” for millions, and “B” for billions.
Add Property Damage actual dollar column
noaa_health_subset2<- mutate(noaa_health_subset,
PROPDMG_DV = ifelse(PROPDMGEXP == "H"|PROPDMGEXP == "h", PROPDMG*100,
ifelse(PROPDMGEXP == "K"|PROPDMGEXP == "k", PROPDMG*1000,
ifelse(PROPDMGEXP == "M"|PROPDMGEXP == "m", PROPDMG*1000000,
ifelse(PROPDMGEXP == "B"|PROPDMGEXP == "b", PROPDMG*1000000000,0)))))
Add Crop Damage actual dolloar column
noaa_health_subset3<- mutate(noaa_health_subset2,
CROPDMG_DV = ifelse(CROPDMGEXP == "H"|CROPDMGEXP == "h", CROPDMG*100,
ifelse(CROPDMGEXP == "K"|CROPDMGEXP == "k", CROPDMG*1000,
ifelse(CROPDMGEXP == "M"|CROPDMGEXP == "m", CROPDMG*1000000,
ifelse(CROPDMGEXP == "B"|CROPDMGEXP == "b", CROPDMG*1000000000,0)))))
Check new entries
head(noaa_health_subset3)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
## PROPDMG_DV CROPDMG_DV
## 1 25000 0
## 2 2500 0
## 3 25000 0
## 4 2500 0
## 5 2500 0
## 6 2500 0
_____________________________________________________________________________________
C.1 Results
Q1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Create subset for fatalities
fatalities_subset <- aggregate(FATALITIES~EVTYPE,noaa_health_subset3, sum )
fatalities_subset2 <- fatalities_subset[order(-fatalities_subset$FATALITIES),][1:10,]
Create subset for injuries
injuries_subset <- aggregate(INJURIES~EVTYPE,noaa_health_subset3, sum )
injuries_subset2 <- injuries_subset[order(-injuries_subset$INJURIES),][1:10,]
Create panel with 2 bar plots showing the distribution of fatalities and injuries as a result of severe weather event
par(mfrow=c(1,2), mar = c(11, 5, 3, 2), mgp = c(3, 1, 0), cex = 0.7)
with(fatalities_subset2, barplot(FATALITIES, names.arg = EVTYPE, las=3, main = "Fatalities due to Weather Events", ylab = "Fatalities", cex.lab = 1.5))
with(injuries_subset2, barplot(INJURIES, names.arg = EVTYPE, las=3, main = "Injuries due to Weather Events", ylab = "Injuries", cex.lab = 1.5 ))

Q2. Across the United States, which types of events have the greatest economic consequences?
Create subset for property damage
propertydamage_subset <- aggregate(PROPDMG_DV~EVTYPE,noaa_health_subset3, sum )
propertydamage_subset2 <- propertydamage_subset[order(-propertydamage_subset$PROPDMG_DV),][1:10,]
Create subset for crop damage
cropdamage_subset <- aggregate(CROPDMG_DV~EVTYPE,noaa_health_subset3, sum )
cropdamage_subset2 <- cropdamage_subset[order(-cropdamage_subset$CROPDMG_DV),][1:10,]
Create panel with 2 bar plots showing the distribution of property and crop damage as a result of severe weather event
par(mfrow=c(1,2), mar = c(11, 5, 3, 2), mgp = c(3, 1, 0), cex = 0.7)
with(propertydamage_subset2, barplot(PROPDMG_DV/(10^9), las = 3, names.arg = EVTYPE, main = "Property $ Damage due to Weather Events", ylab = "Property Damage ($ billion)" ))
with(cropdamage_subset2, barplot(CROPDMG_DV/(10^9), las = 3, names.arg = EVTYPE, main = "Crop $ Damage due to Weather Events", ylab = "Crop Damage ($ billion)" ))

Close the screen device
dev.off()
## null device
## 1
_____________________________________________________________________________________
D.1 Conclusions
From the graphical presentations in the Results section above it is clear that Tornados by far caused the greatest number of fatalities and injuries. Heat exposure, and floods also play a significant roll.
The maximum dollar impact as a result of severe weather, measured as a function of property and crop damage, is the result of flooding and drought, respectively. Tornados, hurricanes and storm surges also significantly contributes to the former, while flooding, ice storms and hail severely impact crop losses.