This study aims to identify the types of events that are most harmful to the population health or have the greatest economic consequences. This report is based on data from events of a broad variety registered between 1950 and 2011. The analysis explores the total number of reported victims related to events as well as the amount of losses with damage on economic goods.
The data used was obtained from the National Oceanic and Atmospheric Administration’s (NOAA) National Weather Service. The Storm Data[47 MB] documents “…the occurrence of storms and other significant weather phenomena having sufficient intensity to cause loss of life, injuries, significant property damage, and/or disruption to commerce”, according to National Weather Service Storm Data Documentation.
## load needed libraries
library(dplyr)
library(lattice)
library(ggplot2)
library(grid)
library(R.utils)
options(scipen = 1)
## setting vaviable
setInternet2(use = TRUE)
urlData <-
"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
fileName <- "repdata-data-StormData.csv.bz2"
destName <- "repdata-data-StormData.csv"
## Download the file if it isn't yet downloaded
if (!file.exists(fileName)) {
download.file(url = urlData, destfile = fileName, mode = "wb")
}
if (!file.exists(destName)) {
bunzip2(fileName, destName, overwrite = TRUE, remove = FALSE)
}
## read the file if it isn't yet read
if (!"stormData" %in% ls()) {
stormData <- read.csv(destName)
}
## check the amount of data
dim(stormData)
## [1] 902297 37
head(stormData, 3)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
We can see that there is a few variables of interest to the scope of this analysis. We may now subset the dataset to get just the list of interest.
## We need now to adjust the column names
names(stormData) <- tolower(names(stormData))
healthdata <- select(stormData, evtype, fatalities, injuries)
damagedata <- select(stormData, evtype, propdmg, propdmgexp, cropdmg, cropdmgexp)
To evaluate the danger of each event to the population health, we calculate the deaths and injuries caused by each event type, then we filter the 10 worst.
totalcases <- aggregate(.~ evtype, data = healthdata, FUN = sum)
summary(totalcases)
## evtype fatalities injuries
## HIGH SURF ADVISORY: 1 Min. : 0.00 Min. : 0.0
## COASTAL FLOOD : 1 1st Qu.: 0.00 1st Qu.: 0.0
## FLASH FLOOD : 1 Median : 0.00 Median : 0.0
## LIGHTNING : 1 Mean : 15.38 Mean : 142.7
## TSTM WIND : 1 3rd Qu.: 0.00 3rd Qu.: 0.0
## TSTM WIND (G45) : 1 Max. :5633.00 Max. :91346.0
## (Other) :979
fatalmost <- totalcases[order(-totalcases$fatalities),][1:10,]
injurmost <- totalcases[order(-totalcases$injuries),][1:10,]
fatalmost[,c("evtype", "fatalities")]
## evtype fatalities
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
injurmost[,c("evtype", "injuries")]
## evtype injuries
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
The following graph shows the comparison on the number of fatalities and injuries caused by the 10 worst weather events to the population health.
par(mar = c(6,4,4, 2), mfrow = c(1, 2))
barplot(
fatalmost$fatalities,
names.arg = fatalmost$evtype,
las = 2,
cex.names = 0.5,
ylim = c(0, 6000),
main = "Fatalities caused by \n Severe Weather Events \n in U.S. (1950 - 2011)",
ylab = "Number of Fatalities"
)
barplot(
injurmost$injuries / 1000,
names.arg = injurmost$evtype,
las = 2,
cex.names = 0.5,
ylim = c(0, 100),
main = "Injuries caused by \n Severe Weather Events \n in U.S. (1950 - 2011)",
ylab = "Number of Injuries(in thousand cases)"
)
To evaluate the damages caused by each event to the economy, we calculate the property damages and crop damages, caused by each event type, then we filter the 10 worst.
## we need to adjust some values to avoid errors
adjustvalues <- function(dmg,exp){
if (is.na(dmg) || is.null(dmg)) {return(0)}
if (is.na(exp) || is.null(exp)) {return(ifelse(is.numeric(dmg),dmg,0))}
if (toupper(exp)=='B') {return(dmg*10^9)}
if (toupper(exp)=='M') {return(dmg*10^6)}
if (toupper(exp)=='K') {return(dmg*10^3)}
if (toupper(exp)=='H') {return(dmg*10^2)}
if (exp=='0') {return(dmg)}
return(dmg)
}
damagedata$losses <- adjustvalues(damagedata$propdmg, damagedata$propdmgexp) + adjustvalues(damagedata$cropdmg, damagedata$cropdmgexp)
## compute the sum of losses by event type
totalloss <- aggregate(losses ~ evtype, data = damagedata, FUN = sum)
summary(filter(totalloss, losses > 0))
## evtype losses
## HIGH SURF ADVISORY: 1 Min. : 1
## FLASH FLOOD : 1 1st Qu.: 5000
## TSTM WIND : 1 Median : 50000
## TSTM WIND (G45) : 1 Mean : 25257257
## ? : 1 3rd Qu.: 501008
## AGRICULTURAL FREEZE : 1 Max. :3212358179
## (Other) :425
damagemost <- totalloss[order(-totalloss$losses),][1:10,]
damagemost
## evtype losses
## 834 TORNADO 3212358179
## 153 FLASH FLOOD 1420303790
## 856 TSTM WIND 1336074813
## 170 FLOOD 900106518
## 760 THUNDERSTORM WIND 876910961
## 244 HAIL 689272976
## 464 LIGHTNING 603355361
## 786 THUNDERSTORM WINDS 446311865
## 359 HIGH WIND 324748843
## 972 WINTER STORM 132722569
The following graph shows the comparison on the economic consequences caused by the 10 worst weather events.
par(mar = c(6,4,4, 2))
barplot(
damagemost$losses / 1000000,
names.arg = damagemost$evtype,
las = 2,
cex.names = 0.5,
main = "Damages caused by \n Severe Weather Events \n in U.S. (1950 - 2011)",
ylim = c(0, 3500),
ylab = "Damages (in Million US$)"
)
Tornados are the most harmful weather events to popullation healt as well as to the economy.