Storms and other severe weather events can wreak havoc for communities and municipalities. Preventing fatalities, injuries and property damage is obviously a key concern for municipal managers. In this report we aim to describe which types of severe weather events across the United States are most harmful to the population health, and which have the greatest economic consequences.
For this analysis we used data from the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States from 1950 to 2011.
The study found that tornados are the biggest threat to human life overall, while floods causes the most damage annually.
library(plyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
The raw data set is never manipulated, so to save extraction time we do not read it again from scratch on every run.
stormdata.raw <- read.csv(bzfile("repdata_data_StormData.csv.bz2"),header = TRUE)
dim(stormdata.raw)
## [1] 902297 37
str(stormdata.raw)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
Strip the unnecessary columns from the dataset, and add a new calculated column: the total economic cost of damage to property and crops, using the exponent values from the two exponent columns:
# create a function to calculate the damage values using the exponent columns
CalcExponent <- function(value, exp)
{
exp <- toupper(exp)
if (is.numeric(exp))
{
result <- value * 10^as.numeric(exp)
}
else
{
result <- value * 10 ^ ifelse(exp == "H", 2,
ifelse(exp == "K", 3,
ifelse(exp == "M", 6,
ifelse(exp == "B", 9, 0))))
}
return(result)
}
# create a dataset containing only the relevant columns, including the total calculated damage
columns <- c( "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
stormdata.selected <- stormdata.raw[, columns]
stormdata.selected <- mutate(stormdata.selected, DAMAGE = as.numeric(CalcExponent(PROPDMG, PROPDMGEXP) +
CalcExponent(CROPDMG, CROPDMGEXP)))
head(stormdata.selected)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP DAMAGE
## 1 TORNADO 0 15 25.0 K 0 25000
## 2 TORNADO 0 0 2.5 K 0 2500
## 3 TORNADO 0 2 25.0 K 0 25000
## 4 TORNADO 0 2 2.5 K 0 2500
## 5 TORNADO 0 2 2.5 K 0 2500
## 6 TORNADO 0 6 2.5 K 0 2500
Summarize the data per event type and create the top 10 pareto for each of the indicators:
# prepare the top 10 fatalities contributors
stormdata.fatalities <- aggregate(FATALITIES ~ EVTYPE, data = stormdata.selected, FUN="sum")
stormdata.fatalities.top10 <- stormdata.fatalities[order(-stormdata.fatalities$FATALITIES), ][1:10, ]
stormdata.fatalities.top10
## EVTYPE FATALITIES
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
# prepare the top 10 injuries contributors
stormdata.injuries <- aggregate(INJURIES ~ EVTYPE, data = stormdata.selected, FUN="sum")
stormdata.injuries.top10 <- stormdata.injuries[order(-stormdata.injuries$INJURIES), ][1:10, ]
stormdata.injuries.top10
## EVTYPE INJURIES
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
# prepare the top 10 economic contributors
stormdata.damage <- aggregate(DAMAGE ~ EVTYPE, data = stormdata.selected, FUN="sum")
stormdata.damage.top10 <- stormdata.damage[order(-stormdata.damage$DAMAGE), ][1:10, ]
stormdata.damage.top10
## EVTYPE DAMAGE
## 170 FLOOD 150319678257
## 411 HURRICANE/TYPHOON 71913712800
## 834 TORNADO 57352114049
## 670 STORM SURGE 43323541000
## 244 HAIL 18758222016
## 153 FLASH FLOOD 17562129167
## 95 DROUGHT 15018672000
## 402 HURRICANE 14610229010
## 590 RIVER FLOOD 10148404500
## 427 ICE STORM 8967041360
The following bar charts display the top 10 contributors with regards to fatalities and health-related incidents, respectively:
par(mfrow = c(1,1), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(stormdata.fatalities.top10$FATALITIES,
names.arg = stormdata.fatalities.top10$EVTYPE,
main = "Top 10 Causes of Fatalities",
ylab = "Number of Fatalities",
las = 3)
barplot(stormdata.injuries.top10$INJURIES,
names.arg = stormdata.injuries.top10$EVTYPE,
main = "Top 10 Causes of Injuries",
ylab = "Number of Injuries",
las = 3)
From the above it is clear that Tornados are by far the most harmful in terms of fatalities and health-related incidents.
The following bar chart displays the top 10 contributors with regards to economic consequences:
par(mfrow = c(1,1), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(stormdata.damage.top10$DAMAGE / 1000000,
names.arg = stormdata.damage.top10$EVTYPE,
main = "Top 10 Causes of Economic Damage",
ylab = "Damage (MIllion USD)",
las = 3)
From the above it is clear that normal floods is the single biggest contributor to economic damage.