This analysis shows the impacts over public health and economy of severe weather events happened in the USA during the period 1950-2011. The source of data is the NOAA Storm Database. The outcome of the analysis concerning the impact on public health definitely assigns to tornadoes the first cause of casualties, followed by heat. From an economic standpoint it appears that flood and rough waters events have the highest impact. From a policy perspective, apart from the countermeasures common to all types of events, the investments related to protection of people and properties from tornadoes, heat and water related events (i.e., flood, rough waters) should be addressed with higher priority with respect to others.
The NOAA Storm Database contains a record of the events concerning severe weather occurred in the USA. It can be consulted and downloaded in CVS format at http://www.ncdc.noaa.gov/stormevents/. It is kept updated by NOAA and currently contains events up to Oct. 2014.
The specific version used for this analysis contains events occurred in the USA from Jan. 1950 to Nov. 2011. It can be downloaded in a BZ2 compressed CSV file at the Coursera Reproducible Research link https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2.
#if file is not present in the current home, download it
datafile <- "repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists(paste0("~/", datafile))) {
url <- "http://d396qusza40orc.cloudfront.net/"
download.file(paste0(url, datafile), paste0("~/", datafile))
}
The dataset weather_events is created by reading the datafile. The bzfile connection is used, because the original file is BZ2 compressed.
weather_events <- read.csv(bzfile(paste0("~/", datafile)))
The structure of the dataset is as follows:
str(weather_events)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
For the purpose of this analysis, the variables to be considered are:
The meaning of the exponent variables (PROPDMGEXP and CROPDMGEXP) is as in the following example: if PROPDMG=2.5 and PROPDMGEXP=“K” (for kilo, i.e, 1000), the amount of the damage resulted in 2.5 * 1000 = 2,500 USD.
The extraction of unique event types might help in understanding the correctness of data (for editorial reasons, only the first 120 types, out of 985, are listed here)
head(unique(weather_events$EVTYPE), n=60L)
## [1] TORNADO TSTM WIND
## [3] HAIL FREEZING RAIN
## [5] SNOW ICE STORM/FLASH FLOOD
## [7] SNOW/ICE WINTER STORM
## [9] HURRICANE OPAL/HIGH WINDS THUNDERSTORM WINDS
## [11] RECORD COLD HURRICANE ERIN
## [13] HURRICANE OPAL HEAVY RAIN
## [15] LIGHTNING THUNDERSTORM WIND
## [17] DENSE FOG RIP CURRENT
## [19] THUNDERSTORM WINS FLASH FLOOD
## [21] FLASH FLOODING HIGH WINDS
## [23] FUNNEL CLOUD TORNADO F0
## [25] THUNDERSTORM WINDS LIGHTNING THUNDERSTORM WINDS/HAIL
## [27] HEAT WIND
## [29] LIGHTING HEAVY RAINS
## [31] LIGHTNING AND HEAVY RAIN FUNNEL
## [33] WALL CLOUD FLOODING
## [35] THUNDERSTORM WINDS HAIL FLOOD
## [37] COLD HEAVY RAIN/LIGHTNING
## [39] FLASH FLOODING/THUNDERSTORM WI WALL CLOUD/FUNNEL CLOUD
## [41] THUNDERSTORM WATERSPOUT
## [43] EXTREME COLD HAIL 1.75)
## [45] LIGHTNING/HEAVY RAIN HIGH WIND
## [47] BLIZZARD BLIZZARD WEATHER
## [49] WIND CHILL BREAKUP FLOODING
## [51] HIGH WIND/BLIZZARD RIVER FLOOD
## [53] HEAVY SNOW FREEZE
## [55] COASTAL FLOOD HIGH WIND AND HIGH TIDES
## [57] HIGH WIND/BLIZZARD/FREEZING RA HIGH TIDES
## [59] HIGH WIND AND HEAVY SNOW RECORD COLD AND HIGH WIND
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
It’s quite evident that here we have a problem: the event types were not coded in a consistent way over the years. There are typos (e.g., “LIGHTING”), different text describing the same event (e.g., various “HURRICANE…”), not understandable events (e.g., “SUMMARY OF…”, not visible in the above list). Therefore, a cleaning of the EVTYPE variable is needed before proceeding with the analysis.
First, the event type will be made upper case. Then the text will be homogenized as much as possible.
#convert EVTYPE to upper case and transform it to a string
weather_events$EVTYPE <- toupper(as.character(weather_events$EVTYPE))
#homogenize text
weather_events$EVTYPE[grepl("AVALAN", weather_events$EVTYPE)] <- "AVALANCHE"
weather_events$EVTYPE[grepl("COLD", weather_events$EVTYPE)] <- "COLD"
weather_events$EVTYPE[grepl("DRY", weather_events$EVTYPE)] <- "DRY"
weather_events$EVTYPE[grepl("DUST|SMOKE|FUNNEL", weather_events$EVTYPE)] <- "DUST"
weather_events$EVTYPE[grepl("FIRE", weather_events$EVTYPE)] <- "WILDFIRE"
weather_events$EVTYPE[grepl("FLOOD", weather_events$EVTYPE)] <- "FLOOD"
weather_events$EVTYPE[grepl("FREEZ|FROST", weather_events$EVTYPE)] <- "FREEZE"
weather_events$EVTYPE[grepl("HAIL", weather_events$EVTYPE)] <- "HAIL"
weather_events$EVTYPE[grepl("HEAT", weather_events$EVTYPE)] <- "HEAT"
weather_events$EVTYPE[grepl("ICE", weather_events$EVTYPE)] <- "ICE"
weather_events$EVTYPE[grepl("LAND", weather_events$EVTYPE)] <- "LANDSLIDE"
weather_events$EVTYPE[grepl("LIGHT|LIGNT", weather_events$EVTYPE)] <- "LIGHTNING"
weather_events$EVTYPE[grepl("LOW TEMP", weather_events$EVTYPE)] <- "LOW TEMPERATURE"
weather_events$EVTYPE[grepl("MUD", weather_events$EVTYPE)] <- "MUD SLIDE"
weather_events$EVTYPE[grepl("RAIN|PRECIP", weather_events$EVTYPE)] <- "RAIN"
weather_events$EVTYPE[grepl("RIP|ROGUE|ROUGH|DROWN|SURF|SEA|WATER|WAVE|MARINE|TIDE", weather_events$EVTYPE)] <- "ROUGH WATERS"
weather_events$EVTYPE[grepl("HURRI", weather_events$EVTYPE)] <- "HURRICANE"
weather_events$EVTYPE[grepl("SNOW|BLIZZ", weather_events$EVTYPE)] <- "SNOW"
weather_events$EVTYPE[grepl("SPOUT", weather_events$EVTYPE)] <- "WATERSPOUT"
weather_events$EVTYPE[grepl("STORM", weather_events$EVTYPE)] <- "STORM"
weather_events$EVTYPE[grepl("SUMMARY|OTHER|[?]", weather_events$EVTYPE)] <- "UNKNOWN"
weather_events$EVTYPE[grepl("THUNDERSTORM|TSTM", weather_events$EVTYPE)] <- "THUNDERSTORM"
weather_events$EVTYPE[grepl("TORN", weather_events$EVTYPE)] <- "TORNADO"
weather_events$EVTYPE[grepl("UNSEAS|URBAN|RECORD|ABNOR", weather_events$EVTYPE)] <- "OTHER"
weather_events$EVTYPE[grepl("VOLC", weather_events$EVTYPE)] <- "VOLCANO"
weather_events$EVTYPE[grepl("WARM", weather_events$EVTYPE)] <- "WARM"
weather_events$EVTYPE[grepl("WIND|WND|GUST", weather_events$EVTYPE)] <- "WIND"
weather_events$EVTYPE[grepl("WINTER|WINTRY", weather_events$EVTYPE)] <- "WINTER WEATHER"
#transform back EVTYPE into a factor
weather_events$EVTYPE <- factor(weather_events$EVTYPE)
According to the NWS Directive 10-1605, para. 2.7, page 12, only three types of damage exponents should be present: K (1,000), M (1,000,000) and B (1,000,000,000). Unfortunately, by retrieving the unique values, the full list of exponents for property damages shows:
unique(weather_events$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
And for crop damages:
unique(weather_events$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
Therefore, also for these variables we need a strategy to cope with unexpected content. Taking into consideration that we don’t have a clue concerning the meaning of those letters, the decision is made to ignore the records corresponding to them: the value corresponding to those events will be set to 0. Concerning the lowercase “m” and “k” present, they will be made uppercase and considered as “M” and “K”.
#create a subset of data for the economical impact analysis
econ_impact <- subset(weather_events, select=c(EVTYPE, PROPDMG, PROPDMGEXP,
CROPDMG, CROPDMGEXP))
#add two variables to the dataset, which will hold the value of the damage
econ_impact$PROPVAL <- 0
econ_impact$CROPVAL <- 0
#transform the exponents into upper case
econ_impact$PROPDMGEXP <- toupper(as.character(econ_impact$PROPDMGEXP))
econ_impact$CROPDMGEXP <- toupper(as.character(econ_impact$CROPDMGEXP))
#cycle through the dataset
for (i in 1:nrow(econ_impact)) {
#compute value for property damages
if (econ_impact$PROPDMGEXP[i] == "K") {
econ_impact$PROPVAL[i] <- econ_impact$PROPDMG[i] * 1000
} else if (econ_impact$PROPDMGEXP[i] == "M") {
econ_impact$PROPVAL[i] <- econ_impact$PROPDMG[i] * 1000000
} else if (econ_impact$PROPDMGEXP[i] == "B") {
econ_impact$PROPVAL[i] <- econ_impact$PROPDMG[i] * 1000000000
} else {
econ_impact$PROPVAL[i] <- 0
}
#compute value for crop field damages
if (econ_impact$CROPDMGEXP[i] == "K") {
econ_impact$CROPVAL[i] <- econ_impact$CROPDMG[i] * 1000
} else if (econ_impact$CROPDMGEXP[i] == "M") {
econ_impact$CROPVAL[i] <- econ_impact$CROPDMG[i] * 1000000
} else if (econ_impact$CROPDMGEXP[i] == "B") {
econ_impact$CROPVAL[i] <- econ_impact$CROPDMG[i] * 1000000000
} else {
econ_impact$CROPVAL[i] <- 0
}
}
The dataset is now ready for the analysis to be made. The impact on health is calculated by creating a dataset containing the aggregation of casualties by event. Then, the top 10 fatalities and injuries are extracted.
#create an aggregated dataset of casualties by event
CasualtiesByEvent <- aggregate(weather_events[c("FATALITIES", "INJURIES")],
by=weather_events["EVTYPE"],
FUN="sum")
#compute and show the top 10 fatalities
top10Fatalities <- head(CasualtiesByEvent[order(CasualtiesByEvent$FATALITIES, decreasing=TRUE),], 10)
top10Fatalities
## EVTYPE FATALITIES INJURIES
## 69 TORNADO 5633 91364
## 26 HEAT 3138 9224
## 21 FLOOD 1525 8604
## 58 ROUGH WATERS 837 972
## 43 LIGHTNING 818 5234
## 68 THUNDERSTORM 505 6962
## 67 STORM 501 4227
## 7 COLD 451 320
## 85 WIND 442 1852
## 65 SNOW 244 1954
#compute and show the top 10 injuries
top10Injuries <- head(CasualtiesByEvent[order(CasualtiesByEvent$INJURIES, decreasing=TRUE),], 10)
top10Injuries
## EVTYPE FATALITIES INJURIES
## 69 TORNADO 5633 91364
## 26 HEAT 3138 9224
## 21 FLOOD 1525 8604
## 68 THUNDERSTORM 505 6962
## 43 LIGHTNING 818 5234
## 67 STORM 501 4227
## 39 ICE 102 2164
## 65 SNOW 244 1954
## 85 WIND 442 1852
## 84 WILDFIRE 90 1608
#create a panel plot of the results
par(mfrow=c(2,1))
par(cex = 0.6)
barplot(top10Fatalities$FATALITIES,
names=top10Fatalities$EVTYPE,
axes=FALSE,
main="Top 10 Casualties by Type of Event - USA 1950-2011\nSource: NOAA Storm Events Database",
ylab="Fatalities",
las=2)
axis(2, at=seq(0, 7000, 500))
barplot(top10Injuries$INJURIES,
names=top10Injuries$EVTYPE,
axes=FALSE,
ylab="Injuries",
las=2)
axis(2, at=seq(0, 100000, 15000))
It appears that tornadoes are, by far, the most dangerous type of event for human health, causing the highest number of both fatalities and injuries. They are followed by heat and flood, again in both fatalities and injuries. Interesting to be noted: whilst rough waters are the fourth cause of fatalities, that event doesn’t appear in the top 10 of the injuries: one could say that water doesn’t forgive.
The impact on economy is calculated by creating a dataset containing the aggregation of damages in US dollars by event. Then, the top 10 events causing damages to properties and to crop fields are extracted.
#create an aggregated dataset of damages to properties and to crop fields by event
DamagesByEvent <- aggregate(econ_impact[c("PROPVAL", "CROPVAL")],
by=econ_impact["EVTYPE"],
FUN="sum")
#compute and show the top 10 damages to properties
top10PropDamages <- head(DamagesByEvent[order(DamagesByEvent$PROPVAL, decreasing=TRUE),], 10)
top10PropDamages
## EVTYPE PROPVAL CROPVAL
## 21 FLOOD 167529740320 12380109100
## 37 HURRICANE 84756180010 5515292800
## 67 STORM 64217927990 1380141300
## 69 TORNADO 56941932180 414961310
## 25 HAIL 17619990720 3114212850
## 84 WILDFIRE 8501628500 403281630
## 85 WIND 6070315700 767316950
## 58 ROUGH WATERS 5869280440 13973476000
## 68 THUNDERSTORM 4493691940 554007350
## 39 ICE 3968294360 5022114300
#compute and show the top 10 damages to crop fields
top10CropDamages <- head(DamagesByEvent[order(DamagesByEvent$CROPVAL, decreasing=TRUE),], 10)
top10CropDamages
## EVTYPE PROPVAL CROPVAL
## 58 ROUGH WATERS 5869280440 13973476000
## 21 FLOOD 167529740320 12380109100
## 37 HURRICANE 84756180010 5515292800
## 39 ICE 3968294360 5022114300
## 25 HAIL 17619990720 3114212850
## 23 FREEZE 37854500 1997061000
## 7 COLD 246579450 1416765550
## 67 STORM 64217927990 1380141300
## 26 HEAT 20325750 904469280
## 53 RAIN 3234687690 806162800
#create a panel plot of the results
par(mfrow=c(2,1))
par(cex = 0.6)
barplot(top10PropDamages$PROPVAL,
names=top10PropDamages$EVTYPE,
axes=FALSE,
main="Top 10 Damages by Type of Event - USA 1950-2011\nSource: NOAA Storm Events Database",
ylab="Damages to Properties",
las=2)
axis(2, at=seq(0, 200e9, 50e9))
barplot(top10CropDamages$CROPVAL,
names=top10CropDamages$EVTYPE,
axes=FALSE,
ylab="Damages to Crop Fields",
las=2)
axis(2, at=seq(0, 14e9, 1e9))
In this case floods, hurricanes, storms and tornadoes are the events causing most of the damages to properties. Crop fields are mostly impacted by rough waters and floods and, to a minor extent, by hurricanes and ice.