Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
In this report we present a brief overview on the impact of some of these main events. More specifically we focus on the effects on human life and their economic impact. This helps to pain a cleared picture on the total cost of the most destructive severe weather events.
The data for this project comes from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and was obtained from the archive Storm Data. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. A detailed description of the data set can be obtained here.
The downloaded archive (bzip2) was loaded into a data frame directly using the read.csv function.
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "repdata-data-StormData.csv.bz2")
raw = read.csv("repdata-data-StormData.csv.bz2", na.strings = "")
dim(raw)
## [1] 902297 37
We can look at the first few rows of the dataset to get an idea of the data represented in this huge data set of 902297 observations.
head(raw)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 <NA> <NA> <NA> <NA> 0
## 2 TORNADO 0 <NA> <NA> <NA> <NA> 0
## 3 TORNADO 0 <NA> <NA> <NA> <NA> 0
## 4 TORNADO 0 <NA> <NA> <NA> <NA> 0
## 5 TORNADO 0 <NA> <NA> <NA> <NA> 0
## 6 TORNADO 0 <NA> <NA> <NA> <NA> 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 <NA> <NA> 14.0 100 3 0 0
## 2 NA 0 <NA> <NA> 2.0 150 2 0 0
## 3 NA 0 <NA> <NA> 0.1 123 2 0 0
## 4 NA 0 <NA> <NA> 0.0 100 2 0 0
## 5 NA 0 <NA> <NA> 0.0 150 2 0 0
## 6 NA 0 <NA> <NA> 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0 <NA> <NA> <NA> <NA>
## 2 0 2.5 K 0 <NA> <NA> <NA> <NA>
## 3 2 25.0 K 0 <NA> <NA> <NA> <NA>
## 4 2 2.5 K 0 <NA> <NA> <NA> <NA>
## 5 2 2.5 K 0 <NA> <NA> <NA> <NA>
## 6 6 2.5 K 0 <NA> <NA> <NA> <NA>
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 <NA> 1
## 2 3042 8755 0 0 <NA> 2
## 3 3340 8742 0 0 <NA> 3
## 4 3458 8626 0 0 <NA> 4
## 5 3412 8642 0 0 <NA> 5
## 6 3450 8748 0 0 <NA> 6
names(raw)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
library(dplyr, warn.conflicts = FALSE)
raw <- tbl_df(raw)
names(raw) <- sub("_+", ".", tolower(names(raw)))
For the analysis in this report we use the R package dplyr, to turn our huge data frame into a format that is easier to work with. Also we change the row names to lower case letters and remove the underscores to replace them with periods, in line with the R naming conventions.
There are two major questions that we are trying to address in this report:
1. Identify the weather events that are the most harmful in terms of population health.
2. Identify the weather events that have the greatest economic consequences.
As we can see above, our data set contains two columns that contain the number of fatalities and injuries, respectively, for each specific severe weather event. We can use this information to aggregate the total number of injuries for each event type and then order these events by the total toll on human life.
harm <- summarise(group_by(raw, evtype), deaths = sum(fatalities), injuries = sum(injuries),
totaltoll = sum(fatalities) + sum(injuries))
harm <- arrange(harm, desc(totaltoll))
harm[1:15, ]
## Source: local data frame [15 x 4]
##
## evtype deaths injuries totaltoll
## (fctr) (dbl) (dbl) (dbl)
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
## 11 HIGH WIND 248 1137 1385
## 12 HAIL 15 1361 1376
## 13 HURRICANE/TYPHOON 64 1275 1339
## 14 HEAVY SNOW 127 1021 1148
## 15 WILDFIRE 75 911 986
To get a better view of the scale of comparison, we can plot the total number of both injuries and deaths by event type for the 15 events that carry the greatest impacts.
library(ggplot2, warn.conflicts = FALSE)
library(gridExtra, warn.conflicts = FALSE)
deathsplot <- ggplot(arrange(harm, desc(deaths))[1:15,], aes(x=reorder(evtype, desc(deaths)), y=deaths)) +
geom_bar(stat = "identity", fill = "magenta4") +
labs(title = "Total number of deaths and injuries, by event", y = "total deaths") +
theme(axis.text.x = element_text(angle = 35, hjust = 1), axis.title.x=element_blank())
injrplot <- ggplot(arrange(harm, desc(injuries))[1:15,], aes(x=reorder(evtype, desc(injuries)), y=injuries)) +
geom_bar(stat = "identity", fill = "midnightblue") +
labs(x = "event type", y = "total injuries") +
theme(axis.text.x = element_text(angle = 35, hjust = 1))
grid.arrange(deathsplot, injrplot, nrow = 2)
We can see from the figure that tornadoes cause the greatest number of both human injuries and casualties. Exessive heat is a close second in terms of total deaths (and in terms of total number of injuries and deaths combined), followed by flash floods and heat. Thunderstorm winds (TSTM), floods and heat also cause similar number of injuries, following tornadoes.
Our data set contains variables corresponding to the values of property damage and crop damage caused by the storm.
head(select(raw, propdmg, propdmgexp, cropdmg, cropdmgexp), 5)
## Source: local data frame [5 x 4]
##
## propdmg propdmgexp cropdmg cropdmgexp
## (dbl) (fctr) (dbl) (fctr)
## 1 25.0 K 0 NA
## 2 2.5 K 0 NA
## 3 25.0 K 0 NA
## 4 2.5 K 0 NA
## 5 2.5 K 0 NA
The propdmg and cropdmg variables give the base number for the value of property and crop damage, respectively, while propdmgexp and cropdmgexp give the respective exponents. They can be “K” for thousands (dollars), “M” for millions and “B” for billions. In very few cases, these variables can take a few other values but we ignore them since they occur in such small numbers, and the base number in a lot of those cases is zero anyway.
table(filter(raw, cropdmg != 0)$cropdmgexp)
##
## ? 0 2 B k K m M
## 0 12 0 7 21 20137 1 1918
So we mutate the data frame to add two new variables for the value of the damages so we can convert the base numbers and exponents into real dollar values. Then we can add the two values up and aggregate the total value of the damages for each event type.
raw <- mutate(raw, proptotal =
ifelse(is.na(propdmgexp), 0, propdmg *
ifelse(propdmgexp == "K", 1000,
ifelse(propdmgexp == "M", 1000000,
ifelse(propdmgexp == "B", 1000000000, 0)))))
raw <- mutate(raw, croptotal =
ifelse(is.na(cropdmgexp), 0, cropdmg *
ifelse(cropdmgexp == "K", 1000,
ifelse(cropdmgexp == "M", 1000000,
ifelse(cropdmgexp == "B", 1000000000, 0)))))
damages <- summarise(group_by(raw, evtype), totaldamage = sum(proptotal) + sum(croptotal))
damages <- arrange(damages, desc(totaldamage))
damages[1:15, ]
## Source: local data frame [15 x 2]
##
## evtype totaldamage
## (fctr) (dbl)
## 1 FLOOD 150319678250
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57340613590
## 4 STORM SURGE 43323541000
## 5 HAIL 18752904170
## 6 FLASH FLOOD 17562128610
## 7 DROUGHT 15018672000
## 8 HURRICANE 14610229010
## 9 RIVER FLOOD 10148404500
## 10 ICE STORM 8967041310
## 11 TROPICAL STORM 8382236550
## 12 WINTER STORM 6715441250
## 13 HIGH WIND 5908617560
## 14 WILDFIRE 5060586800
## 15 TSTM WIND 5038935790
To get a better view of the economic consequences of the costliest weather events and how they compare to one another we can build a bar plot of the total damage from the storms.
damagesplot <- ggplot(arrange(damages, desc(totaldamage))[1:15,],
aes(x=reorder(evtype, desc(totaldamage)), y=totaldamage/1000000000))
damagesplot + geom_bar(stat = "identity", fill = "forestgreen") +
labs(title = "Total property and crops damage, by event") +
labs(x = "event type", y = "total value (in billions $)") +
theme(axis.text.x = element_text(angle = 35, hjust = 1))
As we can see, floods are the costliest of all, repsonsible for about $150 billion in damages, from the data collected in our data set. Hurricanes, tornadoes and storm surges follow next with around $40 - $70 billion in damages.
In this brief report, we have looked at wchich types of severe weather events have the greatest economic consequences and the biggest impact on human health. We have based our analysis on a data set of select severe weather event records from 1951 to 2012.