Human and Economic Consequences of Hazardous Weather Events in the United States Between 1950 and 2011

Author: Toni Rib

Synopsis

In this report I aim to describe both the human and economic consequences of various storm events in the United States between the years 1950 and 2011. Data was obtained from the National Oceanic and Atmospheric Administration’s (NOAA) storm database for the analysis. From the data, I concluded that while Tornados are the overwhelming cause of human death and injury, floods andd hurricanes/typhoons tend to cause the most economic damage as measured in property and crop damage costs.

Data Processing

The raw data for this research was obtained from this link from the Coursera Reproducible Research course website for Peer Assessment 2 on Monday, May 18. However, the data originally came from the United States National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur as well as estimates of any fatalities, injuries, and property/crop damage.

Reading in the Raw Data

The data is originally in comma-separated values (csv) format and compressed with the bz2 algorithm. The ‘read.csv’ function can read in this type of data without decompressing it first, so I will use that to add the raw data to an R data frame called ‘sdata.’

sdata <- read.csv("repdata-data-StormData.csv.bz2", header = TRUE)

The result is a data set of 37 variables and 902297 observations.

The next two sections will create the tidy data sets that will be used for the analysis. One data set will be used for analysis of population health while the second data set will be used for analysis of economic consequences.

Data Processing for Population Health

In the first section of our analysis, we are going to be looking at events which are harmful to human health. We assume that ‘harmful to human health’ means there must be at least one injury or fatality related to the observation. We will create a data set that only includes observations with at least one fatality or injury and sort them based on the number of fatalities and injuries.

mdata <- sdata[sdata$INJURIES > 0 | sdata$FATALITIES > 0, ]
## Sort data in decreasing order based on fatalities then injuries
library(plyr)
mdata <- arrange(mdata, FATALITIES, INJURIES, decreasing = TRUE)

One of the columns we are interested in is the type of event (variable EVTYPE). A quick look at this variable using ‘str’ shows that this is a factor variable with 985 levels.

str(mdata$EVTYPE)
##  Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 275 834 834 834 130 834 834 130 130 834 ...

However, now that we have removed observations with no injuries or fatalities, we can use the aggregate function to see how many event types are still left in our data set.

mdataAggregates <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data = mdata, sum)

With this new data set, there are only 220 event types in our data set.

We can also add a new column that contains the total number of fatalities plus injuries for each storm event type. To make the data easier to read, we will sort the data in decreasing order on this new column.

library(plyr)
mdataAggregates$TOTALHARM <- mdataAggregates$INJURIES + mdataAggregates$FATALITIES
mdataAggregates <- arrange(mdataAggregates, TOTALHARM, decreasing = TRUE)

We will look at this data set further in the Results section.

Data Processing for Economic Consequences

For this section, we are interested in the economic consequences of storms. We will assume that ‘economic consequences’ means total cost of property damage plus crop damage. These are columns ‘PROPDMG’ and ‘CROPDMG’ in the original, raw data set (which we called ‘sdata’) with the associated columns ‘PROPDMGEXP’ and ‘CROPDMGEXP’ which give a one letter symbol signifying the units of the first two columns.

Similar to our previous data processing, we will first remove all observations where no property damage or crop damage was present, as these are not relevant to our analysis. We will also shrink down our data set to only include the columns we are interested in.

econdata <- sdata[sdata$PROPDMG > 0 | sdata$CROPDMG > 0, ]
econSimple <- econdata[ ,c(8, 25, 26, 27, 28)]
library(plyr)
econSimple <- arrange(econSimple, PROPDMG, CROPDMG, decreasing = TRUE)

Before we can aggregate the data for each storm event type, we need to first take into account the fact that the numbers in the ‘PROPDMG’ and ‘CROPDMG’ columns are not all in the same units. We will need to use the ‘PROPDMPEXP’ and ‘CROPDMGEXP’ columns to convert the numbers into actual dollars, all in the same units of $1. First we will take a look at what symbols we have for each column and how many of our observations fall into each category.

## Property Damage Units
table(econSimple$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
##   4357      1      0      5    209      0      1      1      4     18 
##      6      7      8      B      h      H      K      m      M 
##      3      2      0     40      1      6 229057      7  11319
## Crop Damage Units
table(econSimple$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 145037      6     17      0      7     21  97960      1   1982

There are some unusual units in the data, such as ‘?’ or ‘+’ which do not follow the rules outlined in section 2.7 of the National Weather Service Storm Data Documentation which only allows , K (thousands), M (millions), and B (billions). However, from the tables above we can see that these account for very little of the observations, so we will drop them from the data. Additionally, we will convert the values in the ‘PROPDMG’ and ‘CROPDMG’ columns to their actual values (in units of $1) by multiplying by the appropriate magnitude from the ‘PROPDMGEXP’ and ‘CROPDMGEXP’ columns.

We do this by adding two additional columns, ‘PROPMAG’ and ‘CROPMAG’ which will be equal to either 0, 1, 1000, 100000, or 1000000000 depending on the values in the ‘PROPDMGEXP’ and ‘CROPDMGEXP’ columns. All rows will initially be set to a value of 0, then only those rows we wish to keep based on the above criteria will be set to the appropriate non-zero value.

After the new columns are added, we can calculate the actual values of the damage by multiplying the columns together.

## Set all initial multipliers to 0
econSimple$PROPMAG <- 0
econSimple$CROPMAG <- 0

## Set correct multipliers for property damage
econSimple$PROPMAG[econSimple$PROPDMGEXP == "B"] <- 1000000000
econSimple$PROPMAG[econSimple$PROPDMGEXP == "M" | econSimple$PROPDMGEXP == "m"] <- 1000000
econSimple$PROPMAG[econSimple$PROPDMGEXP == "K" | econSimple$PROPDMGEXP == "k"] <- 1000
econSimple$PROPMAG[econSimple$PROPDMGEXP == ""] <- 1

## Set correct multipliers for property damage
econSimple$CROPMAG[econSimple$CROPDMGEXP == "B"] <- 1000000000
econSimple$CROPMAG[econSimple$CROPDMGEXP == "M" | econSimple$CROPDMGEXP == "m"] <- 1000000
econSimple$CROPMAG[econSimple$CROPDMGEXP == "K" | econSimple$CROPDMGEXP == "k"] <- 1000
econSimple$CROPMAG[econSimple$CROPDMGEXP == ""] <- 1

## Calculate actual values
econSimple$PROPDMGACT <- econSimple$PROPDMG * econSimple$PROPMAG
econSimple$CROPDMGACT <- econSimple$CROPDMG * econSimple$CROPMAG

Now that we have converted the columns to actual dollars and set the data we are not interested in to zero, we can use the aggregate function to calculate the total damage for both property and crops for each storm event type. We also add a column for the total damage cost which is equal to the property damage cost plus the crop damage cost. For easier viewing, we sort this new ‘TOTALDAMAGE’ column with the highest values at the top.

EconAggregates <- aggregate(cbind(PROPDMGACT, CROPDMGACT) ~ EVTYPE, data = econSimple, sum)
EconAggregates$TOTALDAMAGE <- EconAggregates$PROPDMGACT + EconAggregates$CROPDMGACT
EconAggregates <- arrange(EconAggregates, TOTALDAMAGE, decreasing = TRUE)
head(EconAggregates, 10)
##               EVTYPE   PROPDMGACT  CROPDMGACT  TOTALDAMAGE
## 1              FLOOD 144657709807  5661968450 150319678257
## 2  HURRICANE/TYPHOON  69305840000  2607872800  71913712800
## 3            TORNADO  56937160483   414953110  57352113593
## 4        STORM SURGE  43323536000        5000  43323541000
## 5               HAIL  15732266777  3025954453  18758221230
## 6        FLASH FLOOD  16140811717  1421317100  17562128817
## 7            DROUGHT   1046106000 13972566000  15018672000
## 8          HURRICANE  11868319010  2741910000  14610229010
## 9        RIVER FLOOD   5118945500  5029459000  10148404500
## 10         ICE STORM   3944927810  5022113500   8967041310

We will look at this data set further in the results section.

Results

Now that we have two tidy data sets, we analyze them to answer the following two questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Results for Population Health

Looking at the first 10 rows of the ‘mdataAggregates’ data set, we can see there is a clear winner in all 3 columns: tornados.

head(mdataAggregates, 10)
##               EVTYPE FATALITIES INJURIES TOTALHARM
## 1            TORNADO       5633    91346     96979
## 2     EXCESSIVE HEAT       1903     6525      8428
## 3          TSTM WIND        504     6957      7461
## 4              FLOOD        470     6789      7259
## 5          LIGHTNING        816     5230      6046
## 6               HEAT        937     2100      3037
## 7        FLASH FLOOD        978     1777      2755
## 8          ICE STORM         89     1975      2064
## 9  THUNDERSTORM WIND        133     1488      1621
## 10      WINTER STORM        206     1321      1527

It should be noted that while some event types could be combined for a more detailed analysis (for example: ‘TSTM WIND’ and ‘THUNDERSTORM WIND’), even adding these two columns together will only account to 10% of the total injuries and fatalities cause by tornados. In fact, tornados count for approximately 62% of ALL fatalities & injuries in the data set.

(mdataAggregates[mdataAggregates$EVTYPE == "TORNADO", 4] / (sum(mdataAggregates$TOTALHARM))) * 100
## [1] 62.29661

Therefore, combining even some of the top ranking events does not remove the TORNADO category from the number one spot.

However, we would also like to look at the top categories below tornados, so we will combine a few of the top categories to get a clear picture of which storm event catgories cause the most damage to humans. Ignoring the TORNADO outlier in the first column, the total fatalities and injuries in columns 2-15 account for almost 80% of the total fatalities and injuries in the data set (again, minus TORNADOS) so we will focus on those columns.

head(mdataAggregates, 15)
##               EVTYPE FATALITIES INJURIES TOTALHARM
## 1            TORNADO       5633    91346     96979
## 2     EXCESSIVE HEAT       1903     6525      8428
## 3          TSTM WIND        504     6957      7461
## 4              FLOOD        470     6789      7259
## 5          LIGHTNING        816     5230      6046
## 6               HEAT        937     2100      3037
## 7        FLASH FLOOD        978     1777      2755
## 8          ICE STORM         89     1975      2064
## 9  THUNDERSTORM WIND        133     1488      1621
## 10      WINTER STORM        206     1321      1527
## 11         HIGH WIND        248     1137      1385
## 12              HAIL         15     1361      1376
## 13 HURRICANE/TYPHOON         64     1275      1339
## 14        HEAVY SNOW        127     1021      1148
## 15          WILDFIRE         75      911       986
(sum(mdataAggregates$TOTALHARM[2:15]) / sum(mdataAggregates$TOTALHARM[2:220])) * 100
## [1] 79.1086

In the first 15 columns, we can combine ‘TSTM WIND’ with ‘THUNDERSTORM WIND’; ‘EXCESSIVE HEAT’ with ‘HEAT’; and ‘WINTER STORM’ with ‘HEAVY SNOW.’ We can then re-aggregate and sort the data.

h <- mdataAggregates[1:15, ]
h$EVTYPE[h$EVTYPE == "TSTM WIND"] <- "THUNDERSTORM WIND"
h$EVTYPE[h$EVTYPE == "HEAT"] <- "EXCESSIVE HEAT"
h$EVTYPE[h$EVTYPE == "HEAVY SNOW"] <- "WINTER STORM"
hAgg <- aggregate(cbind(FATALITIES, INJURIES, TOTALHARM) ~ EVTYPE, data = h, sum)
hAgg <- arrange(hAgg, TOTALHARM, decreasing = TRUE)
print(hAgg)
##               EVTYPE FATALITIES INJURIES TOTALHARM
## 1            TORNADO       5633    91346     96979
## 2     EXCESSIVE HEAT       2840     8625     11465
## 3  THUNDERSTORM WIND        637     8445      9082
## 4              FLOOD        470     6789      7259
## 5          LIGHTNING        816     5230      6046
## 6        FLASH FLOOD        978     1777      2755
## 7       WINTER STORM        333     2342      2675
## 8          ICE STORM         89     1975      2064
## 9          HIGH WIND        248     1137      1385
## 10              HAIL         15     1361      1376
## 11 HURRICANE/TYPHOON         64     1275      1339
## 12          WILDFIRE         75      911       986

This data is better viewed in a barplot showing the top categories.

par(las=2, mar = c(11,5,2,2), cex.axis=0.75)
with(data = hAgg, barplot(TOTALHARM, names.arg = hAgg$EVTYPE[1:12],
                          main = "Total Fatalities & Injuries by Storm Event", 
                          ylab = "# of Fatalities + Injuries",
                          col = "red"))

From this barplot, we can see that the storm event types that cause the next highest number of fatalities and injuries are excessive heat, thunderstorm wind, floods, and lightning. After lightning, the numbers drop off significantly.

Results for Economic Consequences

To draw some conclusions about the economic consequences of storms, we will create a stacked barplot that allows us to see not only the total aggregate damage, but also the damage costs broken out by property and crops. We will only include the first 20 columns. To do this, we also need to drop the TOTALDAMAGE column.

ecoSmall <- EconAggregates[1:20, 1:3]
ecoSmall$EVTYPE <- reorder(ecoSmall$EVTYPE, rowSums(ecoSmall[-1]))
library(reshape2)
library(ggplot2)
eco.m <- melt(ecoSmall, id.var = "EVTYPE")
eco.m <- arrange(eco.m, variable, value, decreasing = TRUE)
ggplot(eco.m, aes(x=EVTYPE, y = value, fill = variable)) + geom_bar(stat= 'identity') + 
      theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
      labs(title = "Property and Crop Damage by Storm Event", 
           x = "Storm Event Type", y ="Cost of Damage in US Dollars")

From this plot we can see that floods cause the most property damage, followed by hurricane/typhoon and tornados. However, the majority of crop damage is caused from drought, which is interested since it is the opposite of flooding. Therefore, while too much rain is bad for property, too little rain is bad for crops.