Synopsis

A short analysis is presented of the NOAA (National Oceanic and Atmospheric Administration) storm database that included recorded storm events in the USA from 1950 to 2011. Two questions were addressed: first, What are the event types that were most harmful to human health? Second, What are the event types that caused most economic damage? The data was first filtered to only contain events that caused harm to human health or caused economic damage. The remaining events were then calssified based on the 48 official event types of the NOAA. Events that could not be classified were ignored. Then the injuries and fatalities caused by the events of the different types were calculated and summaries presented. The same was done for the property and crop damage caused by the events. The actual costs had to be calculated first based on the units given in the data.

Data Processing

Loading the data into R

Loading the used R packages

require(dplyr)
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
require(ggplot2)
## Loading required package: ggplot2
require(xtable)
## Loading required package: xtable
require(tidyr)
## Loading required package: tidyr
require(gridExtra)
## Loading required package: gridExtra
## Loading required package: grid
# R can read compressed files directly, so no need for unzipping
# bzfile() can be used but is not necessary either
Data <- read.csv('repdata-data-StormData.csv.bz2', sep = ",", header = TRUE, na.strings = c("", "NA"))

Removing events with neither injuries nor damage

I first removed all rows (events) from the data that caused neither harm nor damage as they are irrelevant to this analysis.

Data <- tbl_df(Data) 

DataRelevant <- filter(Data, FATALITIES != 0 | INJURIES != 0 | 
                               PROPDMG != 0 | CROPDMG != 0)

Assigning the official event types to the described events in the data

The data contained the categorial variable EVTYPE in which the event types were listed. However, there were 985 different types listed whereas the NOAA considers only 48 official events. The official events can be found via their website (http://www.ncdc.noaa.gov/stormevents/details.jsp). I saved the official events in the variable Events.

Events = c("Astronomical Low Tide", "Avalanche", "Blizzard",
           "Coastal Flood", "Cold/Wind Chill", "Debris Flow", 
           "Dense Fog", "Dense Smoke", "Drought", "Dust Devil",
           "Dust Storm", "Excessive Heat", "Extreme Cold/Wind Chill",
           "Flash Flood", "Flood", "Frost/Freeze", "Funnel Cloud",
           "Freezing Fog", "Hail", "Heat", "Heavy Rain", "Heavy Snow",
           "High Surf", "High Wind", "Hurricane (Typhoon)", "Ice Storm",
           "Lake-Effect Snow", "Lakeshore Flood", "Lightning", 
           "Marine Hail", "Marine High Wind", "Marine Strong Wind",
           "Marine Thunderstorm Wind", "Rip Current", "Seiche",
           "Sleet", "Storm Surge/Tide", "Strong Wind", 
           "Thunderstorm Wind", "Tornado", "Tropical Depression", 
           "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout",
           "Wildfire", "Winter Storm", "Winter Weather")

To facilitate the search for these official events in the events listed in the Data, I had to consider terms such as flood that are part of several types, e.g. Flood and Flash Flood. I adjusted the order so flood was first, and the more specific flood times came later in the list. This was also done for “Heat”. Furthermore, the type Hurricane (Typhone) was changed to the regular expression Hurricane|Typhoon to allow the following regular expression search. The new list with these few changes was called EventsSearch, it was used to assign the listed event types with the official types. Events that could not be assigned were not considered.

# the adjusted search list of the 48 event types
EventsSearch = c("Hurricane|Typhoon", "Low Tide", "Avalanche", "Blizzard", "Flood",
           "Coastal Flood", "Cold/Wind Chill", "Debris Flow", 
           "Dense Fog", "Dense Smoke", "Drought", "Dust Devil",
           "Dust Storm", "Heat", "Excessive Heat", "Extreme Cold/Wind Chill",
           "Flash Flood", "Frost/Freeze", "Funnel Cloud",
           "Freezing Fog", "Hail", "Heavy Rain", "Heavy Snow",
           "High Surf", "High Wind", "Ice Storm",
           "Lake-Effect Snow", "Lakeshore Flood", "Lightning", 
           "Marine Hail", "Marine High Wind", "Marine Strong Wind",
           "Marine Thunderstorm Wind", "Rip Current", "Seiche",
           "Sleet", "Storm Surge/Tide", "Strong Wind", 
           "Thunderstorm Wind", "Tornado", "Tropical Depression", 
           "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout",
           "Wildfire", "Winter Storm", "Winter Weather")

# add a column with the 48 official storm types to the Data

DataRelevant$OffEvent = vector(mode = "character", 
                               length = dim(DataRelevant)[1])
for (i in EventsSearch){
        DataRelevant$OffEvent[grep(i, DataRelevant$EVTYPE,
                                   ignore.case = TRUE)] = i
}

# Events that could not be assigned to the official events were 
# removed

DataRelevant <- filter(DataRelevant, OffEvent != "")

The adjusted data still contained 188352 events that belonged to 45 of the official 48 types.

I only kept the relevant columns of the data.

DataRelevant <- select(DataRelevant, OffEvent, FATALITIES, INJURIES,
                       PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

Adding actual property and crops cost columns to the data

The data contains columns that specify the costs that were caused by damage of property (PROPDMG) or crops (CROPDMG). The units of the costs are given in two separate columns, PROPDMGEXP and CROPDMGEXP, respectively. A K stands for Thousand, a M for million, and a B for billion dollars. However, a look in the unit colums shows that also other “values” appear.

table(DataRelevant$PROPDMGEXP)
## 
##      +      -      0      1      2      3      4      5      6      7 
##      5      1    208      0      1      1      4     18      3      3 
##      8      ?      B      H      K      M      h      m 
##      0      0     36      6 169246  10574      1      7
table(DataRelevant$CROPDMGEXP)
## 
##     0     2     ?     B     K     M     k     m 
##    17     0     6     6 94237  1772    20     1

For the purpose of this analysis I set the few damage values without correct unit to NA. In the analysis of the damage, these events will be ignored.

So I added columns with the values of the damage in US Dollar to receive the final Data.frame DataRelevant that was used for the subsequent analyses.

DataRelevant$PROPDMGEXP <- as.character(DataRelevant$PROPDMGEXP)
DataRelevant$CROPDMGEXP <- as.character(DataRelevant$CROPDMGEXP)

DataK <- filter(DataRelevant, PROPDMGEXP == "K")
DataK <- mutate(DataK, Prop_Damage = PROPDMG*1000)
DataM <- filter(DataRelevant, PROPDMGEXP == "M" | PROPDMGEXP == "m")
DataM <- mutate(DataM, Prop_Damage = PROPDMG*1000000)
DataB <- filter(DataRelevant, PROPDMGEXP == "B")
DataB <- mutate(DataB, Prop_Damage = PROPDMG*1000000000)
DataOther <- filter(DataRelevant, PROPDMGEXP != "K", PROPDMGEXP != "M",
                 PROPDMGEXP != "m", PROPDMGEXP != "B")
DataOther <- mutate(DataOther, Prop_Damage = NA)
DataNA <- filter(DataRelevant, is.na(PROPDMGEXP))
DataNA <- mutate(DataNA, Prop_Damage = NA)

DataRelevant <- rbind(DataK, DataM, DataB, DataOther, DataNA) 
rm(DataK, DataB, DataM, DataOther, DataNA)

# same procedure for the crops damage cost

DataK <- filter(DataRelevant, CROPDMGEXP == "K" | CROPDMGEXP == "k")
DataK <- mutate(DataK, Crop_Damage = CROPDMG*1000)
DataM <- filter(DataRelevant, CROPDMGEXP == "M" | CROPDMGEXP == "m")
DataM <- mutate(DataM, Crop_Damage = CROPDMG*1000000)
DataB <- filter(DataRelevant, CROPDMGEXP == "B")
DataB <- mutate(DataB, Crop_Damage = CROPDMG*1000000000)
DataOther <- filter(DataRelevant, CROPDMGEXP != "K", CROPDMGEXP != "M",
                    CROPDMGEXP != "m", CROPDMGEXP != "B", 
                    CROPDMGEXP != "k")
DataOther <- mutate(DataOther, Crop_Damage = NA)
DataNA <- filter(DataRelevant, is.na(CROPDMGEXP))
DataNA <- mutate(DataNA, Crop_Damage = NA)

DataRelevant <- rbind(DataK, DataM, DataB, DataOther, DataNA)
rm(DataK, DataB, DataM, DataOther, DataNA)

Results

Analysis of storm events that caused the most injuries and fatalities

For all of the events classified into the 48 different categories, I calculated the mean and the sum of injuries and fatalities.

Group_Data <- group_by(DataRelevant, OffEvent)

SummaryPersonDamageSum <- summarise(Group_Data, 
                                 Sum_Injuries = sum(INJURIES),
                                 Sum_Fatalities = sum(FATALITIES))

SummaryPersonDamageMean <- summarise(Group_Data, 
                                     Mean_Injuries = mean(INJURIES),
                                     Mean_Fatalities = mean(FATALITIES))

SummaryPersonDamageSum <- arrange(SummaryPersonDamageSum, 
                                  desc(Sum_Injuries), 
                                  desc(Sum_Fatalities))

SummaryPersonDamageMean <- arrange(SummaryPersonDamageMean, 
                                  desc(Mean_Injuries), 
                                  desc(Mean_Fatalities))

The results are shown in the following two tables, that were ordered in decending order after injuries and then fatalities. First the Table summarising the sum of all Injuries and Fatalities caused by all events of the different types from 1950 - 2011 in the USA

        table_sum <- xtable(SummaryPersonDamageSum, digits = 0)
        print(table_sum, type = 'html')
OffEvent Sum_Injuries Sum_Fatalities
1 Tornado 91364 5658
2 Flood 6795 483
3 Excessive Heat 6525 1922
4 Lightning 5232 817
5 Heat 2699 1216
6 Thunderstorm Wind 2428 209
7 Ice Storm 1992 89
8 Flash Flood 1800 1035
9 High Wind 1507 297
10 Hail 1466 20
11 Winter Storm 1353 217
12 Hurricane|Typhoon 1333 133
13 Heavy Snow 1034 127
14 Wildfire 911 75
15 Blizzard 805 101
16 Winter Weather 538 61
17 Rip Current 529 577
18 Dust Storm 440 22
19 Tropical Storm 383 66
20 Dense Fog 342 18
21 Strong Wind 323 125
22 Heavy Rain 255 99
23 High Surf 204 146
24 Avalanche 170 224
25 Tsunami 129 33
26 Waterspout 72 6
27 Dust Devil 43 2
28 Extreme Cold/Wind Chill 24 125
29 Cold/Wind Chill 12 95
30 Coastal Flood 7 6
31 Storm Surge/Tide 5 11
32 Drought 4 0
33 Funnel Cloud 3 0
34 Marine High Wind 1 1
35 Sleet 0 2
36 Dense Smoke 0 0
37 Freezing Fog 0 0
38 Frost/Freeze 0 0
39 Lake-Effect Snow 0 0
40 Lakeshore Flood 0 0
41 Low Tide 0 0
42 Marine Hail 0 0
43 Seiche 0 0
44 Tropical Depression 0 0
45 Volcanic Ash 0 0

Second the Table showing the mean numbers of Injuries and Fatalities caused by the events of the different types from 1950 - 2011

        table_mean <- xtable(SummaryPersonDamageMean, digits = 2)
        print(table_mean, type = 'html')
OffEvent Mean_Injuries Mean_Fatalities
1 Heat 9.96 4.49
2 Tsunami 9.21 2.36
3 Excessive Heat 9.20 2.71
4 Hurricane|Typhoon 5.75 0.57
5 Dense Fog 4.62 0.24
6 Dust Storm 4.27 0.21
7 Blizzard 3.16 0.40
8 Ice Storm 2.79 0.12
9 Tornado 2.29 0.14
10 Waterspout 1.12 0.09
11 High Surf 1.09 0.78
12 Wildfire 1.06 0.09
13 Winter Weather 0.98 0.11
14 Tropical Storm 0.91 0.16
15 Winter Storm 0.89 0.14
16 Rip Current 0.82 0.90
17 Heavy Snow 0.74 0.09
18 Flood 0.64 0.05
19 Avalanche 0.63 0.84
20 Dust Devil 0.45 0.02
21 Lightning 0.39 0.06
22 High Wind 0.24 0.05
23 Funnel Cloud 0.23 0.00
24 Heavy Rain 0.22 0.09
25 Extreme Cold/Wind Chill 0.22 1.13
26 Cold/Wind Chill 0.13 1.06
27 Storm Surge/Tide 0.11 0.23
28 Strong Wind 0.09 0.04
29 Flash Flood 0.08 0.05
30 Hail 0.06 0.00
31 Marine High Wind 0.05 0.05
32 Thunderstorm Wind 0.04 0.00
33 Coastal Flood 0.03 0.03
34 Drought 0.02 0.00
35 Sleet 0.00 0.20
36 Dense Smoke 0.00 0.00
37 Freezing Fog 0.00 0.00
38 Frost/Freeze 0.00 0.00
39 Lake-Effect Snow 0.00 0.00
40 Lakeshore Flood 0.00 0.00
41 Low Tide 0.00 0.00
42 Marine Hail 0.00 0.00
43 Seiche 0.00 0.00
44 Tropical Depression 0.00 0.00
45 Volcanic Ash 0.00 0.00

Analysis of storm events that caused the most economic damage

I calculated the mean and the sum of the property and crop damages caused by the events of the different event types, and ordered the results after the total damage (property plus crop) with the event types causing the highest damage on top.

SummaryCostSum <- summarise(Group_Data, 
                            Sum_Property_Damage = sum(Prop_Damage, na.rm = TRUE),
                            Sum_Crop_Damage = sum(Crop_Damage, na.rm = TRUE))
                                
SummaryCostSum <- mutate(SummaryCostSum, Total = Sum_Property_Damage +
                                 Sum_Crop_Damage)

SummaryCostSum <- arrange(SummaryCostSum, desc(Total),
                          desc(Sum_Property_Damage))

SummaryCostMean <- summarise(Group_Data, 
                            Mean_Property_Damage = mean(Prop_Damage, na.rm = TRUE),
                            Mean_Crop_Damage = mean(Crop_Damage, na.rm = TRUE))

SummaryCostMean <- mutate(SummaryCostMean, Total = Mean_Property_Damage +
                                  Mean_Crop_Damage)

SummaryCostMean <- arrange(SummaryCostMean, desc(Total),
                           desc(Mean_Property_Damage),
                           desc(Mean_Crop_Damage))

### For the Plots the Data was rearranged and the levels of the OffEvent factor ordered based on the Total damage
names(SummaryCostSum)[2:3] = c("Property", "Crop")

Plot_SummaryCostSum <- SummaryCostSum[1:15,]


Plot_SummaryCostSum <- gather(Plot_SummaryCostSum, Type, Damage,
                              -OffEvent, -Total)
Plot_SummaryCostSum <- arrange(Plot_SummaryCostSum, 
                               desc(Total))
Plot_SummaryCostSum$Damage <- Plot_SummaryCostSum$Damage/10^9

Plot_SummaryCostSum$OffEvent <- as.factor(Plot_SummaryCostSum$OffEvent)

# Reorder the levels
Plot_SummaryCostSum$OffEvent <-
        reorder(Plot_SummaryCostSum$OffEvent,-Plot_SummaryCostSum$Total)

### same procedure for the mean data

names(SummaryCostMean)[2:3] = c("Property", "Crop")

Plot_SummaryCostMean <- SummaryCostMean[1:15,]

Plot_SummaryCostMean <- gather(Plot_SummaryCostMean, Type, Damage,
                               -OffEvent, -Total)
Plot_SummaryCostMean <- arrange(Plot_SummaryCostMean, 
                                desc(Total))
Plot_SummaryCostMean$Damage <- Plot_SummaryCostMean$Damage/1000000

Plot_SummaryCostMean$OffEvent <- as.factor(Plot_SummaryCostMean$OffEvent)

Plot_SummaryCostMean$OffEvent <-
        reorder(Plot_SummaryCostMean$OffEvent,-Plot_SummaryCostMean$Total)
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", 
               "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
TrS <- ggplot(Plot_SummaryCostSum, 
              aes(x = OffEvent, y = Damage, fill = Type)) +
        geom_bar(stat = "identity", position = position_dodge(),
               alpha = 3/4) +
        ylab("Damage in Billion US Dollar") +
        xlab("NOAA Storm Event Type") +
        ggtitle("Total Damage (1950 - 2011)") +
        theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
        scale_fill_manual(values=cbPalette[2:3]) +
        theme(legend.position="top")

TrS2 <- TrS + 
        coord_cartesian(ylim=c(0, 20))

grid.arrange(TrS, TrS2, ncol = 2)

Figure Caption: The 15 storm event types whose recorded events caused the highest total damage of property and crop in the USA from 1950 - 2011 are shown. The figure on the right is the same as on the left just with an adjusted y-axis scale.

TrM <- ggplot(Plot_SummaryCostMean, 
              aes(x = OffEvent, y = Damage, fill = Type)) +
        geom_bar(stat = "identity", position = position_dodge(),
                 alpha = 3/4) +
        ylab("Damage in Million US Dollar") +
        xlab("NOAA Storm Event Type") +
        ggtitle("Mean Damage Caused by Events (1950 - 2011)") +
        theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
        scale_fill_manual(values=cbPalette[2:3])
        #theme(legend.position="top")

TrM

Figure showing the total damage caused by the recorded storm events of the different types in the USA between 1950-2011

Figure Caption: The 15 storm event types whose recorded events caused the highest average damage of property and crop in the USA from 1950 - 2011.