Synopsis

This report takes NOAAs meteorological events list and attempt to draw health and economic generalities from it. I will take all data, include only those that have fatalities, injuries or economic damage via property or crops and then, for comparison, only include data from 1994 onward because it appears that is when more category types were included. Cleaning of the event names is essential to the categorical analysis and will entail taking NOAAs official name list and coercing most of the data points into those categories and pruning high value categories to a more general one. Once the events are more generalised, we’ll compare the top events by year to see trends in the most destructive types then rank them individually for fatalities, injures then economic value. I’ll then cross reference the top 20 of those lists to see what shows up between both fatalities and injuries as well as what events cause the worst effects in all three realms.

Data Processing

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.3
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(stringdist)
## Warning: package 'stringdist' was built under R version 3.2.3
library(lubridate)

Read in the Data

#Storm Data Downloaded Sunday, Jan 24 at 08:24 AM PST
StormDataFull <- tbl_df(read.csv(bzfile('repdata-data-StormData.csv.bz2')))

Start Processing

Getting Familiar with the Data

StormData <- select(StormDataFull, Begin = BGN_DATE, Event = EVTYPE, FATALITIES:CROPDMGEXP)
EventTable <- tbl_df(as.data.frame(with(StormData, table(Event)))) %>% arrange(desc(Freq))

Looks like Hail, TSTM Wind, Thunderstorm Wind, Tornado, Flash Floods, Floods, Thunderstorm Winds, High Wind, Lightning and Heavy Snow are our top ten number of entries. Also, Thunderstorm Wind, TSTM Wind and Thunderstorm Winds should be the same category. There may be other categories that are similar to each other so let’s take a look:

#conducted name similarity to see how the names were grouping
#NameListLong <- EventTable$Event
#agrep(NameListLong[1], NameListLong, ignore.case = TRUE, value = TRUE, useBytes = FALSE)
#agrep(NameListLong[2], NameListLong, ignore.case = TRUE, value = TRUE, useBytes = FALSE)
#agrep(NameListLong[3], NameListLong, ignore.case = TRUE, value = TRUE, useBytes = FALSE)
#grep(NameListLong[4], NameListLong, ignore.case = TRUE, value = TRUE, useBytes = FALSE)
#agrep(NameListLong[5], NameListLong, ignore.case = TRUE, value = TRUE, useBytes = FALSE)
#commented out for brevity of report

This was only the first top five categories and looks like a lot of the categories are duplicated as well and events can be in multiple categories with some modifiers. Later, I’d like to generalize these so that smaller categories get lumped with the main ones. But first, let’s look at economic and health effects at this stage.

Modify Damage amounts based on exponent

Exponents <- unique(c(levels(StormData$PROPDMGEXP), levels(StormData$CROPDMGEXP)))
Expvalues <- as.numeric(c(1, 1, 1, 1, 1, 10^1, 10^2, 10^3, 10^4, 10^5, 10^6, 10^7, 10^9, 10^9, 10^2, 10^2, 10^3, 10^6, 10^6, 10^3))
#create a dictionary of values and mapped values
Modifier <- data.frame(DMGEXP = Exponents, Values = Expvalues, stringsAsFactors = TRUE)
#Use the dictionary to change the original exp values for property and crops to numerical values
StormData <- left_join(StormData, Modifier, by = c("PROPDMGEXP"="DMGEXP")) %>% rename(PROPEXP = Values)
## Warning in left_join_impl(x, y, by$x, by$y): joining factors with different
## levels, coercing to character vector
StormData <- left_join(StormData, Modifier, by = c("CROPDMGEXP"="DMGEXP")) %>% rename(CROPEXP = Values)
## Warning in left_join_impl(x, y, by$x, by$y): joining factors with different
## levels, coercing to character vector
#Use the values provided and exp values create total values for property and crop
#Then add those two together to get total economic impact
StormData <- mutate(StormData, PROPVALUE = PROPDMG * PROPEXP) %>% 
    mutate(CROPVALUE = CROPDMG * CROPEXP) %>%
    mutate(TOTALVALUE = PROPVALUE + CROPVALUE) %>% 
    select(Begin, Event, Fatalities = FATALITIES, Injuries = INJURIES, TotalValue = TOTALVALUE) %>%
    filter(Fatalities > 0 | Injuries > 0 | TotalValue > 0)

Now to do a little pre-processing before we start lumping categories together, let’s entirely eliminate the ones that simply don’t matter to the statistics. For this, we’ll group by event type and only keep those in which there was a fatality, injury, or some form of property damage.

#Translate all Events to uppercase as well to prevent duplicates by case
StormData$Event <- toupper(StormData$Event)
StormData$Event <- gsub('TSTMW', 'THUNDERSTORM WIND', x = StormData$Event)
StormData$Event <- as.factor(gsub('TSTM', 'THUNDERSTORM', x = StormData$Event))
NameList <- levels(as.factor(as.character(StormData$Event))) #remove uneeded factor levels

Right off the bat we’ve gone from 985 categories to 441 categories. Now this is still ten times larger than the official list given in the NOAA data documentation but it’s a start.

ValidNames <- as.factor(c('ASTRONOMICAL LOW TIDE', 'AVALANCHE', 'BLIZZARD', 'COASTAL FLOOD', 'COLD/WIND CHILL', 'DEBRIS FLOW', 'DENSE FOG', 'DENSE SMOKE', 'DROUGHT', 'DUST DEVIL', 'DUST STORM', 'EXCESSIVE HEAT', 'EXTREME COLD/WIND CHILL', 'FLASH FLOOD', 'FLOOD', 'FREEZING FOG', 'FROST/FREEZE', 'FUNNEL CLOUD', 'HAIL', 'HEAT', 'HEAVY RAIN', 'HEAVY SNOW', 'HIGH SURF', 'HIGH WIND', 'HURRICANE/TYPHOON', 'ICE STORM', 'LAKESHORE FLOOD', 'LAKE-EFFECT SNOW', 'LIGHTNING', 'MARINE HAIL', 'MARINE HIGH WIND', 'MARINE STRONG WIND', 'MARINE THUNDERSTORM WIND', 'RIP CURRENT', 'SEICHE', 'SLEET', 'STORM TIDE', 'STRONG WIND', 'THUNDERSTORM WIND', 'TORNADO', 'TROPICAL DEPRESSION', 'TROPICAL STORM', 'TSUNAMI', 'VOLCANIC ASH', 'WATERSPOUT', 'WILDFIRE', 'WINDSTORM', 'WINTER STORM', 'WINTER WEATHER'))

DistanceMatrix <- adist(NameList, ValidNames)
mins<-apply(DistanceMatrix,1,function(x)return(array(which.min(x))))
minvalues <- apply(DistanceMatrix, 1, function(x)return(array(min(x))))

#Create a dictionary with old Event names and start by mapping to a Valid Name that's only a character or two different 
#Except for HAIL because it would turn into RAIN
Mapped <- data.frame(Original = NameList, New = ValidNames[mins], Distance = minvalues)
Mapped <- mutate(Mapped, FinalEvent = ifelse( Mapped$Distance == 1 | (Mapped$Distance==2 & Mapped$Original!='HAIL'), as.character(Mapped$New), as.character(Mapped$Original )))
#Now, based on data exploration, start to substitute larger category names for smaller ones
Mapped$FinalEvent <- gsub('SML', 'SMALL', x = Mapped$FinalEvent) 
Mapped$FinalEvent <- gsub('FLD', 'FLOOD', x = Mapped$FinalEvent) 
Mapped$FinalEvent <- gsub('CSTL', 'COASTAL', x = Mapped$FinalEvent) 
#Change abbreviations
Mapped$FinalEvent <- gsub(' +', ' ', x = Mapped$FinalEvent) 
Mapped$FinalEvent <- gsub('/', ' ', x = Mapped$FinalEvent) 
Mapped$FinalEvent <- gsub(' AND ', ' ', x = Mapped$FinalEvent) 
Mapped$FinalEvent <- gsub(' & ', ' ', x = Mapped$FinalEvent) 
#Change tiny characters
Mapped$FinalEvent <- gsub('^ +', '', x = Mapped$FinalEvent) 
#get rid of any spaces at the start of the event name
Mapped$FinalEvent <- gsub('^THUNDERSTORM WIND.*', 'THUNDERSTORM WIND', x = Mapped$FinalEvent) 
#Anything starting with thunderstorm wind will be only that
Mapped$FinalEvent <- gsub('^TORNADO.*', 'TORNADO', x = Mapped$FinalEvent)
#Same for Tornado
Mapped$FinalEvent <- gsub('.*WATERSPOUT.*', 'WATERSPOUT', x = Mapped$FinalEvent) 
#anything with waterspout is just waterspout
Mapped$FinalEvent <- gsub('FLOODING', 'FLOOD', x = Mapped$FinalEvent)
Mapped$FinalEvent <- gsub('.*FLASH FLOOD.*', 'FLASH FLOOD', x = Mapped$FinalEvent)
#Flood and flooding are synonomous
Mapped$FinalEvent <- gsub('^HURRICANE.*', 'HURRICANE/TYPHOON', x = Mapped$FinalEvent) 
#Hurricanes all together
Mapped$FinalEvent <- gsub('.*TYPHOON.*', 'HURRICANE/TYPHOON', x = Mapped$FinalEvent)  
#A typhoon is just another name for a Hurricane and not officially recognized in the list of valid names
Mapped$FinalEvent <- gsub('.*TROPICAL STORM.*', 'TROPICAL STORM', x = Mapped$FinalEvent)  
#Group all tropical storms together
Mapped$FinalEvent <- gsub('.*BLIZZARD.*', 'BLIZZARD', x = Mapped$FinalEvent) 
#Chose to group anything with Blizzard as BLIZZARD although there is one category with Heavy Snow/Blizzard/Avalanche - it appears only once in 1994 
#Avalanche is not a very frequent event and this avalanche was probably caused by the blizzard anyways. 
Mapped$FinalEvent <- gsub('.*WILD.*', 'WILDFIRE', x = Mapped$FinalEvent) 
#Wildfires in all forms
Mapped$FinalEvent <- gsub('.*SURGE.*', 'STORM TIDE', x = Mapped$FinalEvent) 
#storm surge/tide in their forms are officially called with storm tide 
Mapped$FinalEvent <- gsub('.*MARINE HAIL.*', 'abcdefg', x = Mapped$FinalEvent) 
Mapped$FinalEvent <- gsub('.*HAIL.*', 'HAIL', x = Mapped$FinalEvent) 
Mapped$FinalEvent <- gsub('.*abcdefg.*', 'MARINE HAIL', x = Mapped$FinalEvent) 
#anything with Hail other than Marine Hail is Hail
Mapped$FinalEvent <- gsub('.*ICE STORM.*', 'ICE STORM', x = Mapped$FinalEvent) 
#Anything else with ICE STORM is infrequent so lump them together
Mapped$FinalEvent <- gsub('.*ROAD.*', 'ICE STORM', x = Mapped$FinalEvent) 
#Anything with Road has to do with icy road so lump with ice storm
Mapped$FinalEvent <- gsub('^GUSTY.*', 'HIGH WIND', x = Mapped$FinalEvent) 
#Gusty wind to high winds
Mapped$FinalEvent <- gsub('.*SLIDE.*', 'DEBRIS FLOW', x = Mapped$FinalEvent) 
#Isolate landslides and mudslides
Mapped$FinalEvent <- gsub('.*URBAN.*', 'FLOOD', x = Mapped$FinalEvent) 
#Change all the urban flooding to FLOOD
Mapped$FinalEvent <- gsub('.*FLASH.*', 'FLASH FLOOD', x = Mapped$FinalEvent) 
#All remaining categories with flash in them are flash flood
Mapped$FinalEvent <- gsub('.*COASTAL FLOOD.*', 'COASTAL FLOOD', x = Mapped$FinalEvent) 
Mapped$FinalEvent <- gsub('.*TIDAL FLOOD.*', 'COASTAL FLOOD', x = Mapped$FinalEvent) 
#Anything with coastal or tidal flood get mapped to coastal flood
Mapped$FinalEvent <- gsub('.*LAKE FLOOD.*', 'LAKESHORE FLOOD', x = Mapped$FinalEvent) 
#Rename lake flood with lakeshore flood from official names
Mapped$FinalEvent <- gsub('.*COASTAL FLOOD.*', 'XX1', x = Mapped$FinalEvent) 
Mapped$FinalEvent <- gsub('.*FLASH FLOOD.*', 'XX2', x = Mapped$FinalEvent) 
Mapped$FinalEvent <- gsub('.*LAKESHORE FLOOD.*', 'XX3', x = Mapped$FinalEvent) 
Mapped$FinalEvent <- gsub('.*FLOOD.*', 'FLOOD', x = Mapped$FinalEvent) 
Mapped$FinalEvent <- gsub('XX1', 'COASTAL FLOOD', x = Mapped$FinalEvent) 
Mapped$FinalEvent <- gsub('XX2', 'FLASH FLOOD', x = Mapped$FinalEvent) 
Mapped$FinalEvent <- gsub('XX3', 'LAKESHORE FLOOD', x = Mapped$FinalEvent)
#Any weirdly named flooding that isn't a known name will get grouped with generic flooding
Mapped$FinalEvent <- gsub('.*LIGHTNING.*', 'LIGHTNING', x = Mapped$FinalEvent) 
#All lightning is lightning
Mapped$FinalEvent <- gsub('.*HIGH WINDS.*', 'HIGH WINDS', x = Mapped$FinalEvent) 
Mapped$FinalEvent <- gsub('^HIGH WIND.*', 'HIGH WIND', x = Mapped$FinalEvent) 
#Any high winds are just high winds without taking out marine high wind
Mapped$FinalEvent <- gsub('.*HEAVY RAIN.*', 'HEAVY RAIN', x = Mapped$FinalEvent) 
#All remaining Heavy Rain are Heavy rain
Mapped$FinalEvent <- gsub('.+HEAT$', 'EXCESSIVE HEAT', x = Mapped$FinalEvent) 
Mapped$FinalEvent <- gsub('^HEAT.*', 'HEAT', x = Mapped$FinalEvent) 
#Anything that ended with heat was record or extreme heat, if it started with heat, it was a heat wave
Mapped$FinalEvent <- gsub('.*SURF.*', 'HIGH SURF', x = Mapped$FinalEvent) 
#Group anything with surf into high surf
Mapped$FinalEvent <- gsub('.*EXTREME COLD.*', 'EXTREME COLD', x = Mapped$FinalEvent) 
Mapped$FinalEvent <- gsub(' WIND CHILL$', '', x = Mapped$FinalEvent) 
Mapped$FinalEvent <- gsub('^EXTREME$', 'EXTREME COLD', x = Mapped$FinalEvent) 
#Gets rid of all Extreme cold and wind chill

MappedDict <- select(Mapped, Original, FinalEvent)
MappedDict$FinalEvent <- as.factor(MappedDict$FinalEvent)
NameList <- levels(as.factor(MappedDict$FinalEvent))

#select just the final values for analysis and rename finalevent as event
StormData <- left_join(StormData, MappedDict, by = c("Event"="Original")) %>%
    select(Begin, Event = FinalEvent, Fatalities:TotalValue)
StormData$Event <- as.factor(StormData$Event)

Now we need to extract the year each event occurred

StormData$Begin <- parse_date_time(StormData$Begin, "%m%d%Y %H%M%S")
StormData <- mutate(StormData, Year = as.numeric(format(Begin,'%Y')))

Results

SummaryStatistics <- group_by(StormData, Year, Event) %>% 
    summarise(Freq = n(), Fatalities = sum(Fatalities), Injuries = sum(Injuries), TotalBillions = sum(TotalValue)/1000000000) %>%
    filter(Fatalities >0 | Injuries >0 | TotalBillions > 0) %>%
    group_by(Year) %>% 
    mutate( ValueRank = rank(-TotalBillions)) %>%
    mutate( FatalRank = rank(-Fatalities)) %>%
    mutate( InjuryRank = rank(-Injuries)) %>% 
    arrange(Year, desc(TotalBillions), desc(Fatalities), desc(Injuries)) 
#year that more events are in so better data to compare each to
YearStart <- 1993

From splitting the data, you start to lose granularity if more than 15 or 20 event types are included in the analysis, I will be looking at the top 20 events by Fatalities, Injuries and Total Cost in damage.

RankNumber <- 6
#Keep only those years where the year is comparable to each other
SumStatsValid <- SummaryStatistics[SummaryStatistics$Year > YearStart,]
#Rank by fatalities

RankNumber <- 6
SumStatsValid <- arrange(SumStatsValid, Year, desc(Fatalities), desc(Injuries), desc(TotalBillions)) 

ggplot(SumStatsValid[SumStatsValid$FatalRank<RankNumber, ], aes(Year, Fatalities, fill = Event)) + geom_bar(stat="identity", position = "stack")

Here we can see the devastating effect of heat wave in chicago of 1995 Heat and tornados are some of the highest events that consistently cause devastation We can also see along the bottom how Excessive Heat, Heat and Tornados, rip currents and lightning all group towards teh bottom and are the worst.

#Rank by Injuries

RankNumber <- 6
SumStatsValid <- arrange(SumStatsValid, Year, desc(Injuries), desc(TotalBillions), desc(Fatalities)) 

ggplot(SumStatsValid[SumStatsValid$InjuryRank<RankNumber, ], aes(Year, Injuries, fill = Event)) + geom_bar(stat="identity", position = "stack")

We can see the biggest effects of Tornados in the injury ranking they’re also evident in the fatal Also of note are the number of injuries in 1998 from Hurricane Madeline in Oct of 1998 causing widespread flooding and injuries

RankNumber <-6
SumStatsValid <- arrange(SumStatsValid, Year, desc(TotalBillions), desc(Fatalities), desc(Injuries)) 

ggplot(SumStatsValid[SumStatsValid$ValueRank<RankNumber,], aes(Year, TotalBillions, fill = Event))+geom_bar(stat="identity", position = "stack")

Here the huge effect of Hurricane Katrina and its followup flooding causing record breaking damage. Hurricanes are commonly the biggest cause of damage along with tornadoes and flooding.

TotalEffect <- SumStatsValid %>%
    group_by(Event) %>%
    summarise(Freq = n(), Fatal = sum(Fatalities), Injured = sum(Injuries), Billions = sum(TotalBillions)) %>%
    arrange(desc(Fatal), desc(Injured), desc(Billions)) %>%
    mutate( FatalRank = rank(-Fatal)) %>%
    mutate( InjuryRank = rank(-Injured)) %>%
    mutate( ValueRank = rank(-Billions))

I don’t think there’s a good way to weight injuries vs. fatalities, no number of injuries could ever compare to a death so I will rank them seperately.

Events in the top 20 by Fatalities:

TotalEffect[TotalEffect$FatalRank<21,] %>% arrange(desc(Fatal)) %>% select(Event, Fatalities = Fatal)
## Source: local data frame [20 x 2]
## 
##                Event Fatalities
##               (fctr)      (dbl)
## 1     EXCESSIVE HEAT       2020
## 2            TORNADO       1593
## 3               HEAT       1111
## 4        FLASH FLOOD       1007
## 5          LIGHTNING        795
## 6        RIP CURRENT        572
## 7              FLOOD        487
## 8  THUNDERSTORM WIND        434
## 9       EXTREME COLD        277
## 10         HIGH WIND        263
## 11         AVALANCHE        224
## 12      WINTER STORM        195
## 13         HIGH SURF        165
## 14 HURRICANE/TYPHOON        135
## 15              COLD        127
## 16        HEAVY SNOW        123
## 17       STRONG WIND        110
## 18        HEAVY RAIN         98
## 19         ICE STORM         92
## 20          WILDFIRE         87

Events in the top 20 by Injuries

TotalEffect[TotalEffect$InjuryRank<21,]%>% arrange(desc(Injured)) %>% select(Event, Injured)
## Source: local data frame [20 x 2]
## 
##                Event Injured
##               (fctr)   (dbl)
## 1            TORNADO   22589
## 2              FLOOD    6861
## 3     EXCESSIVE HEAT    6730
## 4  THUNDERSTORM WIND    5949
## 5          LIGHTNING    5118
## 6               HEAT    2489
## 7          ICE STORM    2018
## 8        FLASH FLOOD    1772
## 9           WILDFIRE    1456
## 10 HURRICANE/TYPHOON    1332
## 11         HIGH WIND    1320
## 12      WINTER STORM    1298
## 13        HEAVY SNOW     980
## 14              HAIL     953
## 15               FOG     734
## 16       RIP CURRENT     529
## 17        DUST STORM     439
## 18    WINTER WEATHER     398
## 19          BLIZZARD     390
## 20    TROPICAL STORM     383

Events in the top 20 of both

TotalEffect[TotalEffect$FatalRank<21 & TotalEffect$InjuryRank<21,] %>% select(Event, Fatalities = Fatal, Injured)
## Source: local data frame [14 x 3]
## 
##                Event Fatalities Injured
##               (fctr)      (dbl)   (dbl)
## 1     EXCESSIVE HEAT       2020    6730
## 2            TORNADO       1593   22589
## 3               HEAT       1111    2489
## 4        FLASH FLOOD       1007    1772
## 5          LIGHTNING        795    5118
## 6        RIP CURRENT        572     529
## 7              FLOOD        487    6861
## 8  THUNDERSTORM WIND        434    5949
## 9          HIGH WIND        263    1320
## 10      WINTER STORM        195    1298
## 11 HURRICANE/TYPHOON        135    1332
## 12        HEAVY SNOW        123     980
## 13         ICE STORM         92    2018
## 14          WILDFIRE         87    1456

All these events need to be treated with an appropriate amount of respect because they cause an incredibly amount of hurt. Events in the top 20 by Damage

TotalEffect[TotalEffect$ValueRank<21,] %>% arrange(desc(Billions)) %>% select(Event, Billions)
## Source: local data frame [20 x 2]
## 
##                  Event    Billions
##                 (fctr)       (dbl)
## 1                FLOOD 150.2574504
## 2    HURRICANE/TYPHOON  90.8165278
## 3           STORM TIDE  47.8355790
## 4              TORNADO  25.9971928
## 5          FLASH FLOOD  18.6725961
## 6                 HAIL  18.5850748
## 7              DROUGHT  14.9681720
## 8    THUNDERSTORM WIND  10.7469470
## 9            ICE STORM   8.8553446
## 10      TROPICAL STORM   8.4082766
## 11            WILDFIRE   8.2753451
## 12           HIGH WIND   6.1467907
## 13          HEAVY RAIN   4.0041309
## 14        WINTER STORM   1.5720413
## 15        EXTREME COLD   1.3889084
## 16 SEVERE THUNDERSTORM   1.2055600
## 17        FROST FREEZE   1.1046660
## 18           LIGHTNING   0.8955079
## 19          HEAVY SNOW   0.8675707
## 20            BLIZZARD   0.5392189

Flooding, Hurricanes and Tornadoes cause a huge amount of damage.

Evemts that are in all 3 top 20:

    TotalEffect[TotalEffect$FatalRank<21 & TotalEffect$InjuryRank<21 & TotalEffect$ValueRank<21,] %>% select(Event, Fatalities = Fatal, Injured, Billions)
## Source: local data frame [11 x 4]
## 
##                Event Fatalities Injured    Billions
##               (fctr)      (dbl)   (dbl)       (dbl)
## 1            TORNADO       1593   22589  25.9971928
## 2        FLASH FLOOD       1007    1772  18.6725961
## 3          LIGHTNING        795    5118   0.8955079
## 4              FLOOD        487    6861 150.2574504
## 5  THUNDERSTORM WIND        434    5949  10.7469470
## 6          HIGH WIND        263    1320   6.1467907
## 7       WINTER STORM        195    1298   1.5720413
## 8  HURRICANE/TYPHOON        135    1332  90.8165278
## 9         HEAVY SNOW        123     980   0.8675707
## 10         ICE STORM         92    2018   8.8553446
## 11          WILDFIRE         87    1456   8.2753451

Tornados, (Flash) Floods, Lightning, Thunderstorms and their winds, Winter storms in all varieties and Hurricanes are overall the worst events that can happen right now.