Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This analysis involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The data comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm Data [47Mb] https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

National Climatic Data Center Storm Events FAQ https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Let’s get started by loading the appropriate packages.

Data Processing

setwd("C:/Users/johnarmistead/Data Sci Coursera/Class5/StormData")
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(scales)
library(reshape)
## 
## Attaching package: 'reshape'
## The following object is masked from 'package:lubridate':
## 
##     stamp
## The following object is masked from 'package:dplyr':
## 
##     rename

Next let’s load and take a look at the data. It’s a pretty big file, so we create a “bare” file that we don’t have to load more than once. The “raw” file will get munged.

bare <- as.data.frame(read.csv("repdata%2Fdata%2FStormData.csv.bz2", header=TRUE, sep=","))
raw <- bare

str(raw)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
head(raw)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

OK, it looks like we need to clean up the dates first. We won’t actually need these factors since we’re looking at “all time” stats. Nevertheless, it might be helpful later.

raw$BGN_DATE <- mdy_hms(raw$BGN_DATE) #convert to date-readable format
raw$YEAR <- year(raw$BGN_DATE) #add column for year in case of further analysis later
head(raw)
##   STATE__   BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1 1950-04-18     0130       CST     97     MOBILE    AL TORNADO
## 2       1 1950-04-18     0145       CST      3    BALDWIN    AL TORNADO
## 3       1 1951-02-20     1600       CST     57    FAYETTE    AL TORNADO
## 4       1 1951-06-08     0900       CST     89    MADISON    AL TORNADO
## 5       1 1951-11-15     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 1951-11-15     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES
## 1         0                      14.0   100 3   0          0       15
## 2         0                       2.0   150 2   0          0        0
## 3         0                       0.1   123 2   0          0        2
## 4         0                       0.0   100 2   0          0        2
## 5         0                       0.0   150 2   0          0        2
## 6         0                       1.5   177 2   0          0        6
##   PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE
## 1    25.0          K       0                                         3040
## 2     2.5          K       0                                         3042
## 3    25.0          K       0                                         3340
## 4     2.5          K       0                                         3458
## 5     2.5          K       0                                         3412
## 6     2.5          K       0                                         3450
##   LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM YEAR
## 1      8812       3051       8806              1 1950
## 2      8755          0          0              2 1950
## 3      8742          0          0              3 1951
## 4      8626          0          0              4 1951
## 5      8642          0          0              5 1951
## 6      8748          0          0              6 1951

Now we need to get the Property and Crop Damages values in a consistent format. It looks like K indicates a thousand, M a million, and B a billion. We create additional columns and multiply the original column by the 10x factor.

raw$PROPFACTOR <- 0
raw$PROPFACTOR[raw$PROPDMGEXP %in% c("k", "K")] <- 1000
raw$PROPFACTOR[raw$PROPDMGEXP %in% c("m", "M")] <- 1000000
raw$PROPFACTOR[raw$PROPDMGEXP %in% c("b", "B")] <- 1000000000
raw$PROPDMG <- raw$PROPDMG*raw$PROPFACTOR #multiply original variable by Exp factor

raw$CROPFACTOR <- 0
raw$CROPFACTOR[raw$CROPDMGEXP %in% c("k", "K")] <- 1000
raw$CROPFACTOR[raw$CROPDMGEXP %in% c("m", "M")] <- 1000000
raw$CROPFACTOR[raw$CROPDMGEXP %in% c("b", "B")] <- 1000000000
raw$CROPDMG <- raw$CROPDMG*raw$CROPFACTOR #multiply original variable by Exp factor

Now let’s take a look at a sample.

#preliminary analysis
data <-raw %>% 
        group_by(EVTYPE) %>%
        summarize(n=n()) %>%  #sum everything for a quick glance at the data
        arrange(desc(n)) %>%
        print(data)
## Source: local data frame [985 x 2]
## 
##                EVTYPE      n
##                (fctr)  (int)
## 1                HAIL 288661
## 2           TSTM WIND 219940
## 3   THUNDERSTORM WIND  82563
## 4             TORNADO  60652
## 5         FLASH FLOOD  54277
## 6               FLOOD  25326
## 7  THUNDERSTORM WINDS  20843
## 8           HIGH WIND  20212
## 9           LIGHTNING  15754
## 10         HEAVY SNOW  15708
## ..                ...    ...

At first glance, Hail could be a large contributor to human and dollar impact, but that doesn’t make sense. A more logical hypothesis would be that Storms or Tornadoes hurt a lot of people.

There appear to be several acronyms and various classifications of events that should be cleaned up. Let’s take a look at the whole list of weather events.

eventlist <- sort(unique(raw$EVTYPE))
head(eventlist, n=100) #take a subset because actual unique list > 1000
##   [1]    HIGH SURF ADVISORY           COASTAL FLOOD                
##   [3]  FLASH FLOOD                    LIGHTNING                    
##   [5]  TSTM WIND                      TSTM WIND (G45)              
##   [7]  WATERSPOUT                     WIND                         
##   [9] ?                              ABNORMAL WARMTH               
##  [11] ABNORMALLY DRY                 ABNORMALLY WET                
##  [13] ACCUMULATED SNOWFALL           AGRICULTURAL FREEZE           
##  [15] APACHE COUNTY                  ASTRONOMICAL HIGH TIDE        
##  [17] ASTRONOMICAL LOW TIDE          AVALANCE                      
##  [19] AVALANCHE                      BEACH EROSIN                  
##  [21] Beach Erosion                  BEACH EROSION                 
##  [23] BEACH EROSION/COASTAL FLOOD    BEACH FLOOD                   
##  [25] BELOW NORMAL PRECIPITATION     BITTER WIND CHILL             
##  [27] BITTER WIND CHILL TEMPERATURES Black Ice                     
##  [29] BLACK ICE                      BLIZZARD                      
##  [31] BLIZZARD AND EXTREME WIND CHIL BLIZZARD AND HEAVY SNOW       
##  [33] Blizzard Summary               BLIZZARD WEATHER              
##  [35] BLIZZARD/FREEZING RAIN         BLIZZARD/HEAVY SNOW           
##  [37] BLIZZARD/HIGH WIND             BLIZZARD/WINTER STORM         
##  [39] BLOW-OUT TIDE                  BLOW-OUT TIDES                
##  [41] BLOWING DUST                   blowing snow                  
##  [43] Blowing Snow                   BLOWING SNOW                  
##  [45] BLOWING SNOW- EXTREME WIND CHI BLOWING SNOW & EXTREME WIND CH
##  [47] BLOWING SNOW/EXTREME WIND CHIL BREAKUP FLOODING              
##  [49] BRUSH FIRE                     BRUSH FIRES                   
##  [51] COASTAL  FLOODING/EROSION      COASTAL EROSION               
##  [53] Coastal Flood                  COASTAL FLOOD                 
##  [55] coastal flooding               Coastal Flooding              
##  [57] COASTAL FLOODING               COASTAL FLOODING/EROSION      
##  [59] Coastal Storm                  COASTAL STORM                 
##  [61] COASTAL SURGE                  COASTAL/TIDAL FLOOD           
##  [63] COASTALFLOOD                   COASTALSTORM                  
##  [65] Cold                           COLD                          
##  [67] COLD AIR FUNNEL                COLD AIR FUNNELS              
##  [69] COLD AIR TORNADO               Cold and Frost                
##  [71] COLD AND FROST                 COLD AND SNOW                 
##  [73] COLD AND WET CONDITIONS        Cold Temperature              
##  [75] COLD TEMPERATURES              COLD WAVE                     
##  [77] COLD WEATHER                   COLD WIND CHILL TEMPERATURES  
##  [79] COLD/WIND CHILL                COLD/WINDS                    
##  [81] COOL AND WET                   COOL SPELL                    
##  [83] CSTL FLOODING/EROSION          DAM BREAK                     
##  [85] DAM FAILURE                    Damaging Freeze               
##  [87] DAMAGING FREEZE                DEEP HAIL                     
##  [89] DENSE FOG                      DENSE SMOKE                   
##  [91] DOWNBURST                      DOWNBURST WINDS               
##  [93] DRIEST MONTH                   Drifting Snow                 
##  [95] DROUGHT                        DROUGHT/EXCESSIVE HEAT        
##  [97] DROWNING                       DRY                           
##  [99] DRY CONDITIONS                 DRY HOT WEATHER               
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND

If we did our analysis without cleaning up these headers, we may miss something. Let’s clean up some of these headers if they contain certain key words or phrases.

raw$EVTYPE <- gsub(".*HEAT.*", "Heat", raw$EVTYPE) #Make every column that contains the word "HEAT" to "Heat"
raw$EVTYPE <- gsub(".*TSTM.*", "Storm", raw$EVTYPE) # and so on
raw$EVTYPE <- gsub(".*STORM.*", "Storm", raw$EVTYPE)
raw$EVTYPE <- gsub(".*FLOOD.*", "Flood", raw$EVTYPE)
raw$EVTYPE <- gsub(".*STRONG WIND.*", "Wind", raw$EVTYPE)
raw$EVTYPE <- gsub(".*HIGH WIND.*", "Wind", raw$EVTYPE)
raw$EVTYPE <- gsub(".*FIRE.*", "Fire", raw$EVTYPE)
raw$EVTYPE <- gsub(".*HAIL.*", "Hail", raw$EVTYPE)
raw$EVTYPE <- gsub(".*BLIZZARD.*", "Blizzard", raw$EVTYPE)
raw$EVTYPE <- gsub(".*HURRICANE.*", "Hurricane", raw$EVTYPE)
raw$EVTYPE <- gsub(".*COLD.*", "Cold", raw$EVTYPE)
raw$EVTYPE <- gsub(".*WINTER.*", "Cold", raw$EVTYPE)
raw$EVTYPE <- gsub(".*RIP.*", "Tide/Surf", raw$EVTYPE)
raw$EVTYPE <- gsub(".*FOG.*", "Fog", raw$EVTYPE)
raw$EVTYPE <- gsub(".*AVALANCHE.*", "Avalanche", raw$EVTYPE)
raw$EVTYPE <- gsub(".*RAIN.*", "Rain", raw$EVTYPE)
raw$EVTYPE <- gsub(".*SURF.*", "Tide/Surf", raw$EVTYPE)
raw$EVTYPE <- gsub(".*THUNDERSTORM.*", "Storm", raw$EVTYPE)
raw$EVTYPE <- gsub(".*CLOUD.*", "Storm", raw$EVTYPE)
raw$EVTYPE <- gsub(".*LIGHTNING.*", "Lightning", raw$EVTYPE)
raw$EVTYPE <- gsub(".*SNOW.*", "Snow", raw$EVTYPE)
raw$EVTYPE <- gsub(".*TORNADO.*", "Tornado", raw$EVTYPE)
raw$EVTYPE <- gsub(".*FROST.*", "Cold", raw$EVTYPE)
raw$EVTYPE <- gsub(".*DROUGHT.*", "Drought", raw$EVTYPE)

Now that we’ve done that, let’s take another look at the data in summary.

data <-raw %>% 
        group_by(EVTYPE) %>%
        summarize(n=n()) %>%  #quickly summarize again
        arrange(desc(n)) %>%
        print(data)
## Source: local data frame [431 x 2]
## 
##       EVTYPE      n
##        (chr)  (int)
## 1      Storm 358793
## 2       Hail 289270
## 3      Flood  82659
## 4    Tornado  60698
## 5       Wind  25763
## 6       Snow  17543
## 7  Lightning  15760
## 8       Rain  12165
## 9       Cold  11974
## 10      Fire   4240
## ..       ...    ...

Clearly, storms make much more of an impact than originally seen. Lastly let’s make sure everything’s numeric.

raw$FATALITIES <- as.numeric(raw$FATALITIES)
raw$INJURIES <- as.numeric(raw$INJURIES)

raw$PROPDMG <- as.numeric(raw$PROPDMG)
raw$CROPDMG <- as.numeric(raw$CROPDMG)

Results

Human Impact

Now for the fun part. Here we take our first look at the human impact of weather events: fatalities and injuries.

humtest <- raw %>% 
        group_by(EVTYPE) %>%
        mutate(Total=FATALITIES+INJURIES)  %>% #add column for total for easy sorting and viewing as table if necessary
        summarise(Fatalities=sum(FATALITIES),
                  Injuries=sum(INJURIES),
                  Total=sum(Total)) %>%
        arrange(desc(Total))
head(humtest)
## Source: local data frame [6 x 4]
## 
##      EVTYPE Fatalities Injuries Total
##       (chr)      (dbl)    (dbl) (dbl)
## 1   Tornado       5636    91407 97043
## 2     Storm       1177    13759 14936
## 3      Heat       3138     9154 12292
## 4     Flood       1523     8601 10124
## 5 Lightning        817     5231  6048
## 6      Wind        422     1831  2253

What are the mean Fatalies and Injuries from weather events?

mean(humtest$Fatalities)
## [1] 35.13921
mean(humtest$Injuries)
## [1] 326.051

Let’s re-run the summary table by only the top 12 above-average events that impact human life.

humset <- raw %>% 
        group_by(EVTYPE) %>%
        mutate(Total=FATALITIES+INJURIES)  %>% 
        summarise(Fatalities=sum(FATALITIES),
                  Injuries=sum(INJURIES),
                  Total=sum(Total))  %>%
        filter(Fatalities > 35 | Injuries > 326) %>% #filter above averages
        arrange(desc(Total))

humset <- head(humset, n=12) #subset only Top 12

Finally, we’ll unpivot the table and plot it.

human <- cbind(humset[1], stack(humset[-1])) %>% #unpivot so all values are in same column
     filter(ind!="Total")

humplot <- ggplot(human) +
        aes(x=reorder(EVTYPE, +values), y=values, fill=ind) + # plot and order high to low
        geom_histogram(stat="identity")+coord_flip() + 
        ggtitle("Total Human Impact by Top 12 Weather Events") + 
        theme(plot.title=element_text(lineheight =0.8, face="bold"))+xlab("Event Type")
        print(humplot)

Looks like Tornadoes by far injure and kill the most people. Heat kills the next-most people but storms cause slightly more injuries. Investing in storm detection techniques may prove useful from a policy perspective in preventing a negative impact on human life.

Dollar Impact

What about property and crop damage?

doltest <- raw %>% 
        group_by(EVTYPE) %>%
        mutate(Total=PROPDMG+CROPDMG)  %>% #add column for total for easy sorting and viewing as table if necessary
        summarise(Property=sum(PROPDMG),
                  Crop=sum(CROPDMG),
                  Total=sum(Total)) %>%
        arrange(desc(Total))

Mean Property and Crop Damage ($US):

mean(doltest$Property)
## [1] 991458566
mean(doltest$Crop)
## [1] 113930840

Life before, we filter for only the most impactful events, reformat and plot the data.

dolset <- raw %>% 
        group_by(EVTYPE) %>%
        mutate(Total=PROPDMG+CROPDMG)  %>% 
        summarise(Property=sum(PROPDMG),
                  Crop=sum(CROPDMG),
                  Total=sum(Total))  %>%
        filter(Property > 991458566 | Crop > 113930840) %>% #filter above averages
        arrange(desc(Total))

dolset <- head(dolset, n=12) #subset only Top 12

dollar <- cbind(dolset[1], stack(dolset[-1])) %>% #unpivot
        filter(ind!="Total")

dollar$values <- dollar$values/1000000 #convert to Millions for easy plotting

dolplot <- ggplot(dollar) +
        aes(x=reorder(EVTYPE, +values), y=values, fill=ind) +
        geom_histogram(stat="identity")+coord_flip() +
        ggtitle("Total Dollar Impact by Top 12 Weather Events ($US Millions)") +
        theme(plot.title=element_text(lineheight =.8, face="bold")) +
        xlab("Event Type")
        print(dolplot)

From a dollar impact perspective, Floods are the most costly weather events. Hurricanes, Storms, and Tornadoes also cause significant loss. To prevent property damage costs, policy makers should invest more in flood prevention techniques like dams, canals, and other coastal defenses.

In all, this dataset provides significant insights into how weather events effect human life and property. More research must be done however, to class all the original Event Types into their appropriate buckets. Time series or location-based analysis may also make insights more relevant to individual policymakers. Documentation of how this analysis is performed is critical though. That is the only way we can keep ourselves safe from the storm of data.