Synopsis

In this report we aim to identify the most harmful types of storm events, both in terms of human health and economic damages. Our hypothesis is events with high frequency of occurrence -even though the cost per occurrence may be small- have generated more damages than one-time catastrophic events. To investigate this hypothesis we obtained data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks characteristics of major storms and weather events in the United States. From these data we found that, between 1996 and 2011, the most damaging types of events for human health are related to tornadoes, floods and excessive heat; while the most harmful types, measured by economic losses, are related to floods, hurricanes and storm surge/tides.

Data Processing

We obtained data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks characteristics of major storms and weather events in the United States. The events in the database start in the year 1950 and end in November 2011.

Reading in the data

We first read in the data from the raw text file included in the zip archive. The data is a delimited file. We kept the header data from the original file.

dbraw <- read.csv("repdata_data_StormData.csv.bz2")

After reading in the data, we checked the first few rows (there are more than 900,000) rows in this dataset, as well as its structure.

head(dbraw,10)
##    STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1        1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2        1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3        1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4        1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5        1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6        1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
## 7        1 11/16/1951 0:00:00     0100       CST      9     BLOUNT    AL
## 8        1  1/22/1952 0:00:00     0900       CST    123 TALLAPOOSA    AL
## 9        1  2/13/1952 0:00:00     2000       CST    125 TUSCALOOSA    AL
## 10       1  2/13/1952 0:00:00     2000       CST     57    FAYETTE    AL
##     EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1  TORNADO         0                                               0         NA
## 2  TORNADO         0                                               0         NA
## 3  TORNADO         0                                               0         NA
## 4  TORNADO         0                                               0         NA
## 5  TORNADO         0                                               0         NA
## 6  TORNADO         0                                               0         NA
## 7  TORNADO         0                                               0         NA
## 8  TORNADO         0                                               0         NA
## 9  TORNADO         0                                               0         NA
## 10 TORNADO         0                                               0         NA
##    END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1          0                      14.0   100 3   0          0       15    25.0
## 2          0                       2.0   150 2   0          0        0     2.5
## 3          0                       0.1   123 2   0          0        2    25.0
## 4          0                       0.0   100 2   0          0        2     2.5
## 5          0                       0.0   150 2   0          0        2     2.5
## 6          0                       1.5   177 2   0          0        6     2.5
## 7          0                       1.5    33 2   0          0        1     2.5
## 8          0                       0.0    33 1   0          0        0     2.5
## 9          0                       3.3   100 3   0          1       14    25.0
## 10         0                       2.3   100 3   0          0        0    25.0
##    PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1           K       0                                         3040      8812
## 2           K       0                                         3042      8755
## 3           K       0                                         3340      8742
## 4           K       0                                         3458      8626
## 5           K       0                                         3412      8642
## 6           K       0                                         3450      8748
## 7           K       0                                         3405      8631
## 8           K       0                                         3255      8558
## 9           K       0                                         3334      8740
## 10          K       0                                         3336      8738
##    LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1        3051       8806              1
## 2           0          0              2
## 3           0          0              3
## 4           0          0              4
## 5           0          0              5
## 6           0          0              6
## 7           0          0              7
## 8           0          0              8
## 9        3336       8738              9
## 10       3337       8737             10
str(dbraw)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Given the size of the dataset, we created a dataset with smaller dimensions (and size), choosing only the columns we considered significant for the analysis.

cols <- c("BGN_DATE","TIME_ZONE", "COUNTYNAME", "STATE", "EVTYPE",
          "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", 
          "CROPDMGEXP")
dbmin <- dbraw[,cols]
str(dbmin)
## 'data.frame':    902297 obs. of  11 variables:
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
rm(dbraw)

After this size reduction, we added columns with the Date of each event and Year of each event.

dbmin$DATE <- sapply(strsplit(as.character(dbmin$BGN_DATE), " "), "[", 1)
dbmin$YEAR <- substring(dbmin$DATE,nchar(dbmin$DATE)-3,nchar(dbmin$DATE))
dbmin$YEAR <- as.integer(dbmin$YEAR)
dbmin$DATE <- as.Date(as.character(dbmin$DATE),format="%m/%d/%Y")

We checked for missing values in the most important fields in this dataset, finding none.

table(is.na(dbmin$EVTYPE))
## 
##  FALSE 
## 902297
table(is.na(dbmin$PROPDMG))
## 
##  FALSE 
## 902297
table(is.na(dbmin$CROPDMG))
## 
##  FALSE 
## 902297
table(is.na(dbmin$FATALITIES))
## 
##  FALSE 
## 902297
table(is.na(dbmin$INJURIES))
## 
##  FALSE 
## 902297
sum(rowSums(is.na(dbmin)))
## [1] 0

The first transformation consisted of estimating the economic damages in US dollars. The first step was to convert the unit identifiers (K – thousands, M – millions, B – billions) to their corresponding numeric values. Non-valid unit identifiers were considered to have a value of one unit.

acceptedvals <- c("K","k","M","m","B","b")
dbmin$PROPDMMULT[dbmin$PROPDMGEXP=="K"|dbmin$PROPDMGEXP=="k"]<-1000
dbmin$PROPDMMULT[dbmin$PROPDMGEXP=="M"|dbmin$PROPDMGEXP=="m"]<-1000000
dbmin$PROPDMMULT[dbmin$PROPDMGEXP=="B"|dbmin$PROPDMGEXP=="b"]<-1000000000
dbmin$PROPDMMULT[!(dbmin$PROPDMGEXP %in% acceptedvals)]<-1

dbmin$CROPDMMULT[dbmin$CROPDMGEXP=="K"|dbmin$CROPDMGEXP=="k"]<-1000
dbmin$CROPDMMULT[dbmin$CROPDMGEXP=="M"|dbmin$CROPDMGEXP=="m"]<-1000000
dbmin$CROPDMMULT[dbmin$CROPDMGEXP=="B"|dbmin$CROPDMGEXP=="B"]<-1000000000
dbmin$CROPDMMULT[!(dbmin$CROPDMGEXP %in% acceptedvals)]<-1

The multiples were then multiplied to the corresponding columns and a total amount of damages was estimated.

dbmin$USPROPDM <- dbmin$PROPDMG* dbmin$PROPDMMULT
dbmin$USCROPDM <- dbmin$CROPDMG* dbmin$CROPDMMULT
dbmin$USDAMAGES <- dbmin$USPROPDM + dbmin$USCROPDM

To determine the impact on human health, human fatalities and injuries were added up in one variable.

dbmin$FATplusINJ <- dbmin$FATALITIES + dbmin$INJURIES

The most important transformation was the standardization of the types of events included in the dataset. In spite of a description of 48 event types in the NOAA description of the dataset, we found more than 900 different descriptions. We first changed terms that were duplicated, misspelled, poorly written and related them to the most similar event type listed in the 48-type list. We then identified descriptions that contained terms in the 48-type list that appear at the beginning of the description provided or in any part of it.

dbmin$EVTYPE <- toupper(dbmin$EVTYPE)
dbmin$EVTYPE <- gsub("^ +","",dbmin$EVTYPE)
titles <- c("ASTRONOMICAL LOW TIDE")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Astronomical low tide"
titles <- c("AVALANCE","AVALANCHE")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Avalanche"
titles <- c("BLIZZARD","BLIZZARD AND EXTREME WIND CHIL","BLIZZARD AND HEAVY SNOW",
            "BLIZZARD WEATHER","BLIZZARD/FREEZING RAIN","BLIZZARD/HEAVY SNOW",
            "BLIZZARD/HIGH WIND","BLIZZARD/WINTER STORM","GROUND BLIZZARD")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Blizzard"
titles <- c("BEACH EROSION/COASTAL FLOOD","COASTAL  FLOODING/EROSION",
            "COASTAL FLOOD","COASTAL FLOOD","COASTAL FLOODING",
            "COASTAL FLOODING/EROSION","COASTALFLOOD","COASTAL/TIDAL FLOOD",
            "CSTL FLOODING/EROSION","BEROSION/CSTL FLOOD")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Coastal Flood"
titles <- c("BITTER WIND CHILL","BITTER WIND CHILL TEMPERATURES",
            "COLD","COLD AND SNOW","COLD AND WET CONDITIONS",
            "COLD TEMPERATURE","COLD TEMPERATURES","COLD WAVE",
            "COLD WEATHER","COLD WIND CHILL TEMPERATURES",
            "COLD/WIND CHILL","COLD/WINDS","COOL AND WET","COOL SPELL",
            "LOW TEMPERATURE","LOW TEMPERATURE RECORD","LOW WIND CHILL",
            "WIND CHILL")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Cold/Wind Chill"
titles <- c("DENSE FOG","FOG","COLD","FOG AND COLD TEMPERATURES",
            "PATCHY DENSE FOG")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Dense Fog"
titles <- c("DENSE SMOKE","SMOKE")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Dense Smoke"
titles <- c("DROUGHT","DROUGHT/EXCESSIVE HEAT","DRY","DRY CONDITIONS",
            "DRY HOT WEATHER","DRY PATTERN","DRY SPELL","DRY WEATHER",
            "DRYNESS","EXCESSIVELY DRY","HEAT DROUGHT","HEAT WAVE DROUGHT",
            "RECORD DRY MONTH","RECORD DRYNESS")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Drought"
titles <- c("DUST DEVEL","DUST DEVIL","DUST DEVIL WATERSPOUT")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Dust devil"
titles <- c("DUST STORM","DUST STORM/HIGH WINDS","DUSTSTORM")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Dust Storm"
titles <- c("EXCESSIVE HEAT","EXCESSIVE HEAT/DROUGHT","EXTREME HEAT",
            "HEAT","HEAT WAVE","HEAT WAVES","HEAT/DROUGHT","HEATBURST",
            "HIGH TEMPERATURE RECORD","HOT AND DRY","HOT PATTERN","HOT WEATHER",
            "HOT/DRY PATTERN","RECORD HEAT","RECORD HEAT WAVE","RECORD HIGH TEMPERATURE",
            "RECORD HIGH TEMPERATURES","RECORD/EXCESSIVE HEAT")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Excessive Heat"
titles <- c("BLOWING SNOW- EXTREME WIND CHI","BLOWING SNOW & EXTREME WIND CH",
            "BLOWING SNOW/EXTREME WIND CHIL","EXCESSIVE COLD","EXTENDED COLD",
            "EXTREME COLD","EXTREME COLD/WIND CHILL","EXTREME WIND CHILL",
            "EXTREME WIND CHILL/BLOWING SNO","EXTREME WIND CHILLS",
            "EXTREME WINDCHILL","EXTREME WINDCHILL TEMPERATURES","EXTREME/RECORD COLD",
            "RECORD  COLD","RECORD COLD","RECORD COLD AND HIGH WIND","RECORD COLD/FROST",
            "RECORD COOL","SEVERE COLD")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Extreme Cold/Wind Chill"
titles <- c("FREEZING FOG","ICE FOG")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Freezing Fog"
titles <- c("EXCESSIVE PRECIPITATION","EXCESSIVE RAIN","EXCESSIVE RAINFALL",
            "EXCESSIVE WETNESS","EXTREMELY WET","HEAVY PRECIPATATION",
            "HEAVY PRECIPITATION","HEAVY RAIN","HEAVY RAIN AND FLOOD",
            "HEAVY RAIN AND WIND","HEAVY RAIN EFFECTS","HEAVY RAIN/FLOODING",
            "HEAVY RAIN/HIGH SURF","HEAVY RAIN/LIGHTNING","HEAVY RAIN/MUDSLIDES/FLOOD",
            "HEAVY RAIN/SEVERE WEATHER","HEAVY RAIN/SMALL STREAM URBAN","HEAVY RAIN/SNOW",
            "HEAVY RAIN/URBAN FLOOD", "HEAVY RAIN/WIND","HEAVY RAIN; URBAN FLOOD WINDS;",
            "HEAVY RAINFALL","HEAVY RAINS","HEAVY RAINS/FLOODING","HEAVY SHOWER",
            "HEAVY SHOWERS","HVY RAIN","LOCALLY HEAVY RAIN","RAIN","RAIN (HEAVY)",
            "RAIN AND WIND","RAIN DAMAGE","RAIN/SNOW","RAIN/WIND","RAINSTORM",
            "RECORD PRECIPITATION","RECORD RAINFALL","RECORD/EXCESSIVE RAINFALL",
            "SNOW/RAIN","TORRENTIAL RAIN","TORRENTIAL RAINFALL","TSTM HEAVY RAIN")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Heavy Rain"
titles <- c("ACCUMULATED SNOWFALL","EXCESSIVE SNOW","HEAVY WET SNOW",
            "ICE AND SNOW","RECORD SNOW","RECORD SNOW/COLD",
            "RECORD SNOWFALL","RECORD WINTER SNOW","SNOW ACCUMULATION")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Heavy Snow"
titles <- c("HAZARDOUS SURF","HEAVY SURF","HEAVY SURF AND WIND",
            "HEAVY SURF COASTAL FLOODING","HEAVY SURF/HIGH SURF",
            "HIGH SURF","HIGH SURF ADVISORIES","HIGH SURF ADVISORY",
            "HIGH SURF ADVISORY")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "High Surf"
titles <- c("ICE AND SNOW","GLAZE ICE","GLAZE/ICE STORM","ICE STORM",
            "ICE STORM AND SNOW","ICE STORM/FLASH FLOOD","ICE/SNOW",
            "ICE/STRONG WINDS","ICESTORM/BLIZZARD","SNOW AND ICE STORM",
            "SNOW/ICE STORM")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Ice Storm"
titles <- c("LIGHTING","LIGNTNING")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Lightning"
titles <- c("MARINE HAIL")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Marine Hail"
titles <- c("MARINE HIGH WIND")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Marine High Wind"
titles <- c("MARINE STRONG WIND")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Marine Strong Wind"
titles <- c("MARINE THUNDERSTORM WIND","MARINE TSTM WIND")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Marine Thunderstorm Wind"
titles <- c("SEICHE")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Seiche"
titles <- c("LIGHT SNOW AND SLEET","SNOW AND SLEET","SNOW SLEET",
            "SNOW/RAIN/SLEET","SNOW/SLEET","SNOW/SLEET/FREEZING RAIN",
            "SNOW/SLEET/RAIN")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Marine Thunderstorm Wind"
titles <- c("COASTAL SURGE","STORM SURGE","STORM SURGE/TIDE")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Storm Surge/Tide"
titles <- c("DOWNBURST","DOWNBURST WINDS","DRY MICROBURST","DRY MICROBURST 50",
            "DRY MICROBURST 53","DRY MICROBURST 58","DRY MICROBURST 61",
            "DRY MICROBURST 84","DRY MICROBURST WINDS","DRY MIRCOBURST WINDS",
            "GRADIENT WIND","GRADIENT WINDS","MICROBURST","MICROBURST WINDS",
            "STRONG WIND","STRONG WIND GUST","STRONG WINDS","WHIRLWIND",
            "WIND","WIND ADVISORY","WIND AND WAVE","WIND DAMAGE","WIND GUSTS",
            "WIND/HAIL","WINDS","WND")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Strong Wind"
titles <- c("SEVERE THUNDERSTORM","SEVERE THUNDERSTORM WINDS",
            "SEVERE THUNDERSTORMS","TUNDERSTORM WIND","WIND STORM")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Thunderstorm Wind"
titles <- c("TROPICAL DEPRESSION")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Tropical Depression"
titles <- c("WATER SPOUT","WAYTERSPOUT")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Waterspout"
titles <- c("BRUSH FIRE","BRUSH FIRES","FOREST FIRES","GRASS FIRES",
            "WILD FIRES","WILD/FOREST FIRE","WILD/FOREST FIRES",
            "WILDFIRE","WILDFIRES")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Wildfire"
titles <- c("WINTER MIX","WINTERY MIX","WINTRY MIX")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Winter Weather"
dbmin$TYPE[grep("FLASH FLOOD",dbmin$EVTYPE)] <- "Flash Flood"
dbmin$TYPE[grep("FROST|FREEZ",dbmin$EVTYPE)] <- "Frost/Freeze"
dbmin$TYPE[grep("FUNNEL",dbmin$EVTYPE)] <- "Funnel Cloud"
dbmin$TYPE[grep("HAIL",dbmin$EVTYPE)] <- "Hail"
dbmin$TYPE[grep("HEAVY SNOW",dbmin$EVTYPE)] <- "Heavy Snow"
dbmin$TYPE[grep("HIGH WIND",dbmin$EVTYPE)] <- "High Wind"
dbmin$TYPE[grep("HURRICANE|TYPHOON",dbmin$EVTYPE)] <- "Hurricane"
dbmin$TYPE[grep("LAKE EFFECT SNOW",dbmin$EVTYPE)] <- "Lake-Effect Snow"
dbmin$TYPE[grep("LAKE FLOOD",dbmin$EVTYPE)] <- "Lakeshore Flood"
dbmin$TYPE[grep("^LIGHTING",dbmin$EVTYPE)] <- "Lightning"
dbmin$TYPE[grep("^RIP CURRENT",dbmin$EVTYPE)] <- "Rip Current"
dbmin$TYPE[grep("^SLEET",dbmin$EVTYPE)] <- "Sleet"
dbmin$TYPE[grep("^GUST",dbmin$EVTYPE)] <- "Thunderstorm Wind"
dbmin$TYPE[grep("^THUNDERSTORM|^THUDERSTORM|^THUNDEERSTORM
                | ^THUNDERESTORM|^THUNDERTORM|^THUNDERTSORM
                | ^THUNDESTORM|^THUNERSTORM",dbmin$EVTYPE)] <- "Thunderstorm Wind"
dbmin$TYPE[grep("^TORNADO|^TORNDAO",dbmin$EVTYPE)] <- "Tornado"
dbmin$TYPE[grep("^TROPICAL STORM",dbmin$EVTYPE)] <- "Tropical Storm"
dbmin$TYPE[grep("^TSUNAMI",dbmin$EVTYPE)] <- "Tsunami"
dbmin$TYPE[grep("^VOLCANIC ASH",dbmin$EVTYPE)] <- "Volcanic Ash"
dbmin$TYPE[grep("^WATERSPOUT",dbmin$EVTYPE)] <- "Waterspout"
dbmin$TYPE[grep("^WINTER STORM",dbmin$EVTYPE)] <- "Winter Storm"
dbmin$TYPE[grep("^WINTER WEATHER",dbmin$EVTYPE)] <- "Winter Weather"
dbmin$TYPE[grep("FLOOD|FLD",dbmin$EVTYPE)] <- "Flood"

Those observations in the dataset for which we could not find a match in the 48-type list were excluded from the dataset to be analyzed.

db <- dbmin[is.na(dbmin$TYPE)==FALSE,]
rm(dbmin)

Now that the event type list was reduced, we changed the event type column from string to factor.

db$TYPE <- as.factor(db$TYPE)

We generated an histogram from a table of the frequency of events per year.

a <- table(db$YEAR)
z <- row.names(a)
barplot(a,main="Events per Year",
        ylab="Events per Year",xlab="Year",names.arg = z,col="Blue")

We found a major increase in reported cases in the mid-nineties. From the documentation of the dataset, we found that reporting to NOOA was limited to a certain number of events up until 1995; and since 1996 all the events in the 48-type list were reported. So, for consistency, we eliminated all records from 1995 and earlier.

db <- db[db$YEAR>1995,]

Results

Once the final dataset was created, the first step consisted of exhibiting the frequency of occurrence of each type of event from 1996 to 2011 with a table:

table(db$TYPE,db$YEAR)
##                           
##                             1996  1997  1998  1999  2000  2001  2002  2003
##   Astronomical low tide        0     0     0     0     0     0     0     0
##   Avalanche                    6     4    17    24    27    26    25    11
##   Blizzard                   146   112    59    56   104    77    80    52
##   Cold/Wind Chill              2     1    10     1     0     0     6     4
##   Dense Fog                   42    75    76    80   106   124    84    81
##   Dense Smoke                  0     0     0     0     0     3     8     0
##   Drought                     46    24   116   205   128    94   157    52
##   Dust devil                   9     8     4     8     2     7     8    14
##   Dust Storm                   7     7     9    15    19    18    40    34
##   Excessive Heat              55    71   164   202   162   160   134    59
##   Extreme Cold/Wind Chill    229   142    54   121   182    79    67    62
##   Flood                     4613  4016  4974  3422  3578  3875  4180  4914
##   Freezing Fog                 2     0     0     0     0     0     0     0
##   Frost/Freeze                28    49    39    52    87    72    53    79
##   Funnel Cloud               295   371   442   322   315   302   232   348
##   Hail                     10855  8801 12730 10236 11372 12389 12704 13952
##   Heavy Rain                 371   354   682   488   581   453   336   934
##   Heavy Snow                 980   704   661   786   803   819   777   854
##   High Surf                   17    50    56    59    51   105    48    61
##   High Wind                 1114   704   735   948   842   820   870   669
##   Hurricane                   45    18    29    40     6     5     8     9
##   Ice Storm                  203    88   114    57    95    58    94    51
##   Lake-Effect Snow             0     8     5     5     1     2     0     0
##   Lightning                    0     0     0     0     0     0     0     0
##   Marine Strong Wind           0     0     0     0     0     0     0     0
##   Marine Thunderstorm Wind     0     1     2     4     0    99  1142  1446
##   Rip Current                 31    17    38    35    43    78    61    46
##   Seiche                       0     0     8     1     1     0     1     0
##   Sleet                        0     2     0     0     3     0     0     6
##   Storm Surge/Tide            20    26    55    28     4    17    24    14
##   Strong Wind                 79   119   140   135   206   145   143   182
##   Thunderstorm Wind            5    36    20    15    22    27    16     0
##   Tornado                   1239  1180  1529  1519  1169  1351  1041  1534
##   Tropical Depression          0     0     1     0     0     0     0     3
##   Tropical Storm              33     4    24     9     7    21    20    35
##   Tsunami                      0     0     0     0     0     0     0     0
##   Volcanic Ash                 2     2     0     0     0     3     2     7
##   Waterspout                 159   218   176   270   263   178   207   239
##   Wildfire                    62    37   113   298   372   217   344   312
##   Winter Storm               435   452   261   395   514   438   462   547
##   Winter Weather               8    38    28    37    34    50    53   316
##                           
##                             2004  2005  2006  2007  2008  2009  2010  2011
##   Astronomical low tide        0     0     0    44    39    31    34    26
##   Avalanche                   18    35    36    27    39    12    50    21
##   Blizzard                    74    86   104   185   342   487   356   313
##   Cold/Wind Chill              0     6    15   112   135   118    87    66
##   Dense Fog                   78    56   119   211   206   110   172   152
##   Dense Smoke                  0     0     1     3     2     2     1     1
##   Drought                     38   110   175   319   237   162   214   383
##   Dust devil                   9     9     7     6    13    12    11    10
##   Dust Storm                  28    19    39    35    15    46    40    46
##   Excessive Heat              27    82   162   154   109    76   379   428
##   Extreme Cold/Wind Chill    138    65    37    94   259   144    56   155
##   Flood                     5704  4331  3936  5631  6168  6177  6783  7276
##   Freezing Fog                 0     0     0     0     0     0     0     0
##   Frost/Freeze                73   113   106   294   178   208   211   123
##   Funnel Cloud               386   433   366   340   378   449   525   564
##   Hail                     13196 13829 16729 12764 17594 13341 10947 17807
##   Heavy Rain                 715   854   810   874   854  1087  1121  1065
##   Heavy Snow                 644   705   678   944  1346  1032  1372   934
##   High Surf                   72    88    61    78    60    88    84    64
##   High Wind                  669   715  1448  1628  2467  2142  2082  2194
##   Hurricane                   31    47     0     5    23     1     1     3
##   Ice Storm                   79   113    44   376   172   122   103   115
##   Lake-Effect Snow             0     0     0     0     0     0     0     0
##   Lightning                    0     0     0     0     0     0     0     0
##   Marine Strong Wind           0     0     4     8     5     2    17    12
##   Marine Thunderstorm Wind  1335  1165  1118   870  1181   928  1161  1545
##   Rip Current                 44    39    30    55    56    52    63    46
##   Seiche                       2     1     1     0     3     1     1     1
##   Sleet                        4     2     7     8    21     5    13     1
##   Storm Surge/Tide            18    44    17     7    54    11    12    50
##   Strong Wind                177   202   351   338   613   544   518   412
##   Thunderstorm Wind            0     0   739 12998 16782 13379 15826 21678
##   Tornado                   1947  1343  1264  1238  1891  1272  1446  2192
##   Tropical Depression         26     4     0     5    15     6     0     0
##   Tropical Storm              42    78    29    20   158    10    33   159
##   Tsunami                      0     0     4     0     0     1     5    10
##   Volcanic Ash                 4     3     1     1     2     0     0     0
##   Waterspout                 238   192   230   275   214   220   148   165
##   Wildfire                   123   176   371   250   275   231   220   775
##   Winter Storm               429   531   463  1036  1424  1268  1488  1174
##   Winter Weather             294   429   420  1132  1384  1185  1528  1250

We then proceeded to find, with the help of barplots, those events that generated greater damages to human health and in economic terms. First we added the human and economic damages by each event type, and we then generated combined charts for cumulative effects from 1996 to 2011, where the 8 most significant event types are shown.

FatINJ <- tapply(db$FATplusINJ,db$TYPE,sum)
FatINJ <- FatINJ[order(-FatINJ)]
print(FatINJ[1:8])
##           Tornado             Flood    Excessive Heat Thunderstorm Wind 
##             22178              9857              9719              1549 
##          Wildfire      Winter Storm         Hurricane         High Wind 
##              1545              1483              1453              1320
Dam <- tapply(db$USDAMAGES/1000000,db$TYPE,sum)
Dam <- Dam[order(-Dam)]
print(Dam[1:8],digits=2)
##            Flood        Hurricane Storm Surge/Tide          Tornado 
##           166114            87069            47836            24900 
##             Hail          Drought   Tropical Storm         Wildfire 
##            17201            14414             8320             8163
par(oma=c(0,0,0,0))
par(mar=c(3,8.5,2,1))
par(mfrow=c(2,1)) ##en lugar de 2 x 1 poner  12 x 4
barplot(FatINJ[1:8],main="Cumulative Fatalities and Injuries 1996-2011",
          horiz=TRUE, col="Blue", las=1,cex.axis = 0.65, cex.lab=0.5)
barplot(Dam[1:8],main="Cumulative Damages in US$ millions 1996-2011",
        horiz=TRUE, col="Green", las=1,cex.axis = 0.65, cex.lab=0.5)

In order to show the average human and economic damage per event, for the 8 most relevant types of event under this criterion, we averaged the damages by each type of event and generated the following panel of barplots:

FatINJAvg <- tapply(db$FATplusINJ,db$TYPE,mean)
FatINJAvg <- FatINJAvg[order(-FatINJAvg)]
print(FatINJAvg[1:8])
##            Tsunami          Hurricane     Excessive Heat        Rip Current 
##          8.1000000          5.3616236          4.0094884          1.4237057 
##          Avalanche            Tornado         Dust Storm Marine Strong Wind 
##          1.0026455          0.9578061          0.9280576          0.7500000
DamAvg <- tapply(db$USDAMAGES/1000000,db$TYPE,mean)
DamAvg <- DamAvg[order(-DamAvg)]
print(DamAvg[1:8],digits=0)
##        Hurricane Storm Surge/Tide   Tropical Storm          Tsunami 
##              321              119               12                7 
##          Drought            Flood         Wildfire        Ice Storm 
##                6                2                2                2
par(oma=c(0,0,0,0))
par(mar=c(2,8.5,2,1))
par(mfrow=c(2,1)) ##en lugar de 2 x 1 poner  12 x 4
barplot(FatINJAvg[1:8],main="Average Fatalities and Injuries 1996-2011",
          horiz=TRUE,cex.axis = 0.65, cex.lab=0.5, col="Blue", las=1)
barplot(DamAvg[1:10],main="Average Damages in US$ millions 1996-2011",
        horiz=TRUE,cex.axis = 0.65, cex.lab=0.25, col="Green", las=1)

From the barplots above, we can see that frequent types of events account for most of the cumulative human and economic damages from 1996 to 2011 -with the exception of hurricanes, even though on a single-event basis their cost to human health and economic activity is less significant than catastrophic events. However, it is of great importance to mention that the results are heavily dependent on the classification of event types proposed by the researcher, and so one proposal is to standardize the data entry from its origin.