In this report we aim to identify the most harmful types of storm events, both in terms of human health and economic damages. Our hypothesis is events with high frequency of occurrence -even though the cost per occurrence may be small- have generated more damages than one-time catastrophic events. To investigate this hypothesis we obtained data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks characteristics of major storms and weather events in the United States. From these data we found that, between 1996 and 2011, the most damaging types of events for human health are related to tornadoes, floods and excessive heat; while the most harmful types, measured by economic losses, are related to floods, hurricanes and storm surge/tides.
We obtained data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks characteristics of major storms and weather events in the United States. The events in the database start in the year 1950 and end in November 2011.
We first read in the data from the raw text file included in the zip archive. The data is a delimited file. We kept the header data from the original file.
dbraw <- read.csv("repdata_data_StormData.csv.bz2")
After reading in the data, we checked the first few rows (there are more than 900,000) rows in this dataset, as well as its structure.
head(dbraw,10)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## 7 1 11/16/1951 0:00:00 0100 CST 9 BLOUNT AL
## 8 1 1/22/1952 0:00:00 0900 CST 123 TALLAPOOSA AL
## 9 1 2/13/1952 0:00:00 2000 CST 125 TUSCALOOSA AL
## 10 1 2/13/1952 0:00:00 2000 CST 57 FAYETTE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 TORNADO 0 0 NA
## 2 TORNADO 0 0 NA
## 3 TORNADO 0 0 NA
## 4 TORNADO 0 0 NA
## 5 TORNADO 0 0 NA
## 6 TORNADO 0 0 NA
## 7 TORNADO 0 0 NA
## 8 TORNADO 0 0 NA
## 9 TORNADO 0 0 NA
## 10 TORNADO 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## 7 0 1.5 33 2 0 0 1 2.5
## 8 0 0.0 33 1 0 0 0 2.5
## 9 0 3.3 100 3 0 1 14 25.0
## 10 0 2.3 100 3 0 0 0 25.0
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## 7 K 0 3405 8631
## 8 K 0 3255 8558
## 9 K 0 3334 8740
## 10 K 0 3336 8738
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
## 7 0 0 7
## 8 0 0 8
## 9 3336 8738 9
## 10 3337 8737 10
str(dbraw)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
Given the size of the dataset, we created a dataset with smaller dimensions (and size), choosing only the columns we considered significant for the analysis.
cols <- c("BGN_DATE","TIME_ZONE", "COUNTYNAME", "STATE", "EVTYPE",
"FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG",
"CROPDMGEXP")
dbmin <- dbraw[,cols]
str(dbmin)
## 'data.frame': 902297 obs. of 11 variables:
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
rm(dbraw)
After this size reduction, we added columns with the Date of each event and Year of each event.
dbmin$DATE <- sapply(strsplit(as.character(dbmin$BGN_DATE), " "), "[", 1)
dbmin$YEAR <- substring(dbmin$DATE,nchar(dbmin$DATE)-3,nchar(dbmin$DATE))
dbmin$YEAR <- as.integer(dbmin$YEAR)
dbmin$DATE <- as.Date(as.character(dbmin$DATE),format="%m/%d/%Y")
We checked for missing values in the most important fields in this dataset, finding none.
table(is.na(dbmin$EVTYPE))
##
## FALSE
## 902297
table(is.na(dbmin$PROPDMG))
##
## FALSE
## 902297
table(is.na(dbmin$CROPDMG))
##
## FALSE
## 902297
table(is.na(dbmin$FATALITIES))
##
## FALSE
## 902297
table(is.na(dbmin$INJURIES))
##
## FALSE
## 902297
sum(rowSums(is.na(dbmin)))
## [1] 0
The first transformation consisted of estimating the economic damages in US dollars. The first step was to convert the unit identifiers (K – thousands, M – millions, B – billions) to their corresponding numeric values. Non-valid unit identifiers were considered to have a value of one unit.
acceptedvals <- c("K","k","M","m","B","b")
dbmin$PROPDMMULT[dbmin$PROPDMGEXP=="K"|dbmin$PROPDMGEXP=="k"]<-1000
dbmin$PROPDMMULT[dbmin$PROPDMGEXP=="M"|dbmin$PROPDMGEXP=="m"]<-1000000
dbmin$PROPDMMULT[dbmin$PROPDMGEXP=="B"|dbmin$PROPDMGEXP=="b"]<-1000000000
dbmin$PROPDMMULT[!(dbmin$PROPDMGEXP %in% acceptedvals)]<-1
dbmin$CROPDMMULT[dbmin$CROPDMGEXP=="K"|dbmin$CROPDMGEXP=="k"]<-1000
dbmin$CROPDMMULT[dbmin$CROPDMGEXP=="M"|dbmin$CROPDMGEXP=="m"]<-1000000
dbmin$CROPDMMULT[dbmin$CROPDMGEXP=="B"|dbmin$CROPDMGEXP=="B"]<-1000000000
dbmin$CROPDMMULT[!(dbmin$CROPDMGEXP %in% acceptedvals)]<-1
The multiples were then multiplied to the corresponding columns and a total amount of damages was estimated.
dbmin$USPROPDM <- dbmin$PROPDMG* dbmin$PROPDMMULT
dbmin$USCROPDM <- dbmin$CROPDMG* dbmin$CROPDMMULT
dbmin$USDAMAGES <- dbmin$USPROPDM + dbmin$USCROPDM
To determine the impact on human health, human fatalities and injuries were added up in one variable.
dbmin$FATplusINJ <- dbmin$FATALITIES + dbmin$INJURIES
The most important transformation was the standardization of the types of events included in the dataset. In spite of a description of 48 event types in the NOAA description of the dataset, we found more than 900 different descriptions. We first changed terms that were duplicated, misspelled, poorly written and related them to the most similar event type listed in the 48-type list. We then identified descriptions that contained terms in the 48-type list that appear at the beginning of the description provided or in any part of it.
dbmin$EVTYPE <- toupper(dbmin$EVTYPE)
dbmin$EVTYPE <- gsub("^ +","",dbmin$EVTYPE)
titles <- c("ASTRONOMICAL LOW TIDE")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Astronomical low tide"
titles <- c("AVALANCE","AVALANCHE")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Avalanche"
titles <- c("BLIZZARD","BLIZZARD AND EXTREME WIND CHIL","BLIZZARD AND HEAVY SNOW",
"BLIZZARD WEATHER","BLIZZARD/FREEZING RAIN","BLIZZARD/HEAVY SNOW",
"BLIZZARD/HIGH WIND","BLIZZARD/WINTER STORM","GROUND BLIZZARD")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Blizzard"
titles <- c("BEACH EROSION/COASTAL FLOOD","COASTAL FLOODING/EROSION",
"COASTAL FLOOD","COASTAL FLOOD","COASTAL FLOODING",
"COASTAL FLOODING/EROSION","COASTALFLOOD","COASTAL/TIDAL FLOOD",
"CSTL FLOODING/EROSION","BEROSION/CSTL FLOOD")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Coastal Flood"
titles <- c("BITTER WIND CHILL","BITTER WIND CHILL TEMPERATURES",
"COLD","COLD AND SNOW","COLD AND WET CONDITIONS",
"COLD TEMPERATURE","COLD TEMPERATURES","COLD WAVE",
"COLD WEATHER","COLD WIND CHILL TEMPERATURES",
"COLD/WIND CHILL","COLD/WINDS","COOL AND WET","COOL SPELL",
"LOW TEMPERATURE","LOW TEMPERATURE RECORD","LOW WIND CHILL",
"WIND CHILL")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Cold/Wind Chill"
titles <- c("DENSE FOG","FOG","COLD","FOG AND COLD TEMPERATURES",
"PATCHY DENSE FOG")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Dense Fog"
titles <- c("DENSE SMOKE","SMOKE")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Dense Smoke"
titles <- c("DROUGHT","DROUGHT/EXCESSIVE HEAT","DRY","DRY CONDITIONS",
"DRY HOT WEATHER","DRY PATTERN","DRY SPELL","DRY WEATHER",
"DRYNESS","EXCESSIVELY DRY","HEAT DROUGHT","HEAT WAVE DROUGHT",
"RECORD DRY MONTH","RECORD DRYNESS")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Drought"
titles <- c("DUST DEVEL","DUST DEVIL","DUST DEVIL WATERSPOUT")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Dust devil"
titles <- c("DUST STORM","DUST STORM/HIGH WINDS","DUSTSTORM")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Dust Storm"
titles <- c("EXCESSIVE HEAT","EXCESSIVE HEAT/DROUGHT","EXTREME HEAT",
"HEAT","HEAT WAVE","HEAT WAVES","HEAT/DROUGHT","HEATBURST",
"HIGH TEMPERATURE RECORD","HOT AND DRY","HOT PATTERN","HOT WEATHER",
"HOT/DRY PATTERN","RECORD HEAT","RECORD HEAT WAVE","RECORD HIGH TEMPERATURE",
"RECORD HIGH TEMPERATURES","RECORD/EXCESSIVE HEAT")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Excessive Heat"
titles <- c("BLOWING SNOW- EXTREME WIND CHI","BLOWING SNOW & EXTREME WIND CH",
"BLOWING SNOW/EXTREME WIND CHIL","EXCESSIVE COLD","EXTENDED COLD",
"EXTREME COLD","EXTREME COLD/WIND CHILL","EXTREME WIND CHILL",
"EXTREME WIND CHILL/BLOWING SNO","EXTREME WIND CHILLS",
"EXTREME WINDCHILL","EXTREME WINDCHILL TEMPERATURES","EXTREME/RECORD COLD",
"RECORD COLD","RECORD COLD","RECORD COLD AND HIGH WIND","RECORD COLD/FROST",
"RECORD COOL","SEVERE COLD")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Extreme Cold/Wind Chill"
titles <- c("FREEZING FOG","ICE FOG")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Freezing Fog"
titles <- c("EXCESSIVE PRECIPITATION","EXCESSIVE RAIN","EXCESSIVE RAINFALL",
"EXCESSIVE WETNESS","EXTREMELY WET","HEAVY PRECIPATATION",
"HEAVY PRECIPITATION","HEAVY RAIN","HEAVY RAIN AND FLOOD",
"HEAVY RAIN AND WIND","HEAVY RAIN EFFECTS","HEAVY RAIN/FLOODING",
"HEAVY RAIN/HIGH SURF","HEAVY RAIN/LIGHTNING","HEAVY RAIN/MUDSLIDES/FLOOD",
"HEAVY RAIN/SEVERE WEATHER","HEAVY RAIN/SMALL STREAM URBAN","HEAVY RAIN/SNOW",
"HEAVY RAIN/URBAN FLOOD", "HEAVY RAIN/WIND","HEAVY RAIN; URBAN FLOOD WINDS;",
"HEAVY RAINFALL","HEAVY RAINS","HEAVY RAINS/FLOODING","HEAVY SHOWER",
"HEAVY SHOWERS","HVY RAIN","LOCALLY HEAVY RAIN","RAIN","RAIN (HEAVY)",
"RAIN AND WIND","RAIN DAMAGE","RAIN/SNOW","RAIN/WIND","RAINSTORM",
"RECORD PRECIPITATION","RECORD RAINFALL","RECORD/EXCESSIVE RAINFALL",
"SNOW/RAIN","TORRENTIAL RAIN","TORRENTIAL RAINFALL","TSTM HEAVY RAIN")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Heavy Rain"
titles <- c("ACCUMULATED SNOWFALL","EXCESSIVE SNOW","HEAVY WET SNOW",
"ICE AND SNOW","RECORD SNOW","RECORD SNOW/COLD",
"RECORD SNOWFALL","RECORD WINTER SNOW","SNOW ACCUMULATION")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Heavy Snow"
titles <- c("HAZARDOUS SURF","HEAVY SURF","HEAVY SURF AND WIND",
"HEAVY SURF COASTAL FLOODING","HEAVY SURF/HIGH SURF",
"HIGH SURF","HIGH SURF ADVISORIES","HIGH SURF ADVISORY",
"HIGH SURF ADVISORY")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "High Surf"
titles <- c("ICE AND SNOW","GLAZE ICE","GLAZE/ICE STORM","ICE STORM",
"ICE STORM AND SNOW","ICE STORM/FLASH FLOOD","ICE/SNOW",
"ICE/STRONG WINDS","ICESTORM/BLIZZARD","SNOW AND ICE STORM",
"SNOW/ICE STORM")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Ice Storm"
titles <- c("LIGHTING","LIGNTNING")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Lightning"
titles <- c("MARINE HAIL")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Marine Hail"
titles <- c("MARINE HIGH WIND")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Marine High Wind"
titles <- c("MARINE STRONG WIND")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Marine Strong Wind"
titles <- c("MARINE THUNDERSTORM WIND","MARINE TSTM WIND")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Marine Thunderstorm Wind"
titles <- c("SEICHE")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Seiche"
titles <- c("LIGHT SNOW AND SLEET","SNOW AND SLEET","SNOW SLEET",
"SNOW/RAIN/SLEET","SNOW/SLEET","SNOW/SLEET/FREEZING RAIN",
"SNOW/SLEET/RAIN")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Marine Thunderstorm Wind"
titles <- c("COASTAL SURGE","STORM SURGE","STORM SURGE/TIDE")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Storm Surge/Tide"
titles <- c("DOWNBURST","DOWNBURST WINDS","DRY MICROBURST","DRY MICROBURST 50",
"DRY MICROBURST 53","DRY MICROBURST 58","DRY MICROBURST 61",
"DRY MICROBURST 84","DRY MICROBURST WINDS","DRY MIRCOBURST WINDS",
"GRADIENT WIND","GRADIENT WINDS","MICROBURST","MICROBURST WINDS",
"STRONG WIND","STRONG WIND GUST","STRONG WINDS","WHIRLWIND",
"WIND","WIND ADVISORY","WIND AND WAVE","WIND DAMAGE","WIND GUSTS",
"WIND/HAIL","WINDS","WND")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Strong Wind"
titles <- c("SEVERE THUNDERSTORM","SEVERE THUNDERSTORM WINDS",
"SEVERE THUNDERSTORMS","TUNDERSTORM WIND","WIND STORM")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Thunderstorm Wind"
titles <- c("TROPICAL DEPRESSION")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Tropical Depression"
titles <- c("WATER SPOUT","WAYTERSPOUT")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Waterspout"
titles <- c("BRUSH FIRE","BRUSH FIRES","FOREST FIRES","GRASS FIRES",
"WILD FIRES","WILD/FOREST FIRE","WILD/FOREST FIRES",
"WILDFIRE","WILDFIRES")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Wildfire"
titles <- c("WINTER MIX","WINTERY MIX","WINTRY MIX")
dbmin$TYPE[dbmin$EVTYPE %in% titles] <- "Winter Weather"
dbmin$TYPE[grep("FLASH FLOOD",dbmin$EVTYPE)] <- "Flash Flood"
dbmin$TYPE[grep("FROST|FREEZ",dbmin$EVTYPE)] <- "Frost/Freeze"
dbmin$TYPE[grep("FUNNEL",dbmin$EVTYPE)] <- "Funnel Cloud"
dbmin$TYPE[grep("HAIL",dbmin$EVTYPE)] <- "Hail"
dbmin$TYPE[grep("HEAVY SNOW",dbmin$EVTYPE)] <- "Heavy Snow"
dbmin$TYPE[grep("HIGH WIND",dbmin$EVTYPE)] <- "High Wind"
dbmin$TYPE[grep("HURRICANE|TYPHOON",dbmin$EVTYPE)] <- "Hurricane"
dbmin$TYPE[grep("LAKE EFFECT SNOW",dbmin$EVTYPE)] <- "Lake-Effect Snow"
dbmin$TYPE[grep("LAKE FLOOD",dbmin$EVTYPE)] <- "Lakeshore Flood"
dbmin$TYPE[grep("^LIGHTING",dbmin$EVTYPE)] <- "Lightning"
dbmin$TYPE[grep("^RIP CURRENT",dbmin$EVTYPE)] <- "Rip Current"
dbmin$TYPE[grep("^SLEET",dbmin$EVTYPE)] <- "Sleet"
dbmin$TYPE[grep("^GUST",dbmin$EVTYPE)] <- "Thunderstorm Wind"
dbmin$TYPE[grep("^THUNDERSTORM|^THUDERSTORM|^THUNDEERSTORM
| ^THUNDERESTORM|^THUNDERTORM|^THUNDERTSORM
| ^THUNDESTORM|^THUNERSTORM",dbmin$EVTYPE)] <- "Thunderstorm Wind"
dbmin$TYPE[grep("^TORNADO|^TORNDAO",dbmin$EVTYPE)] <- "Tornado"
dbmin$TYPE[grep("^TROPICAL STORM",dbmin$EVTYPE)] <- "Tropical Storm"
dbmin$TYPE[grep("^TSUNAMI",dbmin$EVTYPE)] <- "Tsunami"
dbmin$TYPE[grep("^VOLCANIC ASH",dbmin$EVTYPE)] <- "Volcanic Ash"
dbmin$TYPE[grep("^WATERSPOUT",dbmin$EVTYPE)] <- "Waterspout"
dbmin$TYPE[grep("^WINTER STORM",dbmin$EVTYPE)] <- "Winter Storm"
dbmin$TYPE[grep("^WINTER WEATHER",dbmin$EVTYPE)] <- "Winter Weather"
dbmin$TYPE[grep("FLOOD|FLD",dbmin$EVTYPE)] <- "Flood"
Those observations in the dataset for which we could not find a match in the 48-type list were excluded from the dataset to be analyzed.
db <- dbmin[is.na(dbmin$TYPE)==FALSE,]
rm(dbmin)
Now that the event type list was reduced, we changed the event type column from string to factor.
db$TYPE <- as.factor(db$TYPE)
We generated an histogram from a table of the frequency of events per year.
a <- table(db$YEAR)
z <- row.names(a)
barplot(a,main="Events per Year",
ylab="Events per Year",xlab="Year",names.arg = z,col="Blue")
We found a major increase in reported cases in the mid-nineties. From the documentation of the dataset, we found that reporting to NOOA was limited to a certain number of events up until 1995; and since 1996 all the events in the 48-type list were reported. So, for consistency, we eliminated all records from 1995 and earlier.
db <- db[db$YEAR>1995,]
Once the final dataset was created, the first step consisted of exhibiting the frequency of occurrence of each type of event from 1996 to 2011 with a table:
table(db$TYPE,db$YEAR)
##
## 1996 1997 1998 1999 2000 2001 2002 2003
## Astronomical low tide 0 0 0 0 0 0 0 0
## Avalanche 6 4 17 24 27 26 25 11
## Blizzard 146 112 59 56 104 77 80 52
## Cold/Wind Chill 2 1 10 1 0 0 6 4
## Dense Fog 42 75 76 80 106 124 84 81
## Dense Smoke 0 0 0 0 0 3 8 0
## Drought 46 24 116 205 128 94 157 52
## Dust devil 9 8 4 8 2 7 8 14
## Dust Storm 7 7 9 15 19 18 40 34
## Excessive Heat 55 71 164 202 162 160 134 59
## Extreme Cold/Wind Chill 229 142 54 121 182 79 67 62
## Flood 4613 4016 4974 3422 3578 3875 4180 4914
## Freezing Fog 2 0 0 0 0 0 0 0
## Frost/Freeze 28 49 39 52 87 72 53 79
## Funnel Cloud 295 371 442 322 315 302 232 348
## Hail 10855 8801 12730 10236 11372 12389 12704 13952
## Heavy Rain 371 354 682 488 581 453 336 934
## Heavy Snow 980 704 661 786 803 819 777 854
## High Surf 17 50 56 59 51 105 48 61
## High Wind 1114 704 735 948 842 820 870 669
## Hurricane 45 18 29 40 6 5 8 9
## Ice Storm 203 88 114 57 95 58 94 51
## Lake-Effect Snow 0 8 5 5 1 2 0 0
## Lightning 0 0 0 0 0 0 0 0
## Marine Strong Wind 0 0 0 0 0 0 0 0
## Marine Thunderstorm Wind 0 1 2 4 0 99 1142 1446
## Rip Current 31 17 38 35 43 78 61 46
## Seiche 0 0 8 1 1 0 1 0
## Sleet 0 2 0 0 3 0 0 6
## Storm Surge/Tide 20 26 55 28 4 17 24 14
## Strong Wind 79 119 140 135 206 145 143 182
## Thunderstorm Wind 5 36 20 15 22 27 16 0
## Tornado 1239 1180 1529 1519 1169 1351 1041 1534
## Tropical Depression 0 0 1 0 0 0 0 3
## Tropical Storm 33 4 24 9 7 21 20 35
## Tsunami 0 0 0 0 0 0 0 0
## Volcanic Ash 2 2 0 0 0 3 2 7
## Waterspout 159 218 176 270 263 178 207 239
## Wildfire 62 37 113 298 372 217 344 312
## Winter Storm 435 452 261 395 514 438 462 547
## Winter Weather 8 38 28 37 34 50 53 316
##
## 2004 2005 2006 2007 2008 2009 2010 2011
## Astronomical low tide 0 0 0 44 39 31 34 26
## Avalanche 18 35 36 27 39 12 50 21
## Blizzard 74 86 104 185 342 487 356 313
## Cold/Wind Chill 0 6 15 112 135 118 87 66
## Dense Fog 78 56 119 211 206 110 172 152
## Dense Smoke 0 0 1 3 2 2 1 1
## Drought 38 110 175 319 237 162 214 383
## Dust devil 9 9 7 6 13 12 11 10
## Dust Storm 28 19 39 35 15 46 40 46
## Excessive Heat 27 82 162 154 109 76 379 428
## Extreme Cold/Wind Chill 138 65 37 94 259 144 56 155
## Flood 5704 4331 3936 5631 6168 6177 6783 7276
## Freezing Fog 0 0 0 0 0 0 0 0
## Frost/Freeze 73 113 106 294 178 208 211 123
## Funnel Cloud 386 433 366 340 378 449 525 564
## Hail 13196 13829 16729 12764 17594 13341 10947 17807
## Heavy Rain 715 854 810 874 854 1087 1121 1065
## Heavy Snow 644 705 678 944 1346 1032 1372 934
## High Surf 72 88 61 78 60 88 84 64
## High Wind 669 715 1448 1628 2467 2142 2082 2194
## Hurricane 31 47 0 5 23 1 1 3
## Ice Storm 79 113 44 376 172 122 103 115
## Lake-Effect Snow 0 0 0 0 0 0 0 0
## Lightning 0 0 0 0 0 0 0 0
## Marine Strong Wind 0 0 4 8 5 2 17 12
## Marine Thunderstorm Wind 1335 1165 1118 870 1181 928 1161 1545
## Rip Current 44 39 30 55 56 52 63 46
## Seiche 2 1 1 0 3 1 1 1
## Sleet 4 2 7 8 21 5 13 1
## Storm Surge/Tide 18 44 17 7 54 11 12 50
## Strong Wind 177 202 351 338 613 544 518 412
## Thunderstorm Wind 0 0 739 12998 16782 13379 15826 21678
## Tornado 1947 1343 1264 1238 1891 1272 1446 2192
## Tropical Depression 26 4 0 5 15 6 0 0
## Tropical Storm 42 78 29 20 158 10 33 159
## Tsunami 0 0 4 0 0 1 5 10
## Volcanic Ash 4 3 1 1 2 0 0 0
## Waterspout 238 192 230 275 214 220 148 165
## Wildfire 123 176 371 250 275 231 220 775
## Winter Storm 429 531 463 1036 1424 1268 1488 1174
## Winter Weather 294 429 420 1132 1384 1185 1528 1250
We then proceeded to find, with the help of barplots, those events that generated greater damages to human health and in economic terms. First we added the human and economic damages by each event type, and we then generated combined charts for cumulative effects from 1996 to 2011, where the 8 most significant event types are shown.
FatINJ <- tapply(db$FATplusINJ,db$TYPE,sum)
FatINJ <- FatINJ[order(-FatINJ)]
print(FatINJ[1:8])
## Tornado Flood Excessive Heat Thunderstorm Wind
## 22178 9857 9719 1549
## Wildfire Winter Storm Hurricane High Wind
## 1545 1483 1453 1320
Dam <- tapply(db$USDAMAGES/1000000,db$TYPE,sum)
Dam <- Dam[order(-Dam)]
print(Dam[1:8],digits=2)
## Flood Hurricane Storm Surge/Tide Tornado
## 166114 87069 47836 24900
## Hail Drought Tropical Storm Wildfire
## 17201 14414 8320 8163
par(oma=c(0,0,0,0))
par(mar=c(3,8.5,2,1))
par(mfrow=c(2,1)) ##en lugar de 2 x 1 poner 12 x 4
barplot(FatINJ[1:8],main="Cumulative Fatalities and Injuries 1996-2011",
horiz=TRUE, col="Blue", las=1,cex.axis = 0.65, cex.lab=0.5)
barplot(Dam[1:8],main="Cumulative Damages in US$ millions 1996-2011",
horiz=TRUE, col="Green", las=1,cex.axis = 0.65, cex.lab=0.5)
In order to show the average human and economic damage per event, for the 8 most relevant types of event under this criterion, we averaged the damages by each type of event and generated the following panel of barplots:
FatINJAvg <- tapply(db$FATplusINJ,db$TYPE,mean)
FatINJAvg <- FatINJAvg[order(-FatINJAvg)]
print(FatINJAvg[1:8])
## Tsunami Hurricane Excessive Heat Rip Current
## 8.1000000 5.3616236 4.0094884 1.4237057
## Avalanche Tornado Dust Storm Marine Strong Wind
## 1.0026455 0.9578061 0.9280576 0.7500000
DamAvg <- tapply(db$USDAMAGES/1000000,db$TYPE,mean)
DamAvg <- DamAvg[order(-DamAvg)]
print(DamAvg[1:8],digits=0)
## Hurricane Storm Surge/Tide Tropical Storm Tsunami
## 321 119 12 7
## Drought Flood Wildfire Ice Storm
## 6 2 2 2
par(oma=c(0,0,0,0))
par(mar=c(2,8.5,2,1))
par(mfrow=c(2,1)) ##en lugar de 2 x 1 poner 12 x 4
barplot(FatINJAvg[1:8],main="Average Fatalities and Injuries 1996-2011",
horiz=TRUE,cex.axis = 0.65, cex.lab=0.5, col="Blue", las=1)
barplot(DamAvg[1:10],main="Average Damages in US$ millions 1996-2011",
horiz=TRUE,cex.axis = 0.65, cex.lab=0.25, col="Green", las=1)
From the barplots above, we can see that frequent types of events account for most of the cumulative human and economic damages from 1996 to 2011 -with the exception of hurricanes, even though on a single-event basis their cost to human health and economic activity is less significant than catastrophic events. However, it is of great importance to mention that the results are heavily dependent on the classification of event types proposed by the researcher, and so one proposal is to standardize the data entry from its origin.