Data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database were explored to answer what events have the most serious impacts on health and properties across the USA. The data were preprocessed, so that only data younger than 1995 were analysed. The results are reported for three consecutive half-decades to account for eventual so called black swan events. The most harmful events with respect to population health are tornados and excessive heat (former with increasing tendency during years, latter with decreasing one). The greatest economic consequences have hurricanes and floods, though hurricanes were absent from top five most harmful events in the last half-decade.
#clear workspace:
rm(list=ls())
#load necessary libraries:
library(dplyr)
library(lubridate)
library(xtable)
library(ggplot2)
#set working enviro to english settings:
Sys.setlocale("LC_ALL","English")
## [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
# download zip file if needed and unzip:
# wrote by Leonard Greski, I added rm() and write()
if(!file.exists("repdata-data-StormData.csv.bz2")){
dlMethod <- "curl"
if(substr(Sys.getenv("OS"),1,7) == "Windows") dlMethod <- "wininet"
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url,destfile='repdata-data-StormData.csv.bz2',method=dlMethod,mode="wb")
#unzip(zipfile = "repdata-data-StormData.csv.bz2")
rm(dlMethod, url)
write(date(), "time_downloaded.txt")
}
#read in raw data:
a<-read.csv("repdata-data-StormData.csv.bz2")
#select ony target variables:
b<- select(a, STATE, BGN_DATE, EVTYPE, FATALITIES:CROPDMGEXP)
head(b)
## STATE BGN_DATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP
## 1 AL 4/18/1950 0:00:00 TORNADO 0 15 25.0 K
## 2 AL 4/18/1950 0:00:00 TORNADO 0 0 2.5 K
## 3 AL 2/20/1951 0:00:00 TORNADO 0 2 25.0 K
## 4 AL 6/8/1951 0:00:00 TORNADO 0 2 2.5 K
## 5 AL 11/15/1951 0:00:00 TORNADO 0 2 2.5 K
## 6 AL 11/15/1951 0:00:00 TORNADO 0 6 2.5 K
## CROPDMG CROPDMGEXP
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
#converting dates to recognizable format:
b$begin_date<-mdy_hms(b$BGN_DATE)
#get rid of no longer needed variable:
b$BGN_DATE<-NULL
#get just year:
b$year<-year(b$begin_date)
#get rid of "Summary something"" rows:
b2<-b[!(grepl("^Summary|^SUMMARY", b$EVTYPE)),]
#get rid of unrecognized event type:
b2<-filter(b2, !b2$EVTYPE == "?")
According to official web (http://www.ncdc.noaa.gov/stormevents/details.jsp) the data were collected from 1950, thought only for tornados. Later, in 1955, thunderstorm winds and hails were added, and finally in 1996 the events started to be more or less classified into 48 event types. I decide to discard all data before year 1996, so that overall impact of tornados, thunderstorm winds and hails would not be overestimated due to far longer time of data collecting.
#discard everything before 1996:
b3<-filter(b2, year >= 1996)
#also get rid of rows with no fatality, injury or damage - those are not target for analysis:
b4<-filter(b3, !(FATALITIES==0 & INJURIES==0 & PROPDMG==0 & CROPDMG==0))
I manually extracted 48 classified eventy types from the National Weather Service Storm Data Documentation (https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf). I dediced to use just these 48 categories and discard everything else - in case of real data, I would consult authorities about cases not clearly belonging to any category stated above. For this assignment, I chose to discard them completely (as it’s not my job to clean that cursed poorly formated database).
#48 official event types:
events <- c("Astronomical_Low_Tide", "Avalanche", "Blizzard", "Coastal_Flood",
"Cold_Wind_Chill", "Debris_Flow", "Dense_Fog", "Dense_Smoke",
"Drought", "Dust_Devil", "Dust_Storm", "Excessive_Heat",
"Extreme_Cold_Wind_Chill", "Flash_Flood", "Flood", "Frost_Freeze",
"Funnel_Cloud", "Freezing_Fog", "Hail", "Heat", "Heavy_Rain", "Heavy_Snow",
"High_Surf", "High_Wind", "Hurricane_Typhoon", "Ice_Storm", "Lake-Effect_Snow",
"Lakeshore_Flood", "Lightning", "Marine_Hail", "Marine_High_Wind",
"Marine_Strong_Wind", "Marine_Thunderstorm_Wind", "Rip_Current", "Seiche",
"Sleet", "Storm_Surge_Tide", "Strong_Wind", "Thunderstorm_Wind", "Tornado",
"Tropical_Depression", "Tropical_Storm", "Tsunami", "Volcanic_Ash", "Waterspout",
"Wildfire", "Winter_Storm", "Winter_Weather")
#change all event types of dataset to upper cases:
b4$EVTYPE<-toupper(b4$EVTYPE)
#filter out events with correct or near correct names:
b5<-b4[grepl("LOW(.*)TIDE", b4$EVTYPE),]
b5<-rbind(b5, b4[grepl("AVALANCHE", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("BLIZZARD", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("COAST(.*)FLOOD", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("COLD|WIND(.*)CHILL", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("DEBRIS", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("DENS(.*)FOG", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("SMOKE", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("DROUGH|DRY", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("DUST(.*)DEVIL", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("DUST(.*)STORM", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("HEAT|WARMTH", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("FLASH(.*)FLOOD", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("^FLOOD", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("FROST|FREEZ", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("FUNNEL", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("FREEZING(.*)FOG", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("HAIL", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("HEAT", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("HEAVY(.*)RAIN", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("HEAVY(.*)SNOW", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("HIGH(.*)SURF", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("HIGH(.*)WIND", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("HURRICANE|TYPHOON", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("ICE(.*)STORM", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("EFFECT(.*)SNOW", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("LAKESH(.*)FLOOD", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("LIGHTNING", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("^MARINE", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("RIP(.*)CURRENT", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("SEICHE", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("SLEET", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("STORM(.*)(SURGE|TIDE)", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("STRONG(.*)WIND", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("(THUNDERSTORM|TSTM)(.*)WIND", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("TORNADO", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("TROPIC(.*)DEPRESS", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("TROPIC(.*)STORM", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("TSUNAMI", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("VOLCAN(.*)ASH", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("WATERSPOUT", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("WILDFIRE", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("WINTER(.*)STORM", b4$EVTYPE),])
b5<-rbind(b5, b4[grepl("WINTER(.*)WEATHER", b4$EVTYPE),])
#filter out two cases that get through grepl statements and I dont want them:
b5<-filter(b5, !b5$EVTYPE=="MARINE ACCIDENT")
b5<-filter(b5, !b5$EVTYPE=="NON-TSTM WIND")
#rename event types, so that there is max of 48 types at the end:
b5$EVTYPE[grep("LOW(.*)TIDE", b5$EVTYPE)]<-"Low tide"
b5$EVTYPE[grep("AVALANCHE", b5$EVTYPE)]<-"Avalanche"
b5$EVTYPE[grep("BLIZZARD", b5$EVTYPE)]<-"Blizzard"
b5$EVTYPE[grep("COAST(.*)FLOOD", b5$EVTYPE)]<-"Coastal flood"
b5$EVTYPE[grep("COLD|WIND(.*)CHILL", b5$EVTYPE)]<-"Cold or Wind chill"
b5$EVTYPE[grep("DEBRIS", b5$EVTYPE)]<-"Debris flow"
b5$EVTYPE[grep("DENS(.*)FOG", b5$EVTYPE)]<-"Dense fog"
b5$EVTYPE[grep("SMOKE", b5$EVTYPE)]<-"Dense smoke"
b5$EVTYPE[grep("DROUGH|DRY", b5$EVTYPE)]<-"Drought"
b5$EVTYPE[grep("DUST(.*)DEVIL", b5$EVTYPE)]<-"Dust devil"
b5$EVTYPE[grep("DUST(.*)STORM", b5$EVTYPE)]<-"Dust storm"
b5$EVTYPE[grep("HEAT|WARMTH", b5$EVTYPE)]<-"Excessive heat"
b5$EVTYPE[grep("FLASH(.*)FLOOD", b5$EVTYPE)]<-"Flash flood"
b5$EVTYPE[grep("^FLOOD", b5$EVTYPE)]<-"Flood"
b5$EVTYPE[grep("FROST|FREEZ", b5$EVTYPE)]<-"Frost or Freeze"
b5$EVTYPE[grep("FUNNEL", b5$EVTYPE)]<-"Funnel cloud"
b5$EVTYPE[grep("FREEZING(.*)FOG", b5$EVTYPE)]<-"Freezing fog"
b5$EVTYPE[grep("HAIL", b5$EVTYPE)]<-"Hail"
b5$EVTYPE[grep("HEAT", b5$EVTYPE)]<-"Heat"
b5$EVTYPE[grep("HEAVY(.*)RAIN", b5$EVTYPE)]<-"Heavy rain"
b5$EVTYPE[grep("HEAVY(.*)SNOW", b5$EVTYPE)]<-"Heavy snow"
b5$EVTYPE[grep("SURF", b5$EVTYPE)]<-"High surf"
b5$EVTYPE[grep("HIGH(.*)WIND", b5$EVTYPE)]<-"High wind"
b5$EVTYPE[grep("HURRICANE|TYPHOON", b5$EVTYPE)]<-"Hurricane"
b5$EVTYPE[grep("ICE(.*)STORM", b5$EVTYPE)]<-"Ice storm"
b5$EVTYPE[grep("EFFECT(.*)SNOW", b5$EVTYPE)]<-"Lake-effect snow"
b5$EVTYPE[grep("LAKESH(.*)FLOOD", b5$EVTYPE)]<-"Lakeshore flood"
b5$EVTYPE[grep("LIGHTNING", b5$EVTYPE)]<-"Lightning"
b5$EVTYPE[grep("^MARINE(.*)HAIL", b5$EVTYPE)]<-"Marine hail"
b5$EVTYPE[grep("^MARINE(.*)HIGH", b5$EVTYPE)]<-"Marine high wind"
b5$EVTYPE[grep("^MARINE(.*)STRONG", b5$EVTYPE)]<-"Marine strong wild"
b5$EVTYPE[grep("^MARINE(.*)(THUNDER|TSTM)", b5$EVTYPE)]<-"Marine thunderstorm"
b5$EVTYPE[grep("RIP(.*)CURRENT", b5$EVTYPE)]<-"Rip current"
b5$EVTYPE[grep("SEICHE", b5$EVTYPE)]<-"Seiche"
b5$EVTYPE[grep("SLEET", b5$EVTYPE)]<-"Sleet"
b5$EVTYPE[grep("STORM(.*)(SURGE|TIDE)", b5$EVTYPE)]<-"Storm surge"
b5$EVTYPE[grep("STRONG(.*)WIND", b5$EVTYPE)]<-"Strong wind"
b5$EVTYPE[grep("^TSTM|^THUNDERSTORM| TSTM", b5$EVTYPE)]<-"Thunderstorm wind"
b5$EVTYPE[grep("TORNADO", b5$EVTYPE)]<-"Tornado"
b5$EVTYPE[grep("TROPIC(.*)DEPRESS", b5$EVTYPE)]<-"Tropic depression"
b5$EVTYPE[grep("TROPIC(.*)STORM", b5$EVTYPE)]<-"Tropic storm"
b5$EVTYPE[grep("TSUNAMI", b5$EVTYPE)]<-"Tsunami"
b5$EVTYPE[grep("VOLCAN(.*)ASH", b5$EVTYPE)]<-"Volcanic ash"
b5$EVTYPE[grep("WATERSPOUT", b5$EVTYPE)]<-"Waterspout"
b5$EVTYPE[grep("WILDFIRE", b5$EVTYPE)]<-"Wildfire"
b5$EVTYPE[grep("WINTER(.*)STORM", b5$EVTYPE)]<-"Winter storm"
b5$EVTYPE[grep("WINTER(.*)WEATHER", b5$EVTYPE)]<-"Winter storm"
#check the event types, if there are only those I want and if their names are recoded properly:
d<-as.character(unique(b5$EVTYPE))
d<-sort(d)
d
## [1] "Avalanche" "Blizzard" "Coastal flood"
## [4] "Cold or Wind chill" "Dense fog" "Dense smoke"
## [7] "Drought" "Dust devil" "Dust storm"
## [10] "Excessive heat" "Flash flood" "Flood"
## [13] "Frost or Freeze" "Funnel cloud" "Hail"
## [16] "Heavy rain" "Heavy snow" "High surf"
## [19] "High wind" "Hurricane" "Ice storm"
## [22] "Lake-effect snow" "Lakeshore flood" "Lightning"
## [25] "Low tide" "Marine strong wild" "Marine thunderstorm"
## [28] "Rip current" "Seiche" "Storm surge"
## [31] "Strong wind" "Thunderstorm wind" "Tornado"
## [34] "Tropic depression" "Tropic storm" "Tsunami"
## [37] "Volcanic ash" "Waterspout" "Wildfire"
## [40] "Winter storm"
From pdf file reffered in previous part, the only valid marks for exponents are “K” for thousands, “M” for millions, and “B” for billions.
#lets see what mess is in this variable:
table(b5$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5
## 8798 0 0 0 0 0 0 0 0 0
## 6 7 8 B h H K m M
## 0 0 0 31 0 0 184756 0 7246
table(b5$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 102143 0 0 0 2 0 96923 0 1763
#so only B, K, and M are used
#create variables for filling in costs with exponents:
b6 <- mutate(b5, prop_cost=PROPDMG, crop_cost=CROPDMG)
#property damage:
b6$prop_cost<-ifelse(b6$PROPDMGEXP=="K", b6$prop_cost*1000, b6$prop_cost)
b6$prop_cost<-ifelse(b6$PROPDMGEXP=="M", b6$prop_cost*1000000, b6$prop_cost)
b6$prop_cost<-ifelse(b6$PROPDMGEXP=="B", b6$prop_cost*1000000000, b6$prop_cost)
#crop damage:
b6$crop_cost<-ifelse(b6$PROPDMGEXP=="K", b6$crop_cost*1000, b6$crop_cost)
b6$crop_cost<-ifelse(b6$PROPDMGEXP=="M", b6$crop_cost*1000000, b6$crop_cost)
b6$crop_cost<-ifelse(b6$PROPDMGEXP=="B", b6$crop_cost*1000000000, b6$crop_cost)
#select only target variables
fin<-select(b6, EVTYPE:INJURIES, year:crop_cost)
head(fin)
## EVTYPE FATALITIES INJURIES year prop_cost crop_cost
## 1 Low tide 0 0 2007 120000 0
## 2 Low tide 0 0 2008 200000 0
## 3 Avalanche 0 2 1996 0 0
## 4 Avalanche 1 1 1996 0 0
## 5 Avalanche 2 0 1996 0 0
## 6 Avalanche 0 2 1996 0 0
I decide to report not only effects on whole time scale (1996 to 2011; table data), but also for three consecutive half-decades (1996 to 2000, 2001 to 2005, and 2006 to 2011; table and plot). Rationale behind it is that 11 year may be too long interval in current changing, unstable and variable clima, so that shorter intervals, like 5 years, could show some interesting trends or at least changes in what events are the most harmful.
I work with simple sum of number of fatalities and injuries as a representation of harm to population health.
#create aggregated dataset with event type and health impact:
fin_health<-group_by(fin, EVTYPE) %>%
summarise(health_impact=sum(FATALITIES+INJURIES))
#arrange dataset so that the most hamrful events are at the beginning:
fin_health<-arrange(fin_health, desc(health_impact))
Tab.1. Number of fatalities and injuries together caused by events in years 1996 to 2011 across USA:
#print table
print(xtable(fin_health), type="html")
| EVTYPE | health_impact | |
|---|---|---|
| 1 | Tornado | 22178.00 |
| 2 | Excessive heat | 19438.00 |
| 3 | Flood | 7172.00 |
| 4 | Thunderstorm wind | 5407.00 |
| 5 | Lightning | 4792.00 |
| 6 | Flash flood | 2561.00 |
| 7 | Winter storm | 2027.00 |
| 8 | Hurricane | 1453.00 |
| 9 | High wind | 1322.00 |
| 10 | Rip current | 1045.00 |
| 11 | Wildfire | 986.00 |
| 12 | Hail | 930.00 |
| 13 | Heavy snow | 807.00 |
| 14 | Cold or Wind chill | 521.00 |
| 15 | Blizzard | 455.00 |
| 16 | Strong wind | 409.00 |
| 17 | Ice storm | 400.00 |
| 18 | Tropic storm | 395.00 |
| 19 | Dust storm | 387.00 |
| 20 | Avalanche | 379.00 |
| 21 | High surf | 330.00 |
| 22 | Heavy rain | 324.00 |
| 23 | Tsunami | 162.00 |
| 24 | Dense fog | 152.00 |
| 25 | Marine thunderstorm | 106.00 |
| 26 | Marine strong wild | 72.00 |
| 27 | Storm surge | 55.00 |
| 28 | Dust devil | 41.00 |
| 29 | Drought | 32.00 |
| 30 | Frost or Freeze | 22.00 |
| 31 | Coastal flood | 13.00 |
| 32 | Waterspout | 4.00 |
| 33 | Funnel cloud | 1.00 |
| 34 | Dense smoke | 0.00 |
| 35 | Lake-effect snow | 0.00 |
| 36 | Lakeshore flood | 0.00 |
| 37 | Low tide | 0.00 |
| 38 | Seiche | 0.00 |
| 39 | Tropic depression | 0.00 |
| 40 | Volcanic ash | 0.00 |
Now for the half-decades part. Categories I chose are, as their names say, half a decade, 5 year - from 1996 to 2000, from 2001 to 2005, and from 2006 to 2011 (this one is 6 years long, as the data represent 16 years long data collection).
#dataset with half-decades:
fin2<-fin
fin2$halfdecade<-cut(fin2$year, c(1995,2000,2006,2011), labels=c("1996-2000", "2001-2005", "2006-2011"))
#aggregate for decade and event type, arrange by health_impact, rank by health impact in halfdecades:
fin2_health<-group_by(fin2, halfdecade, EVTYPE) %>%
summarise(health_impact=sum(FATALITIES+INJURIES)) %>%
arrange(desc(health_impact)) %>%
mutate(rank=1:length(halfdecade))
#create tables for printing:
tab_1 <- filter(fin2_health, halfdecade=="1996-2000")
tab_2 <- filter(fin2_health, halfdecade=="2001-2005")
tab_3 <- filter(fin2_health, halfdecade=="2006-2011")
tab_123<-merge(tab_1, tab_2, by="EVTYPE", all=T)
tab_123<-merge(tab_123, tab_3, by="EVTYPE", all=T)
tab_123<-select(tab_123, -contains("halfdecade"))
#rename collumns
names(tab_123)<-c("Event_type", "health_impact_96-00", "rank_96-00", "health_impact_01-05",
"rank_01-05", "health_impact_06-11", "rank_06-11")
Tab.2. Number of fatalities and injuries together caused by events in three consecutive half-decades:
#print:
print(xtable(tab_123), type="html")
| Event_type | health_impact_96-00 | rank_96-00 | health_impact_01-05 | rank_01-05 | health_impact_06-11 | rank_06-11 | |
|---|---|---|---|---|---|---|---|
| 1 | Avalanche | 107.00 | 17 | 134.00 | 18 | 138.00 | 16 |
| 2 | Blizzard | 414.00 | 10 | 33.00 | 24 | 8.00 | 27 |
| 3 | Coastal flood | 7.00 | 25 | 2.00 | 29 | 4.00 | 28 |
| 4 | Cold or Wind chill | 245.00 | 14 | 118.00 | 21 | 158.00 | 15 |
| 5 | Dense fog | 149.00 | 17 | 3.00 | 29 | ||
| 6 | Dense smoke | 0.00 | 30 | ||||
| 7 | Drought | 20.00 | 24 | 12.00 | 25 | 0.00 | 31 |
| 8 | Dust devil | 3.00 | 26 | 12.00 | 26 | 26.00 | 21 |
| 9 | Dust storm | 164.00 | 15 | 212.00 | 15 | 11.00 | 26 |
| 10 | Excessive heat | 8338.00 | 1 | 7334.00 | 1 | 3766.00 | 2 |
| 11 | Flash flood | 1208.00 | 6 | 744.00 | 6 | 609.00 | 5 |
| 12 | Flood | 6603.00 | 3 | 237.00 | 13 | 332.00 | 9 |
| 13 | Frost or Freeze | 22.00 | 23 | 0.00 | 31 | 0.00 | 32 |
| 14 | Funnel cloud | 1.00 | 28 | 0.00 | 32 | 0.00 | 33 |
| 15 | Hail | 377.00 | 11 | 371.00 | 11 | 182.00 | 11 |
| 16 | Heavy rain | 162.00 | 16 | 110.00 | 22 | 52.00 | 19 |
| 17 | Heavy snow | 528.00 | 9 | 235.00 | 14 | 44.00 | 20 |
| 18 | High surf | 29.00 | 22 | 124.00 | 19 | 177.00 | 12 |
| 19 | High wind | 651.00 | 8 | 504.00 | 9 | 167.00 | 13 |
| 20 | Hurricane | 94.00 | 18 | 1344.00 | 5 | 15.00 | 25 |
| 21 | Ice storm | 263.00 | 12 | 120.00 | 20 | 17.00 | 23 |
| 22 | Lake-effect snow | 0.00 | 29 | 0.00 | 33 | 0.00 | 34 |
| 23 | Lakeshore flood | 0.00 | 34 | 0.00 | 35 | ||
| 24 | Lightning | 1756.00 | 5 | 1954.00 | 3 | 1082.00 | 4 |
| 25 | Low tide | 0.00 | 36 | ||||
| 26 | Marine strong wild | 4.00 | 27 | 68.00 | 18 | ||
| 27 | Marine thunderstorm | 36.00 | 23 | 70.00 | 17 | ||
| 28 | Rip current | 246.00 | 13 | 465.00 | 10 | 334.00 | 8 |
| 29 | Seiche | 0.00 | 30 | 0.00 | 35 | 0.00 | 37 |
| 30 | Storm surge | 36.00 | 21 | 3.00 | 28 | 16.00 | 24 |
| 31 | Strong wind | 65.00 | 20 | 160.00 | 16 | 184.00 | 10 |
| 32 | Thunderstorm wind | 2306.00 | 4 | 1580.00 | 4 | 1521.00 | 3 |
| 33 | Tornado | 6695.00 | 2 | 5012.00 | 2 | 10471.00 | 1 |
| 34 | Tropic depression | 0.00 | 36 | 0.00 | 38 | ||
| 35 | Tropic storm | 78.00 | 19 | 296.00 | 12 | 21.00 | 22 |
| 36 | Tsunami | 0.00 | 37 | 162.00 | 14 | ||
| 37 | Volcanic ash | 0.00 | 38 | ||||
| 38 | Waterspout | 3.00 | 27 | 1.00 | 30 | 0.00 | 39 |
| 39 | Wildfire | 531.00 | 8 | 455.00 | 6 | ||
| 40 | Winter storm | 1086.00 | 7 | 532.00 | 7 | 409.00 | 7 |
Fig.1. Plot of top five most harmful events:
#dataset for plotting top five most harmful:
fin2_health2<-filter(fin2_health, rank<6)
#plot is:
ggplot(fin2_health2, aes(x=factor(EVTYPE), y=health_impact, fill=factor(EVTYPE))) +
geom_bar(stat="identity") +
facet_wrap(~halfdecade) +
theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1)) +
labs(title="Top five most harmful events in years 1996-2011 in USA
splitted up to 3 half-decades",
x="", y="Number of killed or injured people",
fill="Event types")
As can be seen from the plot, the most harmful types of events are tornados followed by excessive heat. The decline in numbers of injuries and fatalities due to warmth could be possibly linked with widespread of air-conditionig. One can also notice the impact of floods in first half-decade, that disappears from top five in next two decades. Overall data in Tab.1. shows similar picture.
#aggreagate property and crop damage:
fin_dmg<-group_by(fin, EVTYPE) %>%
summarise(damage=(sum(prop_cost+crop_cost)/1000000))
#arrange dataset so that the most costly events are at the beginning:
fin_dmg<-arrange(fin_dmg, desc(damage))
Tab.3. Number of fatalities and injuries together caused by events in years 1996 to 2011 across USA:
#print table
print(xtable(fin_dmg), type="html")
| EVTYPE | damage | |
|---|---|---|
| 1 | Hurricane | 1617419.53 |
| 2 | Flood | 226156.35 |
| 3 | Tornado | 49924.00 |
| 4 | Flash flood | 48669.60 |
| 5 | Storm surge | 47840.57 |
| 6 | Hail | 27672.74 |
| 7 | Thunderstorm wind | 20945.05 |
| 8 | High wind | 12425.03 |
| 9 | Wildfire | 11932.48 |
| 10 | Tropic storm | 9460.48 |
| 11 | Drought | 4491.34 |
| 12 | Ice storm | 3656.71 |
| 13 | Winter storm | 2238.11 |
| 14 | Heavy snow | 1601.65 |
| 15 | Lightning | 1044.61 |
| 16 | Heavy rain | 644.89 |
| 17 | Blizzard | 532.72 |
| 18 | Cold or Wind chill | 432.52 |
| 19 | Coastal flood | 390.24 |
| 20 | Frost or Freeze | 236.34 |
| 21 | Strong wind | 226.96 |
| 22 | Tsunami | 164.06 |
| 23 | High surf | 93.97 |
| 24 | Lake-effect snow | 40.18 |
| 25 | Excessive heat | 19.67 |
| 26 | Marine thunderstorm | 11.81 |
| 27 | Lakeshore flood | 7.54 |
| 28 | Dense fog | 7.32 |
| 29 | Dust storm | 6.33 |
| 30 | Waterspout | 5.73 |
| 31 | Avalanche | 3.71 |
| 32 | Tropic depression | 1.74 |
| 33 | Seiche | 0.98 |
| 34 | Marine strong wild | 0.84 |
| 35 | Dust devil | 0.66 |
| 36 | Volcanic ash | 0.50 |
| 37 | Low tide | 0.32 |
| 38 | Rip current | 0.16 |
| 39 | Funnel cloud | 0.13 |
| 40 | Dense smoke | 0.10 |
Now for the half-decades part. The categories are the same as for health impact.
#aggregate for decade and event type, arrange by health_impact, rank by health impact in halfdecades:
fin2_dmg<-group_by(fin2, halfdecade, EVTYPE) %>%
summarise(damage=(sum(prop_cost+crop_cost)/1000000)) %>%
arrange(desc(damage)) %>%
mutate(rank=1:length(halfdecade))
#create tables for printing:
tab_4 <- filter(fin2_dmg, halfdecade=="1996-2000")
tab_5 <- filter(fin2_dmg, halfdecade=="2001-2005")
tab_6 <- filter(fin2_dmg, halfdecade=="2006-2011")
tab_456<-merge(tab_4, tab_5, by="EVTYPE", all=T)
tab_456<-merge(tab_456, tab_6, by="EVTYPE", all=T)
tab_456<-select(tab_456, -contains("halfdecade"))
#rename collumns
names(tab_456)<-c("Event_type", "damage_96-00", "rank_96-00", "damage_01-05",
"rank_01-05", "damage_06-11", "rank_06-11")
Tab.4. Cost of damage in millions of dollars caused by events in three consecutive half-decades:
#print:
print(xtable(tab_456), type="html")
| Event_type | damage_96-00 | rank_96-00 | damage_01-05 | rank_01-05 | damage_06-11 | rank_06-11 | |
|---|---|---|---|---|---|---|---|
| 1 | Avalanche | 0.95 | 23 | 0.38 | 30 | 2.39 | 30 |
| 2 | Blizzard | 412.75 | 15 | 91.23 | 18 | 28.74 | 21 |
| 3 | Coastal flood | 156.33 | 17 | 69.37 | 19 | 164.55 | 14 |
| 4 | Cold or Wind chill | 420.98 | 14 | 1.92 | 26 | 9.63 | 24 |
| 5 | Dense fog | 4.56 | 25 | 2.76 | 29 | ||
| 6 | Dense smoke | 0.10 | 36 | ||||
| 7 | Drought | 1130.34 | 9 | 3350.08 | 11 | 10.92 | 23 |
| 8 | Dust devil | 0.06 | 28 | 0.22 | 31 | 0.38 | 34 |
| 9 | Dust storm | 1.44 | 22 | 1.71 | 27 | 3.18 | 28 |
| 10 | Excessive heat | 6.64 | 20 | 7.34 | 24 | 5.69 | 26 |
| 11 | Flash flood | 13019.20 | 3 | 17583.12 | 4 | 18067.29 | 3 |
| 12 | Flood | 29675.69 | 2 | 157361.54 | 2 | 39119.13 | 1 |
| 13 | Frost or Freeze | 110.81 | 18 | 0.07 | 34 | 125.47 | 16 |
| 14 | Funnel cloud | 0.05 | 29 | 0.01 | 37 | 0.07 | 38 |
| 15 | Hail | 10195.74 | 5 | 8444.26 | 7 | 9032.73 | 4 |
| 16 | Heavy rain | 176.98 | 16 | 354.48 | 14 | 113.43 | 17 |
| 17 | Heavy snow | 1340.11 | 8 | 193.72 | 16 | 67.82 | 19 |
| 18 | High surf | 0.56 | 25 | 10.45 | 22 | 82.97 | 18 |
| 19 | High wind | 1107.01 | 10 | 6251.88 | 9 | 5066.13 | 6 |
| 20 | Hurricane | 811858.78 | 1 | 802912.65 | 1 | 2648.11 | 8 |
| 21 | Ice storm | 1673.30 | 7 | 1203.37 | 12 | 780.05 | 12 |
| 22 | Lake-effect snow | 0.06 | 27 | 23.41 | 20 | 16.71 | 22 |
| 23 | Lakeshore flood | 0.01 | 38 | 7.53 | 25 | ||
| 24 | Lightning | 485.50 | 13 | 255.52 | 15 | 303.60 | 13 |
| 25 | Low tide | 0.32 | 35 | ||||
| 26 | Marine strong wild | 0.03 | 35 | 0.80 | 33 | ||
| 27 | Marine thunderstorm | 10.84 | 21 | 0.97 | 32 | ||
| 28 | Rip current | 0.00 | 30 | 0.16 | 32 | 0.00 | 39 |
| 29 | Seiche | 0.82 | 24 | 0.07 | 33 | 0.09 | 37 |
| 30 | Storm surge | 28.76 | 19 | 43170.32 | 3 | 4641.49 | 7 |
| 31 | Strong wind | 2.62 | 21 | 177.82 | 17 | 46.52 | 20 |
| 32 | Thunderstorm wind | 8103.28 | 6 | 4062.23 | 10 | 8779.55 | 5 |
| 33 | Tornado | 11400.57 | 4 | 13927.03 | 5 | 24596.41 | 2 |
| 34 | Tropic depression | 0.43 | 29 | 1.30 | 31 | ||
| 35 | Tropic storm | 508.11 | 12 | 7955.43 | 8 | 996.95 | 10 |
| 36 | Tsunami | 9.20 | 23 | 154.86 | 15 | ||
| 37 | Volcanic ash | 0.50 | 28 | ||||
| 38 | Waterspout | 0.50 | 26 | 0.03 | 36 | 5.20 | 27 |
| 39 | Wildfire | 9640.04 | 6 | 2292.44 | 9 | ||
| 40 | Winter storm | 839.90 | 11 | 469.73 | 13 | 928.48 | 11 |
Fig.2. Plot of top five most harmful events:
#dataset for plotting top five most harmful:
fin2_dmg2<-filter(fin2_dmg, rank<6)
#plot is:
ggplot(fin2_dmg2, aes(x=factor(EVTYPE), y=damage, fill=factor(EVTYPE))) +
geom_bar(stat="identity") +
facet_wrap(~halfdecade) +
theme(axis.text.x = element_text(angle=90, vjust=0.25, hjust=1)) +
labs(title="Top five events with the greatest economic consequences
in years 1996-2011 in USA splitted up to 3 half-decades",
x=" ", y="Damage to properties and crops (millions of $)",
fill="Event types")
The hurricanes were the most prominent cause of destruction in first two half-decades, though in third half-decade their impact is lower and below top five most costly events. Second in rank are floods causing high damage in every half-decade. In overall data those two events exceed others in two and one levels of magnitude of how costly their impact is.