In accordance with “Storm data” (1950-2011) Tornado, thunderstorm wind, excessive heat, flood, lightnin, heat, flash flood, ice storm, high wind, and wildfire are in TOP10 wether ewents wich are most harmful with respect to population health across the United States.
Flood, hurricane (typhoon), tornado, storm surge/tide, hail, flash flood, drought, thunderstorm wind, ice storm, and wildfire are in TOP10 wether ewents wich have the greatest economic consequences.
My data analysis addresses the following questions:
1. Across the United States, which types of events are most harmful with respect to population health?
2. Across the United States, which types of events have the greatest economic consequences?
“Storm Data” is an official publication of the National Oceanic and Atmospheric Administration (NOAA).
Loading packages we are going to work with
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Downloading data if it’s needed
if( !file.exists("./data") ) {dir.create("./data")}
if ( !file.exists("./data/StormData.csv.bz2") )
{
## downloading file into 'working_directory/data'
library(downloader)
url <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download(url, "./data/StormData.csv.bz2")
unlink(url)
rm(url) ## removing spent variable
}
Reading file into data frame
StormData <- read.csv("./data/StormData.csv.bz2",
header = TRUE,
na.strings="",
stringsAsFactors = FALSE) ## very importent
colnames(StormData) <- tolower(colnames(StormData)) ##!!
dim(StormData)
## [1] 902297 37
StormData1 <- StormData %>%
select(evtype, bgn_date,
fatalities, injuries,
propdmg,propdmgexp,
cropdmg,cropdmgexp
) %>%
mutate(evtype = tolower(evtype)) %>%
arrange(desc(fatalities),desc(injuries))
unique(StormData1$propdmgexp)
## [1] NA "B" "M" "K" "0" "5" "-" "m" "H" "7" "+" "6" "?" "4" "2" "3" "h"
## [18] "1" "8"
unique(StormData1$cropdmgexp)
## [1] NA "K" "M" "B" "0" "m" "?" "k" "2"
For the purposes of this analysis, I have assumed that
. characters {+,-,?, 0 2 3 4 5 6 7 8 9 } in columns ‘propdmgexp’ and ‘propdmgexp’ are errors, so I will replace these characters with NA; . “[kK]” “[mM]” “[bB]” mean “kilo”, “mega”or“millins”, “billions”, so I will replace these charachters with “103“,”106”, “10^9”; . “[hH]” mean maybe ‘hundreds’ so I will replace these with “10^2”.
Fixing ‘propdmgexp’ data
StormData1$propdmgexp <- gsub(pattern="[0-9]", replacement="NA", StormData1$propdmgexp)
StormData1$propdmgexp <- gsub(pattern="[h,H]", replacement="1E2", StormData1$propdmgexp)
StormData1$propdmgexp <- gsub(pattern="[k,K]", replacement="1E3", StormData1$propdmgexp)
StormData1$propdmgexp <- gsub(pattern="[m,M]", replacement="1E6", StormData1$propdmgexp)
StormData1$propdmgexp <- gsub(pattern="[b,B]", replacement="1E9", StormData1$propdmgexp)
StormData1$propdmgexp <- as.numeric(StormData1$propdmgexp)
## Warning: NAs introduced by coercion
StormData1$propdmgexp[is.na(StormData1$propdmgexp)] <- 0
Fixing ‘cropdmgexp’ data
StormData1$cropdmgexp <- gsub(pattern="[0-9]", replacement="NA", StormData1$cropdmgexp)
StormData1$cropdmgexp <- gsub(pattern="[h,H]", replacement="1E2", StormData1$cropdmgexp)
StormData1$cropdmgexp <- gsub(pattern="[k,K]", replacement="1E3", StormData1$cropdmgexp)
StormData1$cropdmgexp <- gsub(pattern="[m,M]", replacement="1E6", StormData1$cropdmgexp)
StormData1$cropdmgexp <- gsub(pattern="[b,B]", replacement="1E9", StormData1$cropdmgexp)
StormData1$cropdmgexp <- as.numeric(StormData1$cropdmgexp)
## Warning: NAs introduced by coercion
StormData1$cropdmgexp[is.na(StormData1$cropdmgexp)] <- 0
StormData1 <- StormData1 %>%
mutate(propdmg = propdmg*propdmgexp,
cropdmg = cropdmg*cropdmgexp,
dmg = propdmg+cropdmg,
victims = fatalities+injuries) %>%
select(-c(propdmgexp,cropdmgexp)) %>%
filter(!(is.na(fatalities) & is.na(injuries) &
is.na(propdmg) & is.na(cropdmg)))
StormData1$bgn_date <- strptime(StormData1$bgn_date, format = "%m/%d/%Y %H:%M:%S")
StormData1 <- mutate(StormData1,evyear = year(bgn_date)) %>%
select(-bgn_date)
In “Storm Data” are only following 48 types of events: “astronomical low tide”, “avalanche”, “blizzard”, “coastal flood”,“cold/wind chill”,“debris flow”, “dense fog”, “dense smoke”,“drought”,“dust devil”, “dust storm”, “excessive heat”, “extreme cold/wind chill”, “flash flood”, “flood”, “frost/freeze”,“funnel cloud”, “freezing fog”, “hail”,“heat”,“heavy rain”,“heavy snow”, “high surf”,“high wind”, “hurricane (typhoon)”, “ice storm”, “lake-effect snow”, “lakeshore flood”, “lightning”, “marine hail”, “marine high wind”, “marine strong wind”, “marine thunderstorm wind”, “rip current”, “seiche”, “sleet”,“storm surge/tide”, “strong wind”, “thunderstorm wind”, “tornado”, “tropical depression”,“tropical storm”,“tsunami”, “volcanic ash”,“waterspout”, “wildfire”, “winter storm”,“winter weather”.
StormData1 has now some more types of events than in “Storm Data”:
length(unique(StormData1$evtype))
## [1] 898
Basic fixing and agregation values in ‘evtype’ column
StormData1$evtype = gsub(pattern="[?)(1-9]", replacement="", StormData1$evtype)
StormData1$evtype[grepl("astronomical low tide", StormData1$evtype)] <- "astronomical low tide"
StormData1$evtype[grepl("(avalanch?e)|(landslide)", StormData1$evtype)] <- "avalanche"
StormData1$evtype[grepl("blizzards?", StormData1$evtype)] <- "blizzard"
# StormData1$evtype[grepl("astronomical high tide",
# StormData1$evtype)] <- "coastal flood"
StormData1$evtype[grepl("coastal flood",
StormData1$evtype)] <- "coastal flood"
# ?"cold/wind chill"
StormData1$evtype[grepl("debris flow", StormData1$evtype)] <- "debris flow"
StormData1$evtype[grepl("(dense )?fog", StormData1$evtype)] <- "dense fog"
StormData1$evtype[grepl("dense smoke", StormData1$evtype)] <- "dense smoke"
StormData1$evtype[grepl("drought", StormData1$evtype)] <- "drought"
StormData1$evtype[grepl("dust devil", StormData1$evtype)] <- "dust devil"
StormData1$evtype[grepl("dust storm", StormData1$evtype)] <- "dust storm"
StormData1$evtype[grepl("((excessive|extreme) heat)|(hyperthermia/exposure)",
StormData1$evtype)] <- "excessive heat"
StormData1$evtype[grepl("(extreme cold)|(winds?.chill)|(extreme windchill)",
StormData1$evtype)] <- "extreme cold/wind chill"
StormData1$evtype[grepl("(flash.floods?)|(floods?.flash)",
StormData1$evtype)] <- "flash flood"
StormData1$evtype[grepl("flooding|(river(.+)flood)|(flood/rain/winds)",
StormData1$evtype)] <- "flood"
StormData1$evtype[grepl("urban(.+)floods?", StormData1$evtype)] <- "flood"
StormData1$evtype[grepl("(agricultural freeze)|(frost/freeze)"
, StormData1$evtype)] <- "frost/freeze"
StormData1$evtype[grepl("funnel cloud", StormData1$evtype)] <- "funnel cloud"
StormData1$evtype[grepl("freezing fog", StormData1$evtype)] <- "freezing fog"
# ?"hail" contained in "marine hail"
# ?"heat" contained in "excessive heat"
StormData1$evtype[grepl("(heat wave)|(unseasonably warm)",
StormData1$evtype)] <- "heat"
StormData1$evtype[grepl("heavy rains?", StormData1$evtype)] <- "heavy rain"
StormData1$evtype[grepl("heavy snow", StormData1$evtype)] <- "heavy snow"
StormData1$evtype[grepl("high surf", StormData1$evtype)] <- "high surf"
StormData1$evtype[grepl("high winds?", StormData1$evtype)] <- "high wind"
StormData1$evtype[grepl("hurricanes?|typhoons?", StormData1$evtype)] <- "hurricane (typhoon)"
StormData1$evtype[grepl("ice storm", StormData1$evtype)] <- "ice storm"
StormData1$evtype[grepl("lake.effect snow", StormData1$evtype)] <- "lake-effect snow"
StormData1$evtype[grepl("lakeshore flood", StormData1$evtype)] <- "lakeshore flood"
StormData1$evtype[grepl("heavy rains?", StormData1$evtype)] <- "heavy rain"
StormData1$evtype[grepl("lightn?ing", StormData1$evtype)] <- "lightning"
StormData1$evtype[grepl("marine hail", StormData1$evtype)] <- "marine hail"
StormData1$evtype[grepl("marine high winds?", StormData1$evtype)] <- "marine high wind"
StormData1$evtype[grepl("marine strong winds?", StormData1$evtype)] <- "marine strong wind"
StormData1$evtype[grepl("marine thunderstorms? winds?", StormData1$evtype)] <- "marine thunderstorm wind"
StormData1$evtype[grepl("rip currents?", StormData1$evtype)] <- "rip currents"
StormData1$evtype[grepl("seiche", StormData1$evtype)] <- "seiche"
StormData1$evtype[grepl("sleet", StormData1$evtype)] <- "sleet"
StormData1$evtype[grepl("storm surge", StormData1$evtype)] <- "storm surge/tide"
StormData1$evtype[grepl("strong winds", StormData1$evtype)] <- "strong wind"
StormData1$evtype[grepl("tstm winds?", StormData1$evtype)] <- "thunderstorm wind"
StormData1$evtype[grepl("severe thunderstorms?", StormData1$evtype)] <- "thunderstorm wind"
StormData1$evtype[grepl("thunderstorms winds",
StormData1$evtype)] <- "thunderstorm wind"
StormData1$evtype[grepl("tornado", StormData1$evtype)] <- "tornado"
StormData1$evtype[grepl("tropical depression", StormData1$evtype)] <- "tropical depression"
StormData1$evtype[grepl("tropical storms?", StormData1$evtype)] <- "tropical storm"
StormData1$evtype[grepl("tsunami", StormData1$evtype)] <- "tsunami"
StormData1$evtype[grepl("volcanic ash", StormData1$evtype)] <- "volcanic ash"
StormData1$evtype[grepl("waterspout?", StormData1$evtype)] <- "waterspout"
StormData1$evtype[grepl("(wild)?fires?", StormData1$evtype)] <- "wildfire"
StormData1$evtype[grepl("winter storms?", StormData1$evtype)] <- "winter storm"
StormData1$evtype[grepl("winter weather|wintry", StormData1$evtype)] <- "winter weather"
After fixing StormData1 has foolowing number of event types:
length(unique(StormData1$evtype))
## [1] 487
losses_by_year <- StormData1 %>%
group_by(evyear) %>%
summarize(total_fatalities = sum(fatalities, na.rm = TRUE),
total_injuries = sum(injuries, na.rm = TRUE),
total_victims = sum(victims, na.rm = TRUE),
total_propdmg = sum(propdmg, na.rm = TRUE),
total_cropdmg = sum(cropdmg, na.rm = TRUE),
total_dmg = sum(dmg, na.rm = TRUE)
)
par(mfrow = c(2,2), mar=c(4, 4, 1, 1))
with (losses_by_year,
{
plot(total_fatalities ~ evyear,
type = "b", lty = "dashed",
xlab = "Year", ylab = "Fatalities")
plot(total_injuries ~ evyear,
type = "b", lty = "dashed",
xlab = "Year", ylab = "Injuries")
plot(total_propdmg ~ evyear,log = "y",
type = "b", lty = "dashed",
xlab = "Year", ylab = "Property damage (in $USA)")
plot(total_cropdmg ~ evyear,log = "y", xlim = c(1993,2011),
type = "b", lty = "dashed",
xlab = "Year", ylab = "Crop damage (in $USA)")
}
)
## Warning in xy.coords(x, y, xlabel, ylabel, log): 43 y values <= 0 omitted
## from logarithmic plot
Fig.1. Victims and damages in USA (1950-2011) caused by the weather events (detailed)
Estimates of crop damage exist in “Storm Data” only for time period from 1993 to 2011 year.
model1 <- lm(total_victims ~ evyear, losses_by_year)
par(mfrow = c(1,2), mar=c(4, 4, 1, 1))
with (losses_by_year,
{
plot(x=evyear, y=total_victims,
xlab = "Year", ylab = "Victims")
abline(model1, lwd = 2, col = "red")
plot(x=evyear, y=total_dmg, log="y",
xlab = "Year", ylab = "Damage (in $USA)")
}
)
Fig.2. Victims and damages in USA (1950-2011) caused by the weather events (in general)
Red line on Fig.1 mean linear aproximation. It seems total damages have exponential depends from time.
Calculation data frame with total fatalities, injuries, and damage in all time period from 4/18/1950 to 11/29/2011 for each type of wether events
losses_by_evtype <- StormData1 %>%
group_by(evtype)%>% #, state
summarize(total_fatalities = sum(fatalities, na.rm = TRUE),
total_injuries = sum(injuries, na.rm = TRUE),
total_victims = sum(victims, na.rm = TRUE),
total_propdmg = sum(propdmg, na.rm = TRUE),
total_cropdmg = sum(cropdmg, na.rm = TRUE),
total_dmg = sum(dmg, na.rm = TRUE)
) %>%
arrange(desc(total_victims), desc(total_dmg))
TopFatEv <- arrange(losses_by_evtype, desc(total_fatalities))[1:10,1]
print(TopFatEv)
## Source: local data frame [10 x 1]
##
## evtype
## 1 tornado
## 2 excessive heat
## 3 heat
## 4 flash flood
## 5 lightning
## 6 thunderstorm wind
## 7 rip currents
## 8 flood
## 9 extreme cold/wind chill
## 10 high wind
TopInjEv <- arrange(losses_by_evtype, desc(total_injuries))[1:10,1]
print(TopInjEv)
## Source: local data frame [10 x 1]
##
## evtype
## 1 tornado
## 2 thunderstorm wind
## 3 flood
## 4 excessive heat
## 5 lightning
## 6 heat
## 7 ice storm
## 8 flash flood
## 9 wildfire
## 10 high wind
Following 10 types of wether events are most harmful with respect to population health across the United States (1950-2011):
TopVictEv <- arrange(losses_by_evtype, desc(total_victims))[1:10,1]
print(TopVictEv)
## Source: local data frame [10 x 1]
##
## evtype
## 1 tornado
## 2 thunderstorm wind
## 3 excessive heat
## 4 flood
## 5 lightning
## 6 heat
## 7 flash flood
## 8 ice storm
## 9 high wind
## 10 wildfire
Following 10 types of events have the greatest economic consequences across the United States (1950-2011):
TopDmgEv <- arrange(losses_by_evtype, desc(total_dmg))[1:10,1]
print(TopDmgEv)
## Source: local data frame [10 x 1]
##
## evtype
## 1 flood
## 2 hurricane (typhoon)
## 3 tornado
## 4 storm surge/tide
## 5 hail
## 6 flash flood
## 7 drought
## 8 thunderstorm wind
## 9 ice storm
## 10 wildfire
TopEv <- data.frame(rank = c(1:10), TopVictEv, TopDmgEv)
colnames(TopEv) = c("rank", "evtypeVict", "evtypeDmg")
print(TopEv)
## rank evtypeVict evtypeDmg
## 1 1 tornado flood
## 2 2 thunderstorm wind hurricane (typhoon)
## 3 3 excessive heat tornado
## 4 4 flood storm surge/tide
## 5 5 lightning hail
## 6 6 heat flash flood
## 7 7 flash flood drought
## 8 8 ice storm thunderstorm wind
## 9 9 high wind ice storm
## 10 10 wildfire wildfire
Below is a list with most dangerous for peple and economics types of wether event:
TopDangerEv <- unique(rbind(TopVictEv,TopDmgEv))
print(TopDangerEv)
## Source: local data frame [14 x 1]
##
## evtype
## 1 tornado
## 2 thunderstorm wind
## 3 excessive heat
## 4 flood
## 5 lightning
## 6 heat
## 7 flash flood
## 8 ice storm
## 9 high wind
## 10 wildfire
## 11 hurricane (typhoon)
## 12 storm surge/tide
## 13 hail
## 14 drought