Data about most damaging weather events given by NOAA lets us summarise theirs damages agains population.
The goal is give clearly information to the autorities about how to better allocate resources to palliate damages.
The two chosen parameters to determine it are “Public health damages” and “Economy damages”. “Flooding” phenomenon are the worst events on economic terms regarding “Property damage” and “Crop damage”. In terms of public health, the harmful climate events are “tornadoes”.
Our code downloads data automatically.
# setwd('~/Desktop')
if (!file.exists("stormdata.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "stormdata.csv.bz2", method = "curl") # it can take some minutes
}
The data is provided as a compressed bz2 file. We should unzip it and then read the csv file as R object.
require("R.utils", warn.conflicts = F)
bunzip2("stormdata.csv.bz2", remove = F, overwrite = T) # unziping
data <- read.csv("stormdata.csv", stringsAsFactors = FALSE) # reading
We transform the necessary columns that we will use next. Then, we convert the BGN_DATE column values into proper R date (time) object.
col_used <- which(names(data) %in% c("EVTYPE", "BGN_DATE", "FATALITIES", "INJURIES",
"PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))
df <- data[, col_used]
df$BGN_DATE = strptime(as.character(df$BGN_DATE), format = "%m/%d/%Y %H:%M:%S")
We want to calculate the actual damges in US dollars. So, we should find the actual denominations or multiplying factors. We assume the next conversion:
So, to obtain damages in USD, actual values present in PROPDMG and CROPDMG fields are multiplied by these factors. And, these two extra columns are added to the data (named as “p.damage” and “c.damage”).
p.level = levels(as.factor(df$PROPDMGEXP))
p = c(0, 0, 0, 0, 1, 10, 100, 1000, 10000, 1e+05, 1e+06, 1e+07, 1e+08, 1e+09,
100, 100, 1000, 1e+06, 1e+06)
p.deno = rep(0, dim(df)[1])
for (i in 1:(length(p.level))) {
p.deno[df$PROPDMGEXP == p.level[i]] = p[i]
}
p.damage = p.deno * df$PROPDMG
df$p.damage = p.damage
# =======================================
c.level = levels(as.factor(df$CROPDMGEXP))
c <- c(0, 0, 1, 100, 1e+09, 1000, 1000, 1e+06, 1e+06)
c.deno = rep(0, dim(df)[1])
for (i in 1:(length(c.level))) {
c.deno[df$CROPDMGEXP == c.level[i]] = c[i]
}
c.damage = c.deno * df$CROPDMG
df$c.damage = c.damage
Analyzing the impact of different events on population health (FATALITIES and INJURIES). Number of injuries and fatalities for each event type are aggregated to better observe which ones are more harmful compared to others.
aggd1 = aggregate(df$FATALITIES, by = list(df$EVTYPE), FUN = sum)
fatalities = aggd1[order(aggd1$x, decreasing = T), ]
names(fatalities) = c("Event.Type", "Fatalities")
head(fatalities, n = 8)
## Event.Type Fatalities
## 826 TORNADO 5633
## 124 EXCESSIVE HEAT 1903
## 151 FLASH FLOOD 978
## 271 HEAT 937
## 453 LIGHTNING 816
## 846 TSTM WIND 504
## 167 FLOOD 470
## 572 RIP CURRENT 368
aggd2 = aggregate(df$INJURIES, by = list(df$EVTYPE), FUN = sum)
injuries = aggd2[order(aggd2$x, decreasing = T), ]
names(injuries) = c("Event.Type", "Injuries")
head(injuries, n = 8)
## Event.Type Injuries
## 826 TORNADO 91346
## 846 TSTM WIND 6957
## 167 FLOOD 6789
## 124 EXCESSIVE HEAT 6525
## 453 LIGHTNING 5230
## 271 HEAT 2100
## 422 ICE STORM 1975
## 151 FLASH FLOOD 1777
Plotting the 20 most harmful events (for both, Injuries and Fatalities):
# par(oma = c(3.5, 0, 0, 0))
barplot(height = fatalities$Fatalities[1:20], names.arg = fatalities$Event.Type[1:20],
las = 2, cex.axis = 0.8, cex.names = 0.7, col = rainbow(20, start = 0, end = 0.35),
ylab = "Number of Fatalities")
title("Top Events \n causing Fatalities", line = -2)
barplot(height = injuries$Injuries[1:20], names.arg = injuries$Event.Type[1:20],
las = 2, cex.axis = 0.7, cex.names = 0.7, col = rainbow(20, start = 0, end = 0.35),
ylab = "Number of Injuries")
title("Top Events \n causing Injuries", line = -2)
Summarizing economic consequences of weather disaster events accross USA.
Property damages and Crop damages in USD are aggregated for each event type to better observe which ones are more harmful compared to others.
agg.d1 = aggregate(df$p.damage, by = list(df$EVTYPE), FUN = sum)
property = agg.d1[order(agg.d1$x, decreasing = T), ]
names(property) = c("Event.Type", "Property.Damage")
row.names(property) = 1:dim(property)[1]
head(property, 10)
## Event.Type Property.Damage
## 1 FLOOD 1.226e+11
## 2 HURRICANE/TYPHOON 6.550e+10
## 3 STORM SURGE 4.256e+10
## 4 HURRICANE 5.707e+09
## 5 TORNADO 5.677e+09
## 6 TROPICAL STORM 5.157e+09
## 7 WINTER STORM 5.015e+09
## 8 RIVER FLOOD 5.001e+09
## 9 STORM SURGE/TIDE 4.001e+09
## 10 HURRICANE OPAL 3.120e+09
agg.d2 = aggregate(df$c.damage, by = list(df$EVTYPE), FUN = sum)
crop = agg.d2[order(agg.d2$x, decreasing = T), ]
names(crop) = c("Event.Type", "Crop.Damage")
row.names(crop) = 1:dim(crop)[1]
head(crop, 10)
## Event.Type Crop.Damage
## 1 RIVER FLOOD 5.003e+09
## 2 ICE STORM 5.002e+09
## 3 DROUGHT 1.534e+09
## 4 HURRICANE/TYPHOON 1.515e+09
## 5 HAIL 9.962e+08
## 6 HEAT 4.007e+08
## 7 FREEZE 2.009e+08
## 8 FLASH FLOOD 1.792e+08
## 9 FLOOD 1.680e+08
## 10 TSTM WIND 1.092e+08
Plotting the 15 most harmful events regarding damages on Property and Crop.
# par(oma = c(3.5, 0, 0, 0))
barplot(height = property$Property.Damage[1:15], names.arg = property$Event.Type[1:15],
las = 2, cex.axis = 0.7, cex.names = 0.7, col = rainbow(20, start = 0, end = 0.35),
ylab = "Damage in US dollars")
title("Top Events \n causing Property damages", line = -2)
barplot(height = crop$Crop.Damage[1:15], names.arg = crop$Event.Type[1:15],
las = 2, cex.axis = 0.7, cex.names = 0.7, col = rainbow(20, start = 0, end = 0.35),
ylab = "Damage in US dollars")
title("Top Events \n causing Crop damages", line = -2)
In terms of public Healh, the most harmful events are “tornadoes” followed by others like “excesive heat” and “flood” and “tstm wind. Regarding Economics, even if "flood” problems seem the most damage type in both cases, the most harmful events are different if we are fixed our attemption to Property or Crop damages,
This graphs should be taken as information to can apply other methods like “pareto diagram” discrimination method to avoid main damages on public health and economics caused by weather events.