Although the citizens are aware of the damages that severe bad weather events produce due to the general public media, the total quantities of losts in human life and properties throughout the years it’s not well known. We analized the weather events that causes the worst damages. We compared the events identifying which provokes more lives lost against those that results in more properties lost. We also separated the outstanding events (identifying outliers) to emphasize on the less publicized but more common ones.
For this study we used the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. It records major weather events since 1950 including location, duration and estimates in property damages and fatalities and injuries.
The data comes in a cvs compressed file.
library(ggplot2)
Sys.setlocale("LC_TIME", "C")
stormdata <- read.csv(bzfile('repdata-data-StormData.csv.bz2'),
comment.char = "", nrows = 1500000,
stringsAsFactors = F)
Estimates of damages are coded in thousands, millions and billions dollars. We transformed them to dollars units.
todollars <- function(ammount, expfactor) {
if (expfactor == "K" | expfactor == "k") { ammount * 1000 }
else if (expfactor == "M" | expfactor == "m") { ammount * 1000000 }
else if (expfactor == "B") { ammount * 1000000000 }
else { ammount }
}
storms <- data.frame(as.Date(stormdata$BGN_DATE, '%m/%d/%Y'),
stormdata$STATE,
stormdata$EVTYPE,
stormdata$FATALITIES,
stormdata$INJURIES,
mapply(todollars, stormdata$PROPDMG, stormdata$PROPDMGEXP),
mapply(todollars, stormdata$CROPDMG, stormdata$CROPDMGEXP),
stringsAsFactors=F
)
names(storms) <- c("date","state","evtype",
"fatalities","injuries","propdmg","cropdmg")
As stated in http://www.ncdc.noaa.gov/stormevents/details.jsp the code for events is more detailed since January 1996. The last recorded event in this data file is November 2011. We filtered the records before January, 1st 1996 and also ignored records that are summary data.
storms96 <- storms[which(as.numeric(format(storms$date, "%Y%m")) >= 199601 &
!grepl("^summary",ignore.case=T,storms$evtype)),]
We did some small cleansing in the event type variable. We transformed all string to lower case and trimed white spaces. We normalized hurricanes and typhoons to “hurricane”. We applied similar processes to thunderstorms, floods and ice. We discarded further normalization because it could let to imputation errors. For instance, “heavy rain/high surf” could either fall in “heavy rain” or “high surf” categories. We also didn’t differentiate between property and crop (agriculture) damages and considered both of them together.
storms96$evtype <- tolower(storms96$evtype)
trim <- function (x) gsub("^\\s+|\\s+$", "", x)
storms96$evtype <- trim(storms96$evtype)
storms96$evtype <- gsub(" "," ", storms96$evtype)
storms96$evtype[storms96$evtype == "hurricane edouard"] <- "hurricane"
storms96$evtype[storms96$evtype == "hurricane/typhoon"] <- "hurricane"
storms96$evtype[storms96$evtype == "typhoon"] <- "hurricane"
storms96$evtype <- gsub("tstm.*","thunderstorm", storms96$evtype)
storms96$evtype[storms96$evtype == "thunderstorms"] <- "thunderstorm"
storms96$evtype[storms96$evtype == "thunderstorm wind"] <- "thunderstorm"
storms96$evtype[storms96$evtype == "thunderstorm wind (g40)"] <- "thunderstorm"
storms96$evtype <- gsub("ice.*","ice", storms96$evtype)
storms96$evtype <- gsub("icy.*","ice", storms96$evtype)
storms96$evtype <- gsub("flood.*","flood", storms96$evtype)
storms96$evtype <- gsub("flash flood.*","flash flood", storms96$evtype)
storms96$evtype <- as.factor(storms96$evtype)
storms96$damages <- storms96$propdmg + storms96$cropdmg
We grouped the event types by the casualties they cause.
fatlpertype <- aggregate(fatalities ~ evtype,
data=storms96[storms96$fatalities > 0,],
FUN=sum)
injrpertype <- aggregate(injuries ~ evtype,
data=storms96[storms96$injuries > 0,],
FUN=sum)
Most the bad weather events (0.8%) didn’t cause any victims from Jan’96 to Nov’11.
round(sum(storms96$fatalities != 0 | storms96$fatalities != 0)
/ nrow(storms96) * 100, 1)
## [1] 0.8
But some of them were breathless and hard to forget like the Joplin tornado and the heatwaves in the Midwest.
storms96[storms96$fatalities >= 50, c(3,1,2,4,5)]
## evtype date state fatalities injuries
## 355128 excessive heat 1999-07-28 IL 99 0
## 371112 excessive heat 1999-07-04 PA 74 135
## 862634 tornado 2011-05-22 MO 158 1150
The 5 worst events presented in descending order according the fatalities per year (factor 5.916 because on 2011 there are only 11 months) they cause are:
worstfatl <- order(fatlpertype$fatalities, decreasing=T)
fatlpertype$fatalitiesperyear <- round(fatlpertype$fatalities / 5.916)
head(fatlpertype[worstfatl, c("evtype","fatalitiesperyear")], n=5)
## evtype fatalitiesperyear
## 18 excessive heat 304
## 79 tornado 255
## 24 flash flood 150
## 54 lightning 110
## 25 flood 70
The same table ordered by injuries:
worstinjr <- order(injrpertype$injuries, decreasing=T)
injrpertype$injuriesperyear <- round(injrpertype$injuries / 5.916)
head(injrpertype[worstinjr, c("evtype","injuriesperyear")], n=5)
## evtype injuriesperyear
## 74 tornado 3493
## 22 flood 1142
## 15 excessive heat 1080
## 72 thunderstorm 867
## 47 lightning 700
Regarding damages we calculated the total damages per year in US dollars for each bad weather event type.
dmgspertype <- aggregate(damages ~ evtype,
data=storms96[storms96$damages > 0,],
FUN=sum)
worstdmgs <- order(dmgspertype$damages, decreasing=T)
dmgspertype$damagesperyear <- round(dmgspertype$damages / (5.916 * 1000000))
head(dmgspertype[worstdmgs, c("evtype","damagesperyear")], n=5)
## evtype damagesperyear
## 32 flood 25172
## 64 hurricane 14718
## 102 storm surge 7301
## 108 tornado 4209
## 48 hail 2886
Filtering outliers shows that storms and bad weather causes great lost in properties and agriculture throughout the year summing up high ammount of liabilities. We used R boxplot to identify outliers.
dmgspermonth <- aggregate(damages ~ evtype + format(date, "%m %Y"),
data=storms96, FUN=sum)
names(dmgspermonth)[2] <- "when"
worstevents <- head(dmgspertype[worstdmgs, c("evtype")], n=5)
dmgspermonthworst5 <- dmgspermonth[dmgspermonth$evtype %in% worstevents,]
dmgspermonthworst5$dwhen <- as.Date(paste("1 ", dmgspermonthworst5$when),
format="%d %m %Y")
outl <- boxplot(dmgspermonthworst5$damages, plot=F)$out
ggplot(subset(dmgspermonthworst5, ! damages %in% outl),
aes(x=dwhen, y=damages/1000000)) +
geom_line(aes(group=1)) +
facet_grid(evtype~.) +
theme(strip.text.y = element_text(
size = 8, colour = "red", angle = 90)) +
labs(x="Date of Event", y="Damages (millions of US$)",
title="Damages per type since January 1996 (outliers filtered).")
Figure 1: Damages since January 1996 in millions of US$ without outliers. It’s only shown the 5 worst bad weather event types.
We conclude that Excessive Heat is the ‘hidden plague’. Its lack of catastrophic images doesn’t serve well from a newstand point of view. Regrettably it causes aproximately 300 deaths and 1000 injured people per year. Tornadoes, 250 fatalities and 3,500 injuries. Floods of different types led to more than 200 deaths.
However floods, hurricanes, storms and tornados induce less fatalities than heat waves but enourmous ammount of damages: near to 25, 15, 7 and 4 billions of dollars per year respectively.