The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database was analysed to determine which events were harmful to human health and which to the economy. It was noticed that there was a large degree of overlap in the events causing injuries and fatalities. Excessive heat causing the most fatalities and tornadoes causing the most injuries. Different events were found to cause property or crop damage. Hurricanes and storm surges caused most property damage. The storms of 2005 accounting for a vast majority of this damage. While drought and hurricanes caused most crop damage.
The dplyr package was used to analyse the data, tidyr to reformat it, lubridate for dates, xtable for tables and ggplot2 for figures.
suppressMessages({
library(dplyr)
library(tidyr)
library(ggplot2)
library(lubridate)})
suppressWarnings(library(xtable))
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
dest <- "FStormData.csv.bz2"
if (!file.exists(dest)) {
download.file(url = url, destfile = dest)
}
stormdata <- read.csv(dest, header = TRUE, stringsAsFactors = FALSE)
stormdata$Date <- mdy_hms(stormdata$BGN_DATE)
stormdata$Year <- year(stormdata$Date)
stormdata <- subset(stormdata, Year >= 1996 & Year <= 2010)
stormdata$EVTYPE[stormdata$EVTYPE == "TSTM WIND"] <- "THUNDERSTORM WIND"
stormdata$EVTYPE[stormdata$EVTYPE == "HURRICANE"] <- "HURRICANE/TYPHOON"
stormdata[stormdata$REFNUM == 605943,]$PROPDMGEXP <- "M"
EXP <- c(" " = 1, K = 1000, "M" = 1E6, "B" = 1E9)
stormdata$PROPDMGEXP[stormdata$PROPDMGEXP == ""] <- " "
stormdata$PropertyDamage <- EXP[stormdata$PROPDMGEXP]*stormdata$PROPDMG
stormdata$CROPDMGEXP[stormdata$CROPDMGEXP == ""] <- " "
stormdata$CropDamage <- EXP[stormdata$CROPDMGEXP]*stormdata$CROPDMG
The top 5 events causing fatalities and the top 5 events causing injuries were calculated over the entire 15 year period, 1996-2010.
topfatal <- stormdata %>%
group_by(EVTYPE) %>%
summarise(Fatalaties = sum(FATALITIES)) %>%
arrange(desc(Fatalaties))
topfatalEVTYPE <- topfatal$EVTYPE[1:5]
topinjury <- stormdata %>%
group_by(EVTYPE) %>%
summarise(Injuries = sum(INJURIES)) %>%
arrange(desc(Injuries))
topinjuryEVTYPE <- topinjury$EVTYPE[1:5]
topEVTYPE <- unique(c(topfatalEVTYPE, topinjuryEVTYPE))
harmpopwide <- stormdata %>%
filter(EVTYPE %in% topEVTYPE) %>%
group_by(EVTYPE, Year) %>%
summarise(Fatalaties = sum(FATALITIES), Injuries = sum(INJURIES))
print(xtable(topfatal[1:5,], digits = 0), type = "html")
| EVTYPE | Fatalaties | |
|---|---|---|
| 1 | EXCESSIVE HEAT | 1761 |
| 2 | TORNADO | 924 |
| 3 | FLASH FLOOD | 819 |
| 4 | LIGHTNING | 625 |
| 5 | FLOOD | 356 |
print(xtable(topinjury[1:5,], digits = 0), type = "html")
| EVTYPE | Injuries | |
|---|---|---|
| 1 | TORNADO | 14504 |
| 2 | FLOOD | 6748 |
| 3 | EXCESSIVE HEAT | 6253 |
| 4 | THUNDERSTORM WIND | 4656 |
| 5 | LIGHTNING | 3947 |
As may be expected, there is degree of overlap in the events causing fatalities and injuries. Although the order changes.
For comparative purposes, the combined events were used to generate the plot of total casualties by year for the various events.
harmpop <- harmpopwide %>%
gather(harm , Total, -EVTYPE, -Year)
ggplot(harmpop, aes(x = Year, y = Total, col = EVTYPE)) +
geom_line() + facet_grid(harm ~ ., scales = "free_y") +
labs(title = "Total casulties vs. Year for Weather Phenonema")
The top 5 events causing property damage and the top 5 events causing crop damage were calculated over the entire 15 year period, 1996-2010.
topprop <- stormdata %>%
group_by(EVTYPE) %>%
summarise(Property = sum(PropertyDamage)) %>%
arrange(desc(Property))
toppropEVTYPE <- topprop$EVTYPE[1:5]
topcrop <- stormdata %>%
group_by(EVTYPE) %>%
summarise(Crop = sum(CropDamage)) %>%
arrange(desc(Crop))
topcropEVTYPE <- topcrop$EVTYPE[1:5]
print(xtable(topprop[1:5,], digits = 0), type = "html")
| EVTYPE | Property | |
|---|---|---|
| 1 | HURRICANE/TYPHOON | 81108159010 |
| 2 | STORM SURGE | 43193536000 |
| 3 | FLOOD | 21342156100 |
| 4 | TORNADO | 14797345010 |
| 5 | HAIL | 14143813870 |
print(xtable(topcrop[1:5,], digits = 0), type = "html")
| EVTYPE | Crop | |
|---|---|---|
| 1 | DROUGHT | 13336292000 |
| 2 | HURRICANE/TYPHOON | 5338782800 |
| 3 | FLOOD | 4819906400 |
| 4 | HAIL | 2393695450 |
| 5 | EXTREME COLD | 1288973000 |
Unlike fatalities and injuries, there is little overlap, so the damage was then considered separately.
The devastating storms of 2005 account (Katrina) caused a huge amount of damage. So that the variation in other event types may be seen, Property damage more than $7 Billion is listed separately and was excluded from the Property Damage Plot.
katerina <- which(stormdata$PropertyDamage > 7e9)
print(xtable(stormdata[katerina,c("Year", "EVTYPE", "PropertyDamage")], digits = 0), type = "html")
| Year | EVTYPE | PropertyDamage | |
|---|---|---|---|
| 569308 | 2005 | HURRICANE/TYPHOON | 10000000000 |
| 577675 | 2005 | HURRICANE/TYPHOON | 16930000000 |
| 577676 | 2005 | STORM SURGE | 31300000000 |
| 581533 | 2005 | HURRICANE/TYPHOON | 7350000000 |
| 581535 | 2005 | STORM SURGE | 11260000000 |
damagepropwide <- stormdata[-katerina,] %>%
filter(EVTYPE %in% toppropEVTYPE) %>%
group_by(EVTYPE, Year) %>%
summarise(Property = sum(PropertyDamage))
ggplot(damagepropwide, aes(x = Year, y = Property, col = EVTYPE)) +
geom_line() +
labs(title = "Total Property Damage vs. Year for Weather Phenonema", subtitle = "excluding damage more than $7 Billion.", y = "Total Damage ($)")
damagecropwide <- stormdata %>%
filter(EVTYPE %in% topcropEVTYPE) %>%
group_by(EVTYPE, Year) %>%
summarise(Crop = sum(CropDamage))
ggplot(damagecropwide, aes(x = Year, y = Crop, col = EVTYPE)) +
geom_line() +
labs(title = "Total Crop Damage vs. Year for Weather Phenonema", y = "Total Damage ($)")