Having analysed the data from the National Weather I have found that the heaviest damage, both in terms of injuries and fatalities as well as economic is caused by severe winds and floods. The single most costly storm is the Tornado, dwarfing all other storms.
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destFile <- "StormData.csv.bz2"
## download.file(fileURL, destFile)
data <- read.csv(destFile, stringsAsFactors = FALSE)
I downloaded the StormData.csv.bz2 file and read it into a dataframe.
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
data <- select(data, c(EVTYPE, FATALITIES, INJURIES, PROPDMG, CROPDMG)) %>%
mutate(FatalPlusInjuries = FATALITIES + INJURIES) %>%
mutate(TotalDamage = PROPDMG + CROPDMG) %>%
filter(FatalPlusInjuries >= 1 & TotalDamage >= 1)
data$EVTYPE <- as.factor(data$EVTYPE)
I then selected the Storm Type (EVTYPE), the health and economic damage, calculated the Fatalities plus Injuries (FatalPlusInjuries) and Total Damage (TotalDamage). I then removed events that had no damage.
summdata <- summarise(group_by(data, EVTYPE),
Injuries = sum(INJURIES),
Fatalities = sum(FATALITIES),
FatalPlusInjuries = sum(FatalPlusInjuries),
PropertyDamage = sum(PROPDMG),
CropDamage = sum(CROPDMG),
TotalDamage = sum(TotalDamage))
healthsummary <- select(summdata, c(EVTYPE, Injuries, Fatalities, FatalPlusInjuries)) %>%
arrange(desc(FatalPlusInjuries)) %>%
filter(FatalPlusInjuries > 1000)
damagesummary <- select(summdata, c(EVTYPE, PropertyDamage, CropDamage, TotalDamage)) %>%
arrange(desc(TotalDamage)) %>%
filter(TotalDamage > 10000)
I then summarised the data and created a summary for health and another for economic damage. I limited the health to storms where over 1,000 people were injured or died. I limited the economic damage to storms where over 10,000 in damage was wreaked.
Here are the most damaging storm types.
healthsummary
## Source: local data frame [9 x 4]
##
## EVTYPE Injuries Fatalities FatalPlusInjuries
## 1 TORNADO 83283 5066 88349
## 2 FLOOD 6734 358 7092
## 3 TSTM WIND 2725 165 2890
## 4 FLASH FLOOD 1490 590 2080
## 5 ICE STORM 1842 40 1882
## 6 HURRICANE/TYPHOON 1275 62 1337
## 7 THUNDERSTORM WIND 1060 85 1145
## 8 WINTER STORM 1004 80 1084
## 9 HIGH WIND 877 151 1028
with(healthsummary, {
qplot(FatalPlusInjuries, EVTYPE)
})
damagesummary
## Source: local data frame [12 x 4]
##
## EVTYPE PropertyDamage CropDamage TotalDamage
## 1 TORNADO 877418.96 26402.09 903821.05
## 2 TSTM WIND 107854.43 3075.75 110930.18
## 3 FLASH FLOOD 61162.69 8306.20 69468.89
## 4 HIGH WIND 41247.50 1740.69 42988.19
## 5 THUNDERSTORM WIND 38986.20 1405.50 40391.70
## 6 FLOOD 29230.63 10369.85 39600.48
## 7 THUNDERSTORM WINDS 31769.80 1321.55 33091.35
## 8 WILDFIRE 19526.80 1068.20 20595.00
## 9 LIGHTNING 20311.00 13.50 20324.50
## 10 WINTER STORM 14963.91 293.00 15256.91
## 11 HAIL 10738.05 3463.00 14201.05
## 12 HEAVY SNOW 11034.74 170.00 11204.74
with(damagesummary, {
qplot(TotalDamage, EVTYPE)
})
matrixsummary <- inner_join(healthsummary, damagesummary)
## Joining by: "EVTYPE"
with(matrixsummary, {
qplot(FatalPlusInjuries, TotalDamage, color = EVTYPE)
})