Given the NOAA Storm Data, demonstrate:
Knowing the answers to these questions will aid in preparations. Tornadoes present the most injuries and fatalities. Tornadoes mixed with Thunderstorm Wind and Hail cause the most economic damage.
if(!file.exists("repdata-data-StormData.csv.bz2"))
{
download.file(url="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "repdata-data-StormData.csv.bz2")
}
StormData <- read.csv("repdata-data-StormData.csv.bz2")
Let’s combine Fatalities and Injuries into a single count, Casualties. This allows for easy sorting by maximum value.
options(scipen = 9)
CasualtyData <- StormData[,c("EVTYPE","FATALITIES","INJURIES")]
CasualtyData$CASUALTIES <- (CasualtyData$FATALITIES + CasualtyData$INJURIES)
byCasualties <- aggregate(list(Casualties = CasualtyData$CASUALTIES,
Injuries = CasualtyData$INJURIES,
Fatalities = CasualtyData$FATALITIES),
list(EventType = CasualtyData$EVTYPE),
FUN = sum)
byCasualties <- byCasualties[order(byCasualties$Casualties,decreasing = TRUE),]
The provided data had many incorrectly entered exponent values. This will convert them to numeric powers of ten. The odd values “+”, “-”, and “?” are converted to 0. Essentially, this completely discounts their values. But, there is such a small number of events that this is nearly unnoticed.
After calculating the dollar cost of Property and Crop damage, combine them into a single field for sorting of maximum damage.
economicData <- StormData[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
economicData$PROPDMGEXP <- gsub(pattern = "+", replacement = "0", economicData$PROPDMGEXP, fixed = TRUE)
economicData$PROPDMGEXP <- gsub(pattern = "-", replacement = "0", economicData$PROPDMGEXP, fixed = TRUE)
economicData$PROPDMGEXP <- gsub(pattern = "?", replacement = "0", economicData$PROPDMGEXP, fixed = TRUE)
economicData$PROPDMGEXP <- gsub(pattern = "m", replacement = "M", economicData$PROPDMGEXP, fixed = TRUE)
economicData$PROPDMGEXP <- gsub(pattern = "k", replacement = "K", economicData$PROPDMGEXP, fixed = TRUE)
economicData$PROPDMGEXP <- gsub(pattern = "h", replacement = "H", economicData$PROPDMGEXP, fixed = TRUE)
economicData$PROPDMGEXP <- gsub(pattern = "M", replacement = "6", economicData$PROPDMGEXP, fixed = TRUE)
economicData$PROPDMGEXP <- gsub(pattern = "K", replacement = "3", economicData$PROPDMGEXP, fixed = TRUE)
economicData$PROPDMGEXP <- gsub(pattern = "H", replacement = "2", economicData$PROPDMGEXP, fixed = TRUE)
economicData$PROPDMGEXP <- gsub(pattern = "B", replacement = "9", economicData$PROPDMGEXP, fixed = TRUE)
economicData$PropertyDamage <- (economicData$PROPDMG * (10^as.numeric(economicData$PROPDMGEXP)))
economicData$CROPDMGEXP <- gsub(pattern = "?", replacement = "0", economicData$CROPDMGEXP, fixed = TRUE)
economicData$CROPDMGEXP <- gsub(pattern = "k", replacement = "K", economicData$CROPDMGEXP, fixed = TRUE)
economicData$CROPDMGEXP <- gsub(pattern = "m", replacement = "M", economicData$CROPDMGEXP, fixed = TRUE)
economicData$CROPDMGEXP <- gsub(pattern = "B", replacement = "9", economicData$CROPDMGEXP, fixed = TRUE)
economicData$CROPDMGEXP <- gsub(pattern = "K", replacement = "3", economicData$CROPDMGEXP, fixed = TRUE)
economicData$CROPDMGEXP <- gsub(pattern = "M", replacement = "6", economicData$CROPDMGEXP, fixed = TRUE)
economicData$CropDamage <- (economicData$CROPDMG * (10^as.numeric(economicData$CROPDMGEXP)))
economicData$TotalDamage <- (economicData$PropertyDamage + economicData$PropertyDamage)
byDamage <- aggregate(list(TotalDamage = economicData$TotalDamage,
PropertyDamage = economicData$PropertyDamage,
CropDamage = economicData$CropDamage),
list(EventType = economicData$EVTYPE),
FUN = sum)
byDamage <- byDamage[order(byDamage$TotalDamage,decreasing = TRUE),]
with(byCasualties[1:3,],barplot(height = Casualties,
names.arg = EventType,
main = "Highest total injuries and fatalities",
cex.names = .7,
xlab = "Event Type",
ylab = "Total"))
byCasualties[1:5,]
## EventType Casualties Injuries Fatalities
## 834 TORNADO 96979 91346 5633
## 130 EXCESSIVE HEAT 8428 6525 1903
## 856 TSTM WIND 7461 6957 504
## 170 FLOOD 7259 6789 470
## 464 LIGHTNING 6046 5230 816
We can see that Tornadoes represent the highest human damage by a significant amount at 96979 incidents. Excessive heat and Thunderstorm Wind combined still represent a fraction of the damage of Tornadoes, at 8428 and 7461 each.
with(byDamage[1:3,],barplot(height = TotalDamage,
names.arg = EventType,
main = "Highest economic cost",
cex.names = .7,
xlab = "Event Type",
ylab = "Total cost"))
byDamage[1:5,]
## EventType TotalDamage PropertyDamage CropDamage
## 842 TORNADOES, TSTM WIND, HAIL 3200000000 1600000000 2500000
## 954 WILD FIRES 1248200000 624100000 NA
## 271 HAILSTORM 482000000 241000000 NA
## 392 HIGH WINDS/COLD 221000000 110500000 7000000
## 591 River Flooding 212310000 106155000 NA
As for economic cost, a combination of Tornadoes, Thunderstorm Wind, and Hail makes the largest source of damage at $3200000000. Wild Fires and Hailstorms place second and third at $1248200000 and $482000000, respectively. Again, the primary damage source is more than twice the second and third sources combined.