Every year, adverse weather can severely derail the daily lives of communities across the U.S. and have both severe financial and health consequences. In order to better prepare you, this is an analysis of the worst weather events with respect to a) population health and b) economic consequences. The original data source is the bz2 compressed file https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2, from the NOAA storm database that tracks various characteristics of major storms. You can see the results of the analysis below.
After loading in the data, I converted the damage units to a consistent USD basis using a “Multiplier” vector. Since crop and property damage were separate values, I combined them into a total damage by event. I also added the injuries and fatalities together to provide one value for the population health aspect.
For reference, the libraries used in this analysis are readr, dplyr, ggplot2, scales and gridExtra.
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "StormData.bz2")
con <- bzfile("StormData.bz2")
open(con, "rb")
StormData <- read_csv(con, na = "?")
## Parsed with column specification:
## cols(
## .default = col_character(),
## STATE__ = col_double(),
## COUNTY = col_double(),
## BGN_RANGE = col_double(),
## COUNTY_END = col_double(),
## END_RANGE = col_double(),
## LENGTH = col_double(),
## WIDTH = col_double(),
## F = col_integer(),
## MAG = col_double(),
## FATALITIES = col_double(),
## INJURIES = col_double(),
## PROPDMG = col_double(),
## CROPDMG = col_double(),
## LATITUDE = col_double(),
## LONGITUDE = col_double(),
## LATITUDE_E = col_double(),
## LONGITUDE_ = col_double(),
## REFNUM = col_double()
## )
## See spec(...) for full column specifications.
close(con)
PropMultiplier <- ifelse(StormData$PROPDMGEXP == "", 1,
ifelse(is.na(StormData$PROPDMGEXP), 1,
ifelse(StormData$PROPDMGEXP == "-", 0,
ifelse(StormData$PROPDMGEXP == "+", 1,
ifelse(StormData$PROPDMGEXP == "h", 100,
ifelse(StormData$PROPDMGEXP == "H", 100,
ifelse(StormData$PROPDMGEXP == "K", 1000,
ifelse(StormData$PROPDMGEXP == "m", 1000000,
ifelse(StormData$PROPDMGEXP == "M", 1000000,
ifelse(StormData$PROPDMGEXP == "B", 1000000000,
ifelse(StormData$PROPDMGEXP == 0, 10^0,
ifelse(StormData$PROPDMGEXP == 1, 10^1,
ifelse(StormData$PROPDMGEXP == 2, 10^2,
ifelse(StormData$PROPDMGEXP == 3, 10^3,
ifelse(StormData$PROPDMGEXP == 4, 10^4,
ifelse(StormData$PROPDMGEXP == 5, 10^5,
ifelse(StormData$PROPDMGEXP == 6, 10^6,
ifelse(StormData$PROPDMGEXP == 7, 10^7,
ifelse(StormData$PROPDMGEXP == 8, 10^8,1)))))))))))))))))))
CropMultiplier <- ifelse(StormData$CROPDMGEXP == "", 1,
ifelse(is.na(StormData$CROPDMGEXP), 1,
ifelse(StormData$CROPDMGEXP == "k", 1000,
ifelse(StormData$CROPDMGEXP == "K", 1000,
ifelse(StormData$CROPDMGEXP == "m", 1000000,
ifelse(StormData$CROPDMGEXP == "M", 1000000,
ifelse(StormData$CROPDMGEXP == "B", 1000000000,
ifelse(StormData$CROPDMGEXP == 0, 10^0,
ifelse(StormData$CROPDMGEXP == 2, 10^2,1)))))))))
StormData2 <- mutate(StormData, PropertyDamage = PROPDMG * PropMultiplier,
CropDamage = CROPDMG * CropMultiplier,
CombinedDamage = PropertyDamage + CropDamage,
FatalitiesInjuries = FATALITIES + INJURIES)
StormData3 <- StormData2[which(complete.cases(StormData2$CombinedDamage)), ]
StormData4 <- group_by(StormData3, EVTYPE) %>%
summarize(Total_Damage = sum(CombinedDamage),
Total_InjuriesFatalities = sum(FatalitiesInjuries))
WorstHealth <- arrange(StormData4[, c(1, 3)], desc(Total_InjuriesFatalities)) %>%
mutate(Total_InjuriesFatalities = comma_format()(Total_InjuriesFatalities)) %>%
rename(EventType = EVTYPE)
WorstHealth <- head(WorstHealth, 10)
WorstDamage <- arrange(StormData4[, c(1, 2)], desc(Total_Damage)) %>%
mutate(Total_Damage = Total_Damage/1000000000) %>%
rename(EventType = EVTYPE)
WorstDamage <- head(WorstDamage, 10)
Here are the results of the Analysis. The 10 worst major storms and weather events for population health are listed in Plot 1, and the 10 worst major storms and weather events for economic consequences are in Plot 2.
ggplot(data = WorstHealth, aes(x = EventType, y = Total_InjuriesFatalities)) +
geom_bar(stat = "identity", fill = "blue", colour = "blue") +
coord_flip() +
labs(title = "Top 10 Worst Events for Population Health",
y = "Combined Injuries and Fatalities")
ggplot(data = WorstDamage, aes(x = EventType, y = Total_Damage)) +
geom_bar(stat = "identity", fill = "blue", colour = "blue") +
coord_flip() +
labs(title = "Top 10 Worst Events for Economic Consequences",
y = "Crop and Property Damages (Billions USD)")