This analysis focuses on the impact of severe weather events on health and economics. The dataset used, StormData, contains 902,297 rows and 37 columns. We subsetted the data to consider specific factors related to health impact (FATALITIES, INJURIES) and economic impact (PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP), categorized by event type (EVTYPE).
The analysis identifies the top 10 weather events that have the most significant impact on health and economic factors. In terms of health impact, the top 10 events are TORNADO, EXCESSIVE HEAT, TSTM WIND, FLOOD, LIGHTNING, HEAT, FLASH FLOOD, ICE STORM, THUNDERSTORM WIND, and WINTER STORM.
When considering economic factors, the top 10 weather events are FLOOD, HURRICANE/TYPHOON, TORNADO, STORM SURGE, HAIL, FLASH FLOOD, DROUGHT, HURRICANE, RIVER FLOOD, and ICE STORM. These events have the greatest economic consequences.
Through this analysis, we gain insights into the severe weather events that significantly impact health and the economy, allowing for better understanding and preparedness in dealing with such events.
Load the libraries.
library(stringr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
Load the data.
stormData <- read.csv("repdata_data_StormData.csv")
Remove unnecessesary rows and columns.
stormData_subset <- stormData[ ,c('EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')]
stormData_subset <- subset(stormData_subset, (!((stormData_subset$EVTYPE %in%
stormData_subset$EVTYPE[grep("^Summary", stormData_subset$EVTYPE)]) |
stormData_subset$EVTYPE %in% c("?", "NONE", "Other"))) &
((stormData_subset$INJURIES > 0 | stormData_subset$FATALITIES > 0 | stormData_subset$PROPDMG > 0 | stormData_subset$CROPDMG > 0)))
Convert exponent columns.
stormData_subset$CROPDMGEXP <- toupper(stormData_subset$CROPDMGEXP)
stormData_subset$PROPDMGEXP <- toupper(stormData_subset$PROPDMGEXP)
stormData_subset$PROPDMGEXP[stormData_subset$PROPDMGEXP %in% c("", "-", "?", "+", "0")] <- 10^0
stormData_subset$PROPDMGEXP[stormData_subset$PROPDMGEXP == "2"] <- 10^2
stormData_subset$PROPDMGEXP[stormData_subset$PROPDMGEXP == "3"] <- 10^3
stormData_subset$PROPDMGEXP[stormData_subset$PROPDMGEXP == "4"] <- 10^4
stormData_subset$PROPDMGEXP[stormData_subset$PROPDMGEXP == "5"] <- 10^5
stormData_subset$PROPDMGEXP[stormData_subset$PROPDMGEXP == "6"] <- 10^6
stormData_subset$PROPDMGEXP[stormData_subset$PROPDMGEXP == "7"] <- 10^7
stormData_subset$PROPDMGEXP[stormData_subset$PROPDMGEXP == "H"] <- 10^2
stormData_subset$PROPDMGEXP[stormData_subset$PROPDMGEXP == "K"] <- 10^3
stormData_subset$PROPDMGEXP[stormData_subset$PROPDMGEXP == "M"] <- 10^6
stormData_subset$PROPDMGEXP[stormData_subset$PROPDMGEXP == "B"] <- 10^9
stormData_subset$CROPDMGEXP[stormData_subset$CROPDMGEXP %in% c("", "?", "0")] <- 10^0
stormData_subset$CROPDMGEXP[stormData_subset$CROPDMGEXP == "K"] <- 10^3
stormData_subset$CROPDMGEXP[stormData_subset$CROPDMGEXP == "M"] <- 10^6
stormData_subset$CROPDMGEXP[stormData_subset$CROPDMGEXP == "B"] <- 10^9
Create Property and Crop Cost column.
stormData_subset$PROPDMG <- as.numeric(stormData_subset$PROPDMG)
stormData_subset$PROPDMGEXP <- as.numeric(stormData_subset$PROPDMGEXP)
stormData_subset$CROPDMG <- as.numeric(stormData_subset$CROPDMG)
stormData_subset$CROPDMGEXP <- as.numeric(stormData_subset$CROPDMGEXP)
stormData_subset$PROPCOST <- stormData_subset$PROPDMG * stormData_subset$PROPDMGEXP
stormData_subset$CROPCOST <- stormData_subset$CROPDMG * stormData_subset$CROPDMGEXP
To estimate health impact, add the fatalities and injuries columns.
stormData_subset_HI <- stormData_subset %>% group_by(EVTYPE) %>%
summarize(FATALITIES = sum(FATALITIES),
INJURIES = sum(INJURIES),
HEALTHIMP = sum(FATALITIES + INJURIES)) %>%
arrange(desc(HEALTHIMP))
top10_HI <- head(stormData_subset_HI, 10)
top10_HI <- top10_HI[, -ncol(top10_HI)]
Reshape the data to long format
top10_HI_long <- tidyr::gather(top10_HI, variable, value, -EVTYPE)
The top 10 events according to health impact (fatalities and
injuries) are TORNADO, EXCESSIVE HEAT, TSTM WIND, FLOOD, LIGHTNING,
HEAT, FLASH FLOOD, ICE STORM, THUNDERSTORM WIND, and WINTER STORM. These
weather events have shown the greatest adverse effects on population
health based on the combined impact of fatalities and injuries.
To estimate economic impact, add the property cost and crop cost columns.
stormData_subset_EI <- stormData_subset %>% group_by(EVTYPE) %>%
summarize(PROPCOST = sum(PROPCOST),
CROPCOST = sum(CROPCOST),
ECONIMP = sum(PROPCOST + CROPCOST)) %>%
arrange(desc(ECONIMP))
top10_EI <- head(stormData_subset_EI, 10)
top10_EI <- top10_EI[, -ncol(top10_EI)]
Reshape the data to long format
top10_EI_long <- tidyr::gather(top10_EI, variable, value, -EVTYPE)
The top 10 events according to economic impact (property cost and
crop cost) are FLOOD, HURRICANE/TYPHOON, TORNADO, STORM SURGE, HAIL,
FLASH FLOOD, DROUGHT, HURRICANE, RIVER FLOOD, and ICE STORM. These
weather events have shown the greatest adverse effects on the economy
based on the combined impact of property damage and crop damage