The data analysis aims for understanding the impacts of severe weather events in the US. The impact of each event to population health (included fatalities and injuries) and economics (included property damage and crop damage) were summarized.
The analysis concludes “TORNADO” is the most harmful event to population health, and “FLOOD” is the event with the greatest economic consequences.
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, "./StormData.bz2")
DATA <- read.csv("~/R/Repro_Res_Pro2/StormData.bz2")
Summarized total fatalities and injuries of each event type and was saved in dataframes.
df1 <- DATA %>%
group_by(EVTYPE) %>%
summarise(Fatal = sum(FATALITIES), Injury = sum(INJURIES)) %>%
filter(Fatal != 0, Injury != 0) %>%
mutate(Total.harmed = Fatal + Injury,
EVTYPE = as.factor(EVTYPE))
df1.1 <- df1 %>%
select(EVTYPE, Fatal) %>%
arrange(desc(Fatal)) %>%
head(5)
# top 5 total fatality
df1.2 <- df1 %>%
select(EVTYPE, Injury) %>%
arrange(desc(Injury)) %>%
head(5)
# top 5 total injury
The events with 0 total damage was removed.
Transformed PROPDMGEXP into real values. Some of the PROPDMGEXP were randomly coded and the official document did not specify the meaning. I considered them as 1 exp.
df2 <- DATA %>%
select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
filter(PROPDMG>0 | CROPDMG>0) %>%
mutate(PROPDMGEXP.n = case_when(PROPDMGEXP=="" ~ 1,
str_detect(PROPDMGEXP, "[:digit:]") ~ 1,
str_detect(PROPDMGEXP, "[-+?]") ~ 1,
str_detect(PROPDMGEXP, "[Kk]") ~ 1000,
str_detect(PROPDMGEXP, "[Mm]") ~ 1000000,
str_detect(PROPDMGEXP, "[Bb]") ~ 1000000000),
CROPDMGEXP.n = case_when(CROPDMGEXP=="" ~ 1,
str_detect(CROPDMGEXP, "[:digit:]") ~ 1,
str_detect(CROPDMGEXP, "[-+?]") ~ 1,
str_detect(CROPDMGEXP, "[Kk]") ~ 1000,
str_detect(CROPDMGEXP, "[Mm]") ~ 1000000,
str_detect(CROPDMGEXP, "[Bb]") ~ 1000000000)) %>%
# transferred randomly coded values into numbers
mutate(PROPDMG.total = PROPDMG * PROPDMGEXP.n,
CROPDMG.total = CROPDMG * CROPDMGEXP.n) %>%
# calculated the value of each DMG in real numbers
group_by(EVTYPE) %>%
summarise(PROPDMG.total = sum(PROPDMG.total),
CROPDMG.total = sum(CROPDMG.total)) %>%
mutate(Total.DMG = PROPDMG.total + CROPDMG.total)
df2.1 <- df2 %>%
select(EVTYPE, PROPDMG.total) %>%
arrange(desc(PROPDMG.total)) %>%
head(5)
# top 5 total PROPDMG
df2.2 <- df2 %>%
select(EVTYPE, CROPDMG.total) %>%
arrange(desc(CROPDMG.total)) %>%
head(5)
# top 5 total CROPDMG
From the below summarized dataframe and barplot, the top 5 harmful events “TORNADO”, “EXCESSIVE HEAT”, “TSTM WIND”, “FLOOD” and “LIGHTNING”.
“TORNADO”, which was the top harmful event to both fatalities and injuries, caused 5633 fatalities and 91346 injuries, and harmed 96979 people’s health in total.
df1.1
## # A tibble: 5 x 2
## EVTYPE Fatal
## <fct> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
The above dataframe shows top 5 events causing fatalities.
df1.2
## # A tibble: 5 x 2
## EVTYPE Injury
## <fct> <dbl>
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
The above dataframe shows top 5 events causing Injuries.
df1 %>%
arrange(desc(Total.harmed)) %>%
head(5) %>%
ggplot(aes(x = fct_reorder(EVTYPE, Total.harmed, .desc = TRUE),
y = Total.harmed/1000)) +
geom_col() +
labs(x = "Type of events", y = "Total people harmed (K)",
title = "Top 5 events harmful to population health") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, vjust = 0.5))
From the below summarized dataframe and barplot, the top 5 events with the greastest economic consequences are “FLOOD”, “Hurricane/Typhoon”, “TORNADO”, “STORM SURGE” and “FLASH FLOOD”.
“FLOOD” is the most harmful event to economics in “property damage”.
“DROUGHT” is the most harmful event to economics in “crop damage”.
In total, “FLOOD” caused the most harm to economics.
df2.1
## # A tibble: 5 x 2
## EVTYPE PROPDMG.total
## <chr> <dbl>
## 1 FLOOD 144657709807
## 2 HURRICANE/TYPHOON 69305840000
## 3 TORNADO 56937160779.
## 4 STORM SURGE 43323536000
## 5 FLASH FLOOD 16140812067.
The above dataframe shows top 5 harmful events to “property damage”.
df2.2
## # A tibble: 5 x 2
## EVTYPE CROPDMG.total
## <chr> <dbl>
## 1 DROUGHT 13972566000
## 2 FLOOD 5661968450
## 3 RIVER FLOOD 5029459000
## 4 ICE STORM 5022113500
## 5 HAIL 3025954473
The above dataframe shows top 5 harmful events to “crop damage”.
df2 %>%
arrange(desc(Total.DMG)) %>%
head(5) %>%
ggplot(aes(x = fct_reorder(EVTYPE, Total.DMG, .desc = TRUE),
y = Total.DMG/1000000000)) +
geom_col() +
labs(x = "Type of events", y = "Total economic damage\n(Billion dollars)",
title = "Top 5 events damaging to economics") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, vjust = 0.5))