Download the database, should put it in the same directory with the R project that used to run this task. Load the database into R, and store it in object called “data”. Load the package dplyr and ggplot2.
data <- read.csv("repdata_data_StormData.csv.bz2")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
Question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
To investigate the harmful event with respect to population health, I focus on the two variables:
Therefore, to answer this question, I aim to group these two variable and calculate the total affected population in each events. The results was arrange in descending.
#Calculate fatalities + injuries
health_data <- data %>%
group_by(EVTYPE) %>%
summarise(
fatalities = sum(FATALITIES, na.rm = TRUE),
injuries = sum(INJURIES, na.rm = TRUE)
) %>%
mutate(total_harm = fatalities + injuries) %>%
arrange(desc(total_harm))
#Return top harmful events
head(health_data, 10)
## # A tibble: 10 × 4
## EVTYPE fatalities injuries total_harm
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
Question 2: Across the United States, which types of events have the greatest economic consequences?
To investigate the harmful event with respect to population health, I focus on the two variables::
Since the exponent was written in short acronym, I firstly converted the value of the 2 variables PROPDMGEXP and CROPDMGEXP. After that, I used the same strategy as question 1. Group the EVTYPE, calculate the total, arrange in descending and finaaly visualize.
#Convert exponent values
convert_exp <- function(exp) {
ifelse(exp == "K", 1e3,
ifelse(exp == "M", 1e6,
ifelse(exp == "B", 1e9,
ifelse(exp == "H", 1e2, 1))))
}
#Calculate total economic damage
economic_data <- data %>%
mutate(
prop_multiplier = convert_exp(PROPDMGEXP),
crop_multiplier = convert_exp(CROPDMGEXP),
property_damage = PROPDMG * prop_multiplier,
crop_damage = CROPDMG * crop_multiplier,
total_damage = property_damage + crop_damage
) %>%
group_by(EVTYPE) %>%
summarise(
total_economic_damage = sum(total_damage, na.rm = TRUE)
) %>%
arrange(desc(total_economic_damage))
#Return top economic damage events
head(economic_data, 10)
## # A tibble: 10 × 2
## EVTYPE total_economic_damage
## <chr> <dbl>
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57340614060.
## 4 STORM SURGE 43323541000
## 5 HAIL 18752905438.
## 6 FLASH FLOOD 17562129167.
## 7 DROUGHT 15018672000
## 8 HURRICANE 14610229010
## 9 RIVER FLOOD 10148404500
## 10 ICE STORM 8967041360
In this part I presented the graph for both questions using ggplot.
Question 1:
#Visualize top harmful events
top_health <- health_data %>%
slice_max(order_by = total_harm, n = 10)
ggplot(top_health,
aes(x = reorder(EVTYPE, total_harm),
y = total_harm)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(
title = "Top 10 Most Harmful Weather Events",
x = "Event Type",
y = "Total Fatalities and Injuries"
)
Question 2:
#Visualize top economic damage events
top_economic <- economic_data %>%
slice_max(order_by = total_economic_damage, n = 10)
ggplot(top_economic,
aes(x = reorder(EVTYPE, total_economic_damage),
y = total_economic_damage)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(
title = "Top 10 Weather Events by Economic Damage",
x = "Event Type",
y = "Economic Damage (USD)"
)
#Most harmful event
head(health_data, 1)
## # A tibble: 1 × 4
## EVTYPE fatalities injuries total_harm
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
#Most economic damage event
head(economic_data, 1)
## # A tibble: 1 × 2
## EVTYPE total_economic_damage
## <chr> <dbl>
## 1 FLOOD 150319678257
Tornadoes were the most harmful weather events with respect to population health in the United States, causing the highest combined number of fatalities and injuries. Floods caused the greatest economic consequences across the United States, resulting in the highest total property and crop damage.