Data Processing

Download the database, should put it in the same directory with the R project that used to run this task. Load the database into R, and store it in object called “data”. Load the package dplyr and ggplot2.

data <- read.csv("repdata_data_StormData.csv.bz2") 
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

Question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

To investigate the harmful event with respect to population health, I focus on the two variables:

  1. FATALITIES, representing death population
  2. INJURIES, representing injured population

Therefore, to answer this question, I aim to group these two variable and calculate the total affected population in each events. The results was arrange in descending.

#Calculate fatalities + injuries 
health_data <- data %>%
  group_by(EVTYPE) %>%
  summarise(
    fatalities = sum(FATALITIES, na.rm = TRUE),
    injuries = sum(INJURIES, na.rm = TRUE)
  ) %>%
  mutate(total_harm = fatalities + injuries) %>%
  arrange(desc(total_harm))

#Return top harmful events
head(health_data, 10)
## # A tibble: 10 × 4
##    EVTYPE            fatalities injuries total_harm
##    <chr>                  <dbl>    <dbl>      <dbl>
##  1 TORNADO                 5633    91346      96979
##  2 EXCESSIVE HEAT          1903     6525       8428
##  3 TSTM WIND                504     6957       7461
##  4 FLOOD                    470     6789       7259
##  5 LIGHTNING                816     5230       6046
##  6 HEAT                     937     2100       3037
##  7 FLASH FLOOD              978     1777       2755
##  8 ICE STORM                 89     1975       2064
##  9 THUNDERSTORM WIND        133     1488       1621
## 10 WINTER STORM             206     1321       1527

Question 2: Across the United States, which types of events have the greatest economic consequences?

To investigate the harmful event with respect to population health, I focus on the two variables::

  1. PROPDMG, representing property damage
  2. CROPDMG, representing crop damage

Since the exponent was written in short acronym, I firstly converted the value of the 2 variables PROPDMGEXP and CROPDMGEXP. After that, I used the same strategy as question 1. Group the EVTYPE, calculate the total, arrange in descending and finaaly visualize.

#Convert exponent values
convert_exp <- function(exp) {
  ifelse(exp == "K", 1e3,
         ifelse(exp == "M", 1e6,
                ifelse(exp == "B", 1e9,
                       ifelse(exp == "H", 1e2, 1))))
}

#Calculate total economic damage
economic_data <- data %>%
  mutate(
    prop_multiplier = convert_exp(PROPDMGEXP),
    crop_multiplier = convert_exp(CROPDMGEXP),
    
    property_damage = PROPDMG * prop_multiplier,
    crop_damage = CROPDMG * crop_multiplier,
    
    total_damage = property_damage + crop_damage
  ) %>%
  group_by(EVTYPE) %>%
  summarise(
    total_economic_damage = sum(total_damage, na.rm = TRUE)
  ) %>%
  arrange(desc(total_economic_damage))


#Return top economic damage events
head(economic_data, 10)
## # A tibble: 10 × 2
##    EVTYPE            total_economic_damage
##    <chr>                             <dbl>
##  1 FLOOD                     150319678257 
##  2 HURRICANE/TYPHOON          71913712800 
##  3 TORNADO                    57340614060.
##  4 STORM SURGE                43323541000 
##  5 HAIL                       18752905438.
##  6 FLASH FLOOD                17562129167.
##  7 DROUGHT                    15018672000 
##  8 HURRICANE                  14610229010 
##  9 RIVER FLOOD                10148404500 
## 10 ICE STORM                   8967041360

Results

In this part I presented the graph for both questions using ggplot.

Question 1:

#Visualize top harmful events
top_health <- health_data %>%
  slice_max(order_by = total_harm, n = 10)

ggplot(top_health,
       aes(x = reorder(EVTYPE, total_harm),
           y = total_harm)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(
    title = "Top 10 Most Harmful Weather Events",
    x = "Event Type",
    y = "Total Fatalities and Injuries"
  )

Question 2:

#Visualize top economic damage events
top_economic <- economic_data %>%
  slice_max(order_by = total_economic_damage, n = 10)

ggplot(top_economic,
       aes(x = reorder(EVTYPE, total_economic_damage),
           y = total_economic_damage)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  labs(
    title = "Top 10 Weather Events by Economic Damage",
    x = "Event Type",
    y = "Economic Damage (USD)"
  )

Conclusion

#Most harmful event
head(health_data, 1)
## # A tibble: 1 × 4
##   EVTYPE  fatalities injuries total_harm
##   <chr>        <dbl>    <dbl>      <dbl>
## 1 TORNADO       5633    91346      96979
#Most economic damage event
head(economic_data, 1)
## # A tibble: 1 × 2
##   EVTYPE total_economic_damage
##   <chr>                  <dbl>
## 1 FLOOD           150319678257

Tornadoes were the most harmful weather events with respect to population health in the United States, causing the highest combined number of fatalities and injuries. Floods caused the greatest economic consequences across the United States, resulting in the highest total property and crop damage.