SYNOPSIS

This study is being conducted to answer two questions. 1. Across the United States, which types of events are most harmful with respect to population health? 2. Across the United States, which types of events have the greatest economic consequences? We shall use the stacked bar chart to find out from the different events which has the highest injuries and fatalities. Our findings show that Tornado has the highest sum of fatalities and injuries. Also Floods cause the most damage to property and crops.

DATA PROCESSING

The packages that were used for the data analysis and results.

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.1
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.2     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.0.2     v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.1.1
## Warning: package 'tidyr' was built under R version 4.1.1
## Warning: package 'readr' was built under R version 4.1.1
## Warning: package 'purrr' was built under R version 4.1.1
## Warning: package 'dplyr' was built under R version 4.1.1
## Warning: package 'forcats' was built under R version 4.1.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Loading the data set provided by the National Climatic Data Center Storm Events.

storm <- read.csv("./data/repdata_data_StormData.csv")
storm <- tibble::as_tibble(storm)

Look at the first 6 rows of the data set.

head(storm)
## # A tibble: 6 x 37
##   STATE__ BGN_DATE   BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE BGN_RANGE
##     <dbl> <chr>      <chr>    <chr>      <dbl> <chr>      <chr> <chr>      <dbl>
## 1       1 4/18/1950~ 0130     CST           97 MOBILE     AL    TORNA~         0
## 2       1 4/18/1950~ 0145     CST            3 BALDWIN    AL    TORNA~         0
## 3       1 2/20/1951~ 1600     CST           57 FAYETTE    AL    TORNA~         0
## 4       1 6/8/1951 ~ 0900     CST           89 MADISON    AL    TORNA~         0
## 5       1 11/15/195~ 1500     CST           43 CULLMAN    AL    TORNA~         0
## 6       1 11/15/195~ 2000     CST           77 LAUDERDALE AL    TORNA~         0
## # ... with 28 more variables: BGN_AZI <chr>, BGN_LOCATI <chr>, END_DATE <chr>,
## #   END_TIME <chr>, COUNTY_END <dbl>, COUNTYENDN <lgl>, END_RANGE <dbl>,
## #   END_AZI <chr>, END_LOCATI <chr>, LENGTH <dbl>, WIDTH <dbl>, F <int>,
## #   MAG <dbl>, FATALITIES <dbl>, INJURIES <dbl>, PROPDMG <dbl>,
## #   PROPDMGEXP <chr>, CROPDMG <dbl>, CROPDMGEXP <chr>, WFO <chr>,
## #   STATEOFFIC <chr>, ZONENAMES <chr>, LATITUDE <dbl>, LONGITUDE <dbl>,
## #   LATITUDE_E <dbl>, LONGITUDE_ <dbl>, REMARKS <chr>, REFNUM <dbl>

Select the columns we will be interested in for the data analysis.

storm_data <- storm %>% select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
storm_data$EVTYPE <-as.factor(storm_data$EVTYPE)
head(storm_data)
## # A tibble: 6 x 7
##   EVTYPE  FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
##   <fct>        <dbl>    <dbl>   <dbl> <chr>        <dbl> <chr>     
## 1 TORNADO          0       15    25   K                0 ""        
## 2 TORNADO          0        0     2.5 K                0 ""        
## 3 TORNADO          0        2    25   K                0 ""        
## 4 TORNADO          0        2     2.5 K                0 ""        
## 5 TORNADO          0        2     2.5 K                0 ""        
## 6 TORNADO          0        6     2.5 K                0 ""

Find the sum of injuries and fatalities per event and arrange them in descending order.

pop_health <- storm_data %>% group_by(EVTYPE) %>% summarise (Total_Fatalities = sum(FATALITIES), Total_Injuries = sum(INJURIES)) %>% arrange(desc(Total_Fatalities, Total_Injuries)) %>% slice(1:10)

top_5 <- pop_health[1:5,]

head(pop_health)
## # A tibble: 6 x 3
##   EVTYPE         Total_Fatalities Total_Injuries
##   <fct>                     <dbl>          <dbl>
## 1 TORNADO                    5633          91346
## 2 EXCESSIVE HEAT             1903           6525
## 3 FLASH FLOOD                 978           1777
## 4 HEAT                        937           2100
## 5 LIGHTNING                   816           5230
## 6 TSTM WIND                   504           6957

Getting the damage amounts and sum of property damage and crop damage.

storm_data$CROPDMGEXP <- recode(storm_data$CROPDMGEXP, "K" = 1000, "M" = 1000000, "B" = 1000000000, .default = 1)
storm_data$PROPDMGEXP <- recode(storm_data$PROPDMGEXP, "K" = 1000, "M" = 1000000, "B" = 1000000000, .default = 1)
damages <- storm_data %>% mutate(Prop_dmg = PROPDMG * PROPDMGEXP, Crop_dmg = CROPDMG * CROPDMGEXP)
damages <- damages %>% group_by(EVTYPE) %>% summarize(Total_propDmg = sum(Prop_dmg), Total_cropDmg = sum(Crop_dmg)) %>% arrange(desc(Total_propDmg,Total_cropDmg))
Top5_Dmg <- damages[1:5,]
head(damages)
## # A tibble: 6 x 3
##   EVTYPE            Total_propDmg Total_cropDmg
##   <fct>                     <dbl>         <dbl>
## 1 FLOOD             144657709807     5661968450
## 2 HURRICANE/TYPHOON  69305840000     2607872800
## 3 TORNADO            56925660790.     414953270
## 4 STORM SURGE        43323536000           5000
## 5 FLASH FLOOD        16140812067.    1421317100
## 6 HAIL               15727367053.    3025537890

RESULTS

A stacked bar chart representing the Top 5 most hazardous events.

plot_health <- gather(top_5, TYPE, VALUE, Total_Fatalities:Total_Injuries)
ggplot(plot_health, aes(x = reorder(EVTYPE, -VALUE), y = VALUE, fill = TYPE)) + geom_bar(stat = "identity") + labs(title = "Top 5 Most Harmful Event Hazards", x = "Weather Type", y = "Sum of Fatalities and Injuries")

From the observation above, we clearly see that the Tornado is the most harmful in respect to population health.

A stacked bar chart representing the Top 5 most economically challenging events.

plot_dmg <- gather(Top5_Dmg, TYPE, VALUE, Total_propDmg:Total_cropDmg)
ggplot(plot_dmg, aes(x = reorder(EVTYPE, -VALUE), y = VALUE, fill = TYPE)) + geom_bar(stat = "identity") + labs(title = "Top 5 Devastating Economic Event", x = "Event Type", y = "Sum in Dollars") + theme(axis.text  = element_text(size = 8))

From the above, we see that Floods cause the most damage.