This study is being conducted to answer two questions. 1. Across the United States, which types of events are most harmful with respect to population health? 2. Across the United States, which types of events have the greatest economic consequences? We shall use the stacked bar chart to find out from the different events which has the highest injuries and fatalities. Our findings show that Tornado has the highest sum of fatalities and injuries. Also Floods cause the most damage to property and crops.
The packages that were used for the data analysis and results.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.1
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.2 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.0.2 v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.1.1
## Warning: package 'tidyr' was built under R version 4.1.1
## Warning: package 'readr' was built under R version 4.1.1
## Warning: package 'purrr' was built under R version 4.1.1
## Warning: package 'dplyr' was built under R version 4.1.1
## Warning: package 'forcats' was built under R version 4.1.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Loading the data set provided by the National Climatic Data Center Storm Events.
storm <- read.csv("./data/repdata_data_StormData.csv")
storm <- tibble::as_tibble(storm)
Look at the first 6 rows of the data set.
head(storm)
## # A tibble: 6 x 37
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE BGN_RANGE
## <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 1 4/18/1950~ 0130 CST 97 MOBILE AL TORNA~ 0
## 2 1 4/18/1950~ 0145 CST 3 BALDWIN AL TORNA~ 0
## 3 1 2/20/1951~ 1600 CST 57 FAYETTE AL TORNA~ 0
## 4 1 6/8/1951 ~ 0900 CST 89 MADISON AL TORNA~ 0
## 5 1 11/15/195~ 1500 CST 43 CULLMAN AL TORNA~ 0
## 6 1 11/15/195~ 2000 CST 77 LAUDERDALE AL TORNA~ 0
## # ... with 28 more variables: BGN_AZI <chr>, BGN_LOCATI <chr>, END_DATE <chr>,
## # END_TIME <chr>, COUNTY_END <dbl>, COUNTYENDN <lgl>, END_RANGE <dbl>,
## # END_AZI <chr>, END_LOCATI <chr>, LENGTH <dbl>, WIDTH <dbl>, F <int>,
## # MAG <dbl>, FATALITIES <dbl>, INJURIES <dbl>, PROPDMG <dbl>,
## # PROPDMGEXP <chr>, CROPDMG <dbl>, CROPDMGEXP <chr>, WFO <chr>,
## # STATEOFFIC <chr>, ZONENAMES <chr>, LATITUDE <dbl>, LONGITUDE <dbl>,
## # LATITUDE_E <dbl>, LONGITUDE_ <dbl>, REMARKS <chr>, REFNUM <dbl>
Select the columns we will be interested in for the data analysis.
storm_data <- storm %>% select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
storm_data$EVTYPE <-as.factor(storm_data$EVTYPE)
head(storm_data)
## # A tibble: 6 x 7
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## <fct> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
## 1 TORNADO 0 15 25 K 0 ""
## 2 TORNADO 0 0 2.5 K 0 ""
## 3 TORNADO 0 2 25 K 0 ""
## 4 TORNADO 0 2 2.5 K 0 ""
## 5 TORNADO 0 2 2.5 K 0 ""
## 6 TORNADO 0 6 2.5 K 0 ""
Find the sum of injuries and fatalities per event and arrange them in descending order.
pop_health <- storm_data %>% group_by(EVTYPE) %>% summarise (Total_Fatalities = sum(FATALITIES), Total_Injuries = sum(INJURIES)) %>% arrange(desc(Total_Fatalities, Total_Injuries)) %>% slice(1:10)
top_5 <- pop_health[1:5,]
head(pop_health)
## # A tibble: 6 x 3
## EVTYPE Total_Fatalities Total_Injuries
## <fct> <dbl> <dbl>
## 1 TORNADO 5633 91346
## 2 EXCESSIVE HEAT 1903 6525
## 3 FLASH FLOOD 978 1777
## 4 HEAT 937 2100
## 5 LIGHTNING 816 5230
## 6 TSTM WIND 504 6957
Getting the damage amounts and sum of property damage and crop damage.
storm_data$CROPDMGEXP <- recode(storm_data$CROPDMGEXP, "K" = 1000, "M" = 1000000, "B" = 1000000000, .default = 1)
storm_data$PROPDMGEXP <- recode(storm_data$PROPDMGEXP, "K" = 1000, "M" = 1000000, "B" = 1000000000, .default = 1)
damages <- storm_data %>% mutate(Prop_dmg = PROPDMG * PROPDMGEXP, Crop_dmg = CROPDMG * CROPDMGEXP)
damages <- damages %>% group_by(EVTYPE) %>% summarize(Total_propDmg = sum(Prop_dmg), Total_cropDmg = sum(Crop_dmg)) %>% arrange(desc(Total_propDmg,Total_cropDmg))
Top5_Dmg <- damages[1:5,]
head(damages)
## # A tibble: 6 x 3
## EVTYPE Total_propDmg Total_cropDmg
## <fct> <dbl> <dbl>
## 1 FLOOD 144657709807 5661968450
## 2 HURRICANE/TYPHOON 69305840000 2607872800
## 3 TORNADO 56925660790. 414953270
## 4 STORM SURGE 43323536000 5000
## 5 FLASH FLOOD 16140812067. 1421317100
## 6 HAIL 15727367053. 3025537890
A stacked bar chart representing the Top 5 most hazardous events.
plot_health <- gather(top_5, TYPE, VALUE, Total_Fatalities:Total_Injuries)
ggplot(plot_health, aes(x = reorder(EVTYPE, -VALUE), y = VALUE, fill = TYPE)) + geom_bar(stat = "identity") + labs(title = "Top 5 Most Harmful Event Hazards", x = "Weather Type", y = "Sum of Fatalities and Injuries")
From the observation above, we clearly see that the Tornado is the most harmful in respect to population health.
A stacked bar chart representing the Top 5 most economically challenging events.
plot_dmg <- gather(Top5_Dmg, TYPE, VALUE, Total_propDmg:Total_cropDmg)
ggplot(plot_dmg, aes(x = reorder(EVTYPE, -VALUE), y = VALUE, fill = TYPE)) + geom_bar(stat = "identity") + labs(title = "Top 5 Devastating Economic Event", x = "Event Type", y = "Sum in Dollars") + theme(axis.text = element_text(size = 8))
From the above, we see that Floods cause the most damage.