Using tidyvrse packages data is processed. After processing the data ggplot2 package is used to present the data. To determine the damage done to humans fatalities and injuries are aggregated by weather event type to determine the most dangerous weather condition. To determine damage caused to property PROPDMG and PROPDMGEXP is used. To have a smooth scale log 10 is used on the values. The most damaging to population health is Tornado and the most damaging to the property is Floods.
Library and data is loaded.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.1 ✔ readr 2.2.0
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.3 ✔ tibble 3.3.1
## ✔ lubridate 1.9.5 ✔ tidyr 1.3.2
## ✔ purrr 1.2.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(knitr)
data <- read_csv("StormData.csv.bz2", locale = locale(encoding = "latin1"),show_col_types = FALSE)
deaths <- data %>% group_by(by = EVTYPE) %>% summarise(fatalities = sum(FATALITIES), injuries = sum(INJURIES, na.rm = TRUE))
ind <- order(deaths$fatalities, deaths$injuries, decreasing = TRUE)
deaths <- deaths[ind,]
deaths <- deaths[1:10,]
A new data frame is created which includes only Event type, injuries and fatalities. They are grouped by event type. The injuries and fatalities are sumed based on the event type. They are ordered in descending order by fatalities first and ties are broken by injuries. Only the top 10 most dangerous climate condition.
damage <- select(data, EVTYPE, starts_with("PROP") | starts_with("CROP"))
damage <- drop_na(damage)
damage <- mutate(damage, PROPDMG = case_when(PROPDMGEXP == "K" | PROPDMGEXP == "k" ~PROPDMG*1000,
PROPDMGEXP == "H" | PROPDMGEXP == "h" ~PROPDMG * 100,
PROPDMGEXP == "M" | PROPDMGEXP == "m" ~PROPDMG * 10^6,
PROPDMGEXP == "B" | PROPDMGEXP == "b" ~PROPDMG * 10^9),
CROPDMG = case_when(
CROPDMGEXP == "K" | CROPDMGEXP == "k" ~CROPDMG * 1000,
CROPDMGEXP == "H" | CROPDMGEXP == "h" ~CROPDMG * 100,
CROPDMGEXP == "M" | CROPDMGEXP == "m" ~CROPDMG * 10^6,
CROPDMGEXP == "B" | CROPDMGEXP == "b" ~CROPDMG * 10^9)
)
damage <- damage %>% group_by(EVTYPE = EVTYPE) %>% summarise(DMG = sum(PROPDMG) + sum(CROPDMG))
damage <- damage[order(damage$DMG, decreasing = TRUE),]
damage <- damage[1:10,]
damage <- mutate(damage, DMG = DMG/10^9)
Another data frame is created from the data Datframe which contains Event type, Property damage. To calculate the property damage first the strings are converted in to numerical and then the PROPDMG and CROPDMG values are multiplied by it. Only top 10 damaging event type are selected. Then they are converted into Billions dollar for clean presentation.
plots <- ggplot(deaths, aes(y = fct_inorder(by), x= injuries))
plots + geom_bar(stat = "identity") + scale_x_log10(labels = scales::comma) + ylab(label = "")
plots <- ggplot(deaths, aes(y = fct_inorder(by) , x= fatalities))
plots + geom_bar(stat = "identity") + scale_x_log10(labels = scales::comma) + ylab(label = "")
As can be seen in the above graphs Tornado causes the most fatalities and injuries. There is significant gap between the damage caused by Tornado and other event types. Log 10 is used on the injuries and fatalities to have a smooth scaling between event types. Folloing is table which present the fatalities and injuries in descending order.
kable(deaths, format = "html", caption = "Weather Casualties")
| by | fatalities | injuries |
|---|---|---|
| TORNADO | 5633 | 91346 |
| EXCESSIVE HEAT | 1903 | 6525 |
| FLASH FLOOD | 978 | 1777 |
| HEAT | 937 | 2100 |
| LIGHTNING | 816 | 5230 |
| TSTM WIND | 504 | 6957 |
| FLOOD | 470 | 6789 |
| RIP CURRENT | 368 | 232 |
| HIGH WIND | 248 | 1137 |
| AVALANCHE | 224 | 170 |
plots <- ggplot(damage, aes(y = fct_inorder(EVTYPE), x= DMG))
plots + geom_bar(stat = "identity") + scale_x_log10(labels = scales::comma) + labs(x = "In Billions $", y = "")
The most damage caused to property is Flood and it causes the most damage to property.
kable(damage, format = "html", caption = "Weather Damages in Billions")
| EVTYPE | DMG |
|---|---|
| FLOOD | 138.007444 |
| HURRICANE/TYPHOON | 29.348168 |
| HURRICANE | 12.405268 |
| RIVER FLOOD | 10.108369 |
| STORM SURGE/TIDE | 4.641493 |
| THUNDERSTORM WIND | 3.813648 |
| WILDFIRE | 3.684468 |
| HIGH WIND | 3.057667 |
| HURRICANE OPAL | 2.187000 |
| DROUGHT | 1.886417 |