Synopsis

Using tidyvrse packages data is processed. After processing the data ggplot2 package is used to present the data. To determine the damage done to humans fatalities and injuries are aggregated by weather event type to determine the most dangerous weather condition. To determine damage caused to property PROPDMG and PROPDMGEXP is used. To have a smooth scale log 10 is used on the values. The most damaging to population health is Tornado and the most damaging to the property is Floods.

Data processing

Library and data is loaded.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.1     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.3     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(knitr)
data <- read_csv("StormData.csv.bz2", locale = locale(encoding = "latin1"),show_col_types = FALSE)
deaths <- data %>% group_by(by = EVTYPE) %>% summarise(fatalities = sum(FATALITIES), injuries = sum(INJURIES, na.rm = TRUE))
ind <- order(deaths$fatalities, deaths$injuries, decreasing = TRUE)
deaths <- deaths[ind,]
deaths <- deaths[1:10,]

A new data frame is created which includes only Event type, injuries and fatalities. They are grouped by event type. The injuries and fatalities are sumed based on the event type. They are ordered in descending order by fatalities first and ties are broken by injuries. Only the top 10 most dangerous climate condition.

damage <- select(data, EVTYPE, starts_with("PROP") | starts_with("CROP"))
damage <- drop_na(damage)
damage <- mutate(damage, PROPDMG = case_when(PROPDMGEXP == "K" | PROPDMGEXP == "k"   ~PROPDMG*1000,
                        PROPDMGEXP == "H" | PROPDMGEXP == "h" ~PROPDMG * 100,
                        PROPDMGEXP == "M" | PROPDMGEXP == "m" ~PROPDMG * 10^6,
                        PROPDMGEXP == "B" | PROPDMGEXP == "b" ~PROPDMG * 10^9),
                 CROPDMG = case_when(
                        CROPDMGEXP == "K" | CROPDMGEXP == "k" ~CROPDMG * 1000,
                        CROPDMGEXP == "H" | CROPDMGEXP == "h" ~CROPDMG * 100,
                        CROPDMGEXP == "M" | CROPDMGEXP == "m" ~CROPDMG * 10^6,
                        CROPDMGEXP == "B" | CROPDMGEXP == "b" ~CROPDMG * 10^9)
                 )
damage <- damage %>% group_by(EVTYPE = EVTYPE) %>% summarise(DMG = sum(PROPDMG) + sum(CROPDMG))
damage <- damage[order(damage$DMG, decreasing = TRUE),]
damage <- damage[1:10,]
damage <- mutate(damage, DMG = DMG/10^9)

Another data frame is created from the data Datframe which contains Event type, Property damage. To calculate the property damage first the strings are converted in to numerical and then the PROPDMG and CROPDMG values are multiplied by it. Only top 10 damaging event type are selected. Then they are converted into Billions dollar for clean presentation.

Results

plots <- ggplot(deaths, aes(y = fct_inorder(by), x= injuries))
plots + geom_bar(stat = "identity") + scale_x_log10(labels = scales::comma) + ylab(label = "")

plots <- ggplot(deaths, aes(y = fct_inorder(by) , x= fatalities))
plots + geom_bar(stat = "identity") + scale_x_log10(labels = scales::comma) + ylab(label = "")

As can be seen in the above graphs Tornado causes the most fatalities and injuries. There is significant gap between the damage caused by Tornado and other event types. Log 10 is used on the injuries and fatalities to have a smooth scaling between event types. Folloing is table which present the fatalities and injuries in descending order.

kable(deaths, format = "html", caption = "Weather Casualties")
Weather Casualties
by fatalities injuries
TORNADO 5633 91346
EXCESSIVE HEAT 1903 6525
FLASH FLOOD 978 1777
HEAT 937 2100
LIGHTNING 816 5230
TSTM WIND 504 6957
FLOOD 470 6789
RIP CURRENT 368 232
HIGH WIND 248 1137
AVALANCHE 224 170
plots <- ggplot(damage, aes(y = fct_inorder(EVTYPE), x= DMG))
plots + geom_bar(stat = "identity") + scale_x_log10(labels = scales::comma) + labs(x = "In Billions $", y = "")

The most damage caused to property is Flood and it causes the most damage to property.

kable(damage, format = "html", caption = "Weather Damages in Billions")
Weather Damages in Billions
EVTYPE DMG
FLOOD 138.007444
HURRICANE/TYPHOON 29.348168
HURRICANE 12.405268
RIVER FLOOD 10.108369
STORM SURGE/TIDE 4.641493
THUNDERSTORM WIND 3.813648
WILDFIRE 3.684468
HIGH WIND 3.057667
HURRICANE OPAL 2.187000
DROUGHT 1.886417