suppressMessages(library(dplyr))
suppressMessages(library(tidyr))
suppressMessages(library(lubridate))
suppressMessages(library(ggplot2))
suppressMessages(library(gt))
storm_original <- read.csv("repdata_data_StormData.csv", header=TRUE)
agg_fatalities <- storm_original %>%
group_by(EVTYPE) %>%
summarize(number_fatalities = sum(FATALITIES))
agg_injuries <- storm_original %>%
group_by(EVTYPE) %>%
summarize(number_injuries = sum(INJURIES))
agg_propdmg <- storm_original %>%
group_by(EVTYPE) %>%
summarize(propdmg = sum(PROPDMG))
agg_total <- left_join(agg_fatalities, agg_injuries, by="EVTYPE")
agg_total <- left_join(agg_total, agg_propdmg, by="EVTYPE")
Accoring to the plot of the correlation of each items, a small number of event types caused massive damage in terms of health and economics. The government needs to prepare the most disastrous event. So I abstruct the top 20 of event types in terms of fatalities, injuries and property damages.
summary(agg_total)
## EVTYPE number_fatalities number_injuries propdmg
## Length:985 Min. : 0.00 Min. : 0.0 Min. : 0
## Class :character 1st Qu.: 0.00 1st Qu.: 0.0 1st Qu.: 0
## Mode :character Median : 0.00 Median : 0.0 Median : 0
## Mean : 15.38 Mean : 142.7 Mean : 11050
## 3rd Qu.: 0.00 3rd Qu.: 0.0 3rd Qu.: 35
## Max. :5633.00 Max. :91346.0 Max. :3212258
plot(agg_total)
top20_fatalities <- agg_total %>%
arrange(desc(number_fatalities)) %>%
head(20)
top20_injuries <- agg_total %>%
arrange(desc(number_injuries)) %>%
head(20)
top20_propdmg <- agg_total %>%
arrange(desc(propdmg)) %>%
head(20)
count_event <- storm_original %>%
group_by(EVTYPE) %>%
summarize(count=n())
Showing the table which present the top 20 event type in terms of fatalities and injuries. This table show 28 events becase some of event does not contain any number. In addition, the table includes the number of event that occured. According to this table, tornado, excessive heat, flash flooding, heat, lighting, tstm wind and flood are the most harmful with respect to population health.
top20_fatalities_select <- top20_fatalities %>%
select(EVTYPE, number_fatalities)
top20_injuries_select <- top20_injuries %>%
select(EVTYPE, number_injuries)
agg_health <- full_join(top20_fatalities_select, top20_injuries_select, by="EVTYPE")
agg_health <- left_join(agg_health, count_event, by="EVTYPE")
gt(agg_health)
| EVTYPE | number_fatalities | number_injuries | count |
|---|---|---|---|
| TORNADO | 5633 | 91346 | 60652 |
| EXCESSIVE HEAT | 1903 | 6525 | 1678 |
| FLASH FLOOD | 978 | 1777 | 54277 |
| HEAT | 937 | 2100 | 767 |
| LIGHTNING | 816 | 5230 | 15754 |
| TSTM WIND | 504 | 6957 | 219940 |
| FLOOD | 470 | 6789 | 25326 |
| RIP CURRENT | 368 | NA | 470 |
| HIGH WIND | 248 | 1137 | 20212 |
| AVALANCHE | 224 | NA | 386 |
| WINTER STORM | 206 | 1321 | 11433 |
| RIP CURRENTS | 204 | NA | 304 |
| HEAT WAVE | 172 | NA | 74 |
| EXTREME COLD | 160 | NA | 655 |
| THUNDERSTORM WIND | 133 | 1488 | 82563 |
| HEAVY SNOW | 127 | 1021 | 15708 |
| EXTREME COLD/WIND CHILL | 125 | NA | 1002 |
| STRONG WIND | 103 | NA | 3566 |
| BLIZZARD | 101 | 805 | 2719 |
| HIGH SURF | 101 | NA | 725 |
| ICE STORM | NA | 1975 | 2006 |
| HAIL | NA | 1361 | 288661 |
| HURRICANE/TYPHOON | NA | 1275 | 88 |
| WILDFIRE | NA | 911 | 2761 |
| THUNDERSTORM WINDS | NA | 908 | 20843 |
| FOG | NA | 734 | 538 |
| WILD/FOREST FIRE | NA | 545 | 1457 |
| DUST STORM | NA | 440 | 427 |
Shoing the plot indicating which type are most harmful and frequent in terms of health. X axis indicate the logarizm of the number of fatalities. Y axis indicates the type of events. The size of point indicates how frequent the event occurs.
agg_health %>%
drop_na(number_fatalities) %>%
ggplot(aes(x=log(number_fatalities), y=EVTYPE, size=count, color=count)) +
geom_point() +
theme_bw()
Showing the table which present the top 20 event type in terms of property damages. According to this table, tornado, flash flood, tstm wind, flood, thunderstorm wind and hail are the most harmful with respect to economic consequences.
top20_propdmg_select <- top20_propdmg %>%
select(EVTYPE, propdmg) %>%
arrange(desc(propdmg))
agg_propdmg <- left_join(top20_propdmg_select, count_event, by="EVTYPE")
gt(agg_propdmg)
| EVTYPE | propdmg | count |
|---|---|---|
| TORNADO | 3212258.16 | 60652 |
| FLASH FLOOD | 1420124.59 | 54277 |
| TSTM WIND | 1335965.61 | 219940 |
| FLOOD | 899938.48 | 25326 |
| THUNDERSTORM WIND | 876844.17 | 82563 |
| HAIL | 688693.38 | 288661 |
| LIGHTNING | 603351.78 | 15754 |
| THUNDERSTORM WINDS | 446293.18 | 20843 |
| HIGH WIND | 324731.56 | 20212 |
| WINTER STORM | 132720.59 | 11433 |
| HEAVY SNOW | 122251.99 | 15708 |
| WILDFIRE | 84459.34 | 2761 |
| ICE STORM | 66000.67 | 2006 |
| STRONG WIND | 62993.81 | 3566 |
| HIGH WINDS | 55625.00 | 1533 |
| HEAVY RAIN | 50842.14 | 11723 |
| TROPICAL STORM | 48423.68 | 690 |
| WILD/FOREST FIRE | 39344.95 | 1457 |
| FLASH FLOODING | 28497.15 | 682 |
| URBAN/SML STREAM FLD | 26051.94 | 3392 |
Shoing the plot indicating which type are most harmful and frequent in terms of economic consequence. X axis indicate the property damage. Y axis indicates the type of events. The size of point indicates how frequent the event occurs.
agg_propdmg %>%
drop_na(propdmg) %>%
ggplot(aes(x=propdmg, y=EVTYPE, size=count, color=count)) +
geom_point() +
theme_bw()
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.