suppressMessages(library(dplyr))
suppressMessages(library(tidyr))
suppressMessages(library(lubridate))
suppressMessages(library(ggplot2))
suppressMessages(library(gt))

Synopsis

Data Processing

storm_original <- read.csv("repdata_data_StormData.csv", header=TRUE)

Calculating the aggregation of fatalities, injuries and propaty domages. Then, binding all aggregation into agg_total

agg_fatalities <- storm_original %>% 
  group_by(EVTYPE) %>% 
  summarize(number_fatalities = sum(FATALITIES))

agg_injuries <- storm_original %>% 
  group_by(EVTYPE) %>% 
  summarize(number_injuries = sum(INJURIES))

agg_propdmg <- storm_original %>% 
  group_by(EVTYPE) %>% 
  summarize(propdmg = sum(PROPDMG))

agg_total <- left_join(agg_fatalities, agg_injuries, by="EVTYPE")
agg_total <- left_join(agg_total, agg_propdmg, by="EVTYPE")

summary of the aggregations of fatalities, injuries and property damages. Then, plot the relation of each items.

Accoring to the plot of the correlation of each items, a small number of event types caused massive damage in terms of health and economics. The government needs to prepare the most disastrous event. So I abstruct the top 20 of event types in terms of fatalities, injuries and property damages.

summary(agg_total)
##     EVTYPE          number_fatalities number_injuries      propdmg       
##  Length:985         Min.   :   0.00   Min.   :    0.0   Min.   :      0  
##  Class :character   1st Qu.:   0.00   1st Qu.:    0.0   1st Qu.:      0  
##  Mode  :character   Median :   0.00   Median :    0.0   Median :      0  
##                     Mean   :  15.38   Mean   :  142.7   Mean   :  11050  
##                     3rd Qu.:   0.00   3rd Qu.:    0.0   3rd Qu.:     35  
##                     Max.   :5633.00   Max.   :91346.0   Max.   :3212258
plot(agg_total)

top20_fatalities <- agg_total %>% 
  arrange(desc(number_fatalities)) %>% 
  head(20)

top20_injuries <- agg_total %>% 
  arrange(desc(number_injuries)) %>% 
  head(20)

top20_propdmg <- agg_total %>% 
  arrange(desc(propdmg)) %>% 
  head(20)

Calculating the number of event occurs

count_event <- storm_original %>% 
  group_by(EVTYPE) %>% 
  summarize(count=n())

Results

which types of events are most harmful with respect to population health?

Showing the table which present the top 20 event type in terms of fatalities and injuries. This table show 28 events becase some of event does not contain any number. In addition, the table includes the number of event that occured. According to this table, tornado, excessive heat, flash flooding, heat, lighting, tstm wind and flood are the most harmful with respect to population health.

top20_fatalities_select <- top20_fatalities %>% 
  select(EVTYPE, number_fatalities)
top20_injuries_select <- top20_injuries %>% 
  select(EVTYPE, number_injuries)
agg_health <- full_join(top20_fatalities_select, top20_injuries_select, by="EVTYPE")
agg_health <- left_join(agg_health, count_event, by="EVTYPE")
gt(agg_health)
EVTYPE number_fatalities number_injuries count
TORNADO 5633 91346 60652
EXCESSIVE HEAT 1903 6525 1678
FLASH FLOOD 978 1777 54277
HEAT 937 2100 767
LIGHTNING 816 5230 15754
TSTM WIND 504 6957 219940
FLOOD 470 6789 25326
RIP CURRENT 368 NA 470
HIGH WIND 248 1137 20212
AVALANCHE 224 NA 386
WINTER STORM 206 1321 11433
RIP CURRENTS 204 NA 304
HEAT WAVE 172 NA 74
EXTREME COLD 160 NA 655
THUNDERSTORM WIND 133 1488 82563
HEAVY SNOW 127 1021 15708
EXTREME COLD/WIND CHILL 125 NA 1002
STRONG WIND 103 NA 3566
BLIZZARD 101 805 2719
HIGH SURF 101 NA 725
ICE STORM NA 1975 2006
HAIL NA 1361 288661
HURRICANE/TYPHOON NA 1275 88
WILDFIRE NA 911 2761
THUNDERSTORM WINDS NA 908 20843
FOG NA 734 538
WILD/FOREST FIRE NA 545 1457
DUST STORM NA 440 427

Shoing the plot indicating which type are most harmful and frequent in terms of health. X axis indicate the logarizm of the number of fatalities. Y axis indicates the type of events. The size of point indicates how frequent the event occurs.

agg_health %>% 
  drop_na(number_fatalities) %>% 
  ggplot(aes(x=log(number_fatalities), y=EVTYPE, size=count, color=count)) +
  geom_point() +
  theme_bw()

Across the United States, which types of events have the greatest economic consequences?

Showing the table which present the top 20 event type in terms of property damages. According to this table, tornado, flash flood, tstm wind, flood, thunderstorm wind and hail are the most harmful with respect to economic consequences.

top20_propdmg_select <- top20_propdmg %>% 
  select(EVTYPE, propdmg) %>% 
  arrange(desc(propdmg))
agg_propdmg <- left_join(top20_propdmg_select, count_event, by="EVTYPE")
gt(agg_propdmg)
EVTYPE propdmg count
TORNADO 3212258.16 60652
FLASH FLOOD 1420124.59 54277
TSTM WIND 1335965.61 219940
FLOOD 899938.48 25326
THUNDERSTORM WIND 876844.17 82563
HAIL 688693.38 288661
LIGHTNING 603351.78 15754
THUNDERSTORM WINDS 446293.18 20843
HIGH WIND 324731.56 20212
WINTER STORM 132720.59 11433
HEAVY SNOW 122251.99 15708
WILDFIRE 84459.34 2761
ICE STORM 66000.67 2006
STRONG WIND 62993.81 3566
HIGH WINDS 55625.00 1533
HEAVY RAIN 50842.14 11723
TROPICAL STORM 48423.68 690
WILD/FOREST FIRE 39344.95 1457
FLASH FLOODING 28497.15 682
URBAN/SML STREAM FLD 26051.94 3392

Shoing the plot indicating which type are most harmful and frequent in terms of economic consequence. X axis indicate the property damage. Y axis indicates the type of events. The size of point indicates how frequent the event occurs.

agg_propdmg %>% 
  drop_na(propdmg) %>% 
  ggplot(aes(x=propdmg, y=EVTYPE, size=count, color=count)) +
  geom_point() +
  theme_bw()

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.