synopsis: “In this documet we’ll be looking into following points:” - We’ll download and open National Weather Service Storm Dataset in Rstudio - We’ll be subsetting the data for fatalitites and injuries and see which event paused more danger to humans. - In the second part we’ll be looking into events again but from the perspective of economic cost. - There are many events but we’ll be listing only the top 20 events, just for the sake of clarity in plots.

DATA PROCESSING:

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
download.file(url = 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2', destfile = 'StormData.csv.bz2',method = 'wget')

storm_data <- read.csv(bzfile('StormData.csv.bz2'))
# Select 3 Columns for processing
fi_data <- select(storm_data,EVTYPE,FATALITIES,INJURIES)

#Use group_by and summarise commands from dplyr to get a sum of fatalities and injuries of each event type 
fi_data_by_type <- group_by(fi_data,EVTYPE)
sum_fi <- summarise(fi_data_by_type,FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES)) 
#  Which 20 events do cause fatalities the most?  
fatality_sorted <- arrange(sum_fi, desc(FATALITIES))
fatality_sorted_top20 <- head(fatality_sorted, 20)

# Just to have a better sorted plot set the factor levels of event types
f_level <- fatality_sorted_top20$EVTYPE
fatality_sorted_top20$EVTYPE <- factor(fatality_sorted_top20$EVTYPE, levels = f_level)

# Below is the list of top 20 events caused fatality
fatality_sorted_top20[c(1,2)]
## Source: local data frame [20 x 2]
## 
##                     EVTYPE FATALITIES
## 1                  TORNADO       5633
## 2           EXCESSIVE HEAT       1903
## 3              FLASH FLOOD        978
## 4                     HEAT        937
## 5                LIGHTNING        816
## 6                TSTM WIND        504
## 7                    FLOOD        470
## 8              RIP CURRENT        368
## 9                HIGH WIND        248
## 10               AVALANCHE        224
## 11            WINTER STORM        206
## 12            RIP CURRENTS        204
## 13               HEAT WAVE        172
## 14            EXTREME COLD        160
## 15       THUNDERSTORM WIND        133
## 16              HEAVY SNOW        127
## 17 EXTREME COLD/WIND CHILL        125
## 18             STRONG WIND        103
## 19                BLIZZARD        101
## 20               HIGH SURF        101
# Plot the top 20 events that caused the most fatalities
ggplot(fatality_sorted_top20, aes(EVTYPE, FATALITIES)) + geom_bar(colour="black", stat="identity", fill = 'blue') + 
  theme(axis.text.x = element_text(angle = 40, hjust = 1)) +
  xlab('Top 20 Event Types caused deaths') +
  ylab('Number of peopled died')

#  Which 20 events do cause injuries the most?  
inj_sorted <- arrange(sum_fi, desc(INJURIES))
inj_sorted_top20 <- head(inj_sorted, 20)

# Just to have a better sorted plot set the factor levels of event types
i_level <- inj_sorted_top20$EVTYPE
inj_sorted_top20$EVTYPE <- factor(inj_sorted_top20$EVTYPE, levels = i_level)

# Below is the list of top 20 events caused injury
inj_sorted_top20[c(1,3)]
## Source: local data frame [20 x 2]
## 
##                EVTYPE INJURIES
## 1             TORNADO    91346
## 2           TSTM WIND     6957
## 3               FLOOD     6789
## 4      EXCESSIVE HEAT     6525
## 5           LIGHTNING     5230
## 6                HEAT     2100
## 7           ICE STORM     1975
## 8         FLASH FLOOD     1777
## 9   THUNDERSTORM WIND     1488
## 10               HAIL     1361
## 11       WINTER STORM     1321
## 12  HURRICANE/TYPHOON     1275
## 13          HIGH WIND     1137
## 14         HEAVY SNOW     1021
## 15           WILDFIRE      911
## 16 THUNDERSTORM WINDS      908
## 17           BLIZZARD      805
## 18                FOG      734
## 19   WILD/FOREST FIRE      545
## 20         DUST STORM      440
# Plot the top 20 events that caused the most injuries
ggplot(inj_sorted_top20, aes(EVTYPE, INJURIES)) + geom_bar(colour="black", stat="identity", fill = 'red') + 
  theme(axis.text.x = element_text(angle = 40, hjust = 1)) +
  xlab('Top 20 Event Types caused injuries') +
  ylab('Number of peopled injured')

# First Clean the dataset for only property damage
property_damage <-subset(storm_data, storm_data$PROPDMGEXP %in% c('B','M','m','K'))

#We only need Event Type, Property Damage Value, filter the list
property_damage <- select(property_damage,c(EVTYPE,PROPDMG,PROPDMGEXP))

#PROPDMGEXP is kept as Factor, convert that into characters
property_damage$PROPDMGEXP  <- as.character(property_damage$PROPDMGEXP)

#Using PROPDMGEXP and PROPDMG add andother field that calculates the real amount of damage
property_damage <- mutate(property_damage, DAMAGECOST = ifelse(PROPDMGEXP == 'K' , PROPDMG*1000,
                                       ifelse(PROPDMGEXP %in% c('M','m') , PROPDMG*1000000,
                                              ifelse(PROPDMGEXP == 'B' , PROPDMG*1000000000,0))))
# Group the events by their type and add the costs
p_damage_by_event <- group_by(property_damage,EVTYPE)
p_damage_by_event_sum <- summarise(p_damage_by_event, P_DAMAGESUM =sum(DAMAGECOST))

# Top 20 events cause property Damage
#p_sorted <- head(arrange(p_damage_by_event_sum, desc(P_DAMAGESUM)), 20)
# First Clean the dataset for only crop damage
crop_damage <-subset(storm_data, storm_data$CROPDMGEXP %in% c('B','M','m','K','k'))

#We only need Event Type, Crop Damage Value. filter the list
crop_damage <- select(crop_damage,c(EVTYPE,CROPDMG,CROPDMGEXP))

#CROPDMGEXP is kept as Factor, convert that into characters
crop_damage$CROPDMGEXP  <- as.character(crop_damage$CROPDMGEXP)

#Using CROPDMGEXP and CROPDMG add and other field that calculates the real amount of damage
crop_damage <- mutate(crop_damage, DAMAGECOST = ifelse(CROPDMGEXP %in% c('K','k') , CROPDMG*1000,
                                                       ifelse(CROPDMGEXP %in% c('M','m') , CROPDMG*1000000,
                                                              ifelse(CROPDMGEXP == 'B' , CROPDMG*1000000000,0))))
# Group the events by their type and add the costs
c_damage_by_event <- group_by(crop_damage,EVTYPE)
c_damage_by_event_sum <- summarise(c_damage_by_event, C_DAMAGESUM =sum(DAMAGECOST))

# Top 20 events cause Crop Damage
#c_sorted <- head(arrange(c_damage_by_event_sum, desc(C_DAMAGESUM)), 20)
# Lets merge damage types (There are some NAs)
merge_damage <- merge(p_damage_by_event_sum,c_damage_by_event_sum,by = 'EVTYPE', all = TRUE)

# Set the N/As as 0
merge_damage$C_DAMAGESUM[is.na(merge_damage$C_DAMAGESUM)] <- 0
merge_damage$P_DAMAGESUM[is.na(merge_damage$P_DAMAGESUM)] <- 0
merge_damage <- mutate(merge_damage,TOTAL_DAMAGE = C_DAMAGESUM + P_DAMAGESUM)
merge_damage <- arrange(merge_damage, desc(TOTAL_DAMAGE))

# See the top 20 events that cause damage
head(merge_damage,20)
##                       EVTYPE  P_DAMAGESUM C_DAMAGESUM TOTAL_DAMAGE
## 1                      FLOOD 144657709800  5661968450 150319678250
## 2          HURRICANE/TYPHOON  69305840000  2607872800  71913712800
## 3                    TORNADO  56937160480   414953110  57352113590
## 4                STORM SURGE  43323536000        5000  43323541000
## 5                       HAIL  15732266720  3025954450  18758221170
## 6                FLASH FLOOD  16140811510  1421317100  17562128610
## 7                    DROUGHT   1046106000 13972566000  15018672000
## 8                  HURRICANE  11868319010  2741910000  14610229010
## 9                RIVER FLOOD   5118945500  5029459000  10148404500
## 10                 ICE STORM   3944927810  5022113500   8967041310
## 11            TROPICAL STORM   7703890550   678346000   8382236550
## 12              WINTER STORM   6688497250    26944000   6715441250
## 13                 HIGH WIND   5270046260   638571300   5908617560
## 14                  WILDFIRE   4765114000   295472800   5060586800
## 15                 TSTM WIND   4484928440   554007350   5038935790
## 16          STORM SURGE/TIDE   4641188000      850000   4642038000
## 17         THUNDERSTORM WIND   3483121140   414843050   3897964190
## 18            HURRICANE OPAL   3172846000    19000000   3191846000
## 19          WILD/FOREST FIRE   3001829500   106796830   3108626330
## 20 HEAVY RAIN/SEVERE WEATHER   2500000000           0   2500000000
top_20 <- head(merge_damage,20)

# Lets sort and plot the values
top <- top_20$EVTYPE
top_20$EVTYPE <- factor(top_20$EVTYPE, levels = top)

ggplot(top_20, aes(EVTYPE, TOTAL_DAMAGE)) + geom_bar(colour="black", stat="identity", fill = 'blue') + 
  theme(axis.text.x = element_text(angle = 40, hjust = 1)) +
  xlab('Different Event Types Causing Economic Damage') +
  ylab('Total Damage in USD')

RESULTS:

From the perspective of human cost, Tornados ar far more fatal than any other event type. Tornados alone caused 5633 death and 91346 injuries in the last 60 years.

From the Economic cost perspective Floods caused the most destruction, followed by Hurricane/Typhoon and Tornados. Therefore response to these events must be the top priority and awareness of the society must be kept always high.