synopsis: “In this documet we’ll be looking into following points:” - We’ll download and open National Weather Service Storm Dataset in Rstudio - We’ll be subsetting the data for fatalitites and injuries and see which event paused more danger to humans. - In the second part we’ll be looking into events again but from the perspective of economic cost. - There are many events but we’ll be listing only the top 20 events, just for the sake of clarity in plots.
DATA PROCESSING:
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
download.file(url = 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2', destfile = 'StormData.csv.bz2',method = 'wget')
storm_data <- read.csv(bzfile('StormData.csv.bz2'))
# Select 3 Columns for processing
fi_data <- select(storm_data,EVTYPE,FATALITIES,INJURIES)
#Use group_by and summarise commands from dplyr to get a sum of fatalities and injuries of each event type
fi_data_by_type <- group_by(fi_data,EVTYPE)
sum_fi <- summarise(fi_data_by_type,FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES))
# Which 20 events do cause fatalities the most?
fatality_sorted <- arrange(sum_fi, desc(FATALITIES))
fatality_sorted_top20 <- head(fatality_sorted, 20)
# Just to have a better sorted plot set the factor levels of event types
f_level <- fatality_sorted_top20$EVTYPE
fatality_sorted_top20$EVTYPE <- factor(fatality_sorted_top20$EVTYPE, levels = f_level)
# Below is the list of top 20 events caused fatality
fatality_sorted_top20[c(1,2)]
## Source: local data frame [20 x 2]
##
## EVTYPE FATALITIES
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
## 11 WINTER STORM 206
## 12 RIP CURRENTS 204
## 13 HEAT WAVE 172
## 14 EXTREME COLD 160
## 15 THUNDERSTORM WIND 133
## 16 HEAVY SNOW 127
## 17 EXTREME COLD/WIND CHILL 125
## 18 STRONG WIND 103
## 19 BLIZZARD 101
## 20 HIGH SURF 101
# Plot the top 20 events that caused the most fatalities
ggplot(fatality_sorted_top20, aes(EVTYPE, FATALITIES)) + geom_bar(colour="black", stat="identity", fill = 'blue') +
theme(axis.text.x = element_text(angle = 40, hjust = 1)) +
xlab('Top 20 Event Types caused deaths') +
ylab('Number of peopled died')
# Which 20 events do cause injuries the most?
inj_sorted <- arrange(sum_fi, desc(INJURIES))
inj_sorted_top20 <- head(inj_sorted, 20)
# Just to have a better sorted plot set the factor levels of event types
i_level <- inj_sorted_top20$EVTYPE
inj_sorted_top20$EVTYPE <- factor(inj_sorted_top20$EVTYPE, levels = i_level)
# Below is the list of top 20 events caused injury
inj_sorted_top20[c(1,3)]
## Source: local data frame [20 x 2]
##
## EVTYPE INJURIES
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
## 11 WINTER STORM 1321
## 12 HURRICANE/TYPHOON 1275
## 13 HIGH WIND 1137
## 14 HEAVY SNOW 1021
## 15 WILDFIRE 911
## 16 THUNDERSTORM WINDS 908
## 17 BLIZZARD 805
## 18 FOG 734
## 19 WILD/FOREST FIRE 545
## 20 DUST STORM 440
# Plot the top 20 events that caused the most injuries
ggplot(inj_sorted_top20, aes(EVTYPE, INJURIES)) + geom_bar(colour="black", stat="identity", fill = 'red') +
theme(axis.text.x = element_text(angle = 40, hjust = 1)) +
xlab('Top 20 Event Types caused injuries') +
ylab('Number of peopled injured')
# First Clean the dataset for only property damage
property_damage <-subset(storm_data, storm_data$PROPDMGEXP %in% c('B','M','m','K'))
#We only need Event Type, Property Damage Value, filter the list
property_damage <- select(property_damage,c(EVTYPE,PROPDMG,PROPDMGEXP))
#PROPDMGEXP is kept as Factor, convert that into characters
property_damage$PROPDMGEXP <- as.character(property_damage$PROPDMGEXP)
#Using PROPDMGEXP and PROPDMG add andother field that calculates the real amount of damage
property_damage <- mutate(property_damage, DAMAGECOST = ifelse(PROPDMGEXP == 'K' , PROPDMG*1000,
ifelse(PROPDMGEXP %in% c('M','m') , PROPDMG*1000000,
ifelse(PROPDMGEXP == 'B' , PROPDMG*1000000000,0))))
# Group the events by their type and add the costs
p_damage_by_event <- group_by(property_damage,EVTYPE)
p_damage_by_event_sum <- summarise(p_damage_by_event, P_DAMAGESUM =sum(DAMAGECOST))
# Top 20 events cause property Damage
#p_sorted <- head(arrange(p_damage_by_event_sum, desc(P_DAMAGESUM)), 20)
# First Clean the dataset for only crop damage
crop_damage <-subset(storm_data, storm_data$CROPDMGEXP %in% c('B','M','m','K','k'))
#We only need Event Type, Crop Damage Value. filter the list
crop_damage <- select(crop_damage,c(EVTYPE,CROPDMG,CROPDMGEXP))
#CROPDMGEXP is kept as Factor, convert that into characters
crop_damage$CROPDMGEXP <- as.character(crop_damage$CROPDMGEXP)
#Using CROPDMGEXP and CROPDMG add and other field that calculates the real amount of damage
crop_damage <- mutate(crop_damage, DAMAGECOST = ifelse(CROPDMGEXP %in% c('K','k') , CROPDMG*1000,
ifelse(CROPDMGEXP %in% c('M','m') , CROPDMG*1000000,
ifelse(CROPDMGEXP == 'B' , CROPDMG*1000000000,0))))
# Group the events by their type and add the costs
c_damage_by_event <- group_by(crop_damage,EVTYPE)
c_damage_by_event_sum <- summarise(c_damage_by_event, C_DAMAGESUM =sum(DAMAGECOST))
# Top 20 events cause Crop Damage
#c_sorted <- head(arrange(c_damage_by_event_sum, desc(C_DAMAGESUM)), 20)
# Lets merge damage types (There are some NAs)
merge_damage <- merge(p_damage_by_event_sum,c_damage_by_event_sum,by = 'EVTYPE', all = TRUE)
# Set the N/As as 0
merge_damage$C_DAMAGESUM[is.na(merge_damage$C_DAMAGESUM)] <- 0
merge_damage$P_DAMAGESUM[is.na(merge_damage$P_DAMAGESUM)] <- 0
merge_damage <- mutate(merge_damage,TOTAL_DAMAGE = C_DAMAGESUM + P_DAMAGESUM)
merge_damage <- arrange(merge_damage, desc(TOTAL_DAMAGE))
# See the top 20 events that cause damage
head(merge_damage,20)
## EVTYPE P_DAMAGESUM C_DAMAGESUM TOTAL_DAMAGE
## 1 FLOOD 144657709800 5661968450 150319678250
## 2 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 3 TORNADO 56937160480 414953110 57352113590
## 4 STORM SURGE 43323536000 5000 43323541000
## 5 HAIL 15732266720 3025954450 18758221170
## 6 FLASH FLOOD 16140811510 1421317100 17562128610
## 7 DROUGHT 1046106000 13972566000 15018672000
## 8 HURRICANE 11868319010 2741910000 14610229010
## 9 RIVER FLOOD 5118945500 5029459000 10148404500
## 10 ICE STORM 3944927810 5022113500 8967041310
## 11 TROPICAL STORM 7703890550 678346000 8382236550
## 12 WINTER STORM 6688497250 26944000 6715441250
## 13 HIGH WIND 5270046260 638571300 5908617560
## 14 WILDFIRE 4765114000 295472800 5060586800
## 15 TSTM WIND 4484928440 554007350 5038935790
## 16 STORM SURGE/TIDE 4641188000 850000 4642038000
## 17 THUNDERSTORM WIND 3483121140 414843050 3897964190
## 18 HURRICANE OPAL 3172846000 19000000 3191846000
## 19 WILD/FOREST FIRE 3001829500 106796830 3108626330
## 20 HEAVY RAIN/SEVERE WEATHER 2500000000 0 2500000000
top_20 <- head(merge_damage,20)
# Lets sort and plot the values
top <- top_20$EVTYPE
top_20$EVTYPE <- factor(top_20$EVTYPE, levels = top)
ggplot(top_20, aes(EVTYPE, TOTAL_DAMAGE)) + geom_bar(colour="black", stat="identity", fill = 'blue') +
theme(axis.text.x = element_text(angle = 40, hjust = 1)) +
xlab('Different Event Types Causing Economic Damage') +
ylab('Total Damage in USD')
RESULTS:
From the perspective of human cost, Tornados ar far more fatal than any other event type. Tornados alone caused 5633 death and 91346 injuries in the last 60 years.
From the Economic cost perspective Floods caused the most destruction, followed by Hurricane/Typhoon and Tornados. Therefore response to these events must be the top priority and awareness of the society must be kept always high.