Synopsis
Severe weather events cause human and economic damages. This report explores the U.S. National Oceanic and Atmospheric Administration’s storm database downloaded via https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2. This database tracks characteristics of major storms and weather events in the US from 1950 through November 2011, including estimates of any fatalities, injuries, and property damage. Explanatory info for this dataset is available via:
Documentation: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf
FAQ: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf
Based on this data, this report addresses, with R code and tables and diagrams, the following questions: 1) Which types of events are most harmful with respect to population health? 2) Which types of events have the greatest economic consequences? The intended audience for the report is a government or municipal manager responsible for preparing for severe weather events, e.g. by prioritizing response capabilities for different types of events.
Data processing
To begin, we install and load the relevant R libraries as needed and the read in the data frame df
for the analysis:
require(plyr)
require(ggplot2)
df = read.csv('repdata_data_StormData.csv.bz2')
The further processing of df
necessary for answering the two questions provided for this report will be described step by step in the following.
Results
Question 1: Which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
To study this question, we review the observation columns in df
along with the above referred explanatory info to identify the columns containing observations of harm to population health:
names(df)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
We identify the columns INJURIES
and FATALITIES
as containing observations of harm to human health, and based on these we form IDX
variable as an index of the level of harm for each weather event type in df
to human population:
scaler = mean(df$INJURIES)/mean(df$FATALITIES)
scaler
## [1] 9.278838
df$IDX = df$INJURIES+scaler*df$FATALITIES
We then list the top 20 weather event types by their total, average and maximum impact on humans based on their IDX
scores:
typesums = ddply(df,c('EVTYPE'),summarize,IDXsum=round(sum(IDX)))
typesums = arrange(typesums,-IDXsum)
head(typesums,20)
## EVTYPE IDXsum
## 1 TORNADO 143614
## 2 EXCESSIVE HEAT 24183
## 3 LIGHTNING 12802
## 4 TSTM WIND 11634
## 5 FLOOD 11150
## 6 FLASH FLOOD 10852
## 7 HEAT 10794
## 8 RIP CURRENT 3647
## 9 HIGH WIND 3438
## 10 WINTER STORM 3232
## 11 ICE STORM 2801
## 12 THUNDERSTORM WIND 2722
## 13 AVALANCHE 2248
## 14 HEAVY SNOW 2199
## 15 RIP CURRENTS 2190
## 16 HEAT WAVE 1905
## 17 HURRICANE/TYPHOON 1869
## 18 BLIZZARD 1742
## 19 EXTREME COLD 1716
## 20 WILDFIRE 1607
typeaves = ddply(df,c('EVTYPE'),summarize,IDXmean=round(mean(IDX)))
typeaves = arrange(typeaves,-IDXmean)
head(typeaves,20)
## EVTYPE IDXmean
## 1 TORNADOES, TSTM WIND, HAIL 232
## 2 COLD AND SNOW 130
## 3 TROPICAL STORM GORDON 117
## 4 Heat Wave 70
## 5 RECORD/EXCESSIVE HEAT 53
## 6 HEAT WAVE DROUGHT 52
## 7 EXTREME HEAT 48
## 8 HIGH WIND AND SEAS 48
## 9 WILD FIRES 44
## 10 HIGH WIND/SEAS 37
## 11 WINTER STORMS 37
## 12 MARINE MISHAP 35
## 13 Heavy surf and wind 28
## 14 THUNDERSTORMW 27
## 15 HEAT WAVE 26
## 16 ROUGH SEAS 26
## 17 WINTER STORM HIGH WINDS 24
## 18 HEAT WAVES 23
## 19 RIP CURRENTS/HEAVY SURF 23
## 20 TSUNAMI 22
typemaxs = ddply(df,c('EVTYPE'),summarize,IDXmax=round(max(IDX)))
typemaxs = arrange(typemaxs,-IDXmax)
head(typemaxs,20)
## EVTYPE IDXmax
## 1 HEAT 5410
## 2 TORNADO 2616
## 3 ICE STORM 1577
## 4 EXCESSIVE HEAT 919
## 5 HURRICANE/TYPHOON 845
## 6 FLOOD 819
## 7 EXTREME HEAT 529
## 8 BLIZZARD 431
## 9 TSUNAMI 426
## 10 HEAT WAVE 306
## 11 UNSEASONABLY WARM AND DRY 269
## 12 FLASH FLOOD 234
## 13 TORNADOES, TSTM WIND, HAIL 232
## 14 WILDFIRE 220
## 15 WINTER STORM 211
## 16 TROPICAL STORM 209
## 17 HEAVY SNOW 185
## 18 HEAVY RAIN 184
## 19 WILD FIRES 178
## 20 RECORD/EXCESSIVE HEAT 158
Per above, tornadoes, heat, thunder storms, severe winter weather and tropical storms have been the most dangerous forms of severe weather events for the human population across the US from 1950 through 2011.
Question 2: Which types of events have the greatest economic consequences?
To answer this question, we form a US Dollar measure of the total economic costs for the weather event types in df
. For that purpose, we identify the df
columns PROPDMG
, PROPDMGEXP
, CROPDMG
and CROPDMGEXP
as containing the necessary raw data. Columns CROPDMG
and CROPDMGEXP
provide info for the exponent info for their respective cost coefficients as follows:
levels(df$PROPDMGEXP)
## [1] "" "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
levels(df$CROPDMGEXP)
## [1] "" "?" "0" "2" "B" "k" "K" "m" "M"
With the info per above, we form ECONDMGBUSD
as the measure of total ecomonic damage, in Billion USD, for the event types in df
as follows;
df$PROPDMGEXP = as.character(df$PROPDMGEXP)
df$PROPDMGEXP[df$PROPDMGEXP=='' | df$PROPDMGEXP=='-' | df$PROPDMGEXP=='?' | df$PROPDMGEXP=='+'] = 1
df$PROPDMGEXP[df$PROPDMGEXP=='h' | df$PROPDMGEXP=='H'] = 2
df$PROPDMGEXP[df$PROPDMGEXP=='k' | df$PROPDMGEXP=='K'] = 3
df$PROPDMGEXP[df$PROPDMGEXP=='m' | df$PROPDMGEXP=='M'] = 6
df$PROPDMGEXP[df$PROPDMGEXP=='b' | df$PROPDMGEXP=='B'] = 12
df$PROPDMGUSD = df$PROPDMG * 10^as.numeric(df$PROPDMGEXP)
df$CROPDMGEXP = as.character(df$CROPDMGEXP)
df$CROPDMGEXP[df$CROPDMGEXP=='' | df$CROPDMGEXP=='-' | df$CROPDMGEXP=='?' | df$CROPDMGEXP=='+'] = 1
df$CROPDMGEXP[df$CROPDMGEXP=='h' | df$CROPDMGEXP=='H'] = 2
df$CROPDMGEXP[df$CROPDMGEXP=='k' | df$CROPDMGEXP=='K'] = 3
df$CROPDMGEXP[df$CROPDMGEXP=='m' | df$CROPDMGEXP=='M'] = 6
df$CROPDMGEXP[df$CROPDMGEXP=='b' | df$CROPDMGEXP=='B'] = 12
df$CROPDMGUSD = df$CROPDMG * 10^as.numeric(df$CROPDMGEXP)
df$ECONDMGBUSD = (df$PROPDMGUSD+df$CROPDMGUSD)/10^12
We visualize the financially most damaging types of severe weather events from df
:
typetotals = ddply(df,c('EVTYPE'),summarize,totals=round(sum(ECONDMGBUSD)))
typetotals = arrange(typetotals,-totals)
head(typetotals,20)
## EVTYPE totals
## 1 FLOOD 123
## 2 HURRICANE/TYPHOON 67
## 3 STORM SURGE 43
## 4 RIVER FLOOD 10
## 5 HURRICANE 6
## 6 ICE STORM 5
## 7 TORNADO 5
## 8 TROPICAL STORM 5
## 9 WINTER STORM 5
## 10 STORM SURGE/TIDE 4
## 11 HURRICANE OPAL 3
## 12 DROUGHT 2
## 13 HAIL 2
## 14 HEAVY RAIN/SEVERE WEATHER 2
## 15 TORNADOES, TSTM WIND, HAIL 2
## 16 WILD/FOREST FIRE 2
## 17 FLASH FLOOD 1
## 18 HIGH WIND 1
## 19 SEVERE THUNDERSTORM 1
## 20 WILDFIRE 1
tops = subset(typetotals,totals>.04*max(totals))
tops = mutate(tops,EVTYPE=droplevels(EVTYPE))
tops = mutate(tops,EVTYPE=as.factor(tolower(EVTYPE)))
p = ggplot(data = tops, aes(x=reorder(EVTYPE,-totals), y=totals)) + geom_bar(color='red',fill='red',stat='identity') + ggtitle('Cumulative financial losses in the US by most damaging severe weather types') + labs(x='Severe weather event types',y='Total property damages 1950-2011 [BUSD]')
p
We accordingly conclude that flooding, tropical storms, winter storms and tornadoes have caused the most economic damage across the US from 1950 through 2011, in terms of total dollar values of losses. In addition to the above analysis of the cumulative financial damages per event types, the user of the report could, using e.g. R Studio environment https://www.rstudio.com, identify also the most damaging forms of weather in terms average or maximum financial losses per weather event type by replacing the sum
in the R code above with mean
and max
, and so forth, depending on needs for further analysis.