Synopsis

Severe weather events cause human and economic damages. This report explores the U.S. National Oceanic and Atmospheric Administration’s storm database downloaded via https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2. This database tracks characteristics of major storms and weather events in the US from 1950 through November 2011, including estimates of any fatalities, injuries, and property damage. Explanatory info for this dataset is available via:

Based on this data, this report addresses, with R code and tables and diagrams, the following questions: 1) Which types of events are most harmful with respect to population health? 2) Which types of events have the greatest economic consequences? The intended audience for the report is a government or municipal manager responsible for preparing for severe weather events, e.g. by prioritizing response capabilities for different types of events.

Data processing

To begin, we install and load the relevant R libraries as needed and the read in the data frame df for the analysis:

require(plyr)
require(ggplot2)
df = read.csv('repdata_data_StormData.csv.bz2')

The further processing of df necessary for answering the two questions provided for this report will be described step by step in the following.

Results

Question 1: Which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

To study this question, we review the observation columns in df along with the above referred explanatory info to identify the columns containing observations of harm to population health:

names(df)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

We identify the columns INJURIES and FATALITIES as containing observations of harm to human health, and based on these we form IDX variable as an index of the level of harm for each weather event type in df to human population:

scaler = mean(df$INJURIES)/mean(df$FATALITIES)
scaler
## [1] 9.278838
df$IDX = df$INJURIES+scaler*df$FATALITIES

We then list the top 20 weather event types by their total, average and maximum impact on humans based on their IDX scores:

typesums = ddply(df,c('EVTYPE'),summarize,IDXsum=round(sum(IDX)))
typesums = arrange(typesums,-IDXsum)
head(typesums,20)
##               EVTYPE IDXsum
## 1            TORNADO 143614
## 2     EXCESSIVE HEAT  24183
## 3          LIGHTNING  12802
## 4          TSTM WIND  11634
## 5              FLOOD  11150
## 6        FLASH FLOOD  10852
## 7               HEAT  10794
## 8        RIP CURRENT   3647
## 9          HIGH WIND   3438
## 10      WINTER STORM   3232
## 11         ICE STORM   2801
## 12 THUNDERSTORM WIND   2722
## 13         AVALANCHE   2248
## 14        HEAVY SNOW   2199
## 15      RIP CURRENTS   2190
## 16         HEAT WAVE   1905
## 17 HURRICANE/TYPHOON   1869
## 18          BLIZZARD   1742
## 19      EXTREME COLD   1716
## 20          WILDFIRE   1607
typeaves = ddply(df,c('EVTYPE'),summarize,IDXmean=round(mean(IDX)))
typeaves = arrange(typeaves,-IDXmean)
head(typeaves,20)
##                        EVTYPE IDXmean
## 1  TORNADOES, TSTM WIND, HAIL     232
## 2               COLD AND SNOW     130
## 3       TROPICAL STORM GORDON     117
## 4                   Heat Wave      70
## 5       RECORD/EXCESSIVE HEAT      53
## 6           HEAT WAVE DROUGHT      52
## 7                EXTREME HEAT      48
## 8          HIGH WIND AND SEAS      48
## 9                  WILD FIRES      44
## 10             HIGH WIND/SEAS      37
## 11              WINTER STORMS      37
## 12              MARINE MISHAP      35
## 13        Heavy surf and wind      28
## 14              THUNDERSTORMW      27
## 15                  HEAT WAVE      26
## 16                 ROUGH SEAS      26
## 17    WINTER STORM HIGH WINDS      24
## 18                 HEAT WAVES      23
## 19    RIP CURRENTS/HEAVY SURF      23
## 20                    TSUNAMI      22
typemaxs = ddply(df,c('EVTYPE'),summarize,IDXmax=round(max(IDX)))
typemaxs = arrange(typemaxs,-IDXmax)
head(typemaxs,20)
##                        EVTYPE IDXmax
## 1                        HEAT   5410
## 2                     TORNADO   2616
## 3                   ICE STORM   1577
## 4              EXCESSIVE HEAT    919
## 5           HURRICANE/TYPHOON    845
## 6                       FLOOD    819
## 7                EXTREME HEAT    529
## 8                    BLIZZARD    431
## 9                     TSUNAMI    426
## 10                  HEAT WAVE    306
## 11  UNSEASONABLY WARM AND DRY    269
## 12                FLASH FLOOD    234
## 13 TORNADOES, TSTM WIND, HAIL    232
## 14                   WILDFIRE    220
## 15               WINTER STORM    211
## 16             TROPICAL STORM    209
## 17                 HEAVY SNOW    185
## 18                 HEAVY RAIN    184
## 19                 WILD FIRES    178
## 20      RECORD/EXCESSIVE HEAT    158

Per above, tornadoes, heat, thunder storms, severe winter weather and tropical storms have been the most dangerous forms of severe weather events for the human population across the US from 1950 through 2011.

Question 2: Which types of events have the greatest economic consequences?

To answer this question, we form a US Dollar measure of the total economic costs for the weather event types in df. For that purpose, we identify the df columns PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP as containing the necessary raw data. Columns CROPDMG and CROPDMGEXP provide info for the exponent info for their respective cost coefficients as follows:

levels(df$PROPDMGEXP)
##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
levels(df$CROPDMGEXP)
## [1] ""  "?" "0" "2" "B" "k" "K" "m" "M"

With the info per above, we form ECONDMGBUSD as the measure of total ecomonic damage, in Billion USD, for the event types in df as follows;

df$PROPDMGEXP = as.character(df$PROPDMGEXP)
df$PROPDMGEXP[df$PROPDMGEXP=='' | df$PROPDMGEXP=='-' | df$PROPDMGEXP=='?' | df$PROPDMGEXP=='+'] = 1
df$PROPDMGEXP[df$PROPDMGEXP=='h' | df$PROPDMGEXP=='H'] = 2
df$PROPDMGEXP[df$PROPDMGEXP=='k' | df$PROPDMGEXP=='K'] = 3
df$PROPDMGEXP[df$PROPDMGEXP=='m' | df$PROPDMGEXP=='M'] = 6
df$PROPDMGEXP[df$PROPDMGEXP=='b' | df$PROPDMGEXP=='B'] = 12
df$PROPDMGUSD = df$PROPDMG * 10^as.numeric(df$PROPDMGEXP)

df$CROPDMGEXP = as.character(df$CROPDMGEXP)
df$CROPDMGEXP[df$CROPDMGEXP=='' | df$CROPDMGEXP=='-' | df$CROPDMGEXP=='?' | df$CROPDMGEXP=='+'] = 1
df$CROPDMGEXP[df$CROPDMGEXP=='h' | df$CROPDMGEXP=='H'] = 2
df$CROPDMGEXP[df$CROPDMGEXP=='k' | df$CROPDMGEXP=='K'] = 3
df$CROPDMGEXP[df$CROPDMGEXP=='m' | df$CROPDMGEXP=='M'] = 6
df$CROPDMGEXP[df$CROPDMGEXP=='b' | df$CROPDMGEXP=='B'] = 12
df$CROPDMGUSD = df$CROPDMG * 10^as.numeric(df$CROPDMGEXP)

df$ECONDMGBUSD = (df$PROPDMGUSD+df$CROPDMGUSD)/10^12

We visualize the financially most damaging types of severe weather events from df:

typetotals = ddply(df,c('EVTYPE'),summarize,totals=round(sum(ECONDMGBUSD)))
typetotals = arrange(typetotals,-totals)
head(typetotals,20)
##                        EVTYPE totals
## 1                       FLOOD    123
## 2           HURRICANE/TYPHOON     67
## 3                 STORM SURGE     43
## 4                 RIVER FLOOD     10
## 5                   HURRICANE      6
## 6                   ICE STORM      5
## 7                     TORNADO      5
## 8              TROPICAL STORM      5
## 9                WINTER STORM      5
## 10           STORM SURGE/TIDE      4
## 11             HURRICANE OPAL      3
## 12                    DROUGHT      2
## 13                       HAIL      2
## 14  HEAVY RAIN/SEVERE WEATHER      2
## 15 TORNADOES, TSTM WIND, HAIL      2
## 16           WILD/FOREST FIRE      2
## 17                FLASH FLOOD      1
## 18                  HIGH WIND      1
## 19        SEVERE THUNDERSTORM      1
## 20                   WILDFIRE      1
tops = subset(typetotals,totals>.04*max(totals))
tops = mutate(tops,EVTYPE=droplevels(EVTYPE))
tops = mutate(tops,EVTYPE=as.factor(tolower(EVTYPE)))
p = ggplot(data = tops, aes(x=reorder(EVTYPE,-totals), y=totals)) + geom_bar(color='red',fill='red',stat='identity') + ggtitle('Cumulative financial losses in the US by most damaging severe weather types') + labs(x='Severe weather event types',y='Total property damages 1950-2011 [BUSD]')
p

We accordingly conclude that flooding, tropical storms, winter storms and tornadoes have caused the most economic damage across the US from 1950 through 2011, in terms of total dollar values of losses. In addition to the above analysis of the cumulative financial damages per event types, the user of the report could, using e.g. R Studio environment https://www.rstudio.com, identify also the most damaging forms of weather in terms average or maximum financial losses per weather event type by replacing the sum in the R code above with mean and max, and so forth, depending on needs for further analysis.