Analyis of Severe Weather Events to Improve Preventive Measures

Summary

Severe weather events from the National Climate Data Center (NCDC;1950-2011) were analyzed to estimate their public health and economic impact.Based on estimates of public health and economic impact from specific weather events, resources can be more efficiently allocated to reduce public health and economic impact. All steps to analyze data are listed below so that another researcher may reproduce the results,which are also listed in its own section below.

DATA PROCESSING

Read in and look at structure of NCDC storm database; for more information and access to data see: http://www.ncdc.noaa.gov/stormevents/details.jsp 902,297 observations and 37 variables with no missing values

storm=read.csv('O:/jkatz/programming/repdata_data_StormData.csv')

PUBLIC HEALTH IMPACT

The weather event type - column 8 has 980 events. To attain a more accurate analysis, aggregation of like events should be performed. The number deaths, column 23 were used to measure impact of events on public health; injuries were not used due to greater subjectivity. Storm dataset attributes are defined here: http://ire.org/nicar/database-library/databases/storm-events/

No missing values

sum(complete.cases(storm[c(8,23)]))
## [1] 902297
deaths=tapply(storm$FATALITIES,storm$EVTYPE,sum);str(deaths);summary(deaths)
##  num [1:985(1d)] 0 0 0 0 0 0 0 0 0 0 ...
##  - attr(*, "dimnames")=List of 1
##   ..$ : chr [1:985] "   HIGH SURF ADVISORY" " COASTAL FLOOD" " FLASH FLOOD" " LIGHTNING" ...
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0       0       0      15       0    5630

a list of the 30 weather events that cause the most deaths and the total deaths from all events

 rev(sort(deaths))[1:30];sum(deaths)
##                 TORNADO          EXCESSIVE HEAT             FLASH FLOOD 
##                    5633                    1903                     978 
##                    HEAT               LIGHTNING               TSTM WIND 
##                     937                     816                     504 
##                   FLOOD             RIP CURRENT               HIGH WIND 
##                     470                     368                     248 
##               AVALANCHE            WINTER STORM            RIP CURRENTS 
##                     224                     206                     204 
##               HEAT WAVE            EXTREME COLD       THUNDERSTORM WIND 
##                     172                     160                     133 
##              HEAVY SNOW EXTREME COLD/WIND CHILL             STRONG WIND 
##                     127                     125                     103 
##               HIGH SURF                BLIZZARD              HEAVY RAIN 
##                     101                     101                      98 
##            EXTREME HEAT         COLD/WIND CHILL               ICE STORM 
##                      96                      95                      89 
##                WILDFIRE      THUNDERSTORM WINDS       HURRICANE/TYPHOON 
##                      75                      64                      64 
##                     FOG               HURRICANE          TROPICAL STORM 
##                      62                      61                      58
## [1] 15145

FINANCIAL IMPACT

storm_econ=storm[c(8,25:28)];head(storm_econ);str(storm_econ)
##    EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO    25.0          K       0           
## 2 TORNADO     2.5          K       0           
## 3 TORNADO    25.0          K       0           
## 4 TORNADO     2.5          K       0           
## 5 TORNADO     2.5          K       0           
## 6 TORNADO     2.5          K       0
## 'data.frame':    902297 obs. of  5 variables:
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
  ##Subset event and property/crop damage caused

sum(complete.cases(storm_econ))
## [1] 902297
##No missing values
 levels(storm_econ$PROPDMGEXP)=c(rep(0,13),1E9,0,1E2,1E3,0,1E6)

 levels(storm_econ$CROPDMGEXP)=c(rep(0,4),1E9,0,1E3,0,1E6)
  ##Change letters to reflect multipliers

storm_econ$propcash=storm_econ$PROPDMG*as.numeric(as.character(storm_econ$PROPDMGEXP))
head(storm_econ)
##    EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP propcash
## 1 TORNADO    25.0       1000       0          0    25000
## 2 TORNADO     2.5       1000       0          0     2500
## 3 TORNADO    25.0       1000       0          0    25000
## 4 TORNADO     2.5       1000       0          0     2500
## 5 TORNADO     2.5       1000       0          0     2500
## 6 TORNADO     2.5       1000       0          0     2500
storm_econ$cropcash=storm_econ$CROPDMG*as.numeric(as.character(storm_econ$CROPDMGEXP))
summary(storm_econ$cropcash);summary(storm_econ$propcash)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.00e+00 0.00e+00 0.00e+00 5.44e+04 0.00e+00 5.00e+09
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.00e+00 0.00e+00 0.00e+00 4.74e+05 5.00e+02 1.15e+11
storm_econ$totcash=storm_econ$cropcash+storm_econ$propcash
summary(storm_econ$totcash)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.00e+00 0.00e+00 0.00e+00 5.28e+05 1.00e+03 1.15e+11
head(storm_econ)
##    EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP propcash cropcash totcash
## 1 TORNADO    25.0       1000       0          0    25000        0   25000
## 2 TORNADO     2.5       1000       0          0     2500        0    2500
## 3 TORNADO    25.0       1000       0          0    25000        0   25000
## 4 TORNADO     2.5       1000       0          0     2500        0    2500
## 5 TORNADO     2.5       1000       0          0     2500        0    2500
## 6 TORNADO     2.5       1000       0          0     2500        0    2500
cost=tapply(storm_econ$totcash,storm$EVTYPE,sum);str(cost);summary(cost)
##  num [1:985(1d)] 200000 0 50000 0 8100000 8000 0 0 5000 0 ...
##  - attr(*, "dimnames")=List of 1
##   ..$ : chr [1:985] "   HIGH SURF ADVISORY" " COASTAL FLOOD" " FLASH FLOOD" " LIGHTNING" ...
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.00e+00 0.00e+00 0.00e+00 4.84e+08 8.50e+04 1.50e+11
  ##a list of the 30 weather events that cause the most damage

rev(sort(cost))[1:30];sum(cost)
##                      FLOOD          HURRICANE/TYPHOON 
##                  1.503e+11                  7.191e+10 
##                    TORNADO                STORM SURGE 
##                  5.734e+10                  4.332e+10 
##                       HAIL                FLASH FLOOD 
##                  1.875e+10                  1.756e+10 
##                    DROUGHT                  HURRICANE 
##                  1.502e+10                  1.461e+10 
##                RIVER FLOOD                  ICE STORM 
##                  1.015e+10                  8.967e+09 
##             TROPICAL STORM               WINTER STORM 
##                  8.382e+09                  6.715e+09 
##                  HIGH WIND                   WILDFIRE 
##                  5.909e+09                  5.061e+09 
##                  TSTM WIND           STORM SURGE/TIDE 
##                  5.039e+09                  4.642e+09 
##          THUNDERSTORM WIND             HURRICANE OPAL 
##                  3.898e+09                  3.162e+09 
##           WILD/FOREST FIRE  HEAVY RAIN/SEVERE WEATHER 
##                  3.109e+09                  2.500e+09 
##         THUNDERSTORM WINDS TORNADOES, TSTM WIND, HAIL 
##                  1.924e+09                  1.602e+09 
##                 HEAVY RAIN               EXTREME COLD 
##                  1.428e+09                  1.361e+09 
##        SEVERE THUNDERSTORM               FROST/FREEZE 
##                  1.206e+09                  1.104e+09 
##                 HEAVY SNOW                  LIGHTNING 
##                  1.067e+09                  9.408e+08 
##                   BLIZZARD                 HIGH WINDS 
##                  7.713e+08                  6.490e+08
## [1] 4.764e+11
sum(cost)
## [1] 4.764e+11
  ##476,373,500,510 476Billion

sum(rev(sort(cost))[1:10])/sum(cost)
## [1] 0.8564

Create dataframe with deaths by time period

##look at last 10 years versus previous 51 years for changes in deaths and cost

death_dates=storm[c(2,8,23)];head(death_dates)
##             BGN_DATE  EVTYPE FATALITIES
## 1  4/18/1950 0:00:00 TORNADO          0
## 2  4/18/1950 0:00:00 TORNADO          0
## 3  2/20/1951 0:00:00 TORNADO          0
## 4   6/8/1951 0:00:00 TORNADO          0
## 5 11/15/1951 0:00:00 TORNADO          0
## 6 11/15/1951 0:00:00 TORNADO          0
  ##keep only date, evtype, and deaths

#small=as.character(death_dates[1:5,1])
       #test dataset

extr_year=function(x){
  smaller=strsplit(x,"/" )
smallest=strsplit(smaller[[1]][3]," ")
smallest[[1]][1]
  ##year has been extracted
  }
##create function and apply to entire date column

death_dates$year=sapply(as.character(death_dates$BGN_DATE),extr_year);head(death_dates)
##             BGN_DATE  EVTYPE FATALITIES year
## 1  4/18/1950 0:00:00 TORNADO          0 1950
## 2  4/18/1950 0:00:00 TORNADO          0 1950
## 3  2/20/1951 0:00:00 TORNADO          0 1951
## 4   6/8/1951 0:00:00 TORNADO          0 1951
## 5 11/15/1951 0:00:00 TORNADO          0 1951
## 6 11/15/1951 0:00:00 TORNADO          0 1951
death_dates$year_periods=cut(as.numeric(death_dates$year)
,breaks=c(1949,1959,1969,1979,1989,1999,2011),labels=c("1950-1959","1960-1969","1970-1979",
"1980-1989","1990-1999","2000-2011"))
tail(death_dates)
##                  BGN_DATE         EVTYPE FATALITIES year year_periods
## 902292 11/28/2011 0:00:00 WINTER WEATHER          0 2011    2000-2011
## 902293 11/30/2011 0:00:00      HIGH WIND          0 2011    2000-2011
## 902294 11/10/2011 0:00:00      HIGH WIND          0 2011    2000-2011
## 902295  11/8/2011 0:00:00      HIGH WIND          0 2011    2000-2011
## 902296  11/9/2011 0:00:00       BLIZZARD          0 2011    2000-2011
## 902297 11/28/2011 0:00:00     HEAVY SNOW          0 2011    2000-2011
   ##cut date column into 10 year intervals


library(plyr)
deaths2=ddply(death_dates,.(year_periods,EVTYPE),summarize,deaths=sum(FATALITIES));str(deaths2)
## 'data.frame':    1121 obs. of  3 variables:
##  $ year_periods: Factor w/ 6 levels "1950-1959","1960-1969",..: 1 1 1 2 2 2 3 3 3 4 ...
##  $ EVTYPE      : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 244 834 856 244 834 856 244 834 856 244 ...
##  $ deaths      : num  0 1419 0 0 942 ...
##Sum fatalities by weather event type FOR EACH YEAR PERIOD
deaths2$EVTYPE=as.character(deaths2$EVTYPE)

spl=split(deaths2,deaths2$year_periods);str(spl)
## List of 6
##  $ 1950-1959:'data.frame':   3 obs. of  3 variables:
##   ..$ year_periods: Factor w/ 6 levels "1950-1959","1960-1969",..: 1 1 1
##   ..$ EVTYPE      : chr [1:3] "HAIL" "TORNADO" "TSTM WIND"
##   ..$ deaths      : num [1:3] 0 1419 0
##  $ 1960-1969:'data.frame':   3 obs. of  3 variables:
##   ..$ year_periods: Factor w/ 6 levels "1950-1959","1960-1969",..: 2 2 2
##   ..$ EVTYPE      : chr [1:3] "HAIL" "TORNADO" "TSTM WIND"
##   ..$ deaths      : num [1:3] 0 942 0
##  $ 1970-1979:'data.frame':   3 obs. of  3 variables:
##   ..$ year_periods: Factor w/ 6 levels "1950-1959","1960-1969",..: 3 3 3
##   ..$ EVTYPE      : chr [1:3] "HAIL" "TORNADO" "TSTM WIND"
##   ..$ deaths      : num [1:3] 0 998 0
##  $ 1980-1989:'data.frame':   3 obs. of  3 variables:
##   ..$ year_periods: Factor w/ 6 levels "1950-1959","1960-1969",..: 4 4 4
##   ..$ EVTYPE      : chr [1:3] "HAIL" "TORNADO" "TSTM WIND"
##   ..$ deaths      : num [1:3] 0 522 177
##  $ 1990-1999:'data.frame':   913 obs. of  3 variables:
##   ..$ year_periods: Factor w/ 6 levels "1950-1959","1960-1969",..: 5 5 5 5 5 5 5 5 5 5 ...
##   ..$ EVTYPE      : chr [1:913] " COASTAL FLOOD" " LIGHTNING" " TSTM WIND" " TSTM WIND (G45)" ...
##   ..$ deaths      : num [1:913] 0 0 0 0 0 0 0 0 0 1 ...
##  $ 2000-2011:'data.frame':   196 obs. of  3 variables:
##   ..$ year_periods: Factor w/ 6 levels "1950-1959","1960-1969",..: 6 6 6 6 6 6 6 6 6 6 ...
##   ..$ EVTYPE      : chr [1:196] "   HIGH SURF ADVISORY" " FLASH FLOOD" " TSTM WIND" " WATERSPOUT" ...
##   ..$ deaths      : num [1:196] 0 0 0 0 0 0 0 0 0 179 ...
nines=lapply(spl[[5]][3],function(x)rev(sort(x))[1:10])
zeros=lapply(spl[[6]][3],function(x)rev(sort(x))[1:10])

top=deaths2[deaths2$deaths %in% c(nines$deaths,zeros$deaths),]
top_deaths=rbind(top,deaths2[1:12,])

Results

Figure 1 shows the percentage of deaths caused by specific weather events from total deaths caused by all weather events in the United States from 1950-2011. The top ten weather events are the focus of this figure as they account for ~80% of totaldeaths.

library(ggplot2)
per=rev(sort(deaths))[1:10]/sum(deaths);str(per);head(per)
##  num [1:10(1d)] 0.3719 0.1257 0.0646 0.0619 0.0539 ...
##  - attr(*, "dimnames")=List of 1
##   ..$ : chr [1:10] "TORNADO" "EXCESSIVE HEAT" "FLASH FLOOD" "HEAT" ...
##        TORNADO EXCESSIVE HEAT    FLASH FLOOD           HEAT      LIGHTNING 
##        0.37194        0.12565        0.06458        0.06187        0.05388 
##      TSTM WIND 
##        0.03328
 per_graph=data.frame(event=names(per),percentage=as.vector(per));str(per_graph)
## 'data.frame':    10 obs. of  2 variables:
##  $ event     : Factor w/ 10 levels "AVALANCHE","EXCESSIVE HEAT",..: 9 2 3 5 7 10 4 8 6 1
##  $ percentage: num  0.3719 0.1257 0.0646 0.0619 0.0539 ...
per_graph=transform(per_graph,event = reorder(event, percentage))

 p=qplot(event,data=per_graph,geom="bar",weight=percentage,fill=event,xlab="Top 10 severe weather events",ylab="Percentage of deaths resulting from weather event",main="
Figure 1. Deaths caused by severe weather events in the United States from 1950-2011")
 p+theme(axis.text.x = element_text(angle = 90))

plot of chunk unnamed-chunk-8

Figure 2 shows total cash loss from specific weather events versus time. The top ten weather events are the focus of this figure as they account for ~86% of total finanical loss.

library(ggplot2)


per=rev(sort(cost))[1:10]/sum(cost);str(per);head(per)
##  num [1:10(1d)] 0.3156 0.151 0.1204 0.0909 0.0394 ...
##  - attr(*, "dimnames")=List of 1
##   ..$ : chr [1:10] "FLOOD" "HURRICANE/TYPHOON" "TORNADO" "STORM SURGE" ...
##             FLOOD HURRICANE/TYPHOON           TORNADO       STORM SURGE 
##           0.31555           0.15096           0.12037           0.09094 
##              HAIL       FLASH FLOOD 
##           0.03937           0.03687
 per_graph=data.frame(event=names(per),percentage=as.vector(per));str(per_graph)
## 'data.frame':    10 obs. of  2 variables:
##  $ event     : Factor w/ 10 levels "DROUGHT","FLASH FLOOD",..: 3 6 10 9 4 2 1 5 8 7
##  $ percentage: num  0.3156 0.151 0.1204 0.0909 0.0394 ...
per_graph=transform(per_graph,event = reorder(event, percentage))

 p=qplot(event,data=per_graph,geom="bar",weight=percentage,fill=event,xlab="Top 10 severe weather events",ylab="Percentage of financial loss resulting from weather event",main="
    Figure 2. Financial Loss caused by severe weather events in the United States from 1950-2011")
 p+theme(axis.text.x = element_text(angle = 90))

plot of chunk unnamed-chunk-9

Figure 3 was developed to look at weather event impact over time and whether certain emerging events have a greater recent impact, but less overall. Global warming or another reason could be the source of these higher priority emerging weather events.

top_deaths=transform(top_deaths,EVTYPE = reorder(EVTYPE,deaths))

p=qplot(EVTYPE,data=top_deaths,geom="bar",weight=deaths,fill=EVTYPE,xlab="Top severe weather
         events per decade",ylab="Number of deaths resulting from weather event",main="
         Figure 3. Deaths caused by severe weather events in the United States by decade from 1950-2011")
p+facet_wrap(~year_periods, scale="free")+theme(axis.text.x = element_text(angle = 90))

plot of chunk unnamed-chunk-10

Conclusion

A person in charge of allocating preventive resources to reduce the impact of severe weather events on public health and crops/property can employ the above reproducible code to more efficiently channel resources. For instance, Figs.1 & 2 show tornados are both a major cause of deaths and financial loss. A focus on the reduction of tornado impact seems appropriate. Additionally, Figure 3 shows that in the 1990s, excessive heat caused the highest deaths; this is in contrast to tornados causing the most deaths in all other decades. It would be interesting to research why excessive heat caused more deaths, and to model the likelihood excessive heat usurps tornados again in 2010-2020.

Validation of dataset

Used http://www.ncdc.noaa.gov/oa/climate/sd/annsum1996.pdf to attain mean tornados(87)/yr and multiplied by 61 years for a total of 5,301 which is close to the 5633 reported in the NCDC dataset used in this project. The distribution qualities (skew, etc.) of the values the mean was derived from was not taken into account as time was limited.