Severe weather events from the National Climate Data Center (NCDC;1950-2011) were analyzed to estimate their public health and economic impact.Based on estimates of public health and economic impact from specific weather events, resources can be more efficiently allocated to reduce public health and economic impact. All steps to analyze data are listed below so that another researcher may reproduce the results,which are also listed in its own section below.
Read in and look at structure of NCDC storm database; for more information and access to data see: http://www.ncdc.noaa.gov/stormevents/details.jsp 902,297 observations and 37 variables with no missing values
storm=read.csv('O:/jkatz/programming/repdata_data_StormData.csv')
PUBLIC HEALTH IMPACT
The weather event type - column 8 has 980 events. To attain a more accurate analysis, aggregation of like events should be performed. The number deaths, column 23 were used to measure impact of events on public health; injuries were not used due to greater subjectivity. Storm dataset attributes are defined here: http://ire.org/nicar/database-library/databases/storm-events/
No missing values
sum(complete.cases(storm[c(8,23)]))
## [1] 902297
deaths=tapply(storm$FATALITIES,storm$EVTYPE,sum);str(deaths);summary(deaths)
## num [1:985(1d)] 0 0 0 0 0 0 0 0 0 0 ...
## - attr(*, "dimnames")=List of 1
## ..$ : chr [1:985] " HIGH SURF ADVISORY" " COASTAL FLOOD" " FLASH FLOOD" " LIGHTNING" ...
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 0 15 0 5630
a list of the 30 weather events that cause the most deaths and the total deaths from all events
rev(sort(deaths))[1:30];sum(deaths)
## TORNADO EXCESSIVE HEAT FLASH FLOOD
## 5633 1903 978
## HEAT LIGHTNING TSTM WIND
## 937 816 504
## FLOOD RIP CURRENT HIGH WIND
## 470 368 248
## AVALANCHE WINTER STORM RIP CURRENTS
## 224 206 204
## HEAT WAVE EXTREME COLD THUNDERSTORM WIND
## 172 160 133
## HEAVY SNOW EXTREME COLD/WIND CHILL STRONG WIND
## 127 125 103
## HIGH SURF BLIZZARD HEAVY RAIN
## 101 101 98
## EXTREME HEAT COLD/WIND CHILL ICE STORM
## 96 95 89
## WILDFIRE THUNDERSTORM WINDS HURRICANE/TYPHOON
## 75 64 64
## FOG HURRICANE TROPICAL STORM
## 62 61 58
## [1] 15145
FINANCIAL IMPACT
storm_econ=storm[c(8,25:28)];head(storm_econ);str(storm_econ)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 25.0 K 0
## 2 TORNADO 2.5 K 0
## 3 TORNADO 25.0 K 0
## 4 TORNADO 2.5 K 0
## 5 TORNADO 2.5 K 0
## 6 TORNADO 2.5 K 0
## 'data.frame': 902297 obs. of 5 variables:
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##Subset event and property/crop damage caused
sum(complete.cases(storm_econ))
## [1] 902297
##No missing values
levels(storm_econ$PROPDMGEXP)=c(rep(0,13),1E9,0,1E2,1E3,0,1E6)
levels(storm_econ$CROPDMGEXP)=c(rep(0,4),1E9,0,1E3,0,1E6)
##Change letters to reflect multipliers
storm_econ$propcash=storm_econ$PROPDMG*as.numeric(as.character(storm_econ$PROPDMGEXP))
head(storm_econ)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP propcash
## 1 TORNADO 25.0 1000 0 0 25000
## 2 TORNADO 2.5 1000 0 0 2500
## 3 TORNADO 25.0 1000 0 0 25000
## 4 TORNADO 2.5 1000 0 0 2500
## 5 TORNADO 2.5 1000 0 0 2500
## 6 TORNADO 2.5 1000 0 0 2500
storm_econ$cropcash=storm_econ$CROPDMG*as.numeric(as.character(storm_econ$CROPDMGEXP))
summary(storm_econ$cropcash);summary(storm_econ$propcash)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00e+00 0.00e+00 0.00e+00 5.44e+04 0.00e+00 5.00e+09
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00e+00 0.00e+00 0.00e+00 4.74e+05 5.00e+02 1.15e+11
storm_econ$totcash=storm_econ$cropcash+storm_econ$propcash
summary(storm_econ$totcash)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00e+00 0.00e+00 0.00e+00 5.28e+05 1.00e+03 1.15e+11
head(storm_econ)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP propcash cropcash totcash
## 1 TORNADO 25.0 1000 0 0 25000 0 25000
## 2 TORNADO 2.5 1000 0 0 2500 0 2500
## 3 TORNADO 25.0 1000 0 0 25000 0 25000
## 4 TORNADO 2.5 1000 0 0 2500 0 2500
## 5 TORNADO 2.5 1000 0 0 2500 0 2500
## 6 TORNADO 2.5 1000 0 0 2500 0 2500
cost=tapply(storm_econ$totcash,storm$EVTYPE,sum);str(cost);summary(cost)
## num [1:985(1d)] 200000 0 50000 0 8100000 8000 0 0 5000 0 ...
## - attr(*, "dimnames")=List of 1
## ..$ : chr [1:985] " HIGH SURF ADVISORY" " COASTAL FLOOD" " FLASH FLOOD" " LIGHTNING" ...
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00e+00 0.00e+00 0.00e+00 4.84e+08 8.50e+04 1.50e+11
##a list of the 30 weather events that cause the most damage
rev(sort(cost))[1:30];sum(cost)
## FLOOD HURRICANE/TYPHOON
## 1.503e+11 7.191e+10
## TORNADO STORM SURGE
## 5.734e+10 4.332e+10
## HAIL FLASH FLOOD
## 1.875e+10 1.756e+10
## DROUGHT HURRICANE
## 1.502e+10 1.461e+10
## RIVER FLOOD ICE STORM
## 1.015e+10 8.967e+09
## TROPICAL STORM WINTER STORM
## 8.382e+09 6.715e+09
## HIGH WIND WILDFIRE
## 5.909e+09 5.061e+09
## TSTM WIND STORM SURGE/TIDE
## 5.039e+09 4.642e+09
## THUNDERSTORM WIND HURRICANE OPAL
## 3.898e+09 3.162e+09
## WILD/FOREST FIRE HEAVY RAIN/SEVERE WEATHER
## 3.109e+09 2.500e+09
## THUNDERSTORM WINDS TORNADOES, TSTM WIND, HAIL
## 1.924e+09 1.602e+09
## HEAVY RAIN EXTREME COLD
## 1.428e+09 1.361e+09
## SEVERE THUNDERSTORM FROST/FREEZE
## 1.206e+09 1.104e+09
## HEAVY SNOW LIGHTNING
## 1.067e+09 9.408e+08
## BLIZZARD HIGH WINDS
## 7.713e+08 6.490e+08
## [1] 4.764e+11
sum(cost)
## [1] 4.764e+11
##476,373,500,510 476Billion
sum(rev(sort(cost))[1:10])/sum(cost)
## [1] 0.8564
Create dataframe with deaths by time period
##look at last 10 years versus previous 51 years for changes in deaths and cost
death_dates=storm[c(2,8,23)];head(death_dates)
## BGN_DATE EVTYPE FATALITIES
## 1 4/18/1950 0:00:00 TORNADO 0
## 2 4/18/1950 0:00:00 TORNADO 0
## 3 2/20/1951 0:00:00 TORNADO 0
## 4 6/8/1951 0:00:00 TORNADO 0
## 5 11/15/1951 0:00:00 TORNADO 0
## 6 11/15/1951 0:00:00 TORNADO 0
##keep only date, evtype, and deaths
#small=as.character(death_dates[1:5,1])
#test dataset
extr_year=function(x){
smaller=strsplit(x,"/" )
smallest=strsplit(smaller[[1]][3]," ")
smallest[[1]][1]
##year has been extracted
}
##create function and apply to entire date column
death_dates$year=sapply(as.character(death_dates$BGN_DATE),extr_year);head(death_dates)
## BGN_DATE EVTYPE FATALITIES year
## 1 4/18/1950 0:00:00 TORNADO 0 1950
## 2 4/18/1950 0:00:00 TORNADO 0 1950
## 3 2/20/1951 0:00:00 TORNADO 0 1951
## 4 6/8/1951 0:00:00 TORNADO 0 1951
## 5 11/15/1951 0:00:00 TORNADO 0 1951
## 6 11/15/1951 0:00:00 TORNADO 0 1951
death_dates$year_periods=cut(as.numeric(death_dates$year)
,breaks=c(1949,1959,1969,1979,1989,1999,2011),labels=c("1950-1959","1960-1969","1970-1979",
"1980-1989","1990-1999","2000-2011"))
tail(death_dates)
## BGN_DATE EVTYPE FATALITIES year year_periods
## 902292 11/28/2011 0:00:00 WINTER WEATHER 0 2011 2000-2011
## 902293 11/30/2011 0:00:00 HIGH WIND 0 2011 2000-2011
## 902294 11/10/2011 0:00:00 HIGH WIND 0 2011 2000-2011
## 902295 11/8/2011 0:00:00 HIGH WIND 0 2011 2000-2011
## 902296 11/9/2011 0:00:00 BLIZZARD 0 2011 2000-2011
## 902297 11/28/2011 0:00:00 HEAVY SNOW 0 2011 2000-2011
##cut date column into 10 year intervals
library(plyr)
deaths2=ddply(death_dates,.(year_periods,EVTYPE),summarize,deaths=sum(FATALITIES));str(deaths2)
## 'data.frame': 1121 obs. of 3 variables:
## $ year_periods: Factor w/ 6 levels "1950-1959","1960-1969",..: 1 1 1 2 2 2 3 3 3 4 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 244 834 856 244 834 856 244 834 856 244 ...
## $ deaths : num 0 1419 0 0 942 ...
##Sum fatalities by weather event type FOR EACH YEAR PERIOD
deaths2$EVTYPE=as.character(deaths2$EVTYPE)
spl=split(deaths2,deaths2$year_periods);str(spl)
## List of 6
## $ 1950-1959:'data.frame': 3 obs. of 3 variables:
## ..$ year_periods: Factor w/ 6 levels "1950-1959","1960-1969",..: 1 1 1
## ..$ EVTYPE : chr [1:3] "HAIL" "TORNADO" "TSTM WIND"
## ..$ deaths : num [1:3] 0 1419 0
## $ 1960-1969:'data.frame': 3 obs. of 3 variables:
## ..$ year_periods: Factor w/ 6 levels "1950-1959","1960-1969",..: 2 2 2
## ..$ EVTYPE : chr [1:3] "HAIL" "TORNADO" "TSTM WIND"
## ..$ deaths : num [1:3] 0 942 0
## $ 1970-1979:'data.frame': 3 obs. of 3 variables:
## ..$ year_periods: Factor w/ 6 levels "1950-1959","1960-1969",..: 3 3 3
## ..$ EVTYPE : chr [1:3] "HAIL" "TORNADO" "TSTM WIND"
## ..$ deaths : num [1:3] 0 998 0
## $ 1980-1989:'data.frame': 3 obs. of 3 variables:
## ..$ year_periods: Factor w/ 6 levels "1950-1959","1960-1969",..: 4 4 4
## ..$ EVTYPE : chr [1:3] "HAIL" "TORNADO" "TSTM WIND"
## ..$ deaths : num [1:3] 0 522 177
## $ 1990-1999:'data.frame': 913 obs. of 3 variables:
## ..$ year_periods: Factor w/ 6 levels "1950-1959","1960-1969",..: 5 5 5 5 5 5 5 5 5 5 ...
## ..$ EVTYPE : chr [1:913] " COASTAL FLOOD" " LIGHTNING" " TSTM WIND" " TSTM WIND (G45)" ...
## ..$ deaths : num [1:913] 0 0 0 0 0 0 0 0 0 1 ...
## $ 2000-2011:'data.frame': 196 obs. of 3 variables:
## ..$ year_periods: Factor w/ 6 levels "1950-1959","1960-1969",..: 6 6 6 6 6 6 6 6 6 6 ...
## ..$ EVTYPE : chr [1:196] " HIGH SURF ADVISORY" " FLASH FLOOD" " TSTM WIND" " WATERSPOUT" ...
## ..$ deaths : num [1:196] 0 0 0 0 0 0 0 0 0 179 ...
nines=lapply(spl[[5]][3],function(x)rev(sort(x))[1:10])
zeros=lapply(spl[[6]][3],function(x)rev(sort(x))[1:10])
top=deaths2[deaths2$deaths %in% c(nines$deaths,zeros$deaths),]
top_deaths=rbind(top,deaths2[1:12,])
Figure 1 shows the percentage of deaths caused by specific weather events from total deaths caused by all weather events in the United States from 1950-2011. The top ten weather events are the focus of this figure as they account for ~80% of totaldeaths.
library(ggplot2)
per=rev(sort(deaths))[1:10]/sum(deaths);str(per);head(per)
## num [1:10(1d)] 0.3719 0.1257 0.0646 0.0619 0.0539 ...
## - attr(*, "dimnames")=List of 1
## ..$ : chr [1:10] "TORNADO" "EXCESSIVE HEAT" "FLASH FLOOD" "HEAT" ...
## TORNADO EXCESSIVE HEAT FLASH FLOOD HEAT LIGHTNING
## 0.37194 0.12565 0.06458 0.06187 0.05388
## TSTM WIND
## 0.03328
per_graph=data.frame(event=names(per),percentage=as.vector(per));str(per_graph)
## 'data.frame': 10 obs. of 2 variables:
## $ event : Factor w/ 10 levels "AVALANCHE","EXCESSIVE HEAT",..: 9 2 3 5 7 10 4 8 6 1
## $ percentage: num 0.3719 0.1257 0.0646 0.0619 0.0539 ...
per_graph=transform(per_graph,event = reorder(event, percentage))
p=qplot(event,data=per_graph,geom="bar",weight=percentage,fill=event,xlab="Top 10 severe weather events",ylab="Percentage of deaths resulting from weather event",main="
Figure 1. Deaths caused by severe weather events in the United States from 1950-2011")
p+theme(axis.text.x = element_text(angle = 90))
Figure 2 shows total cash loss from specific weather events versus time. The top ten weather events are the focus of this figure as they account for ~86% of total finanical loss.
library(ggplot2)
per=rev(sort(cost))[1:10]/sum(cost);str(per);head(per)
## num [1:10(1d)] 0.3156 0.151 0.1204 0.0909 0.0394 ...
## - attr(*, "dimnames")=List of 1
## ..$ : chr [1:10] "FLOOD" "HURRICANE/TYPHOON" "TORNADO" "STORM SURGE" ...
## FLOOD HURRICANE/TYPHOON TORNADO STORM SURGE
## 0.31555 0.15096 0.12037 0.09094
## HAIL FLASH FLOOD
## 0.03937 0.03687
per_graph=data.frame(event=names(per),percentage=as.vector(per));str(per_graph)
## 'data.frame': 10 obs. of 2 variables:
## $ event : Factor w/ 10 levels "DROUGHT","FLASH FLOOD",..: 3 6 10 9 4 2 1 5 8 7
## $ percentage: num 0.3156 0.151 0.1204 0.0909 0.0394 ...
per_graph=transform(per_graph,event = reorder(event, percentage))
p=qplot(event,data=per_graph,geom="bar",weight=percentage,fill=event,xlab="Top 10 severe weather events",ylab="Percentage of financial loss resulting from weather event",main="
Figure 2. Financial Loss caused by severe weather events in the United States from 1950-2011")
p+theme(axis.text.x = element_text(angle = 90))
Figure 3 was developed to look at weather event impact over time and whether certain emerging events have a greater recent impact, but less overall. Global warming or another reason could be the source of these higher priority emerging weather events.
top_deaths=transform(top_deaths,EVTYPE = reorder(EVTYPE,deaths))
p=qplot(EVTYPE,data=top_deaths,geom="bar",weight=deaths,fill=EVTYPE,xlab="Top severe weather
events per decade",ylab="Number of deaths resulting from weather event",main="
Figure 3. Deaths caused by severe weather events in the United States by decade from 1950-2011")
p+facet_wrap(~year_periods, scale="free")+theme(axis.text.x = element_text(angle = 90))
A person in charge of allocating preventive resources to reduce the impact of severe weather events on public health and crops/property can employ the above reproducible code to more efficiently channel resources. For instance, Figs.1 & 2 show tornados are both a major cause of deaths and financial loss. A focus on the reduction of tornado impact seems appropriate. Additionally, Figure 3 shows that in the 1990s, excessive heat caused the highest deaths; this is in contrast to tornados causing the most deaths in all other decades. It would be interesting to research why excessive heat caused more deaths, and to model the likelihood excessive heat usurps tornados again in 2010-2020.
Used http://www.ncdc.noaa.gov/oa/climate/sd/annsum1996.pdf to attain mean tornados(87)/yr and multiplied by 61 years for a total of 5,301 which is close to the 5633 reported in the NCDC dataset used in this project. The distribution qualities (skew, etc.) of the values the mean was derived from was not taken into account as time was limited.