We wish to examine the effects of damaging weather events on population health and economic prosperity. The data used in this study is published from the National Oceanic & Atmospheric Administration, which documents the frequency of storms and significant weather events, both damaging and rare. Most data is provided by the National Weather Service to compile this dataset, though other sources are included.
We will examine this dataset to determine both health and economic effects of storms: which types of events are most costly to human lives and most monetarily costly in terms of physical damages. Recoded data on injuries, fatalities, crop damages, and property damages for all recorded events will be used to answer these questions.
The data analyzed here are made available from NOAA. The file was loaded via the read.csv() command into RStudio. The header of the data is given below, giving a quick look at the file to make sure it was loaded as expected.
StormData=read.csv('repdata-data-StormData.csv')
summary(StormData)
## STATE__ BGN_DATE BGN_TIME
## Min. : 1.0 5/25/2011 0:00:00: 1202 12:00:00 AM: 10163
## 1st Qu.:19.0 4/27/2011 0:00:00: 1193 06:00:00 PM: 7350
## Median :30.0 6/9/2011 0:00:00 : 1030 04:00:00 PM: 7261
## Mean :31.2 5/30/2004 0:00:00: 1016 05:00:00 PM: 6891
## 3rd Qu.:45.0 4/4/2011 0:00:00 : 1009 12:00:00 PM: 6703
## Max. :95.0 4/2/2006 0:00:00 : 981 03:00:00 PM: 6700
## (Other) :895866 (Other) :857229
## TIME_ZONE COUNTY COUNTYNAME STATE
## CST :547493 Min. : 0 JEFFERSON : 7840 TX : 83728
## EST :245558 1st Qu.: 31 WASHINGTON: 7603 KS : 53440
## MST : 68390 Median : 75 JACKSON : 6660 OK : 46802
## PST : 28302 Mean :101 FRANKLIN : 6256 MO : 35648
## AST : 6360 3rd Qu.:131 LINCOLN : 5937 IA : 31069
## HST : 2563 Max. :873 MADISON : 5632 NE : 30271
## (Other): 3631 (Other) :862369 (Other):621339
## EVTYPE BGN_RANGE BGN_AZI
## HAIL :288661 Min. : 0 :547332
## TSTM WIND :219940 1st Qu.: 0 N : 86752
## THUNDERSTORM WIND: 82563 Median : 0 W : 38446
## TORNADO : 60652 Mean : 1 S : 37558
## FLASH FLOOD : 54277 3rd Qu.: 1 E : 33178
## FLOOD : 25326 Max. :3749 NW : 24041
## (Other) :170878 (Other):134990
## BGN_LOCATI END_DATE END_TIME
## :287743 :243411 :238978
## COUNTYWIDE : 19680 4/27/2011 0:00:00: 1214 06:00:00 PM: 9802
## Countywide : 993 5/25/2011 0:00:00: 1196 05:00:00 PM: 8314
## SPRINGFIELD : 843 6/9/2011 0:00:00 : 1021 04:00:00 PM: 8104
## SOUTH PORTION: 810 4/4/2011 0:00:00 : 1007 12:00:00 PM: 7483
## NORTH PORTION: 784 5/30/2004 0:00:00: 998 11:59:00 PM: 7184
## (Other) :591444 (Other) :653450 (Other) :622432
## COUNTY_END COUNTYENDN END_RANGE END_AZI
## Min. :0 Mode:logical Min. : 0 :724837
## 1st Qu.:0 NA's:902297 1st Qu.: 0 N : 28082
## Median :0 Median : 0 S : 22510
## Mean :0 Mean : 1 W : 20119
## 3rd Qu.:0 3rd Qu.: 0 E : 20047
## Max. :0 Max. :925 NE : 14606
## (Other): 72096
## END_LOCATI LENGTH WIDTH F
## :499225 Min. : 0.0 Min. : 0 Min. :0
## COUNTYWIDE : 19731 1st Qu.: 0.0 1st Qu.: 0 1st Qu.:0
## SOUTH PORTION : 833 Median : 0.0 Median : 0 Median :1
## NORTH PORTION : 780 Mean : 0.2 Mean : 8 Mean :1
## CENTRAL PORTION: 617 3rd Qu.: 0.0 3rd Qu.: 0 3rd Qu.:1
## SPRINGFIELD : 575 Max. :2315.0 Max. :4400 Max. :5
## (Other) :380536 NA's :843563
## MAG FATALITIES INJURIES PROPDMG
## Min. : 0 Min. : 0 Min. : 0.0 Min. : 0
## 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0.0 1st Qu.: 0
## Median : 50 Median : 0 Median : 0.0 Median : 0
## Mean : 47 Mean : 0 Mean : 0.2 Mean : 12
## 3rd Qu.: 75 3rd Qu.: 0 3rd Qu.: 0.0 3rd Qu.: 0
## Max. :22000 Max. :583 Max. :1700.0 Max. :5000
##
## PROPDMGEXP CROPDMG CROPDMGEXP WFO
## :465934 Min. : 0.0 :618413 :142069
## K :424665 1st Qu.: 0.0 K :281832 OUN : 17393
## M : 11330 Median : 0.0 M : 1994 JAN : 13889
## 0 : 216 Mean : 1.5 k : 21 LWX : 13174
## B : 40 3rd Qu.: 0.0 0 : 19 PHI : 12551
## 5 : 28 Max. :990.0 B : 9 TSA : 12483
## (Other): 84 (Other): 9 (Other):690738
## STATEOFFIC
## :248769
## TEXAS, North : 12193
## ARKANSAS, Central and North Central: 11738
## IOWA, Central : 11345
## KANSAS, Southwest : 11212
## GEORGIA, North and Central : 11120
## (Other) :595920
## ZONENAMES
## :594029
## :205988
## GREATER RENO / CARSON CITY / M - GREATER RENO / CARSON CITY / M : 639
## GREATER LAKE TAHOE AREA - GREATER LAKE TAHOE AREA : 592
## JEFFERSON - JEFFERSON : 303
## MADISON - MADISON : 302
## (Other) :100444
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_
## Min. : 0 Min. :-14451 Min. : 0 Min. :-14455
## 1st Qu.:2802 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0
## Median :3540 Median : 8707 Median : 0 Median : 0
## Mean :2875 Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.:4019 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. :9706 Max. : 17124 Max. :9706 Max. :106220
## NA's :47 NA's :40
## REMARKS REFNUM
## :287433 Min. : 1
## : 24013 1st Qu.:225575
## Trees down.\n : 1110 Median :451149
## Several trees were blown down.\n : 568 Mean :451149
## Trees were downed.\n : 446 3rd Qu.:676723
## Large trees and power lines were blown down.\n: 432 Max. :902297
## (Other) :588295
The variables that will be of particular interest in this analysis are as follows: -EVTYPE: Notes the type of storm/weather -FATALITIES: Number of deaths due to the event -INJURIES: Number of injuries due to the event -PROPDMG: Property damage in dollars (part 1) -PROPDMGEXP: Gives indicator of the exponent of the property damage (part 2) -CROPDMG: Damage to crops in dollars (part 1) -CROPDMGEXP: Gives indicator of the exponent of the property damage (part 2)
CROPDMGEXP and PROPDMGEXP are exponents that are described by the following key: ? or 0 means there is no multiplier (exp=0) k or K means x 1000 (exp=3) m or M means x 1000000 (exp=6) b or B means x 1000000000 (exp=9)
The full value of property damage by the event is given by PROPDMG(10exp). The full value of damage to crops by the event is given by CROPDMG(10exp).
Some brief overview here
The first question we want to answer is: “Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?”
To answer this, we focus on the number of fatalities and injuries caused by each environmental type. The first step is to find the total number (sum) of all fatalities and injuries for each of the 985 types of events, given all 902297 events that occurred and were recorded.
fatalitysum=tapply(StormData$FATALITIES,StormData$EVTYPE,sum,na.rm=TRUE)
injurysum=tapply(StormData$INJURIES,StormData$EVTYPE,sum,na.rm=TRUE)
We can look at the summaries of these sums to see the distribution of fatalities, injuries, and both. Notice that the injuries occur in much higher numbers than fatalities.
summary(fatalitysum)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 0 15 0 5630
summary(injurysum)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 0 143 0 91300
summary(fatalitysum+injurysum)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 0 0 158 0 97000
We sort the dataframe by the total number of fatalities and injuries and then examine the distribution of the top 15 most crippling events.
df=data.frame(injurysum[order(injurysum+fatalitysum,decreasing=TRUE)],
fatalitysum[order(injurysum+fatalitysum,decreasing=TRUE)],
names(fatalitysum)[order(injurysum+fatalitysum,decreasing=TRUE)])
colnames(df)=c("Injuries","Fatalities","Type")
par(mar=c(12, 4, 1.5, 0.5))
indices=1:15
barplot((df$Fatalities[indices]+df$Injuries[indices]),col='green',ylab="Number of People",las=2, cex.axis=.75)
barplot((df$Injuries[indices]),col='blue',add=T,las=2,cex.axis=.75)
legend("topright",legend=c("Fatalities","Injuries"),fill=c("green","blue"),cex=.8)
Given the present data, tornadoes are responsible for more deaths and injuries than any other type of event. In fact, tornadoes are responsible for 37.2% of all deaths (5,633 out of 15,145 total) and a whopping 65% of all injuries (91,346 out of 140,528 total). Besides tornados, types of heat (heat, extreme heat) and various sources of wind (tsunami, thunderstorm, etc.) also constitute many of the remaining injuries and deaths.
The second question we want to answer is: “Across the United States, which types of events have the greatest economic consequences?”
We approach this much the same way as examining the health effects of these events. We focus on the monetary propery and crop damages caused by each environmental type. The first step is to find the total number (sum) of all crop and property damage in dollars for each of the 985 types of events, given all 902297 events that occurred and were recorded.
Before this, we must convert the two columns defining crop damages (CROPDMG and CROPDMGEXP) to a single dollar amount and the two columns defining property damages (PROPDMG and PROPDMGEXP) to a single dollar amount.
CROPDMGEXP and PROPDMGEXP are exponents that are described by the following key: ? or 0 means there is no multiplier (exp=0) k or K means x 1000 (exp=3) m or M means x 1000000 (exp=6) b or B means x 1000000000 (exp=9)
The full value of property damage by the event is given by PROPDMG(10exp). The full value of damage to crops by the event is given by CROPDMG(10exp).
Below, we substitute the code for numerical values and find total property and total crop damage by event in dollars.
StormData$Cexp=StormData$CROPDMGEXP
StormData$Cexp=gsub("\\?",0,StormData$Cexp)
StormData$Cexp=gsub("k",10^3,StormData$Cexp,ignore.case=TRUE)
StormData$Cexp=gsub("m",10^6,StormData$Cexp,ignore.case=TRUE)
StormData$Cexp=gsub("b",10^9,StormData$Cexp,ignore.case=TRUE)
StormData$Cexp=as.numeric(StormData$Cexp)
StormData$Pexp=StormData$PROPDMGEXP
StormData$Pexp=gsub("\\?",0,StormData$Pexp)
StormData$Pexp=gsub("k",10^3,StormData$Pexp,ignore.case=TRUE)
StormData$Pexp=gsub("m",10^6,StormData$Pexp,ignore.case=TRUE)
StormData$Pexp=gsub("b",10^9,StormData$Pexp,ignore.case=TRUE)
StormData$Pexp=as.numeric(StormData$Pexp)
## Warning: NAs introduced by coercion
cropdam=tapply(StormData$CROPDMG+StormData$Cexp,StormData$EVTYPE,sum,na.rm=TRUE)
propdam=tapply(StormData$PROPDMG+StormData$Pexp,StormData$EVTYPE,sum,na.rm=TRUE)
A summary of each (and together) is given to see the distribution of values.
summary(cropdam)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00e+00 0.00e+00 0.00e+00 1.14e+07 0.00e+00 4.14e+09
summary(propdam)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00e+00 0.00e+00 0.00e+00 5.26e+07 1.50e+03 1.20e+10
summary(cropdam+propdam)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00e+00 0.00e+00 0.00e+00 6.40e+07 2.03e+03 1.31e+10
We sort the dataframe by the total amount of crop and property damage and then examine the distribution of the top 15 most costly events.
df2=data.frame(propdam[order(propdam+cropdam,decreasing=TRUE)],
cropdam[order(propdam+cropdam,decreasing=TRUE)],
names(cropdam)[order(propdam+cropdam,decreasing=TRUE)])
colnames(df2)=c("PropDamage","CropDamage","Type")
par(mar=c(12, 4, 1.5, 0.5))#,mfrow=c(2,1))
indices=1:15
barplot((df2$PropDamage[indices]+df2$CropDamage[indices]),
col='green',ylab="Dollars",las=2, cex.axis=.75)
barplot((df2$PropDamage[indices]),
col='blue',add=T,las=2,cex.axis=.75)
legend("topright",legend=c("Crop Damage","Property Damage"),fill=c("green","blue"),cex=.8)
Given the present data, hurricanes/typhoons are responsible for more monetary damage than any other type of event, followed by tornadoes, floods, and droughts. Hurricanes and typhoons (Hurricane, Hurricane/Typhoon, Hurricane Opal) are responsible for 28.8% of all crop and property damages. Tornadoes are responsible for 12.1%, floods and flash floods 15.3%, and droughts 6.6%. For most of these most costly events, the property damages far outweigh the crop damages. However, for drought, the crops suffer far beyond any property damages.
Floods and tornadoes are quite common, and both quite costly, though tornadoes have a hugely disproportionate effect on health compared to all other events. Different types of wind are also quite common and lead to many injuries (though not many fatalities). Hurricanes are not common, but very costly where and when they occur.
With regards to human lives, it would be prudent to emphasize better techniques for identification and prediction, alerting the public, and addressing safety procedures for tornadoes, winds, and heat. Monetarily, it is wise to plan for large damages from tornadoes, floods, and droughts as these are all quite common and costly. Hurricanes are the most costly due to widespread property damage, but are more rare and generally known ahead of time, allowing for special preparation.