Synopsis

We wish to examine the effects of damaging weather events on population health and economic prosperity. The data used in this study is published from the National Oceanic & Atmospheric Administration, which documents the frequency of storms and significant weather events, both damaging and rare. Most data is provided by the National Weather Service to compile this dataset, though other sources are included.

We will examine this dataset to determine both health and economic effects of storms: which types of events are most costly to human lives and most monetarily costly in terms of physical damages. Recoded data on injuries, fatalities, crop damages, and property damages for all recorded events will be used to answer these questions.



Data Processing

The data analyzed here are made available from NOAA. The file was loaded via the read.csv() command into RStudio. The header of the data is given below, giving a quick look at the file to make sure it was loaded as expected.

StormData=read.csv('repdata-data-StormData.csv')

summary(StormData)
##     STATE__                  BGN_DATE             BGN_TIME     
##  Min.   : 1.0   5/25/2011 0:00:00:  1202   12:00:00 AM: 10163  
##  1st Qu.:19.0   4/27/2011 0:00:00:  1193   06:00:00 PM:  7350  
##  Median :30.0   6/9/2011 0:00:00 :  1030   04:00:00 PM:  7261  
##  Mean   :31.2   5/30/2004 0:00:00:  1016   05:00:00 PM:  6891  
##  3rd Qu.:45.0   4/4/2011 0:00:00 :  1009   12:00:00 PM:  6703  
##  Max.   :95.0   4/2/2006 0:00:00 :   981   03:00:00 PM:  6700  
##                 (Other)          :895866   (Other)    :857229  
##    TIME_ZONE          COUNTY         COUNTYNAME         STATE       
##  CST    :547493   Min.   :  0   JEFFERSON :  7840   TX     : 83728  
##  EST    :245558   1st Qu.: 31   WASHINGTON:  7603   KS     : 53440  
##  MST    : 68390   Median : 75   JACKSON   :  6660   OK     : 46802  
##  PST    : 28302   Mean   :101   FRANKLIN  :  6256   MO     : 35648  
##  AST    :  6360   3rd Qu.:131   LINCOLN   :  5937   IA     : 31069  
##  HST    :  2563   Max.   :873   MADISON   :  5632   NE     : 30271  
##  (Other):  3631                 (Other)   :862369   (Other):621339  
##                EVTYPE         BGN_RANGE       BGN_AZI      
##  HAIL             :288661   Min.   :   0          :547332  
##  TSTM WIND        :219940   1st Qu.:   0   N      : 86752  
##  THUNDERSTORM WIND: 82563   Median :   0   W      : 38446  
##  TORNADO          : 60652   Mean   :   1   S      : 37558  
##  FLASH FLOOD      : 54277   3rd Qu.:   1   E      : 33178  
##  FLOOD            : 25326   Max.   :3749   NW     : 24041  
##  (Other)          :170878                  (Other):134990  
##          BGN_LOCATI                  END_DATE             END_TIME     
##               :287743                    :243411              :238978  
##  COUNTYWIDE   : 19680   4/27/2011 0:00:00:  1214   06:00:00 PM:  9802  
##  Countywide   :   993   5/25/2011 0:00:00:  1196   05:00:00 PM:  8314  
##  SPRINGFIELD  :   843   6/9/2011 0:00:00 :  1021   04:00:00 PM:  8104  
##  SOUTH PORTION:   810   4/4/2011 0:00:00 :  1007   12:00:00 PM:  7483  
##  NORTH PORTION:   784   5/30/2004 0:00:00:   998   11:59:00 PM:  7184  
##  (Other)      :591444   (Other)          :653450   (Other)    :622432  
##    COUNTY_END COUNTYENDN       END_RANGE      END_AZI      
##  Min.   :0    Mode:logical   Min.   :  0          :724837  
##  1st Qu.:0    NA's:902297    1st Qu.:  0   N      : 28082  
##  Median :0                   Median :  0   S      : 22510  
##  Mean   :0                   Mean   :  1   W      : 20119  
##  3rd Qu.:0                   3rd Qu.:  0   E      : 20047  
##  Max.   :0                   Max.   :925   NE     : 14606  
##                                            (Other): 72096  
##            END_LOCATI         LENGTH           WIDTH            F         
##                 :499225   Min.   :   0.0   Min.   :   0   Min.   :0       
##  COUNTYWIDE     : 19731   1st Qu.:   0.0   1st Qu.:   0   1st Qu.:0       
##  SOUTH PORTION  :   833   Median :   0.0   Median :   0   Median :1       
##  NORTH PORTION  :   780   Mean   :   0.2   Mean   :   8   Mean   :1       
##  CENTRAL PORTION:   617   3rd Qu.:   0.0   3rd Qu.:   0   3rd Qu.:1       
##  SPRINGFIELD    :   575   Max.   :2315.0   Max.   :4400   Max.   :5       
##  (Other)        :380536                                   NA's   :843563  
##       MAG          FATALITIES     INJURIES         PROPDMG    
##  Min.   :    0   Min.   :  0   Min.   :   0.0   Min.   :   0  
##  1st Qu.:    0   1st Qu.:  0   1st Qu.:   0.0   1st Qu.:   0  
##  Median :   50   Median :  0   Median :   0.0   Median :   0  
##  Mean   :   47   Mean   :  0   Mean   :   0.2   Mean   :  12  
##  3rd Qu.:   75   3rd Qu.:  0   3rd Qu.:   0.0   3rd Qu.:   0  
##  Max.   :22000   Max.   :583   Max.   :1700.0   Max.   :5000  
##                                                               
##    PROPDMGEXP        CROPDMG        CROPDMGEXP          WFO        
##         :465934   Min.   :  0.0          :618413          :142069  
##  K      :424665   1st Qu.:  0.0   K      :281832   OUN    : 17393  
##  M      : 11330   Median :  0.0   M      :  1994   JAN    : 13889  
##  0      :   216   Mean   :  1.5   k      :    21   LWX    : 13174  
##  B      :    40   3rd Qu.:  0.0   0      :    19   PHI    : 12551  
##  5      :    28   Max.   :990.0   B      :     9   TSA    : 12483  
##  (Other):    84                   (Other):     9   (Other):690738  
##                                STATEOFFIC    
##                                     :248769  
##  TEXAS, North                       : 12193  
##  ARKANSAS, Central and North Central: 11738  
##  IOWA, Central                      : 11345  
##  KANSAS, Southwest                  : 11212  
##  GEORGIA, North and Central         : 11120  
##  (Other)                            :595920  
##                                                                                                                                                                                                     ZONENAMES     
##                                                                                                                                                                                                          :594029  
##                                                                                                                                                                                                          :205988  
##  GREATER RENO / CARSON CITY / M - GREATER RENO / CARSON CITY / M                                                                                                                                         :   639  
##  GREATER LAKE TAHOE AREA - GREATER LAKE TAHOE AREA                                                                                                                                                       :   592  
##  JEFFERSON - JEFFERSON                                                                                                                                                                                   :   303  
##  MADISON - MADISON                                                                                                                                                                                       :   302  
##  (Other)                                                                                                                                                                                                 :100444  
##     LATITUDE      LONGITUDE        LATITUDE_E     LONGITUDE_    
##  Min.   :   0   Min.   :-14451   Min.   :   0   Min.   :-14455  
##  1st Qu.:2802   1st Qu.:  7247   1st Qu.:   0   1st Qu.:     0  
##  Median :3540   Median :  8707   Median :   0   Median :     0  
##  Mean   :2875   Mean   :  6940   Mean   :1452   Mean   :  3509  
##  3rd Qu.:4019   3rd Qu.:  9605   3rd Qu.:3549   3rd Qu.:  8735  
##  Max.   :9706   Max.   : 17124   Max.   :9706   Max.   :106220  
##  NA's   :47                      NA's   :40                     
##                                            REMARKS           REFNUM      
##                                                :287433   Min.   :     1  
##                                                : 24013   1st Qu.:225575  
##  Trees down.\n                                 :  1110   Median :451149  
##  Several trees were blown down.\n              :   568   Mean   :451149  
##  Trees were downed.\n                          :   446   3rd Qu.:676723  
##  Large trees and power lines were blown down.\n:   432   Max.   :902297  
##  (Other)                                       :588295

The variables that will be of particular interest in this analysis are as follows: -EVTYPE: Notes the type of storm/weather -FATALITIES: Number of deaths due to the event -INJURIES: Number of injuries due to the event -PROPDMG: Property damage in dollars (part 1) -PROPDMGEXP: Gives indicator of the exponent of the property damage (part 2) -CROPDMG: Damage to crops in dollars (part 1) -CROPDMGEXP: Gives indicator of the exponent of the property damage (part 2)

CROPDMGEXP and PROPDMGEXP are exponents that are described by the following key: ? or 0 means there is no multiplier (exp=0) k or K means x 1000 (exp=3) m or M means x 1000000 (exp=6) b or B means x 1000000000 (exp=9)

The full value of property damage by the event is given by PROPDMG(10exp). The full value of damage to crops by the event is given by CROPDMG(10exp).



Results

Some brief overview here

Health Consequences

The first question we want to answer is: “Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?”

To answer this, we focus on the number of fatalities and injuries caused by each environmental type. The first step is to find the total number (sum) of all fatalities and injuries for each of the 985 types of events, given all 902297 events that occurred and were recorded.

fatalitysum=tapply(StormData$FATALITIES,StormData$EVTYPE,sum,na.rm=TRUE)
injurysum=tapply(StormData$INJURIES,StormData$EVTYPE,sum,na.rm=TRUE)

We can look at the summaries of these sums to see the distribution of fatalities, injuries, and both. Notice that the injuries occur in much higher numbers than fatalities.

summary(fatalitysum)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0       0       0      15       0    5630
summary(injurysum)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0       0       0     143       0   91300
summary(fatalitysum+injurysum)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0       0       0     158       0   97000

We sort the dataframe by the total number of fatalities and injuries and then examine the distribution of the top 15 most crippling events.

df=data.frame(injurysum[order(injurysum+fatalitysum,decreasing=TRUE)],
              fatalitysum[order(injurysum+fatalitysum,decreasing=TRUE)],
              names(fatalitysum)[order(injurysum+fatalitysum,decreasing=TRUE)])
colnames(df)=c("Injuries","Fatalities","Type")

par(mar=c(12, 4, 1.5, 0.5))
indices=1:15
barplot((df$Fatalities[indices]+df$Injuries[indices]),col='green',ylab="Number of People",las=2, cex.axis=.75)
barplot((df$Injuries[indices]),col='blue',add=T,las=2,cex.axis=.75)
legend("topright",legend=c("Fatalities","Injuries"),fill=c("green","blue"),cex=.8)

plot of chunk Health Effects

Given the present data, tornadoes are responsible for more deaths and injuries than any other type of event. In fact, tornadoes are responsible for 37.2% of all deaths (5,633 out of 15,145 total) and a whopping 65% of all injuries (91,346 out of 140,528 total). Besides tornados, types of heat (heat, extreme heat) and various sources of wind (tsunami, thunderstorm, etc.) also constitute many of the remaining injuries and deaths.


Economic Consequences

The second question we want to answer is: “Across the United States, which types of events have the greatest economic consequences?”

We approach this much the same way as examining the health effects of these events. We focus on the monetary propery and crop damages caused by each environmental type. The first step is to find the total number (sum) of all crop and property damage in dollars for each of the 985 types of events, given all 902297 events that occurred and were recorded.

Before this, we must convert the two columns defining crop damages (CROPDMG and CROPDMGEXP) to a single dollar amount and the two columns defining property damages (PROPDMG and PROPDMGEXP) to a single dollar amount.

CROPDMGEXP and PROPDMGEXP are exponents that are described by the following key: ? or 0 means there is no multiplier (exp=0) k or K means x 1000 (exp=3) m or M means x 1000000 (exp=6) b or B means x 1000000000 (exp=9)

The full value of property damage by the event is given by PROPDMG(10exp). The full value of damage to crops by the event is given by CROPDMG(10exp).

Below, we substitute the code for numerical values and find total property and total crop damage by event in dollars.

StormData$Cexp=StormData$CROPDMGEXP
StormData$Cexp=gsub("\\?",0,StormData$Cexp)
StormData$Cexp=gsub("k",10^3,StormData$Cexp,ignore.case=TRUE)
StormData$Cexp=gsub("m",10^6,StormData$Cexp,ignore.case=TRUE)
StormData$Cexp=gsub("b",10^9,StormData$Cexp,ignore.case=TRUE)
StormData$Cexp=as.numeric(StormData$Cexp)

StormData$Pexp=StormData$PROPDMGEXP
StormData$Pexp=gsub("\\?",0,StormData$Pexp)
StormData$Pexp=gsub("k",10^3,StormData$Pexp,ignore.case=TRUE)
StormData$Pexp=gsub("m",10^6,StormData$Pexp,ignore.case=TRUE)
StormData$Pexp=gsub("b",10^9,StormData$Pexp,ignore.case=TRUE)
StormData$Pexp=as.numeric(StormData$Pexp)
## Warning: NAs introduced by coercion
cropdam=tapply(StormData$CROPDMG+StormData$Cexp,StormData$EVTYPE,sum,na.rm=TRUE)
propdam=tapply(StormData$PROPDMG+StormData$Pexp,StormData$EVTYPE,sum,na.rm=TRUE)

A summary of each (and together) is given to see the distribution of values.

summary(cropdam)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.00e+00 0.00e+00 0.00e+00 1.14e+07 0.00e+00 4.14e+09
summary(propdam)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.00e+00 0.00e+00 0.00e+00 5.26e+07 1.50e+03 1.20e+10
summary(cropdam+propdam)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 0.00e+00 0.00e+00 0.00e+00 6.40e+07 2.03e+03 1.31e+10

We sort the dataframe by the total amount of crop and property damage and then examine the distribution of the top 15 most costly events.

df2=data.frame(propdam[order(propdam+cropdam,decreasing=TRUE)],
               cropdam[order(propdam+cropdam,decreasing=TRUE)],
               names(cropdam)[order(propdam+cropdam,decreasing=TRUE)])
colnames(df2)=c("PropDamage","CropDamage","Type")

par(mar=c(12, 4, 1.5, 0.5))#,mfrow=c(2,1))
indices=1:15
barplot((df2$PropDamage[indices]+df2$CropDamage[indices]),
        col='green',ylab="Dollars",las=2, cex.axis=.75)
barplot((df2$PropDamage[indices]),
        col='blue',add=T,las=2,cex.axis=.75)
legend("topright",legend=c("Crop Damage","Property Damage"),fill=c("green","blue"),cex=.8)

plot of chunk Economic Effects

Given the present data, hurricanes/typhoons are responsible for more monetary damage than any other type of event, followed by tornadoes, floods, and droughts. Hurricanes and typhoons (Hurricane, Hurricane/Typhoon, Hurricane Opal) are responsible for 28.8% of all crop and property damages. Tornadoes are responsible for 12.1%, floods and flash floods 15.3%, and droughts 6.6%. For most of these most costly events, the property damages far outweigh the crop damages. However, for drought, the crops suffer far beyond any property damages.



Conclusion

Floods and tornadoes are quite common, and both quite costly, though tornadoes have a hugely disproportionate effect on health compared to all other events. Different types of wind are also quite common and lead to many injuries (though not many fatalities). Hurricanes are not common, but very costly where and when they occur.

With regards to human lives, it would be prudent to emphasize better techniques for identification and prediction, alerting the public, and addressing safety procedures for tornadoes, winds, and heat. Monetarily, it is wise to plan for large damages from tornadoes, floods, and droughts as these are all quite common and costly. Hurricanes are the most costly due to widespread property damage, but are more rare and generally known ahead of time, allowing for special preparation.