Tornados/Wind have the greatest impact on United States from a population health and economic consequence persepective

Analyzing he NOAA Storm Database

Synopsis

The goal here is to expore the NOAA Storm Database and answer some basic questions about severe weather events. Furthermore, the two questions of interest here are:

This analysis condluded that, Wind/Torados are responsible for 75% of the total effect on population health and 49% of the total economic consequence for all weather related events in the United States. It is to be noted that, Wind/Tornado events have extremely more imapct to property than crops in the United Stated and cause more injuries then deaths.

Other major events that are noteworthy include Heat/Drought due to their impact to population health, accounting for 9% of total incidences. While from an economic consequence standpoint, hurricanes and floods account for a total of 37% of the total economic damange.

Overall, injuries from wether events have a greater impact on population health. In similar respect, property damage from weather events has a greater economic consequence than crop damages.

Data Processing

Provided below are the list of libraries required for this analysis

## Loading required package: survival
## Loading required package: splines
## Loading required package: MASS

The code below extract the rae file and loads it in the StormData data.frame

OrigDataURL = "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists("dataFile.bz2")) 
download.file(OrigDataURL, "dataFile.bz2")

# Read downloaded data file
StormData <- read.csv("dataFile.bz2")

In order to simplify the number of classes of events, reclassification of the $EVTYPE variable was done to accomodate the following 11 classes: Flood, Rain, Hurricane, Heat/Drought, Cold, Fire, Wind/tornado, Marine, Dust/Fog, Slides, and Other. The new calssifications were stored in the $TYPE varaible.

StormData$TYPE <- StormData$EVTYPE
StormData$TYPE<-rep("Other",length(StormData$EVTYPE)) # Inizialize with a generic value

StormData$TYPE[grep("*flood*",StormData$EVTYPE,ignore.case=TRUE)]<-"Flood"
StormData$TYPE[grep("(*rain*|*hail*|*storm*|*lightning*)",StormData$EVTYPE,ignore.case=TRUE)]<-"Percipitation"
StormData$TYPE[grep("*hurrica*",StormData$EVTYPE,ignore.case=TRUE)]<-"Hurricane"
StormData$TYPE[grep("(*heat*|*drought*)",StormData$EVTYPE,ignore.case=TRUE)]<-"Heat/Drought"
StormData$TYPE[grep("(*cold*|*freeze*)",StormData$EVTYPE,ignore.case=TRUE)]<-"Cold"
StormData$TYPE[grep("*fire*",StormData$EVTYPE,ignore.case=TRUE)]<-"Fire"
StormData$TYPE[grep("(*blizzard*|*wind*|*torn*)",StormData$EVTYPE,ignore.case=TRUE)]<-"Wind/Tornado"
StormData$TYPE[grep("(*marine*|*tsunami*|*seiche*|*tide*|*coast*)",StormData$EVTYPE,ignore.case=TRUE)]<-"Marine"
StormData$TYPE[grep("(*dust*|*smoke*|*fog*|*ash*)",StormData$EVTYPE,ignore.case=TRUE)]<-"Dust/Fog"
StormData$TYPE[grep("(*debris*|*mud*|*slide*|*avalanche*)",StormData$EVTYPE,ignore.case=TRUE)]<-"Slides"

StormData$TYPE<-as.factor(StormData$TYPE)

To make data processing faster, the main columns from the original dataset were extracted. The columns related to health data that will help us answer our first question are:

The columns related to economic data that will help us answer our second question are:

Both these datasets have the event classification column:

HealthData <- StormData[,c("TYPE", "FATALITIES", "INJURIES")]
EconData <-  StormData[,c("TYPE", "PROPDMG", "CROPDMG", "PROPDMGEXP", "CROPDMGEXP")]

We can now see overall statistics for the two data sets created

summary(HealthData)
##             TYPE          FATALITIES     INJURIES     
##  Wind/Tornado :438977   Min.   :  0   Min.   :   0.0  
##  Percipitation:304659   1st Qu.:  0   1st Qu.:   0.0  
##  Dust/Fog     : 61092   Median :  0   Median :   0.0  
##  Heat/Drought : 33077   Mean   :  0   Mean   :   0.2  
##  Flood        : 26141   3rd Qu.:  0   3rd Qu.:   0.0  
##  Other        : 18705   Max.   :583   Max.   :1700.0  
##  (Other)      : 19646
summary(EconData)
##             TYPE           PROPDMG        CROPDMG        PROPDMGEXP    
##  Wind/Tornado :438977   Min.   :   0   Min.   :  0.0          :465934  
##  Percipitation:304659   1st Qu.:   0   1st Qu.:  0.0   K      :424665  
##  Dust/Fog     : 61092   Median :   0   Median :  0.0   M      : 11330  
##  Heat/Drought : 33077   Mean   :  12   Mean   :  1.5   0      :   216  
##  Flood        : 26141   3rd Qu.:   0   3rd Qu.:  0.0   B      :    40  
##  Other        : 18705   Max.   :5000   Max.   :990.0   5      :    28  
##  (Other)      : 19646                                  (Other):    84  
##    CROPDMGEXP    
##         :618413  
##  K      :281832  
##  M      :  1994  
##  k      :    21  
##  0      :    19  
##  B      :     9  
##  (Other):     9

The table below shows the break down of exponents coded in the $PROPDMGEXP field. Here, we see that there are issues with letter cases and annomaly values such as '-' or '+'.

table(EconData$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330

To get the true value of property damage, the $PROPDMG variable was raised to the power of $PROPDMGEXP. A intermidiate step where a numeric value was assigned to the character values in the $PROPDMGEXP variable was performed. The table above should provide some insight on what to code the values in the $PROPDMGEXP field with.

# Adding the exponenet to relavent values
EconData$PROPDMG_VAL[EconData$PROPDMGEXP=="K"]<-EconData$PROPDMG*10^3
EconData$PROPDMG_VAL[EconData$PROPDMGEXP=="M"]<-EconData$PROPDMG*10^6
EconData$PROPDMG_VAL[EconData$PROPDMGEXP=="m"]<-EconData$PROPDMG*10^6
EconData$PROPDMG_VAL[EconData$PROPDMGEXP=="B"]<-EconData$PROPDMG*10^9
EconData$PROPDMG_VAL[EconData$PROPDMGEXP=="0"]<-EconData$PROPDMG*1^0
EconData$PROPDMG_VAL[EconData$PROPDMGEXP=="1"]<-EconData$PROPDMG*1^1
EconData$PROPDMG_VAL[EconData$PROPDMGEXP=="2"]<-EconData$PROPDMG*1^2
EconData$PROPDMG_VAL[EconData$PROPDMGEXP=="3"]<-EconData$PROPDMG*1^3
EconData$PROPDMG_VAL[EconData$PROPDMGEXP=="4"]<-EconData$PROPDMG*1^4
EconData$PROPDMG_VAL[EconData$PROPDMGEXP=="5"]<-EconData$PROPDMG*1^5
EconData$PROPDMG_VAL[EconData$PROPDMGEXP=="6"]<-EconData$PROPDMG*1^6
EconData$PROPDMG_VAL[EconData$PROPDMGEXP=="7"]<-EconData$PROPDMG*1^7
EconData$PROPDMG_VAL[EconData$PROPDMGEXP=="8"]<-EconData$PROPDMG*1^8

The table below shows the break down of exponents coded in the $CROPDMGEXP field. Here, we see that there are issues with letter cases and annomaly values such as '-' or '+'.

table(EconData$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994

To get the true value of crop damage, the $CROPDMG variable was raised to the power of $CROPDMGEXP. A intermidiate step where a numeric value was assigned to the character values in the $CROPDMGEXP variable was performed.

#EconData$CROPDMGEXP[is.na(EconData$CROPDMGEXP)]<-"1"
EconData$CROPDMG_VAL[EconData$CROPDMGEXP=="K"]<-EconData$CROPDMG*10^3
EconData$CROPDMG_VAL[EconData$CROPDMGEXP=="k"]<-EconData$CROPDMG*10^3
EconData$CROPDMG_VAL[EconData$CROPDMGEXP=="M"]<-EconData$CROPDMG*10^6
EconData$CROPDMG_VAL[EconData$CROPDMGEXP=="m"]<-EconData$CROPDMG*10^6
EconData$CROPDMG_VAL[EconData$CROPDMGEXP=="B"]<-EconData$CROPDMG*10^9
EconData$CROPDMG_VAL[EconData$CROPDMGEXP=="m"]<-EconData$CROPDMG*10^6
EconData$CROPDMG_VAL[EconData$CROPDMGEXP=="0"]<-EconData$CROPDMG*1^0
EconData$CROPDMG_VAL[EconData$CROPDMGEXP=="2"]<-EconData$CROPDMG*1^2

Added a total and percentage column to aid in analysis in further secions.

HealthData_Tidy <- HealthData[]
HealthData_Tidy <- aggregate(.~TYPE, data= HealthData_Tidy, FUN=sum)
HealthData_Tidy$TOTAL <- HealthData_Tidy$FATALITIES + HealthData_Tidy$INJURIES
HealthData_Tidy$PERCENT_TOTAL <- round((HealthData_Tidy$TOTAL/sum(HealthData_Tidy$TOTAL))*100,2)
#HealthData_Tidy[order(HealthData_Tidy$TOTAL, decreasing = TRUE),]

Added a total and percentage column to aid in analysis in further secions.

EconData_Tidy <- EconData[]
EconData_Tidy <- aggregate(.~TYPE, data= EconData_Tidy, FUN=sum)
EconData_Tidy$TOTAL <- EconData_Tidy$PROPDMG_VAL + EconData_Tidy$CROPDMG_VAL
EconData_Tidy$PERCENT_TOTAL <- round((EconData_Tidy$TOTAL/sum(EconData_Tidy$TOTAL))*100,2)
#EconData_Tidy[order(EconData_Tidy$TOTAL, decreasing = TRUE),]

Results

Across the United States, which types of events are most harmful with respect to population health?


HealthData_Tidy_Graph <-HealthData_Tidy[,1:3]
HealthData_Tidy_Graph <- melt(HealthData_Tidy_Graph, id = c("TYPE"))
HealthData_Tidy_Graph[order(HealthData_Tidy_Graph$TYPE, HealthData_Tidy_Graph$value, HealthData_Tidy_Graph$variable, decreasing = TRUE),]
##             TYPE   variable  value
## 22  Wind/Tornado   INJURIES 108055
## 11  Wind/Tornado FATALITIES   7596
## 10        Slides FATALITIES    268
## 21        Slides   INJURIES    226
## 20 Percipitation   INJURIES   6625
## 9  Percipitation FATALITIES    838
## 19         Other   INJURIES   1351
## 8          Other FATALITIES    778
## 18        Marine   INJURIES    199
## 7         Marine FATALITIES     86
## 17     Hurricane   INJURIES   1328
## 6      Hurricane FATALITIES    133
## 16  Heat/Drought   INJURIES  10604
## 5   Heat/Drought FATALITIES   3419
## 15         Flood   INJURIES   6794
## 4          Flood FATALITIES    483
## 14          Fire   INJURIES   1063
## 3           Fire FATALITIES     78
## 13      Dust/Fog   INJURIES   3966
## 2       Dust/Fog FATALITIES   1227
## 12          Cold   INJURIES    317
## 1           Cold FATALITIES    239
g <- ggplot(HealthData_Tidy_Graph, aes(x =TYPE, y=log(value), fill = variable ))
g <- g + geom_bar(stat="identity") 
g <- g + guides(fill=guide_legend(reverse=TRUE, title = "Type of Incident"))
g <- g + scale_fill_brewer(palette="Pastel1")
g <- g + theme(legend.position="bottom")
g <- g + xlab("Type of Event") + ylab("Log Number of Incidences [Units: log(count)]") + labs(title = "Number of incidences to population health by event type")
print(g)

plot of chunk unnamed-chunk-10 Figure 1. Effects of weather related events on public health

x1 <- xtable(HealthData_Tidy[order(HealthData_Tidy$TOTAL, decreasing = TRUE),])
print(x1,floating=FALSE, type = "html")
TYPE FATALITIES INJURIES TOTAL PERCENT_TOTAL
11 Wind/Tornado 7596.00 108055.00 115651.00 74.29
5 Heat/Drought 3419.00 10604.00 14023.00 9.01
9 Percipitation 838.00 6625.00 7463.00 4.79
4 Flood 483.00 6794.00 7277.00 4.67
2 Dust/Fog 1227.00 3966.00 5193.00 3.34
8 Other 778.00 1351.00 2129.00 1.37
6 Hurricane 133.00 1328.00 1461.00 0.94
3 Fire 78.00 1063.00 1141.00 0.73
1 Cold 239.00 317.00 556.00 0.36
10 Slides 268.00 226.00 494.00 0.32
7 Marine 86.00 199.00 285.00 0.18

Analysis: From figure 1 and the table above, we see that Wind/Tornado events are the most harmful and account for about 74% of the total damage. Furthremore, the number of fatalaties and injuriesinduvidually from this type of event are greater than any other type of event. Even on a log scale the total number of Wind/Tornado events are over an unit more than the scond highest event type.

Heat/Drought event type is the second most harful. Although these types of events only account for roughly 9% of the total, they are double of the next highest event type. Futhermore, the number of fatalites in this type of event type constitute for a large percentage, about 23%, of the total fatality amount.

The rest of that event types account for 16.5% of the total.

Across the United States, which types of events have the greatest economic consequences?

EconData_Tidy_Graph <-EconData_Tidy[,c(1,6,7)]
EconData_Tidy_Graph <- melt(EconData_Tidy_Graph, id = c("TYPE"))

#so as not to break log function
EconData_Tidy_Graph[EconData_Tidy_Graph$value==0, 3] <- 0.0001
#EconData_Tidy_Graph[order(EconData_Tidy_Graph$TYPE, EconData_Tidy_Graph$value, EconData_Tidy_Graph$variable, decreasing = TRUE),]

h <- ggplot(EconData_Tidy_Graph, aes(x =TYPE, y=log(value), fill = variable ))
h <- h + geom_bar(stat="identity") 
h <- h + guides(fill=guide_legend(reverse=TRUE,  title = "Type of Cost"))
h <- h + scale_fill_brewer(palette="Pastel2")
h <- h + theme(legend.position="bottom")
h <- h + xlab("Type of Event") + ylab("Log Cost [Units: log($ USD)]") + labs(title = "Economic consequence by event type")
print(h)

plot of chunk unnamed-chunk-12 Figure 2. Economic consequence of weather related events in the United Stated

x2 <- xtable(EconData_Tidy[order(EconData_Tidy$TOTAL, decreasing = TRUE),c(1,6:9)])
print(x2,floating=FALSE, type = "html")
TYPE PROPDMG_VAL CROPDMG_VAL TOTAL PERCENT_TOTAL
11 Wind/Tornado 299071791907.50 128504070.00 299200295977.50 48.55
6 Hurricane 182977647500.00 0.00 182977647500.00 29.69
4 Flood 42709046140.00 14031200.00 42723077340.00 6.93
9 Percipitation 30347558105.00 60427750.00 30407985855.00 4.93
3 Fire 26638610080.00 3447100.00 26642057180.00 4.32
7 Marine 25342471330.00 7980300.00 25350451630.00 4.11
2 Dust/Fog 7460721977.50 16308910.00 7477030887.50 1.21
5 Heat/Drought 791327070.00 10588300.00 801915370.00 0.13
8 Other 365058310.00 4011900.00 369070210.00 0.06
10 Slides 257828350.00 170500.00 257998850.00 0.04
1 Cold 13439710.00 526500.00 13966210.00 0.00

Analysis: From figure 2 and the table above, we see that Wind/Tornados account for about 49% of the total economic impact. This is significantly greater than the next event type. In this type of event, property damage is significantly more than crop damage. The second highest amount of economic consequence comes from Hurricanes, which accounts for about 30% of the total economic consequence. It is to be noted here that all the economic consequence for this event type comes solely from property damage.

The top two event tyes account for 78.24% of the total damage, while the rest account for 21.73%.


Additional Information

sessionInfo()
## R version 3.1.0 (2014-04-10)
## Platform: i386-w64-mingw32/i386 (32-bit)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] splines   stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] xtable_1.7-3    doBy_4.5-10     MASS_7.3-33     survival_2.37-7
## [5] reshape2_1.4    plyr_1.8.1      ggplot2_1.0.0   knitr_1.6      
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_1.2-4    digest_0.6.4        evaluate_0.5.5     
##  [4] formatR_0.10        grid_3.1.0          gtable_0.1.2       
##  [7] lattice_0.20-29     lme4_1.1-6          Matrix_1.1-3       
## [10] minqa_1.2.3         munsell_0.4.2       nlme_3.1-117       
## [13] proto_0.3-10        Rcpp_0.11.1         RcppEigen_0.3.2.1.2
## [16] scales_0.2.4        stringr_0.6.2       tools_3.1.0

Done By: Jovan Sardinha (jovan.sardinha@gmail.com)
Created On: June 16, 2014
Last Updated: June 22, 2014