Weather Events That Cause Most Damages To Population Health And Ecomonic Consequences

Synnopsis

In US, Tornado and Flood are the two extreme weather events that have most damages to populaiton health and economic sequences, respectively, since 1950.

Data Processing

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

About the data

The data for this project come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the web site and unzip it into your working directory: Storm Data (47Mb) There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined. National Weather Service Storm Data Documentation and National Climatic Data Center Storm Events FAQ

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Data processing and analysis environment

library(ggplot2)
library(knitr)
sessionInfo()
## R version 3.0.3 (2014-03-06)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ggplot2_0.9.3.1 knitr_1.5      
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_1.2-4 digest_0.6.4     evaluate_0.5.3   formatR_0.10    
##  [5] grid_3.0.3       gtable_0.1.2     MASS_7.3-29      munsell_0.4.2   
##  [9] plyr_1.8.1       proto_0.3-10     Rcpp_0.11.1      reshape2_1.2.2  
## [13] scales_0.2.4     stringr_0.6.2    tools_3.0.3

Data loading and processing

storm <- read.csv("./repdata-data-StormData.csv")
names(storm)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Across the United States, which types of events are most harmful with respect to population health?

From the 37 variables listed above, we can see that types of events correspond to the variable “EVTYPE”, and population health correspond to the variables “FATALITIES” and “INJURIES”, which are the numbers of fatalities and injuries in a event. So we decided to subset the data with only these three variables/columns.

events_pophealth <- storm[, c("EVTYPE", "FATALITIES", "INJURIES")]

The next step is to add the harmful variables “FATALITIES” and “INJURIES” together, and calculate the sum of these two with grouping by “EVTYPE”.

events_pophealth$harm <- events_pophealth$FATALITIES + events_pophealth$INJURIES
harm_by_evtype <- tapply(events_pophealth$harm, factor(events_pophealth$EVTYPE), 
    sum)

Across the United States, which types of events have the greatest economic consequences?

From the 37 variables listed earlier, we know that “PROPDMG”, “PROPDMGEXP”, “CROPDMG”, and “CROPDMGEXP” variables correspond to economic consequences. These variables are the property damages and crop damages as well as the damage magnitudes in “K” (thousand), “M” (million), and “B”(billion) in a event. So we only subset the data with these four variables/columns together with variable “EVTYPE” for analyzing the economic results.

events_ecodmg <- storm[, c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

Before we can add the two variables “PROPDMG” and “CROPDMG” together, we need to consider the magnitudes of the damages, and then calculate the sum of these two.

events_ecodmg$PROPDMGEXP <- as.character(events_ecodmg$PROPDMGEXP)
events_ecodmg$CROPDMGEXP <- as.character(events_ecodmg$CROPDMGEXP)
events_ecodmg$PROPDMGEXP[which(events_ecodmg$PROPDMGEXP == "K")] <- "000"
events_ecodmg$PROPDMGEXP[which(events_ecodmg$PROPDMGEXP == "M")] <- "000000"
events_ecodmg$PROPDMGEXP[which(events_ecodmg$PROPDMGEXP == "B")] <- "000000000"
events_ecodmg$CROPDMGEXP[which(events_ecodmg$CROPDMGEXP == "K")] <- "000"
events_ecodmg$CROPDMGEXP[which(events_ecodmg$CROPDMGEXP == "M")] <- "000000"
events_ecodmg$CROPDMGEXP[which(events_ecodmg$CROPDMGEXP == "B")] <- "000000000"
events_ecodmg$PROPDMG <- paste(events_ecodmg$PROPDMG, events_ecodmg$PROPDMGEXP, 
    sep = "")
events_ecodmg$CROPDMG <- paste(events_ecodmg$CROPDMG, events_ecodmg$CROPDMGEXP, 
    sep = "")
events_ecodmg$PROPDMG <- as.numeric(events_ecodmg$PROPDMG)
## Warning: NAs introduced by coercion
events_ecodmg$CROPDMG <- as.numeric(events_ecodmg$CROPDMG)
## Warning: NAs introduced by coercion
events_ecodmg$ecodmg <- (events_ecodmg$PROPDMG + events_ecodmg$CROPDMG)/1e+09
events_ecodmg$ecodmg[which(is.na(events_ecodmg$ecodmg))] <- 0

Now, we can calculate the sum of economic damage with grouping by “EVTYPE”.

ecodmg_by_evtype <- tapply(events_ecodmg$ecodmg, factor(events_ecodmg$EVTYPE), 
    sum)

Results

To answer the question “Across the United States, which types of events are most harmful with respect to population health?” We choose only look at the top 5 harmful sum based on EVTYPE (events type), and plot the figure accordingly.

harm_by_evtype <- sort(harm_by_evtype, decreasing = TRUE)[1:5]
Events <- names(harm_by_evtype)
qplot(Events, harm_by_evtype, main = "Top 5 Events Harmful to US Population Health", 
    ylab = "Total Number of Fatalities and Injuries", geom = "bar", stat = "identity", 
    fill = Events)

plot of chunk plot top5 harmful sum with events

From the plot we can see that the most harmful weather event to US populaiton health since 1950 is Tornado, with the total number is more than magnitude larger than other events such as Excessive Heat, TSTM Wind, Flood, etc.

To answer the question “Across the United States, which types of events have the greatest economic consequences?” Similarly, we choose only look at the top 5 economic consequences based on EVTYPE (events type), and plot the figure accordingly.

ecodmg_by_evtype <- sort(ecodmg_by_evtype, decreasing = TRUE)[1:5]
Events_Type <- names(ecodmg_by_evtype)
qplot(Events_Type, ecodmg_by_evtype, main = "Top 5 Events Have Great Economic Consequences in US", 
    ylab = "Property and Crop Damages (in Billions)", geom = "bar", stat = "identity", 
    fill = Events_Type)

plot of chunk plot top5 economic sum with events

The above plot shows that among the various weather events types, Flood has the greatest economic consequences in US, which has caused damages in about 150 billion dollars since 1950.