In US, Tornado and Flood are the two extreme weather events that have most damages to populaiton health and economic sequences, respectively, since 1950.
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The data for this project come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the web site and unzip it into your working directory: Storm Data (47Mb) There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined. National Weather Service Storm Data Documentation and National Climatic Data Center Storm Events FAQ
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
library(ggplot2)
library(knitr)
sessionInfo()
## R version 3.0.3 (2014-03-06)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] ggplot2_0.9.3.1 knitr_1.5
##
## loaded via a namespace (and not attached):
## [1] colorspace_1.2-4 digest_0.6.4 evaluate_0.5.3 formatR_0.10
## [5] grid_3.0.3 gtable_0.1.2 MASS_7.3-29 munsell_0.4.2
## [9] plyr_1.8.1 proto_0.3-10 Rcpp_0.11.1 reshape2_1.2.2
## [13] scales_0.2.4 stringr_0.6.2 tools_3.0.3
storm <- read.csv("./repdata-data-StormData.csv")
names(storm)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
From the 37 variables listed above, we can see that types of events correspond to the variable “EVTYPE”, and population health correspond to the variables “FATALITIES” and “INJURIES”, which are the numbers of fatalities and injuries in a event. So we decided to subset the data with only these three variables/columns.
events_pophealth <- storm[, c("EVTYPE", "FATALITIES", "INJURIES")]
The next step is to add the harmful variables “FATALITIES” and “INJURIES” together, and calculate the sum of these two with grouping by “EVTYPE”.
events_pophealth$harm <- events_pophealth$FATALITIES + events_pophealth$INJURIES
harm_by_evtype <- tapply(events_pophealth$harm, factor(events_pophealth$EVTYPE),
sum)
From the 37 variables listed earlier, we know that “PROPDMG”, “PROPDMGEXP”, “CROPDMG”, and “CROPDMGEXP” variables correspond to economic consequences. These variables are the property damages and crop damages as well as the damage magnitudes in “K” (thousand), “M” (million), and “B”(billion) in a event. So we only subset the data with these four variables/columns together with variable “EVTYPE” for analyzing the economic results.
events_ecodmg <- storm[, c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
Before we can add the two variables “PROPDMG” and “CROPDMG” together, we need to consider the magnitudes of the damages, and then calculate the sum of these two.
events_ecodmg$PROPDMGEXP <- as.character(events_ecodmg$PROPDMGEXP)
events_ecodmg$CROPDMGEXP <- as.character(events_ecodmg$CROPDMGEXP)
events_ecodmg$PROPDMGEXP[which(events_ecodmg$PROPDMGEXP == "K")] <- "000"
events_ecodmg$PROPDMGEXP[which(events_ecodmg$PROPDMGEXP == "M")] <- "000000"
events_ecodmg$PROPDMGEXP[which(events_ecodmg$PROPDMGEXP == "B")] <- "000000000"
events_ecodmg$CROPDMGEXP[which(events_ecodmg$CROPDMGEXP == "K")] <- "000"
events_ecodmg$CROPDMGEXP[which(events_ecodmg$CROPDMGEXP == "M")] <- "000000"
events_ecodmg$CROPDMGEXP[which(events_ecodmg$CROPDMGEXP == "B")] <- "000000000"
events_ecodmg$PROPDMG <- paste(events_ecodmg$PROPDMG, events_ecodmg$PROPDMGEXP,
sep = "")
events_ecodmg$CROPDMG <- paste(events_ecodmg$CROPDMG, events_ecodmg$CROPDMGEXP,
sep = "")
events_ecodmg$PROPDMG <- as.numeric(events_ecodmg$PROPDMG)
## Warning: NAs introduced by coercion
events_ecodmg$CROPDMG <- as.numeric(events_ecodmg$CROPDMG)
## Warning: NAs introduced by coercion
events_ecodmg$ecodmg <- (events_ecodmg$PROPDMG + events_ecodmg$CROPDMG)/1e+09
events_ecodmg$ecodmg[which(is.na(events_ecodmg$ecodmg))] <- 0
Now, we can calculate the sum of economic damage with grouping by “EVTYPE”.
ecodmg_by_evtype <- tapply(events_ecodmg$ecodmg, factor(events_ecodmg$EVTYPE),
sum)
To answer the question “Across the United States, which types of events are most harmful with respect to population health?” We choose only look at the top 5 harmful sum based on EVTYPE (events type), and plot the figure accordingly.
harm_by_evtype <- sort(harm_by_evtype, decreasing = TRUE)[1:5]
Events <- names(harm_by_evtype)
qplot(Events, harm_by_evtype, main = "Top 5 Events Harmful to US Population Health",
ylab = "Total Number of Fatalities and Injuries", geom = "bar", stat = "identity",
fill = Events)
From the plot we can see that the most harmful weather event to US populaiton health since 1950 is Tornado, with the total number is more than magnitude larger than other events such as Excessive Heat, TSTM Wind, Flood, etc.
To answer the question “Across the United States, which types of events have the greatest economic consequences?” Similarly, we choose only look at the top 5 economic consequences based on EVTYPE (events type), and plot the figure accordingly.
ecodmg_by_evtype <- sort(ecodmg_by_evtype, decreasing = TRUE)[1:5]
Events_Type <- names(ecodmg_by_evtype)
qplot(Events_Type, ecodmg_by_evtype, main = "Top 5 Events Have Great Economic Consequences in US",
ylab = "Property and Crop Damages (in Billions)", geom = "bar", stat = "identity",
fill = Events_Type)
The above plot shows that among the various weather events types, Flood has the greatest economic consequences in US, which has caused damages in about 150 billion dollars since 1950.