This study aims to develop a reproducible analysis about the impact of sever weather events on the population health (injuries and fatalities) and economic damages (crops). The data used in this study come from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database over 1950 to 2011. The analysis finds that the most injuries and fatalities are caused by tornados. Also, it is shown that floods are the main cause of the property damages and droughts are the primary cause of the crop damages. This analysis can be used by U.S. government for reallocating resources based on the prediction of primary events that will induce the highest social and economic cost.
The data for this analysis come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size (Storm Data [47Mb]). The data and the data documentation can be downloaded the file from the Reproducible Research Course (Week4) web site. The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
rm(list = ls())
### Upload data.
storm.data.released <- read.csv(bzfile("repdata%2Fdata%2FStormData.csv.bz2"), sep = ",", header = TRUE, stringsAsFactors = FALSE)
### A quick look at the data set, its variables and its dimensionality.
names(storm.data.released)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
str(storm.data.released)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
This analysis focuses only on severe weather events with consequences on health and economic issues. The analysis needs the following subset of variables: EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP.
storm.data.released2 <- storm.data.released[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
### Clean the data set
library(plyr)
## Warning: package 'plyr' was built under R version 3.2.5
storm.data.released2$EVTYPE = mapvalues(storm.data.released2$EVTYPE, from = c("TSTM WIND", "THUNDERSTORM WINDS", "RIVER FLOOD", "HURRICANE/TYPHOON", "HURRICANE"), to = c("THUNDERSTORM WIND", "THUNDERSTORM WIND", "FLOOD", "HURRICANE", "HURRICANE"))
storm.data.released2$PROPDMGEXP = mapvalues(storm.data.released2$PROPDMGEXP, from = c("K", "M", "", "B", "m", "+", "0", "5", "6", "?", "4", "2", "3", "h", "7", "H", "-", "1", "8"), to = c(10^3, 10^6, 1, 10^9, 10^6, 1, 1, 10^5, 10^6, 1, 10^4, 10^2, 10^3, 10^3, 10^7, 10^2, 1, 1, 10^8))
print(head(storm.data.released2$PROPDMG, 12))
## [1] 25.0 2.5 25.0 2.5 2.5 2.5 2.5 2.5 25.0 25.0 2.5 2.5
print(head(storm.data.released2$PROPDMGEXP, 12))
## [1] "1000" "1000" "1000" "1000" "1000" "1000" "1000" "1000"
## [9] "1000" "1000" "1e+06" "1e+06"
PROPFINAL <- as.numeric(storm.data.released2$PROPDMG) * as.numeric(storm.data.released2$PROPDMGEXP)
storm.data.released2$CROPDMGEXP = mapvalues(storm.data.released2$CROPDMGEXP, from = c("M", "K", "m", "B", "?", "0", "k", "2"), to = c(10^6, 10^3, 10^3, 10^9, 1, 1, 10^3, 10^2))
print(head(storm.data.released2$CROPDMG, 12))
## [1] 0 0 0 0 0 0 0 0 0 0 0 0
print(head(storm.data.released2$CROPDMGEXP, 12))
## [1] "" "" "" "" "" "" "" "" "" "" "" ""
CROPFINAL <- as.numeric(storm.data.released2$CROPDMG) * as.numeric(storm.data.released2$CROPDMGEXP)
From analysing the data set, it can be deduced that the events with the most harmful impacts to the population health are fatalities and injuries. The total values related to these two variables are:
total.injuries <- aggregate(INJURIES ~ EVTYPE, data = storm.data.released2, FUN = sum)
total.fatalities <- aggregate(FATALITIES ~ EVTYPE, data = storm.data.released2, FUN = sum)
It is necessary to reorder the values of the injuries variable, to be ready for a plot.
total.injuries <- total.injuries[order(total.injuries$INJURIES, decreasing = TRUE),][1:12,]
print(head(total.injuries, 12))
## EVTYPE INJURIES
## 831 TORNADO 91346
## 758 THUNDERSTORM WIND 9353
## 170 FLOOD 6791
## 130 EXCESSIVE HEAT 6525
## 463 LIGHTNING 5230
## 275 HEAT 2100
## 426 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 244 HAIL 1361
## 402 HURRICANE 1321
## 968 WINTER STORM 1321
## 359 HIGH WIND 1137
The graph showing the impact of the wether events on the injuries in the US is:
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.5
library(extrafont)
## Warning: package 'extrafont' was built under R version 3.2.5
## Registering fonts with R
p1 <- ggplot(data = total.injuries, aes(x = reorder(EVTYPE, - INJURIES), y = INJURIES)) +
geom_bar(fill = heat.colors(12, alpha = 1), color ="black", stat = "identity") +
ylab("Total number of injuries") + xlab("Event type \n (Fig.1)") +
expand_limits(y = c(0, 100000)) +
ggtitle("Weather events' impact on the health of the US population. \n Events with the highest number of injuries")
p1 <- p1 + theme(axis.text.x =
element_text(size = 10,
angle = 45,
hjust = 1,
vjust = 1))
p1 <- p1 + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))
p1 <- p1 + theme(
plot.title = element_text(color = "red", size = 14, face = "bold.italic", family ="Comic Sans MS"),
axis.title.x = element_text(color = "blue", size = 14, face = "bold"),
axis.title.y = element_text(color = "#993333", size = 14, face = "bold"))
print(p1)
It is necessary to reorder the values of the fatalities variable, to be ready for a plot.
total.fatalities <- total.fatalities[order(total.fatalities$FATALITIES, decreasing = TRUE),][1:12,]
print(head(total.fatalities, 12))
## EVTYPE FATALITIES
## 831 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 463 LIGHTNING 816
## 758 THUNDERSTORM WIND 701
## 170 FLOOD 472
## 584 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
## 968 WINTER STORM 206
## 585 RIP CURRENTS 204
The graph showing the impact of the wether events on the injuries in the US is:
library(ggplot2)
library(extrafont)
p1 <- ggplot(data = total.fatalities, aes(x = reorder(EVTYPE, - FATALITIES), y = FATALITIES)) +
geom_bar(fill = heat.colors(12, alpha = 1), color ="black", stat = "identity") +
ylab("Total number of fatalities") + xlab("Event type \n (Fig.2)") +
expand_limits(y = c(0, 7000)) +
ggtitle("Weather events' impact on the health of the US population. \n Events with the highest number of fatalities")
p1 <- p1 + theme(axis.text.x =
element_text(size = 10,
angle = 45,
hjust = 1,
vjust = 1))
p1 <- p1 + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))
p1 <- p1 + theme(
plot.title = element_text(color = "red", size = 14, face = "bold.italic", family ="Comic Sans MS"),
axis.title.x = element_text(color = "blue", size = 14, face = "bold"),
axis.title.y = element_text(color = "#993333", size = 14, face = "bold"))
print(p1)
From analysing the data set, it can be deduced that the events with the most harmful impacts to the economic development are property damages and crop damages. The total values related to these two variables are:
total.property.damages <- aggregate(PROPFINAL ~ EVTYPE, data = storm.data.released2, FUN = sum)
total.crop.damages <- aggregate(CROPFINAL ~ EVTYPE, data = storm.data.released2, FUN = sum)
It is necessary to reorder the values of the property damages and crop damages variables, to be ready for a plot.
total.property.damages <- total.property.damages[order(total.property.damages$PROPFINAL, decreasing = TRUE),][1:12,]
print(head(total.property.damages, 12))
## EVTYPE PROPFINAL
## 170 FLOOD 149776655307
## 402 HURRICANE 81174159010
## 831 TORNADO 56947380677
## 668 STORM SURGE 43323536000
## 153 FLASH FLOOD 16822673979
## 244 HAIL 15735267513
## 758 THUNDERSTORM WIND 9912641826
## 845 TROPICAL STORM 7703890550
## 968 WINTER STORM 6688497251
## 359 HIGH WIND 5270046295
## 953 WILDFIRE 4765114000
## 669 STORM SURGE/TIDE 4641188000
total.crop.damages <- total.crop.damages[order(total.crop.damages$CROPFINAL, decreasing = TRUE),][1:12,]
print(head(total.crop.damages, 12))
## EVTYPE CROPFINAL
## 16 DROUGHT 13972566000
## 35 FLOOD 10691427450
## 78 HURRICANE 5349782800
## 85 ICE STORM 5022113500
## 53 HAIL 3025954470
## 30 FLASH FLOOD 1421317100
## 26 EXTREME COLD 1292973000
## 115 THUNDERSTORM WIND 1159505180
## 47 FROST/FREEZE 1094086000
## 65 HEAVY RAIN 733399800
## 132 TROPICAL STORM 678346000
## 73 HIGH WIND 638571300
#total.crop.damages <- total.crop.damages[order(total.crop.damages$CROPDMGVAL, decreasing = TRUE),][1:12,]
#print(head(total.crop.damages, 12))
The graphs showing the impact of the wether events on the property damages and crop damages in the US are:
library(ggplot2)
library(extrafont)
p1 <- ggplot(data = total.property.damages, aes(x = reorder(EVTYPE, - PROPFINAL), y = PROPFINAL / 10^9)) +
geom_bar(fill = heat.colors(12, alpha = 1), color ="black", stat = "identity") +
ylab("Total number of property damages \n (billions of dollars)") + xlab("Event type") +
expand_limits(y = c(0, 170)) +
ggtitle("Weather events' impact \n on the economic development in the US. \n Events with the highest number \n of property damages")
p1 <- p1 + theme(axis.text.x =
element_text(size = 10,
angle = 45,
hjust = 1,
vjust = 1))
p1 <- p1 + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))
p1 <- p1 + theme(
plot.title = element_text(color = "red", size = 10, face = "bold.italic", family ="Comic Sans MS"),
axis.title.x = element_text(color = "blue", size = 10, face = "bold"),
axis.title.y = element_text(color = "#993333", size = 9, face = "bold"))
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.2.5
p1 <- p1 + theme(plot.margin = unit(c(1,0.6,0.2,0.3), "cm"))
p2 <- ggplot(data = total.crop.damages, aes(x = reorder(EVTYPE, - CROPFINAL), y = CROPFINAL / 10^9)) +
geom_bar(fill = heat.colors(12, alpha = 1), color ="black", stat = "identity") +
ylab("Total number of crop damages \n (billions of dollars)") + xlab("Event type") +
expand_limits(y = c(0, 16)) +
ggtitle("Weather events' impact \n on the economic development in the US. \n Events with the highest number \n of crop damages")
p2 <- p2 + theme(axis.text.x =
element_text(size = 10,
angle = 45,
hjust = 1,
vjust = 1))
p2 <- p2 + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"))
p2 <- p2 + theme(
plot.title = element_text(color = "red", size = 10, face = "bold.italic", family = "Comic Sans MS"),
axis.title.x = element_text(color = "blue", size = 10, face = "bold"),
axis.title.y = element_text(color = "#993333", size = 9, face = "bold"))
p2 <- p2 + theme(plot.margin = unit(c(1,0,0.2,0.6), "cm"))
library(gridExtra)
library(grid)
gA <- ggplotGrob(p1)
gB <- ggplotGrob(p2)
maxWidth = unit.pmax(gA$widths[2:3], gB$widths[2:3])
# Set the widths
gA$widths[2:3] <- as.list(maxWidth)
gB$widths[2:3] <- as.list(maxWidth)
# Arrange the two charts
grid.arrange(gA, gB, nrow = 1, bottom = textGrob("(Fig.3)", gp = gpar(col = "blue", fontsize = 14, fontface = "bold")))
This analysis based on the NOAA data set shows that the maximum number of injuries and fatalities is caused by tornados. The maximum number of the property damages is caused by floods. The maximum number of crop damages is caused by droughts. The graphs show other weather events that are harmful to the population health and economic development.
sessionInfo()
## R version 3.2.4 Revised (2016-03-16 r70336)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] gridExtra_2.2.1 extrafont_0.17 ggplot2_2.1.0 plyr_1.8.4
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.5 digest_0.6.9 Rttf2pt1_1.3.4 gtable_0.2.0
## [5] formatR_1.3 magrittr_1.5 evaluate_0.9 scales_0.4.0
## [9] stringi_1.0-1 extrafontdb_1.0 rmarkdown_0.9.6 labeling_0.3
## [13] tools_3.2.4 stringr_1.0.0 munsell_0.4.3 colorspace_1.2-6
## [17] htmltools_0.3.5 knitr_1.14