————————————————–

Reproducible Research Course - Project 2

————————————————–

Consequences of Severe Weather on Health and Economic Events in the US

1. Synopsis

This study aims to develop a reproducible analysis about the impact of sever weather events on the population health (injuries and fatalities) and economic damages (crops). The data used in this study come from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database over 1950 to 2011. The analysis finds that the most injuries and fatalities are caused by tornados. Also, it is shown that floods are the main cause of the property damages and droughts are the primary cause of the crop damages. This analysis can be used by U.S. government for reallocating resources based on the prediction of primary events that will induce the highest social and economic cost.

1. Data Processing

1.a. About Data

The data for this analysis come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size (Storm Data [47Mb]). The data and the data documentation can be downloaded the file from the Reproducible Research Course (Week4) web site. The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

1.b. Data Pre-processing

Set the working directory
setwd(“I:/Coursera/Data Science Specialization/Course5_Reproducible Research/Assignments/Project2”) ; using the console
Clean the environment

rm(list = ls())

Install some R packages and upload libraries
install.packages(“knitr”)
install.packages(“markdown”)
install.packages(“dplyr”)
install.packages(“ggplot2”“)
install.packages(“gridExtra”)

————————————-

library(knitr)
library(markdown)
library(dplyr)
library(ggplot2)
library(gridExtra)

————————————–

Change fonts in ggplot2. Use instructions in the following order:
install.packages(“extrafont”) ; instruction uploaded using the console
library(extrafont) ; instruction uploaded using the console
font_import(pattern=“[C/c]omic”) ; instruction uploaded using the console
font_import(pattern=“[A/a]rial”) ; instruction uploaded using the console
loadfonts(device=“win”) ; instruction uploaded using the console

————————————–

Upload data. Pre-process data
### Upload data.
storm.data.released <- read.csv(bzfile("repdata%2Fdata%2FStormData.csv.bz2"), sep = ",", header = TRUE, stringsAsFactors = FALSE)
### A quick look at the data set, its variables and its dimensionality.
names(storm.data.released)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
str(storm.data.released)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

This analysis focuses only on severe weather events with consequences on health and economic issues. The analysis needs the following subset of variables: EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP.

storm.data.released2 <- storm.data.released[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
### Clean the data set
library(plyr)
## Warning: package 'plyr' was built under R version 3.2.5
storm.data.released2$EVTYPE = mapvalues(storm.data.released2$EVTYPE, from = c("TSTM WIND", "THUNDERSTORM WINDS", "RIVER FLOOD", "HURRICANE/TYPHOON", "HURRICANE"), to = c("THUNDERSTORM WIND", "THUNDERSTORM WIND", "FLOOD", "HURRICANE", "HURRICANE"))

storm.data.released2$PROPDMGEXP = mapvalues(storm.data.released2$PROPDMGEXP, from = c("K", "M", "", "B", "m", "+", "0", "5", "6", "?", "4", "2", "3", "h", "7", "H", "-", "1", "8"), to = c(10^3, 10^6, 1, 10^9, 10^6, 1, 1, 10^5, 10^6, 1, 10^4, 10^2, 10^3, 10^3, 10^7, 10^2, 1, 1, 10^8))

print(head(storm.data.released2$PROPDMG, 12))
##  [1] 25.0  2.5 25.0  2.5  2.5  2.5  2.5  2.5 25.0 25.0  2.5  2.5
print(head(storm.data.released2$PROPDMGEXP, 12))
##  [1] "1000"  "1000"  "1000"  "1000"  "1000"  "1000"  "1000"  "1000" 
##  [9] "1000"  "1000"  "1e+06" "1e+06"
PROPFINAL <- as.numeric(storm.data.released2$PROPDMG) * as.numeric(storm.data.released2$PROPDMGEXP)

storm.data.released2$CROPDMGEXP = mapvalues(storm.data.released2$CROPDMGEXP, from = c("M", "K", "m", "B", "?", "0", "k", "2"), to = c(10^6, 10^3, 10^3, 10^9, 1, 1, 10^3, 10^2))

print(head(storm.data.released2$CROPDMG, 12))
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0
print(head(storm.data.released2$CROPDMGEXP, 12))
##  [1] "" "" "" "" "" "" "" "" "" "" "" ""
CROPFINAL <- as.numeric(storm.data.released2$CROPDMG) * as.numeric(storm.data.released2$CROPDMGEXP)

2. Results/Conclusions

2.a. Calculation of total number of events by type.

2.a.1. Injuries and fatalities.

From analysing the data set, it can be deduced that the events with the most harmful impacts to the population health are fatalities and injuries. The total values related to these two variables are:

total.injuries <- aggregate(INJURIES ~ EVTYPE, data = storm.data.released2, FUN = sum)
total.fatalities <- aggregate(FATALITIES ~ EVTYPE, data = storm.data.released2, FUN = sum)

It is necessary to reorder the values of the injuries variable, to be ready for a plot.

total.injuries <- total.injuries[order(total.injuries$INJURIES, decreasing = TRUE),][1:12,]
print(head(total.injuries, 12))
##                EVTYPE INJURIES
## 831           TORNADO    91346
## 758 THUNDERSTORM WIND     9353
## 170             FLOOD     6791
## 130    EXCESSIVE HEAT     6525
## 463         LIGHTNING     5230
## 275              HEAT     2100
## 426         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 244              HAIL     1361
## 402         HURRICANE     1321
## 968      WINTER STORM     1321
## 359         HIGH WIND     1137

The graph showing the impact of the wether events on the injuries in the US is:

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.5
library(extrafont)
## Warning: package 'extrafont' was built under R version 3.2.5
## Registering fonts with R
p1 <- ggplot(data = total.injuries, aes(x = reorder(EVTYPE, - INJURIES), y = INJURIES)) +
   geom_bar(fill = heat.colors(12, alpha = 1), color ="black", stat = "identity")  + 
    ylab("Total number of injuries") + xlab("Event type \n (Fig.1)") +
    expand_limits(y = c(0, 100000)) +
    ggtitle("Weather events' impact on the health of the US population. \n Events with the highest number of injuries")

p1 <- p1 + theme(axis.text.x =
            element_text(size  = 10,
                         angle = 45,
                         hjust = 1,
                         vjust = 1))

p1 <- p1 + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
            panel.background = element_blank(), axis.line = element_line(colour = "black"))

p1 <- p1 + theme(
            plot.title = element_text(color = "red", size = 14, face = "bold.italic", family ="Comic Sans MS"),
            axis.title.x = element_text(color = "blue", size = 14, face = "bold"),
            axis.title.y = element_text(color = "#993333", size = 14, face = "bold"))

print(p1)

It is necessary to reorder the values of the fatalities variable, to be ready for a plot.

total.fatalities <- total.fatalities[order(total.fatalities$FATALITIES, decreasing = TRUE),][1:12,]
print(head(total.fatalities, 12))
##                EVTYPE FATALITIES
## 831           TORNADO       5633
## 130    EXCESSIVE HEAT       1903
## 153       FLASH FLOOD        978
## 275              HEAT        937
## 463         LIGHTNING        816
## 758 THUNDERSTORM WIND        701
## 170             FLOOD        472
## 584       RIP CURRENT        368
## 359         HIGH WIND        248
## 19          AVALANCHE        224
## 968      WINTER STORM        206
## 585      RIP CURRENTS        204

The graph showing the impact of the wether events on the injuries in the US is:

library(ggplot2)
library(extrafont)
p1 <- ggplot(data = total.fatalities, aes(x = reorder(EVTYPE, - FATALITIES), y = FATALITIES)) +
   geom_bar(fill = heat.colors(12, alpha = 1), color ="black", stat = "identity")  + 
    ylab("Total number of fatalities") + xlab("Event type \n (Fig.2)") +
    expand_limits(y = c(0, 7000)) +
    ggtitle("Weather events' impact on the health of the US population. \n Events with the highest number of fatalities")

p1 <- p1 + theme(axis.text.x =
            element_text(size  = 10,
                         angle = 45,
                         hjust = 1,
                         vjust = 1))

p1 <- p1 + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
            panel.background = element_blank(), axis.line = element_line(colour = "black"))

p1 <- p1 + theme(
            plot.title = element_text(color = "red", size = 14, face = "bold.italic", family ="Comic Sans MS"),
            axis.title.x = element_text(color = "blue", size = 14, face = "bold"),
            axis.title.y = element_text(color = "#993333", size = 14, face = "bold"))

print(p1)

2.a.2. Property damages and crop damages.

From analysing the data set, it can be deduced that the events with the most harmful impacts to the economic development are property damages and crop damages. The total values related to these two variables are:

total.property.damages <- aggregate(PROPFINAL ~ EVTYPE, data = storm.data.released2, FUN = sum)
total.crop.damages <- aggregate(CROPFINAL ~ EVTYPE, data = storm.data.released2, FUN = sum)

It is necessary to reorder the values of the property damages and crop damages variables, to be ready for a plot.

total.property.damages <- total.property.damages[order(total.property.damages$PROPFINAL, decreasing = TRUE),][1:12,]
print(head(total.property.damages, 12))
##                EVTYPE    PROPFINAL
## 170             FLOOD 149776655307
## 402         HURRICANE  81174159010
## 831           TORNADO  56947380677
## 668       STORM SURGE  43323536000
## 153       FLASH FLOOD  16822673979
## 244              HAIL  15735267513
## 758 THUNDERSTORM WIND   9912641826
## 845    TROPICAL STORM   7703890550
## 968      WINTER STORM   6688497251
## 359         HIGH WIND   5270046295
## 953          WILDFIRE   4765114000
## 669  STORM SURGE/TIDE   4641188000
total.crop.damages <- total.crop.damages[order(total.crop.damages$CROPFINAL, decreasing = TRUE),][1:12,]
print(head(total.crop.damages, 12))
##                EVTYPE   CROPFINAL
## 16            DROUGHT 13972566000
## 35              FLOOD 10691427450
## 78          HURRICANE  5349782800
## 85          ICE STORM  5022113500
## 53               HAIL  3025954470
## 30        FLASH FLOOD  1421317100
## 26       EXTREME COLD  1292973000
## 115 THUNDERSTORM WIND  1159505180
## 47       FROST/FREEZE  1094086000
## 65         HEAVY RAIN   733399800
## 132    TROPICAL STORM   678346000
## 73          HIGH WIND   638571300
#total.crop.damages <- total.crop.damages[order(total.crop.damages$CROPDMGVAL, decreasing = TRUE),][1:12,]
#print(head(total.crop.damages, 12))

The graphs showing the impact of the wether events on the property damages and crop damages in the US are:

library(ggplot2)
library(extrafont)

p1 <- ggplot(data = total.property.damages, aes(x = reorder(EVTYPE, - PROPFINAL), y = PROPFINAL / 10^9)) +
   geom_bar(fill = heat.colors(12, alpha = 1), color ="black", stat = "identity")  + 
    ylab("Total number of property damages \n (billions of dollars)") + xlab("Event type") +
    expand_limits(y = c(0, 170)) +
    ggtitle("Weather events' impact \n on the economic development in the US. \n Events with the highest number \n of property damages")

p1 <- p1 + theme(axis.text.x =
            element_text(size  = 10,
                         angle = 45,
                         hjust = 1,
                         vjust = 1))

p1 <- p1 + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
            panel.background = element_blank(), axis.line = element_line(colour = "black"))

p1 <- p1 + theme(
            plot.title = element_text(color = "red", size = 10, face = "bold.italic", family ="Comic Sans MS"),
            axis.title.x = element_text(color = "blue", size = 10, face = "bold"),
            axis.title.y = element_text(color = "#993333", size = 9, face = "bold"))

library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.2.5
p1 <- p1 + theme(plot.margin = unit(c(1,0.6,0.2,0.3), "cm"))


p2 <- ggplot(data = total.crop.damages, aes(x = reorder(EVTYPE, - CROPFINAL), y = CROPFINAL / 10^9)) +
   geom_bar(fill = heat.colors(12, alpha = 1), color ="black", stat = "identity")  + 
    ylab("Total number of crop damages \n (billions of dollars)") + xlab("Event type") +
    expand_limits(y = c(0, 16)) +
    ggtitle("Weather events' impact \n on the economic development in the US. \n Events with the highest number \n of crop damages")

p2 <- p2 + theme(axis.text.x =
            element_text(size  = 10,
                         angle = 45,
                         hjust = 1,
                         vjust = 1))

p2 <- p2 + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
            panel.background = element_blank(), axis.line = element_line(colour = "black"))

p2 <- p2 + theme(
            plot.title = element_text(color = "red", size = 10, face = "bold.italic", family = "Comic Sans MS"),
            axis.title.x = element_text(color = "blue", size = 10, face = "bold"),
            axis.title.y = element_text(color = "#993333", size = 9, face = "bold"))

p2 <- p2 + theme(plot.margin = unit(c(1,0,0.2,0.6), "cm"))

library(gridExtra)
library(grid)


gA <- ggplotGrob(p1)
gB <- ggplotGrob(p2)

maxWidth = unit.pmax(gA$widths[2:3], gB$widths[2:3])

# Set the widths
gA$widths[2:3] <- as.list(maxWidth)
gB$widths[2:3] <- as.list(maxWidth)

# Arrange the two charts
grid.arrange(gA, gB, nrow = 1, bottom = textGrob("(Fig.3)", gp = gpar(col = "blue", fontsize = 14, fontface = "bold")))

2.b. Conclusions

This analysis based on the NOAA data set shows that the maximum number of injuries and fatalities is caused by tornados. The maximum number of the property damages is caused by floods. The maximum number of crop damages is caused by droughts. The graphs show other weather events that are harmful to the population health and economic development.

2.c. Information about the session version and packages.

sessionInfo()
## R version 3.2.4 Revised (2016-03-16 r70336)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] gridExtra_2.2.1 extrafont_0.17  ggplot2_2.1.0   plyr_1.8.4     
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.5      digest_0.6.9     Rttf2pt1_1.3.4   gtable_0.2.0    
##  [5] formatR_1.3      magrittr_1.5     evaluate_0.9     scales_0.4.0    
##  [9] stringi_1.0-1    extrafontdb_1.0  rmarkdown_0.9.6  labeling_0.3    
## [13] tools_3.2.4      stringr_1.0.0    munsell_0.4.3    colorspace_1.2-6
## [17] htmltools_0.3.5  knitr_1.14