Economic and Public Health Impact By Extreme Weather Events

1.0.- Sypnosis

This analisys tries to answer two main issues. The first; which type of events (as indicated in the EVTYPE variable) are most harmful with respect to population health; and the second; which types of events have the greatest economic consequences. To answer these questions, this analisys gets the original data (repdata-data-StormData.csv), processes it and select the data from 1990 to 2011. The result shows that “TORNADO” caused the most injuries, also it has had a big impact in the fatalities, even though the “EXCESIVE HEAT” was the first. Related with the economic aspects and it’s impacts, the analisys shows that “FLOOD” has produced the most devastated effects in the properties and the “DROUGHT” caused the most damage in crops.

2.0- Data Procesing

The data for this analisys comes in the form of a comma-separated-value file compressed via the bzip2, this data was donwloaded from coursera page, unziped, and saved in my working directory.

library(dplyr)
library(lubridate)
setwd("C:/Users/Usuario/Desktop/archivos/coursera R/Reproducible Research/RepData_PeerAssessment2")
sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## 
## locale:
## [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252   
## [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
## [5] LC_TIME=Spanish_Spain.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] lubridate_1.3.3 dplyr_0.4.1    
## 
## loaded via a namespace (and not attached):
##  [1] assertthat_0.1  DBI_0.3.1       digest_0.6.8    evaluate_0.5.5 
##  [5] formatR_1.0     htmltools_0.2.6 knitr_1.9       magrittr_1.5   
##  [9] memoise_0.2.1   parallel_3.1.2  plyr_1.8.1      Rcpp_0.11.5    
## [13] rmarkdown_0.5.1 stringr_0.6.2   tools_3.1.2     yaml_2.1.13
Sys.getlocale(category = "LC_ALL")
## [1] "LC_COLLATE=Spanish_Spain.1252;LC_CTYPE=Spanish_Spain.1252;LC_MONETARY=Spanish_Spain.1252;LC_NUMERIC=C;LC_TIME=Spanish_Spain.1252"
Sys.setlocale(category = "LC_ALL", locale = "English")
## [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

2.1.- Read the data from my working directory and struture the data

data<-read.csv("repdata-data-StormData.csv", stringsAsFactors = FALSE, sep=",")
dim(data)
## [1] 902297     37
head(data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6
str(data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

2.2.- Formating and exploring data

data$BGN_DATE <- strptime(data$BGN_DATE, "%m/%d/%Y %H:%M:%S")
data$BGN_DATE<-as.Date(data$BGN_DATE)
#plot a histogram with the total data by year
hist(year(data$BGN_DATE), xlab="years", main="Histogram data/year",breaks=30, 
     ylim=c(0,125000))

In the above histogram, the number of events from 1950 to 1980 are small compared with the period 1990-2011. We are going to consider the last period (1990-2011) as the best track record period in the data.

2.3.- Select the data form 1990 to 2011

stormData<-filter(data, year(BGN_DATE)>=1990)
dim(stormData)
## [1] 751740     37

2.4.- Process the economic variable impact

The values of property damage (PROPDMG) and crops (CROPDMG) are related to PROPDMGEXP and CROPDMGEXP variables, in this way: B ( bilion), h or H (houndre), K (kilo), m or M (million)

table(stormData$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 346265      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 396151      7   8956
table(stormData$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 467856      7     19      1      9     21 281832      1   1994
# Change M,H,B,K to numeric 6,2,9,3 
stormData$PROPDMGEXP[(stormData$PROPDMGEXP=="1" | stormData$PROPDMGEXP=="2"
                     | stormData$PROPDMGEXP=="3" | stormData$PROPDMGEXP=="4" 
                     | stormData$PROPDMGEXP=="5" | stormData$PROPDMGEXP=="6" 
                     | stormData$PROPDMGEXP=="7" | stormData$PROPDMGEXP=="8")]<-"0"
stormData$PROPDMGEXP[(stormData$PROPDMGEXP=="m" | stormData$PROPDMGEXP=="M")]<-"6"
stormData$PROPDMGEXP[(stormData$PROPDMGEXP=="B" )]<-"9"
stormData$PROPDMGEXP[(stormData$PROPDMGEXP=="K")]<-"3"
stormData$PROPDMGEXP[(stormData$PROPDMGEXP=="h" | stormData$PROPDMGEXP=="H")]<-"2"
stormData$PROPDMGEXP<-as.numeric(stormData$PROPDMGEXP)
stormData$PROPDMGEXP[is.na(stormData$PROPDMGEXP)]<-0

# New variable 
propertyDam<-stormData$PROPDMG*10^stormData$PROPDMGEXP


# Change 
stormData$CROPDMGEXP[(stormData$CROPDMGEXP=="2" | stormData$CROPDMGEXP=="?")]<-"0"
stormData$CROPDMGEXP[(stormData$CROPDMGEXP=="k"| stormData$CROPDMGEXP=="K")]<-"3"
stormData$CROPDMGEXP[(stormData$CROPDMGEXP=="M"| stormData$CROPDMGEXP=="m")]<-"6"
stormData$CROPDMGEXP[(stormData$CROPDMGEXP=="B")]<-"9"
stormData$CROPDMGEXP<-as.numeric(stormData$CROPDMGEXP)
stormData$CROPDMGEXP[is.na(stormData$CROPDMGEXP)]<-0

# New variable
cropDam<-stormData$CROPDMG*10^stormData$CROPDMGEXP

# new data
newData<-mutate(stormData,propertyDam,cropDam)%>%
        select(EVTYPE,INJURIES,FATALITIES,propertyDam,cropDam)

3.0.- Results

3.1.- Total Fatalities, Injuries and Economic Damage by the most important first ten weather Event (1990-2011)

# summarise by event fatalities, injuries,crop damage and property damage
sumData<-newData%>%
        group_by(EVTYPE)%>%
        summarise(FATALITIES=sum(FATALITIES),INJURIES=sum(INJURIES),
                  propertyD=sum(propertyDam), cropD=sum(cropDam))
# numbers of Injuries by event
injuries<-sumData%>%
        select(EVTYPE,INJURIES)%>%
        mutate(VALUE=INJURIES, EFECT=as.factor(c("injuries")))%>%
        select(EVTYPE,EFECT,VALUE)%>%
        arrange(desc(VALUE))

# Numbers of fatalities by event
fatalities<-sumData%>%
        select(EVTYPE,FATALITIES)%>%
        mutate(VALUE=FATALITIES, EFECT=as.factor(c("fatalities")))%>%
        select(EVTYPE,EFECT,VALUE)%>%
        arrange(desc(VALUE))

#numbers of properity damages
propertyDamage<-sumData%>%
        select(EVTYPE,propertyD)%>%
        mutate(VALUE=propertyD, EFECT=as.factor(c("property damage")))%>%
        select(EVTYPE,EFECT,VALUE)%>%
        arrange(desc(VALUE))

#numbers of crop damages
cropDamage<-sumData%>%
        select(EVTYPE,cropD)%>%
        mutate(VALUE=cropD, EFECT=as.factor(c("crop damage")))%>%
        select(EVTYPE,EFECT,VALUE)%>%
        arrange(desc(VALUE))

#Join four data frames 985 events for injuries,fatalities property damage and crops damage 
#and crops damage, vertically, using the rbind function 
joinData<-rbind(injuries,fatalities,propertyDamage, cropDamage)

#Join two data frames (injuries,fatalities) for the most ten(10) important  events   
#vertically, using the rbind function  
injuriesFatalitiesFirst10<-rbind(head(injuries,10),head(fatalities,10))
        
#Join two data frames (property damage and crops damege) for the ten 
# most important  events vertically, using the rbind function
propertyCropDamegeFirst10<-rbind(head(propertyDamage,10), head(cropDamage,10))

print(injuriesFatalitiesFirst10)
## Source: local data frame [20 x 3]
## 
##               EVTYPE      EFECT VALUE
## 1            TORNADO   injuries 26674
## 2              FLOOD   injuries  6789
## 3     EXCESSIVE HEAT   injuries  6525
## 4          LIGHTNING   injuries  5230
## 5          TSTM WIND   injuries  5022
## 6               HEAT   injuries  2100
## 7          ICE STORM   injuries  1975
## 8        FLASH FLOOD   injuries  1777
## 9  THUNDERSTORM WIND   injuries  1488
## 10      WINTER STORM   injuries  1321
## 11    EXCESSIVE HEAT fatalities  1903
## 12           TORNADO fatalities  1752
## 13       FLASH FLOOD fatalities   978
## 14              HEAT fatalities   937
## 15         LIGHTNING fatalities   816
## 16             FLOOD fatalities   470
## 17       RIP CURRENT fatalities   368
## 18         TSTM WIND fatalities   327
## 19         HIGH WIND fatalities   248
## 20         AVALANCHE fatalities   224
print(propertyCropDamegeFirst10)
## Source: local data frame [20 x 3]
## 
##               EVTYPE           EFECT        VALUE
## 1              FLOOD property damage 144657709807
## 2  HURRICANE/TYPHOON property damage  69305840000
## 3        STORM SURGE property damage  43323536000
## 4            TORNADO property damage  30458515609
## 5        FLASH FLOOD property damage  16140812067
## 6               HAIL property damage  15732267543
## 7          HURRICANE property damage  11868319010
## 8     TROPICAL STORM property damage   7703890550
## 9       WINTER STORM property damage   6688497251
## 10         HIGH WIND property damage   5270046295
## 11           DROUGHT     crop damage  13972566000
## 12             FLOOD     crop damage   5661968450
## 13       RIVER FLOOD     crop damage   5029459000
## 14         ICE STORM     crop damage   5022113500
## 15              HAIL     crop damage   3025954473
## 16         HURRICANE     crop damage   2741910000
## 17 HURRICANE/TYPHOON     crop damage   2607872800
## 18       FLASH FLOOD     crop damage   1421317100
## 19      EXTREME COLD     crop damage   1292973000
## 20      FROST/FREEZE     crop damage   1094086000

In the above tables show the numbers of fataliies, injuries, and economic damages for the ten most important event.

3.2.- Plot Total Fatalities, Injuries and Economic Damages by the most important first ten weather Events (1990-2011)

library(ggplot2)
library(gridExtra)
# plot  injuries
g1<-ggplot(head(injuriesFatalitiesFirst10,10), aes(EVTYPE,VALUE) ) + geom_bar(stat="identity", fill="blue",  
        alpha=0.3)+coord_flip() + labs(title="Injuries by weather Event (1990-2011)") + xlab("event type") +ylab("number")
# plot  fatalities
g11<-ggplot(tail(injuriesFatalitiesFirst10,10), aes(EVTYPE,VALUE) ) + geom_bar(stat="identity", fill="blue",  
        alpha=0.3)+coord_flip() + labs(title="Fatalities by weather Event (1990-2011)") + xlab("event type") +ylab("number")
grid.arrange(g1,g11, ncol=1)

The Tornado caused the highest injuries and the excesive heat the mayority of the fatalities from 1990 to 2011 in the United States.

#plot damage of the properties
g2<-ggplot(head(propertyCropDamegeFirst10,10), aes(EVTYPE,VALUE/1e+06) ) + geom_bar(stat="identity", fill="blue",  
        alpha=0.3)+coord_flip() + labs(title="Property Economic Damage by Weather Event(1990-2011)") + xlab("event type") +ylab("million $")
#plot damage of the crops
g21<-ggplot(tail(propertyCropDamegeFirst10,10), aes(EVTYPE,VALUE/1e+06) ) + geom_bar(stat="identity", fill="blue",  
        alpha=0.3)+coord_flip() + labs(title="Crop Economic Damage by Weather Event(1990-2011)") + xlab("event type") +ylab("million $")
grid.arrange(g2,g21, ncol=1)

The flood has produced the most devastated effects in the properties and the “DROUGHT” caused the most damage in crops, from 1990 to 2011 in the United States.