This analisys tries to answer two main issues. The first; which type of events (as indicated in the EVTYPE variable) are most harmful with respect to population health; and the second; which types of events have the greatest economic consequences. To answer these questions, this analisys gets the original data (repdata-data-StormData.csv), processes it and select the data from 1990 to 2011. The result shows that “TORNADO” caused the most injuries, also it has had a big impact in the fatalities, even though the “EXCESIVE HEAT” was the first. Related with the economic aspects and it’s impacts, the analisys shows that “FLOOD” has produced the most devastated effects in the properties and the “DROUGHT” caused the most damage in crops.
The data for this analisys comes in the form of a comma-separated-value file compressed via the bzip2, this data was donwloaded from coursera page, unziped, and saved in my working directory.
library(dplyr)
library(lubridate)
setwd("C:/Users/Usuario/Desktop/archivos/coursera R/Reproducible Research/RepData_PeerAssessment2")
sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
##
## locale:
## [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252
## [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
## [5] LC_TIME=Spanish_Spain.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] lubridate_1.3.3 dplyr_0.4.1
##
## loaded via a namespace (and not attached):
## [1] assertthat_0.1 DBI_0.3.1 digest_0.6.8 evaluate_0.5.5
## [5] formatR_1.0 htmltools_0.2.6 knitr_1.9 magrittr_1.5
## [9] memoise_0.2.1 parallel_3.1.2 plyr_1.8.1 Rcpp_0.11.5
## [13] rmarkdown_0.5.1 stringr_0.6.2 tools_3.1.2 yaml_2.1.13
Sys.getlocale(category = "LC_ALL")
## [1] "LC_COLLATE=Spanish_Spain.1252;LC_CTYPE=Spanish_Spain.1252;LC_MONETARY=Spanish_Spain.1252;LC_NUMERIC=C;LC_TIME=Spanish_Spain.1252"
Sys.setlocale(category = "LC_ALL", locale = "English")
## [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
data<-read.csv("repdata-data-StormData.csv", stringsAsFactors = FALSE, sep=",")
dim(data)
## [1] 902297 37
head(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
str(data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
data$BGN_DATE <- strptime(data$BGN_DATE, "%m/%d/%Y %H:%M:%S")
data$BGN_DATE<-as.Date(data$BGN_DATE)
#plot a histogram with the total data by year
hist(year(data$BGN_DATE), xlab="years", main="Histogram data/year",breaks=30,
ylim=c(0,125000))
In the above histogram, the number of events from 1950 to 1980 are small compared with the period 1990-2011. We are going to consider the last period (1990-2011) as the best track record period in the data.
stormData<-filter(data, year(BGN_DATE)>=1990)
dim(stormData)
## [1] 751740 37
The values of property damage (PROPDMG) and crops (CROPDMG) are related to PROPDMGEXP and CROPDMGEXP variables, in this way: B ( bilion), h or H (houndre), K (kilo), m or M (million)
table(stormData$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5
## 346265 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 396151 7 8956
table(stormData$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 467856 7 19 1 9 21 281832 1 1994
# Change M,H,B,K to numeric 6,2,9,3
stormData$PROPDMGEXP[(stormData$PROPDMGEXP=="1" | stormData$PROPDMGEXP=="2"
| stormData$PROPDMGEXP=="3" | stormData$PROPDMGEXP=="4"
| stormData$PROPDMGEXP=="5" | stormData$PROPDMGEXP=="6"
| stormData$PROPDMGEXP=="7" | stormData$PROPDMGEXP=="8")]<-"0"
stormData$PROPDMGEXP[(stormData$PROPDMGEXP=="m" | stormData$PROPDMGEXP=="M")]<-"6"
stormData$PROPDMGEXP[(stormData$PROPDMGEXP=="B" )]<-"9"
stormData$PROPDMGEXP[(stormData$PROPDMGEXP=="K")]<-"3"
stormData$PROPDMGEXP[(stormData$PROPDMGEXP=="h" | stormData$PROPDMGEXP=="H")]<-"2"
stormData$PROPDMGEXP<-as.numeric(stormData$PROPDMGEXP)
stormData$PROPDMGEXP[is.na(stormData$PROPDMGEXP)]<-0
# New variable
propertyDam<-stormData$PROPDMG*10^stormData$PROPDMGEXP
# Change
stormData$CROPDMGEXP[(stormData$CROPDMGEXP=="2" | stormData$CROPDMGEXP=="?")]<-"0"
stormData$CROPDMGEXP[(stormData$CROPDMGEXP=="k"| stormData$CROPDMGEXP=="K")]<-"3"
stormData$CROPDMGEXP[(stormData$CROPDMGEXP=="M"| stormData$CROPDMGEXP=="m")]<-"6"
stormData$CROPDMGEXP[(stormData$CROPDMGEXP=="B")]<-"9"
stormData$CROPDMGEXP<-as.numeric(stormData$CROPDMGEXP)
stormData$CROPDMGEXP[is.na(stormData$CROPDMGEXP)]<-0
# New variable
cropDam<-stormData$CROPDMG*10^stormData$CROPDMGEXP
# new data
newData<-mutate(stormData,propertyDam,cropDam)%>%
select(EVTYPE,INJURIES,FATALITIES,propertyDam,cropDam)
# summarise by event fatalities, injuries,crop damage and property damage
sumData<-newData%>%
group_by(EVTYPE)%>%
summarise(FATALITIES=sum(FATALITIES),INJURIES=sum(INJURIES),
propertyD=sum(propertyDam), cropD=sum(cropDam))
# numbers of Injuries by event
injuries<-sumData%>%
select(EVTYPE,INJURIES)%>%
mutate(VALUE=INJURIES, EFECT=as.factor(c("injuries")))%>%
select(EVTYPE,EFECT,VALUE)%>%
arrange(desc(VALUE))
# Numbers of fatalities by event
fatalities<-sumData%>%
select(EVTYPE,FATALITIES)%>%
mutate(VALUE=FATALITIES, EFECT=as.factor(c("fatalities")))%>%
select(EVTYPE,EFECT,VALUE)%>%
arrange(desc(VALUE))
#numbers of properity damages
propertyDamage<-sumData%>%
select(EVTYPE,propertyD)%>%
mutate(VALUE=propertyD, EFECT=as.factor(c("property damage")))%>%
select(EVTYPE,EFECT,VALUE)%>%
arrange(desc(VALUE))
#numbers of crop damages
cropDamage<-sumData%>%
select(EVTYPE,cropD)%>%
mutate(VALUE=cropD, EFECT=as.factor(c("crop damage")))%>%
select(EVTYPE,EFECT,VALUE)%>%
arrange(desc(VALUE))
#Join four data frames 985 events for injuries,fatalities property damage and crops damage
#and crops damage, vertically, using the rbind function
joinData<-rbind(injuries,fatalities,propertyDamage, cropDamage)
#Join two data frames (injuries,fatalities) for the most ten(10) important events
#vertically, using the rbind function
injuriesFatalitiesFirst10<-rbind(head(injuries,10),head(fatalities,10))
#Join two data frames (property damage and crops damege) for the ten
# most important events vertically, using the rbind function
propertyCropDamegeFirst10<-rbind(head(propertyDamage,10), head(cropDamage,10))
print(injuriesFatalitiesFirst10)
## Source: local data frame [20 x 3]
##
## EVTYPE EFECT VALUE
## 1 TORNADO injuries 26674
## 2 FLOOD injuries 6789
## 3 EXCESSIVE HEAT injuries 6525
## 4 LIGHTNING injuries 5230
## 5 TSTM WIND injuries 5022
## 6 HEAT injuries 2100
## 7 ICE STORM injuries 1975
## 8 FLASH FLOOD injuries 1777
## 9 THUNDERSTORM WIND injuries 1488
## 10 WINTER STORM injuries 1321
## 11 EXCESSIVE HEAT fatalities 1903
## 12 TORNADO fatalities 1752
## 13 FLASH FLOOD fatalities 978
## 14 HEAT fatalities 937
## 15 LIGHTNING fatalities 816
## 16 FLOOD fatalities 470
## 17 RIP CURRENT fatalities 368
## 18 TSTM WIND fatalities 327
## 19 HIGH WIND fatalities 248
## 20 AVALANCHE fatalities 224
print(propertyCropDamegeFirst10)
## Source: local data frame [20 x 3]
##
## EVTYPE EFECT VALUE
## 1 FLOOD property damage 144657709807
## 2 HURRICANE/TYPHOON property damage 69305840000
## 3 STORM SURGE property damage 43323536000
## 4 TORNADO property damage 30458515609
## 5 FLASH FLOOD property damage 16140812067
## 6 HAIL property damage 15732267543
## 7 HURRICANE property damage 11868319010
## 8 TROPICAL STORM property damage 7703890550
## 9 WINTER STORM property damage 6688497251
## 10 HIGH WIND property damage 5270046295
## 11 DROUGHT crop damage 13972566000
## 12 FLOOD crop damage 5661968450
## 13 RIVER FLOOD crop damage 5029459000
## 14 ICE STORM crop damage 5022113500
## 15 HAIL crop damage 3025954473
## 16 HURRICANE crop damage 2741910000
## 17 HURRICANE/TYPHOON crop damage 2607872800
## 18 FLASH FLOOD crop damage 1421317100
## 19 EXTREME COLD crop damage 1292973000
## 20 FROST/FREEZE crop damage 1094086000
In the above tables show the numbers of fataliies, injuries, and economic damages for the ten most important event.
library(ggplot2)
library(gridExtra)
# plot injuries
g1<-ggplot(head(injuriesFatalitiesFirst10,10), aes(EVTYPE,VALUE) ) + geom_bar(stat="identity", fill="blue",
alpha=0.3)+coord_flip() + labs(title="Injuries by weather Event (1990-2011)") + xlab("event type") +ylab("number")
# plot fatalities
g11<-ggplot(tail(injuriesFatalitiesFirst10,10), aes(EVTYPE,VALUE) ) + geom_bar(stat="identity", fill="blue",
alpha=0.3)+coord_flip() + labs(title="Fatalities by weather Event (1990-2011)") + xlab("event type") +ylab("number")
grid.arrange(g1,g11, ncol=1)
The Tornado caused the highest injuries and the excesive heat the mayority of the fatalities from 1990 to 2011 in the United States.
#plot damage of the properties
g2<-ggplot(head(propertyCropDamegeFirst10,10), aes(EVTYPE,VALUE/1e+06) ) + geom_bar(stat="identity", fill="blue",
alpha=0.3)+coord_flip() + labs(title="Property Economic Damage by Weather Event(1990-2011)") + xlab("event type") +ylab("million $")
#plot damage of the crops
g21<-ggplot(tail(propertyCropDamegeFirst10,10), aes(EVTYPE,VALUE/1e+06) ) + geom_bar(stat="identity", fill="blue",
alpha=0.3)+coord_flip() + labs(title="Crop Economic Damage by Weather Event(1990-2011)") + xlab("event type") +ylab("million $")
grid.arrange(g2,g21, ncol=1)
The flood has produced the most devastated effects in the properties and the “DROUGHT” caused the most damage in crops, from 1990 to 2011 in the United States.