By ‘effects on population health’, we meant the counts for fatalities and injuries. By ‘economic consequences’ we meant the property and crop damage, in terms of the doller-value of the damages.
It needs to be noted that we have not accounted for any ‘bias’ in deriving the numbers. E.g. it may be so that the counting of the fatalities and injuries were not used to be properly documented in the 50’s and 60’s - or might have been attributed to other events. Similarly, it is also possible that putting a value to the damages were less scientific in the beginning and are becoming more sophisticated and accurate. Also we have not accounted for the inflationary effects on the values of crop and property.
We also need to note that, for both analyses, we have considered the top 5 events (in terms of health or economic effects) for identifying trends.
bzfile ("repdata_data_StormData.csv.bz2", open="r")
## description class
## "repdata_data_StormData.csv.bz2" "bzfile"
## mode text
## "r" "text"
## opened can read
## "opened" "yes"
## can write
## "no"
file <- read.csv("repdata_data_StormData.csv.bz2")
## Warning: closing unused connection 5 (repdata_data_StormData.csv.bz2)
df <- file[,c("BGN_DATE","EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "CROPDMG")]
tmp<- format(as.POSIXlt(strptime(df$BGN_DATE,"%m/%d/%Y %H:%M:%S")),"%Y")
df <- data.frame(df, tmp)
colnames(df) <- c("BGN_DATE","EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "CROPDMG", "YEAR")
library(sqldf)
## Warning: package 'sqldf' was built under R version 3.1.1
## Loading required package: gsubfn
## Warning: package 'gsubfn' was built under R version 3.1.1
## Loading required package: proto
## Warning: package 'proto' was built under R version 3.1.1
## Loading required package: RSQLite
## Warning: package 'RSQLite' was built under R version 3.1.1
## Loading required package: DBI
## Loading required package: RSQLite.extfuns
## Warning: package 'RSQLite.extfuns' was built under R version 3.1.1
dfh <- sqldf("select EVTYPE, sum(FATALITIES) as sfat, sum(INJURIES) as sinj, YEAR from df group by EVTYPE, YEAR")
## Loading required package: tcltk
dfht <- sqldf("select EVTYPE, sum(sfat) as ssfat, sum(sinj) as ssinj from dfh group by EVTYPE")
dfht <- dfht[order(dfht$ssfat, dfht$ssinj),]
tail(dfht)
## EVTYPE ssfat ssinj
## 846 TSTM WIND 504 6957
## 453 LIGHTNING 816 5230
## 271 HEAT 937 2100
## 151 FLASH FLOOD 978 1777
## 124 EXCESSIVE HEAT 1903 6525
## 826 TORNADO 5633 91346
As we see from the above list, that till now, the 5 most harmful events are TORNADO, EXCESSIVE HEAT, FLASH FLOOD, HEAT and LIGHTNING in that order. Please note that we go upward from the bottom of the list, as the sorting was done in ascending order.
Let us now examine what is the trend for the fatalities for these 5 events together.
dfhp <- dfh[dfh$EVTYPE %in% c("TORNADO", "EXCESSIVE HEAT", "FLASH FLOOD", "HEAT", "LIGHTNING"),]
plot(dfhp$YEAR,dfhp$sfat,type="c", main="Fatalities of 5 top events over the years")
lines(lowess(dfhp$YEAR,dfhp$sfat))
plot(dfhp$YEAR,dfhp$sinj,type="c", main = "Injuries from top 5 events over the years")
lines(lowess(dfhp$YEAR,dfhp$sinj))
dfe <- sqldf("select EVTYPE, sum(PROPDMG) as sprop, sum(CROPDMG) as scrop, YEAR from df group by EVTYPE, YEAR")
## Loading required package: tcltk
dfet <- sqldf("select EVTYPE, sum(sprop) as ssprop, sum(scrop) as sscrop from dfe group by EVTYPE")
totalDmg <- dfet$ssprop + dfet$sscrop
dfet <- data.frame(dfet,totalDmg)
dfet <- dfet[order(dfet$totalDmg, dfet$ssprop, dfet$sscrop),]
tail(dfet)
## EVTYPE ssprop sscrop totalDmg
## 753 THUNDERSTORM WIND 876844 66791 943636
## 167 FLOOD 899938 168038 1067976
## 241 HAIL 688693 579596 1268290
## 846 TSTM WIND 1335966 109203 1445168
## 151 FLASH FLOOD 1420125 179200 1599325
## 826 TORNADO 3212258 100019 3312277
As we see from above, that the 5 events that are have most adverse economic consequences are, TORNADO, FLASH FLOOD, TSTM WIND, HAIL and FLOOD. Please note that we go upward from the bottom of the list, as the sorting was done in ascending order.
Let us now examine what is the trend for the total damage (i.e. property and crop together) for these 5 events together.
dfep <- dfe[dfe$EVTYPE %in% c("TORNADO", "FLASH FLOOD", "TSTM WIND", "HAIL","FLOOD"),]
plot(dfep$YEAR, (dfep$sprop + dfep$scrop), type="c", main="Property and Crop Damages of top 5 events over the years")
lines(lowess(dfep$YEAR, (dfep$sprop + dfep$scrop)))
From our analysis we found that:-
1. The effects on population health have reduced over period of time - though very slowly.
2. The economic consequences of the storms are increasing rapidly since 1980-s.