Synopsis: Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. Based on the NOAA Storm Database from 1950 to 2011, I analyze the types of events are the most harmful with respect to population health, and cause the greatest economic consequences. Based on my analysis, the most harmful events on public health are both TORNADO respect fatalities and injuries. As for the economic damages, the event of greatest economic damage on property damage is FLOOD, and the event of greatest economic damage on crop damage is DROUGHT. The event of greatest economic damage on both properties and crop is FLOOD.
Questions to anwser: 1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? 2. Across the United States, which types of events have the greatest economic consequences?
library(R.utils)
library(utils)
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
if (!file.exists("./repdata-data-StormData.csv")) {
fileURL <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url=fileURL, destfile="./repdata-data-StormData.bz2", method="curl")
bunzip2("./repdata-data-StormData.bz2")
}
if (!file.exists("./subdata0.rds")){
data <- read.csv("./repdata-data-StormData.csv")
subdata <- subset(data, select=c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))
saveRDS(subdata, "./subdata0.rds")
}
subdata <-readRDS("subdata0.rds")
totharm <- subset(subdata, select=c("EVTYPE", "FATALITIES", "INJURIES"))
totFATALITIES <- aggregate(FATALITIES~EVTYPE, data=totharm, sum)
totINJURIES <- aggregate(INJURIES~EVTYPE, data=totharm, sum)
totharm <- merge(totFATALITIES, totINJURIES)
Check the names of event types.
head(unique(totharm$EVTYPE))
## [1] HIGH SURF ADVISORY COASTAL FLOOD FLASH FLOOD
## [4] LIGHTNING TSTM WIND TSTM WIND (G45)
## 985 Levels: HIGH SURF ADVISORY COASTAL FLOOD ... WND
Reshape these names for summation of harfulness.
# reshape event names
totharm$EVTYPE <- gsub(".*FLOOD.*", "FLOOD",totharm$EVTYPE)
totharm$EVTYPE <- gsub(".*HIGH WINDS.*", "HIGH WINDS",totharm$EVTYPE)
totharm$EVTYPE <- gsub(".*HAIL.*", "HAIL",totharm$EVTYPE)
totharm$EVTYPE <- gsub(".*SNOW.*", "SNOW",totharm$EVTYPE)
totharm$EVTYPE <- gsub(".*LIGHTNING.*", "LIGHTNING",totharm$EVTYPE)
totharm$EVTYPE <- gsub(".*THUNDERSTORM WIND.*", "THUNDERSTORM WINDS",totharm$EVTYPE)
totharm$EVTYPE <- gsub(".*TORNADO.*", "TORNADO",totharm$EVTYPE)
totharm$EVTYPE <- gsub(".*TSTM WIND.*", "TSTM WIND",totharm$EVTYPE)
totharm$EVTYPE <- gsub(".*WATERSPOUT.*", "WATERSPOUT",totharm$EVTYPE)
totharm$EVTYPE <- gsub(".*WIND.*", "WIND",totharm$EVTYPE)
# sum the total harmfulness caused by various event types
totINJURIES <- aggregate(INJURIES~EVTYPE, data=totharm, sum) # numbers of INJURIES caused by variouse events
totFATALITIES <- aggregate(FATALITIES~EVTYPE, data=totharm, sum) # harmfulness of FATALITIES caused by variouse events
tot_FatInjHarm <- merge(totFATALITIES, totINJURIES) # merge totINJURIES and totFATALITIES into single subset
Check the names of top events on INJURIES and FATALITIES.
head(arrange(tot_FatInjHarm, desc(INJURIES)))
## EVTYPE FATALITIES INJURIES
## 1 TORNADO 5636 91407
## 2 WIND 1416 11398
## 3 FLOOD 1523 8603
## 4 EXCESSIVE HEAT 1903 6525
## 5 LIGHTNING 817 5232
## 6 HEAT 937 2100
head(arrange(tot_FatInjHarm, desc(FATALITIES)))
## EVTYPE FATALITIES INJURIES
## 1 TORNADO 5636 91407
## 2 EXCESSIVE HEAT 1903 6525
## 3 FLOOD 1523 8603
## 4 WIND 1416 11398
## 5 HEAT 937 2100
## 6 LIGHTNING 817 5232
The most harmful events are “TORNADO”, “WIND”, “FLOOD” and “EXCESSIVE HEAT”. Plot them for comparison.
tot_FatInjHarm_top <- filter(tot_FatInjHarm, EVTYPE== "TORNADO"|EVTYPE== "WIND"|EVTYPE== "FLOOD" | EVTYPE== "EXCESSIVE HEAT" )
As for the second question, “what’s the greatest economic consequences?”, I subset the damage related variables to object named “totdamage”. The variables named “PROPDMGEXP” and “CROPDMGEXP” are multiplier of PROPDMG" and “CROPDMG” respectively: “K” means thousand, “M” means million, and “B” means billion.
totdamage <- subset(subdata, select=c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))
totdamage <- subset(totdamage, PROPDMGEXP %in% "B"| PROPDMGEXP %in% "h"| PROPDMGEXP %in% "H"| PROPDMGEXP %in% "K"| PROPDMGEXP %in% "m"| PROPDMGEXP %in% "M"| CROPDMGEXP %in% "B"| CROPDMGEXP %in% "k"| CROPDMGEXP %in% "K"| CROPDMGEXP %in% "m"| CROPDMGEXP %in% "M" , select=c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))
I only consider those variables whose multiplier are “K”, “M” and “B”, and relevel the basic unit to be million.
# in the unit of million dollars
totdamage$CROPDMGvalue <- totdamage$CROPDMG*0
totdamage[which(totdamage$CROPDMGEXP=="B"), "CROPDMGvalue"] <- totdamage[which(totdamage$CROPDMGEXP=="B"), "CROPDMG"]*1000
totdamage[which(totdamage$CROPDMGEXP=="M"), "CROPDMGvalue"] <- totdamage[which(totdamage$CROPDMGEXP=="M"), "CROPDMG"]
totdamage[which(totdamage$CROPDMGEXP=="K"), "CROPDMGvalue"] <- totdamage[which(totdamage$CROPDMGEXP=="K"), "CROPDMG"]*0.001
totdamage$PROPDMGvalue <- totdamage$PROPDMG*0
totdamage[which(totdamage$PROPDMGEXP=="B"), "PROPDMGvalue"] <- totdamage[which(totdamage$PROPDMGEXP=="B"), "PROPDMG"]*1000
totdamage[which(totdamage$PROPDMGEXP=="M"), "PROPDMGvalue"] <- totdamage[which(totdamage$PROPDMGEXP=="M"), "PROPDMG"]
totdamage[which(totdamage$PROPDMGEXP=="K"), "PROPDMGvalue"] <- totdamage[which(totdamage$PROPDMGEXP=="K"), "PROPDMG"]*0.001
Reshape these names for summation of harfulness.
totdamage$EVTYPE <- gsub(".*FLOOD.*", "FLOOD",totdamage$EVTYPE )
totdamage$EVTYPE <- gsub(".*HIGH WINDS.*", "HIGH WINDS",totdamage$EVTYPE )
totdamage$EVTYPE <- gsub(".*HAIL.*", "HAIL",totdamage$EVTYPE )
totdamage$EVTYPE <- gsub(".*SNOW.*", "SNOW",totdamage$EVTYPE )
totdamage$EVTYPE <- gsub(".*LIGHTNING.*", "LIGHTNING",totdamage$EVTYPE )
totdamage$EVTYPE <- gsub(".*THUNDERSTORM WIND.*", "THUNDERSTORM WINDS",totdamage$EVTYPE )
totdamage$EVTYPE <- gsub(".*TORNADO.*", "TORNADO",totdamage$EVTYPE )
totdamage$EVTYPE <- gsub(".*TSTM WIND.*", "TSTM WIND",totdamage$EVTYPE )
totdamage$EVTYPE <- gsub(".*WATERSPOUT.*", "WATERSPOUT",totdamage$EVTYPE )
totdamage$EVTYPE <- gsub(".*WIND.*", "WIND",totdamage$EVTYPE )
totdamage$EVTYPE <- gsub(".*HURRICANE.*", "HURRICANE",totdamage$EVTYPE )
totdamage$EVTYPE <- gsub(".*STORM.*", "STORM",totdamage$EVTYPE )
Add damages of PROPDMG and CROPDMG respect to indvidual event.
totPROPDMGvalue <- aggregate(PROPDMGvalue~EVTYPE, data=totdamage, sum)
totCROPDMGvalue <- aggregate(CROPDMGvalue~EVTYPE, data=totdamage, sum)
tot_PROPCROPDMG <- merge(totPROPDMGvalue, totCROPDMGvalue)
Check top damages source event.
head(arrange(tot_PROPCROPDMG, desc(PROPDMGvalue)))
## EVTYPE PROPDMGvalue CROPDMGvalue
## 1 FLOOD 167379.14 12352.0791
## 2 HURRICANE 84636.18 5495.2928
## 3 STORM 67528.05 5766.6135
## 4 TORNADO 56981.60 414.9614
## 5 HAIL 17615.09 3113.7958
## 6 WIND 16084.38 1979.1151
head(arrange(tot_PROPCROPDMG, desc(CROPDMGvalue)))
## EVTYPE PROPDMGvalue CROPDMGvalue
## 1 DROUGHT 1046.106 13972.566
## 2 FLOOD 167379.144 12352.079
## 3 STORM 67528.052 5766.614
## 4 HURRICANE 84636.180 5495.293
## 5 HAIL 17615.091 3113.796
## 6 WIND 16084.377 1979.115
Pick up “DROUGHT”“,”FLOOD“”, “STORM”“,”HURRICANE“” and “TORNADO” as top damage sources.
tot_PROPCROPDMG_top <- filter(tot_PROPCROPDMG, EVTYPE== "DROUGHT" | EVTYPE== "FLOOD"| EVTYPE== "STORM"| EVTYPE== "HURRICANE"| EVTYPE== "TORNADO")
tot_PROPCROPDMG_top$totdamage <- tot_PROPCROPDMG_top$PROPDMGvalue+tot_PROPCROPDMG_top$CROPDMGvalue
The most harmfulness with respect to population health are plot below.
par(mfrow=c(1,2))
barplot(tot_FatInjHarm_top$INJURIES, col="blue", xlab="Event Type", ylab="Population", main="INJURIES", names.arg=tot_FatInjHarm_top$EVTYPE, cex.names=0.4)
barplot(tot_FatInjHarm_top$FATALITIES, col="red", xlab="Event Type", ylab="Population", main="FATALITIES", names.arg=tot_FatInjHarm_top$EVTYPE, cex.names=0.4)
No matter in the catalog of INJURIES or FATALITIES, TORNADO is the most harmful event to population health.
As for the greatest economic damage, the economic damages respect to total PROPDMG, CROPDMG, and their summation are lsited below.
arrange(tot_PROPCROPDMG_top, desc(totdamage))
## EVTYPE PROPDMGvalue CROPDMGvalue totdamage
## 1 FLOOD 167379.144 12352.0791 179731.22
## 2 HURRICANE 84636.180 5495.2928 90131.47
## 3 STORM 67528.052 5766.6135 73294.67
## 4 TORNADO 56981.598 414.9614 57396.56
## 5 DROUGHT 1046.106 13972.5660 15018.67
The greatest economics damage on PROP is FLOOD.The greatest economics damage on CROP is DROUGHT. The greatest economics damage on summation of PROP and CROP is FLOOD.