The following analysis is intended for state and municipal representatives with responsibility for the protection of human welfare and minimizing the economic impact of severe weather. The report uses historical data gather by the National Weather Services. This analysis will answer the question of what types of storms have the highest toll on the affected population and which storms cause the greatest economic impact to the region. If the readers geographic area is frequented by these storms, it is high recommended that contingency plans are developed and resources set aside to minimize the immediate impacts, and strategies are developed to minimizing the long term economic impact. It is not within the scope of this report to recommend any specific plans or strategy as this will depend on the degree of impact and available resources.
The following section describes the data processing performed to support the analysis contained in the results section.
setwd("~/Documents/workspace/coursera/Data_Science/Reproducible_Research/RepData_PeerAssessment2")
## RESET THE ENVIRONMENT
rm(list = ls())
### PACKAGES
suppressMessages(require(Hmisc, quietly = TRUE))
suppressMessages(require(lattice, quietly = TRUE))
suppressMessages(require(ggplot2, quietly = TRUE))
suppressMessages(require(reshape, quietly = TRUE))
suppressMessages(require(plyr, quietly = TRUE))
suppressMessages(require(xtable, quietly = TRUE))
suppressMessages(require(choroplethr, quietly = TRUE))
## READ THE RAW DATA
wdata <- read.csv("StormData.csv.bz2", stringsAsFactors = FALSE)
The economic data is recorded as property damage and crop damage. For each storm event this data is in the following columns, PROPDMG and CROPDMG. For this anaylsis the property and crop damage amounts are combined for each storm for a total economic impact due to damage. The values in PROPDMG and CROPDMG have a companion column, PROPDMGEXP and CROPDMGEXP, that indicates the unit, i.e. K=$1,000, M=$1,000,000, etc. However, while most of the values are K, there are execptions. For this analysis the values that were not K, M, or B in PROPDMGEXP and CROPDMGEXP, were converted to K. Prior to calculating the total, all amounts are converted to $K.
### NORMALIZE THE DAMAGE AMOUNT TO $M FIX THE PROPDMGEXP, MAKE THEN K, M, or B
wdata$fixPROPDMGEXP <- wdata$PROPDMGEXP
wdata$fixPROPDMGEXP[wdata$PROPDMGEXP == "k"] <- "K"
wdata$fixPROPDMGEXP[wdata$PROPDMGEXP == "m"] <- "M"
wdata$fixPROPDMGEXP[wdata$PROPDMGEXP == "m"] <- "M"
wdata$fixPROPDMGEXP[wdata$PROPDMGEXP != "B" & wdata$PROPDMGEXP != "M" & wdata$PROPDMGEXP !=
"K"] <- "K"
### FIX THE PROPDMGEXP, MAKE THEN K, M, or B
wdata$fixCROPDMGEXP <- wdata$CROPDMGEXP
wdata$fixCROPDMGEXP[wdata$CROPDMGEXP == "k"] <- "K"
wdata$fixCROPDMGEXP[wdata$CROPDMGEXP == "m"] <- "M"
wdata$fixCROPDMGEXP[wdata$CROPDMGEXP == "m"] <- "M"
wdata$fixCROPDMGEXP[wdata$CROPDMGEXP != "B" & wdata$CROPDMGEXP != "M" & wdata$CROPDMGEXP !=
"K"] <- "K"
### NORMALIZE DAMAGE TO $K
wdata$fixPROPDMG <- wdata$PROPDMG
wdata$fixPROPDMG[wdata$fixPROPDMGEXP == "M"] <- wdata$fixPROPDMG[wdata$fixPROPDMGEXP ==
"M"] * 1000
wdata$fixPROPDMG[wdata$fixPROPDMGEXP == "B"] <- wdata$fixPROPDMG[wdata$fixPROPDMGEXP ==
"B"] * 1e+05
### NORMALIZE CROP DAMAGE to $K
wdata$fixCROPDMG <- wdata$CROPDMG
wdata$fixCROPDMG[wdata$fixCROPDMGEXP == "M"] <- wdata$fixCROPDMG[wdata$fixCROPDMGEXP ==
"M"] * 1000
wdata$fixCROPDMG[wdata$fixCROPDMGEXP == "B"] <- wdata$fixCROPDMG[wdata$fixCROPDMGEXP ==
"B"] * 1e+05
### TOTAL THE DAMAGE AND CROP DAMAGE FOR THE TOTAL
wdata$TOTAL.DAMAGE <- wdata$fixPROPDMG + wdata$fixCROPDMG
### CREATE A FACTOR FROM THE EXISTING EVTYPE FOR PROCESSING IN ddply
wdata$EVENT.TYPE <- as.factor(wdata$EVTYPE)
### SUMMARIZE BY EVENT.TYPE TO DETERMINE THE STORMS TYPE WITH THE MOST DAMAGE,
### ARRANGE IN DESC ORDER
damage.table <- ddply(wdata, .(EVENT.TYPE), summarize, STORM.TOTAL.DAMAGE.MM = sum(TOTAL.DAMAGE)/1000)
damage.table <- arrange(damage.table, desc(STORM.TOTAL.DAMAGE.MM))
For purpose of this limited analysis the impact on population health is being limited to the total number of injuries and fatalities. The researcher realizes there are deeper long term health impacts from severe weather, however, those longer term health impacts are not considered in this analysis. The total impact is determined by adding FATALITIES and INJURIES in the weather dataset.
### BY STORM ADD THE INJURIES AND FATALITIES TOGETHER
wdata$HEALTH.IMPACT <- wdata$FATALITIES + wdata$INJURIES
health.table <- ddply(wdata, .(EVENT.TYPE), summarize, STORM.TOTAL.HEALTH.IMPACT = sum(HEALTH.IMPACT))
health.table <- arrange(health.table, desc(STORM.TOTAL.HEALTH.IMPACT))
The following table summarize the total health and economic impact of storms
### PRINT THE TABLE OF TOP 10 STORM TYPES
total_impact <- cbind(health.table[1:10, ], damage.table[1:10, ])
names(total_impact) <- c("Storm Type", "Health Impact", "Storm Type", "Economic Impact ($MM)")
print(xtable(head(total_impact, 10)), type = "html")
| Storm Type | Health Impact | Storm Type | Economic Impact ($MM) | |
|---|---|---|---|---|
| 1 | TORNADO | 96979.00 | TORNADO | 52571.08 |
| 2 | EXCESSIVE HEAT | 8428.00 | FLOOD | 40069.69 |
| 3 | TSTM WIND | 7461.00 | HAIL | 17133.68 |
| 4 | FLOOD | 7259.00 | FLASH FLOOD | 16662.69 |
| 5 | LIGHTNING | 6046.00 | DROUGHT | 13668.67 |
| 6 | HEAT | 3037.00 | HURRICANE/TYPHOON | 11604.71 |
| 7 | FLASH FLOOD | 2755.00 | HURRICANE | 9480.23 |
| 8 | ICE STORM | 2064.00 | TSTM WIND | 5038.99 |
| 9 | THUNDERSTORM WIND | 1621.00 | STORM SURGE | 5019.54 |
| 10 | WINTER STORM | 1527.00 | HIGH WIND | 4738.65 |
The following chart shows the state by state impact of the TORNADO which has the largest health impact.
### DATAFRAME FOR TOP 10 STORMS
topStorms <- health.table[1:2, ]
topStorms$EVENT.TYPE <- factor(as.character(topStorms$EVENT.TYPE), ordered = TRUE)
sugar <- list()
options(warn = -1)
healthCharts <- dlply(topStorms, .(EVENT.TYPE), function(df) {
STORM <- as.character(df$EVENT.TYPE[1])
res1 <- ddply(wdata, .(STATE), function(df2) {
region <- df2$STATE[1]
ss <- subset(df2, EVTYPE == STORM)
if (region %in% state.abb) {
value <- sum(ss$HEALTH.IMPACT)
row <- cbind(region = region, value = value)
return(row)
}
})
res1$region <- as.character(res1$region)
res1$value <- as.numeric(as.character(res1$value))
res1 <- rbind(res1, data.frame(STATE = "DC", region = "DC", value = 0))
return(choroplethr(res1, "state", states = state.abb, title = paste("Health Impact\nState by State Storm Type: ",
df$EVENT.TYPE[1])))
})
## [1] "The following regions were missing and are being set to NA: district of columbia"
## [1] "The following regions were missing and are being set to NA: district of columbia"
## INDEX FOR TOP STORM topStorms
### PRINT THE TORNADO CHART
print(healthCharts[[2]])
The following chart shows the state by state impact of the TORNADO which has the largest economic impact.
### DATAFRAME FOR TOP 4 STORMS
topStorms <- damage.table[1:2, ]
topStorms$EVENT.TYPE <- factor(as.character(topStorms$EVENT.TYPE), ordered = TRUE)
economicCharts <- dlply(topStorms, .(EVENT.TYPE), function(df) {
STORM <- as.character(df$EVENT.TYPE[1])
res1 <- ddply(wdata, .(STATE), function(df2) {
region <- df2$STATE[1]
ss <- subset(df2, EVTYPE == STORM)
if (region %in% state.abb) {
value <- sum(ss$TOTAL.DAMAGE)/1000
row <- cbind(region = region, value = value)
return(row)
}
})
res1$region <- as.character(res1$region)
res1$value <- as.numeric(as.character(res1$value))
return(choroplethr(res1, "state", title = paste("Economic Impact\nState by State Storm Type: ",
df$EVENT.TYPE[1])))
})
## [1] "The following regions were missing and are being set to NA: district of columbia"
## [1] "The following regions were missing and are being set to NA: district of columbia"
### PRINT THE TORNADO CHART
print(economicCharts[[2]])