This data analysis evaluates National Weather Service data collected from 1950 through 2011. Due to a lack of good record keeping, earlier records are less complete.
The goal of this analysis is to assess the types of weather events that cause the most harm to human health in the form of injuries and death, and to the economy in the form of property damage and crop damage costs.
This is the final project for the Coursera course titled Reproducible Research.
Several packages were loaded and utilized for this analysis.
#Add packages
library(downloader)
library(plyr)
library(ggplot2)
library(gridExtra)
library(grid)
Data was downloaded using the downloader package, and read into the project. The data file will be saved in the working directory.
url="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
library(downloader)
download(url, dest="stormdata.csv", mode="wb")
sd<- read.csv("stormdata.csv")
The entire data set is not necessary for the analysis, and only 7 variables were selected and placed in a subset.
sdKeep <- sd[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG","CROPDMGEXP")]
Injury and fatality data was summarized by weather event type, then put in decreasing order for each.
harm <- ddply(sdKeep, .(EVTYPE), summarize, fatalities =sum(FATALITIES), injuries = sum(INJURIES))
fatal<-harm[order(harm$fatalities, decreasing = T), ]
injury <-harm[order(harm$injuries, decreasing = T), ]
Cost data in this data set is not in ideal form for analysis. Columns described cost with letters as an indicator of magnitude(hundreds, thousands, etc.), which had to be converted into numbers. A function was created for this process.
Exp <- function(e) {
if (e %in% c("h", "H"))
return(2)
else if (e %in% c("k", "K"))
return(3)
else if (e %in% c("m", "M"))
return(6)
else if (e %in% c("b", "B"))
return(9)
else if (!is.na(as.numeric(e)))
return(as.numeric(e))
else if (e %in% c("", "-", "?", "+"))
return(0)
else {
stop("Invalid value.")
}
}
The function was then uitilized to produce values of property anc crop damage that could be easily compared.
propExp <- sapply(sdKeep$PROPDMGEXP, FUN=Exp)
sdKeep$propDamage <- sdKeep$PROPDMG * (10 ** propExp)
cropExp <- sapply(sdKeep$CROPDMGEXP, FUN=Exp)
sdKeep$cropDamage <- sdKeep$CROPDMG * (10 ** cropExp)
Once this was performed cost data was summarized by event type.
cost <-ddply(sdKeep, .(EVTYPE), summarize, propDamage = sum(propDamage),
cropDamage = sum(cropDamage))
Events without any cost were excluded.
cost <- cost[(cost$propDamage > 0 | cost$cropDamage > 0), ]
Data was then sorted in decreasing order
propOrder <-cost[order(cost$propDamage, decreasing = TRUE), ]
cropOrder <-cost[order(cost$cropDamage, decreasing = TRUE), ]
With data processing complete, results can now be assessed.
Lists were produced of the Top 10 weather events affecting public health in the form of injuries and fatalities.
head(injury[, c("EVTYPE", "injuries")], 10)
## EVTYPE injuries
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
head(fatal[, c("EVTYPE", "fatalities")], 10)
## EVTYPE fatalities
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
Plots were created to demonstrate the results in a more visual form. The impacts of Tornadoes is quite apparent. Note there are weather events “Heat” and “Excessive Heat” here. The difference between these two depend on regionally and locally defined heat advisory criteria. Flooding is another contender here, which again is separated into different types based on zoning, terrain, and other characteristics.
injplot <- ggplot(data=head(injury,10), aes(x=reorder(EVTYPE, injuries), y=injuries)) +
geom_bar(fill= "blue",stat="identity") + coord_flip() + ylab("Total number of injuries") +
xlab("Weather Event Type") +
ggtitle("Injuries due to Weather Events, Top 10") +
theme(legend.position="none")
fatalplot <- ggplot(data=head(fatal,10), aes(x=reorder(EVTYPE, fatalities), y=fatalities)) +
geom_bar(fill= "red",stat="identity") + coord_flip() +
ylab("Total number of deaths") + xlab("Weather Event Type") +
ggtitle("Fatalities due to Weather Events, Top 10") +
theme(legend.position="none")
grid.arrange(injplot, fatalplot, nrow=2)
## Impact on the Economy The top 10 weather events witht the highest property and crop damage costs are listed below. Flooding is seen several times for property damage. Drought leads in damage to crops.
head(propOrder[, c("EVTYPE", "propDamage")], 10)
## EVTYPE propDamage
## 153 FLASH FLOOD 6.820237e+13
## 786 THUNDERSTORM WINDS 2.086532e+13
## 834 TORNADO 1.078951e+12
## 244 HAIL 3.157558e+11
## 464 LIGHTNING 1.729433e+11
## 170 FLOOD 1.446577e+11
## 411 HURRICANE/TYPHOON 6.930584e+10
## 185 FLOODING 5.920826e+10
## 670 STORM SURGE 4.332354e+10
## 310 HEAVY SNOW 1.793259e+10
head(cropOrder[, c("EVTYPE", "cropDamage")], 10)
## EVTYPE cropDamage
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 590 RIVER FLOOD 5029459000
## 427 ICE STORM 5022113500
## 244 HAIL 3025974480
## 402 HURRICANE 2741910000
## 411 HURRICANE/TYPHOON 2607872800
## 153 FLASH FLOOD 1421317100
## 140 EXTREME COLD 1292973000
## 212 FROST/FREEZE 1094086000
Again, visualization of this demonstrates the magnitude of costs by these weather events. Due to high values the log10 is shown here.
propplot <- ggplot(data=head(propOrder,10), aes(x=reorder(EVTYPE, propDamage),
y=log10(propDamage), fill=propDamage)) +
geom_bar( fill= "darkblue", stat="identity") + coord_flip() +
ylab("Property Damage, USD, log10") + xlab("Weather Event Type") +
ggtitle("Property Damage due to Weather Events, Top 10") +
theme(plot.title = element_text(hjust = 0))
cropplot <- ggplot(data=head(cropOrder,10), aes(x=reorder(EVTYPE, cropDamage),
y=log10(cropDamage), fill=cropDamage)) +
geom_bar(fill= "darkred", stat="identity") +
coord_flip() + ylab("Crop Damage, USD, log10") +
xlab("Weather Event Type") +
ggtitle("Crop Damage due to Weather Events, Top 10") +
theme(legend.position="none")
grid.arrange(propplot, cropplot, nrow=2)