This report is for the peer assessment 2 for Reproducible Research on Coursera.
The task is to analyze the StormData data set and address the question of which types of weather events are most harmful to population health & economic impact.
The conclusion of the analysis is that tornado's are most harmful to population health (fatalities & injuries) and tornado, storm & wind have the greatest economic impact (damages to property & crop).
Download the data and load the raw csv file.
src <- "StormData.csv"
if (!file.exists(src)){
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "StormData.csv.bz2", method="curl")
bunzip2("StormData.csv.bz2", overwrite=T, remove=F)
}
data <- read.csv(src)
names(data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
Subset the data by weather event type and aggregate by population health & economic impact.
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
# subset data
data <- data[, c("EVTYPE","FATALITIES","INJURIES","PROPDMG","CROPDMG")]
# events are most harmful to population health
data.health <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data, sum, rm.na=TRUE)
#data.health$y <- rowSums(data.health[, c(2, 3)])
data.health$sum <- data.health$FATALITIES+data.health$INJURIES
data.health <- data.health[order(-data.health$sum),]
data.health <- head(data.health, 10)
Across the United States, which types of events have the greatest economic consequence?
# events have the greatest economic consequences
data.economic <- aggregate(cbind(PROPDMG, CROPDMG) ~ EVTYPE, data, sum, rm.na=TRUE)
data.economic$sum <- data.economic$PROPDMG+data.economic$CROPDMG
data.economic <- data.economic[order(-data.economic$sum),]
data.economic <- head(data.economic, 10)
barplot(data.health$sum,
las=3,
names.arg = data.health$EVTYPE,
main = "Top 10 Most Harmful Weather Events",
ylab = "Number of fatalities & injuries",
col = "red")
The graph above shows that tornado’s has the greatest impact on population health.
barplot(data.economic$sum,
las=3,
names.arg = data.economic$EVTYPE,
main = "Top 10 Weather Events With Highest Economic Impact",
ylab = "Property & crop damages",
col = "blue")
The graph above shows that tornado, flood & wind has the greatest economic impact on property & crop damages.