The second peer assessed project in the Reproducible Research course requires analyzing the Storm Data publication from the National Weather Service (NWS) to determine which types of storm events have the greatest impact on population heath and which have the greatest economic consequences. In this analysis, two metrics are defined and computed to estimate which weather events have the greatest impact in these areas. The impact on population health is estimated with the sum of fatalities and injuries, and the economic impact is estimated by adding the total property and crop damage. The top 5 weather event types are determined for each to answer the two questions in the assignment.
The conclusions reached in this analysis are that the event types which are the most harmful to population health are: tornado, excessive heat, thunderstorm wind, flood, and lightning, and the event types which have the greatest economic impact are: flood, hurricane/typhoon, tornado, storm surge, and hail.
The storm data file was downloaded from the the link provided on course assignment page and the compressed CSV data file was loaded into a data frame.
rawdata <- read.csv("repdata_data_StormData.csv.bz2")
To create a metric that approximates the total impact on population health, the number of fatalities and injuries for each event type are added together. Likewise, to create a metric that approximates the economic consequences of each storm event type, the dollar amounts for property and crop damage are added together. While other metrics could be created, these two are simple and give a rough idea of the magnitude of the population health impact and economic consequences of the weather events.
First the raw data is aggregated by event type using the aggregate() function and the fatalities and injuries are summed. This creates a new data frame, and the metric is renamed to HARMEVCOUNT. The data are sorted by the HARMEVCOUNT in descending order.
pophealth <- aggregate(FATALITIES + INJURIES ~ EVTYPE, data=rawdata, sum)
names(pophealth)[2] <- "HARMEVCOUNT"
pophealth <- pophealth[order(-pophealth$HARMEVCOUNT),]
Now create a temporary data frame with only the event types and the computed values for property and crop damage.
The property damage amounts must be computed by multiplying the amount in PROPDMG by 1,000 if the value of PROPDMGEXP is ‘K’, by 1,000,000 for ‘M’, and by 1,000,000,000 for ‘B’. The crop damage amounts must be computed the same way using the values of CROPDMG and CROPDMGEXP. This code replaces ‘K’, ‘M’, and ‘B’ with 1e+03, 1e+06, and 1e+09 respectively and computes the final value.
tmpdata <- data.frame(EVTYPE = rawdata$EVTYPE,
PROPDMG=rawdata$PROPDMG,
PROPDMGEXP=as.character(rawdata$PROPDMGEXP),
CROPDMG=rawdata$CROPDMG,
CROPDMGEXP=as.character(rawdata$CROPDMGEXP),
stringsAsFactors=FALSE)
# replace 'K', 'M', and 'B' with multipliers, substitute zero for NAs,
# and convert to numeric
tmpdata$PROPDMGEXP[tmpdata$PROPDMGEXP=='K' | tmpdata$PROPDMGEXP=='k'] <- 1e+03
tmpdata$PROPDMGEXP[tmpdata$PROPDMGEXP=='M' | tmpdata$PROPDMGEXP=='m'] <- 1e+06
tmpdata$PROPDMGEXP[tmpdata$PROPDMGEXP=='B' | tmpdata$PROPDMGEXP=='b'] <- 1e+09
tmpdata$CROPDMGEXP[tmpdata$CROPDMGEXP=='K' | tmpdata$PROPDMGEXP=='k'] <- 1e+03
tmpdata$CROPDMGEXP[tmpdata$CROPDMGEXP=='M' | tmpdata$PROPDMGEXP=='m'] <- 1e+06
tmpdata$CROPDMGEXP[tmpdata$CROPDMGEXP=='B' | tmpdata$PROPDMGEXP=='b'] <- 1e+09
# convert the character data to numeric
tmpdata$PROPDMGEXP <- suppressWarnings(as.numeric(tmpdata$PROPDMGEXP))
tmpdata$CROPDMGEXP <- suppressWarnings(as.numeric(tmpdata$CROPDMGEXP))
# substitute zero for NA
tmpdata$PROPDMGEXP[is.na(tmpdata$PROPDMGEXP)] <- 0
tmpdata$CROPDMGEXP[is.na(tmpdata$CROPDMGEXP)] <- 0
# compute the total damage
tmpdata$TOTDMG <- (tmpdata$PROPDMG * tmpdata$PROPDMGEXP) +
(tmpdata$CROPDMG * tmpdata$CROPDMGEXP)
Next the raw data is aggregated again by event type and the total damage amounts are summed to create a new data frame. This metric is called ECONIMPACT in the aggregated data frame.
# create ECONIMPACT metric by event type using aggregate() and summing
# the property and crop damage amounts.
econimpact <- aggregate(TOTDMG ~ EVTYPE, data=tmpdata, sum)
names(econimpact)[2] <- "ECONIMPACT"
# sort the data by the HARMEVCOUNT in descending order
econimpact <- econimpact[order(-econimpact$ECONIMPACT),]
To answer the 2 questions posed in the assignment, the top 5 most harmful events for both population health and economic consequences are determined and plotted.
pophealth <- pophealth[1:5, ]
econimpact <- econimpact[1:5, ]
barplot(pophealth$HARMEVCOUNT,
names.arg=pophealth$EVTYPE,
main="Top 5 Weather Events Most Harmful To Population Health",
xlab="Event Type",
ylab="Number of Harmful Events")
The first barplot above shows that the event types which are the most harmful to population health are: tornado, excessive heat, thunderstorm wind, flood, and lightning.
Next create a similar barplot showing the weather events with the greatest economic consequences.
barplot(econimpact$ECONIMPACT / 1e+09,
names.arg=econimpact$EVTYPE,
main="Top 5 Weather Events With Greatest Economic Impact",
xlab="Event Type",
ylab="Economic Impact in Billions of Dollars")
The above barplot shows that the event types which have the greatest economic impact are: flood, hurricane/typhoon, tornado, storm surge, and hail.