Reproducible Research: Peer Assessment 2 - Storm and Severe Weather Events

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

My assessment

Tornadoes are by far the worst weather event for property damaage and personal health (both fatalities and injuries). Crops are most affected by water-related events (floods, hurricanes, and hail). Interestingly, there is a very large seperation between tornadoes and all other events when looking at fatalaties, injuries and property damage. There is a clear separation when examining the events on the whole.

Background information

Data for this assignment
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ

Assignment

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.

Data Downloading/Fetching

file_loc <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2F
StormData.csv.bz2"
#download file if we don't have it already
if (!file.exists("StormData.csv.bz2")) {
    download.file(file_loc, destfile="StormData.csv.bz2")
    message("StormData.csv.bz2 has been downloaded!")
} else message("File already downloaded!")
## File already downloaded!
#unzip the file if it's not unzipped already
if (!file.exists("StormData.csv")) {
    bunzip2("StormData.csv.bz2")
    message("StormData has been unzipped!")
} else message("File already unzipped!")
## File already unzipped!
if (!exists("rawData")) {
    rawData <- read.csv("StormData.csv")
    message("rawData has been loaded!")
} else message("rawData already loaded!")
## rawData has been loaded!

Data Processing

Here we begin to process & clean the data.

#all we care about are a few of the columns that detail casulties or damage

myData <- subset(rawData,
                 select = c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG",
                            "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))

#delete summaries
myData <- myData[-grep("SUMMARY", myData$EVTYPE, ignore.case = TRUE)]

#need to clean up some mispellings and correlate like items
myData$myEVTYPE[grepl("THUNDERSTORM",myData$EVTYPE)] <- "THUNDERSTORM"
myData$myEVTYPE[grepl("TSTM",myData$EVTYPE)] <- "THUNDERSTORM"
myData$myEVTYPE[grepl("SNOW",myData$EVTYPE)] <- "WINTER STORM"
myData$myEVTYPE[grepl("WINT",myData$EVTYPE)] <- "WINTER STORM"
myData$myEVTYPE[grepl("ICE",myData$EVTYPE)] <- "WINTER STORM"
myData$myEVTYPE[grepl("FREEZ",myData$EVTYPE)] <- "WINTER STORM"
myData$myEVTYPE[grepl("LIGHTNING",myData$EVTYPE)] <- "THUNDERSTORM"
myData$myEVTYPE[grepl("CHILL",myData$EVTYPE)] <- "COLD"
myData$myEVTYPE[grepl("COLD",myData$EVTYPE)] <- "COLD"
myData$myEVTYPE[grepl("WIND",myData$EVTYPE)] <- "HIGH WIND"
myData$myEVTYPE[grepl("FLOOD",myData$EVTYPE)] <- "FLOOD"
myData$myEVTYPE[grepl("URBAN",myData$EVTYPE)] <- "FLOOD"
myData$myEVTYPE[grepl("HURRICANE",myData$EVTYPE)] <- "HURRICANE"
myData$myEVTYPE[grepl("SUMMARY",myData$EVTYPE)] <- "HURRICANE"
myData$myEVTYPE[grepl("TROPICAL",myData$EVTYPE)] <- "HURRICANE"
myData$myEVTYPE[grepl("TORN",myData$EVTYPE)] <- "TORNADO"
myData$myEVTYPE[grepl("WATERSPROUT",myData$EVTYPE)] <- "TORNADO"
myData$myEVTYPE[grepl("FUNNEL",myData$EVTYPE)] <- "TORNADO"
myData$myEVTYPE[grepl("HAIL",myData$EVTYPE)] <- "HAIL"
myData$myEVTYPE[grepl("RAIN",myData$EVTYPE)] <- "HEAVY RAIN"
myData$myEVTYPE[grepl("FIRE",myData$EVTYPE)] <- "WILDFIRE"
myData$myEVTYPE[grepl("SPOUT",myData$EVTYPE)] <- "TORNADO"
myData$myEVTYPE[grepl("MICRO",myData$EVTYPE)] <- "TORNADO"
myData$myEVTYPE[grepl("DRY",myData$EVTYPE)] <- "DROUGHT"
myData$myEVTYPE[grepl("HEAT",myData$EVTYPE)] <- "HEAT"
myData$myEVTYPE <- as.factor(myData$myEVTYPE)

We'll need to adjust the damage figures as the StormData.csv lists “k” for thousands, “m” for millions, etc. For simplicity's sake - we'll keep everything in Billions (B's)

#adjust property damage
myData$myPropDmg <- myData$PROPDMG

myData$myPropDmg[myData$PROPDMGEXP == "K"] <- 
    myData$PROPDMG[myData$PROPDMGEXP == "K"] * 1000
myData$myPropDmg[myData$PROPDMGEXP == "M"] <- 
    myData$PROPDMG[myData$PROPDMGEXP == "M"] * 1000000
myData$myPropDmg[myData$PROPDMGEXP == "B"] <- 
    myData$PROPDMG[myData$PROPDMGEXP == "B"] * 1000000

#adjust crop damage
myData$myCropDmg <- myData$CROPDMG

myData$myCropDmg[myData$CROPDMGEXP == "K"] <- 
    myData$CROPDMG[myData$CROPDMGEXP == "K"] * 1000
myData$myCropDmg[myData$CROPDMGEXP == "M"] <- 
    myData$CROPDMG[myData$CROPDMGEXP == "M"] * 1000000
myData$myCropDmg[myData$CROPDMGEXP == "B"] <- 
    myData$CROPDMG[myData$CROPDMGEXP == "B"] * 1000000

#turn everything back into M's
myData$myPropDmg <- myData$myPropDmg / 1000000000
myData$myCropDmg <- myData$myCropDmg / 1000000000

#concentrate on propDmg events
propDmg <- tapply(myData$myPropDmg, myData$myEVTYPE, FUN=sum, na.rm=TRUE)
propDmg <- propDmg[order(propDmg, decreasing=TRUE)]

#concentrate on propDmg events
cropDmg <- tapply(myData$myCropDmg, myData$myEVTYPE, FUN=sum, na.rm=TRUE)
cropDmg <- cropDmg[order(cropDmg, decreasing=TRUE)]

#combine crop and prop damage
allDmg <- c(cropDmg, propDmg)
allDmg <- allDmg[order(allDmg, decreasing=TRUE)]

Here we begin to correlate fatality producing events

#get only fatalities
fatalities <- tapply(myData$FATALITIES, myData$myEVTYPE, FUN=sum, na.rm=TRUE)

#gets all Fatalities
allFatalities <- fatalities[order(fatalities, decreasing=TRUE)]

#takes only top ten fatality producing events
topFatalities <- allFatalities[1:10]

topFatalities
##      TORNADO         HEAT        FLOOD    HIGH WIND THUNDERSTORM 
##         5639         3138         1550         1413          818 
## WINTER STORM         COLD    HURRICANE   HEAVY RAIN     WILDFIRE 
##          520          222          201          114           90

Let's look at the top ten injury producing events

#get only Injuries
Injuries <- tapply(myData$INJURIES, myData$myEVTYPE, FUN=sum, na.rm=TRUE)

#gets all Injuries
allInjuries <- Injuries[order(Injuries, decreasing=TRUE)]

#takes only top ten casuality producing events
topInjuries <- allInjuries[1:10]

topInjuries
##      TORNADO    HIGH WIND         HEAT        FLOOD THUNDERSTORM 
##        91439        11398         9154         8682         5271 
## WINTER STORM    HURRICANE     WILDFIRE         HAIL   HEAVY RAIN 
##         5236         1709         1608         1467          301

For the our purposes - all Injuries and Fatalities are most harmful with respect to population health

#Combine/add Injuries and Fatalities
#topHealth <- topFatalities + topInjuries
topHealth <- c(topInjuries, topFatalities)

topHealth
##      TORNADO    HIGH WIND         HEAT        FLOOD THUNDERSTORM 
##        91439        11398         9154         8682         5271 
## WINTER STORM    HURRICANE     WILDFIRE         HAIL   HEAVY RAIN 
##         5236         1709         1608         1467          301 
##      TORNADO         HEAT        FLOOD    HIGH WIND THUNDERSTORM 
##         5639         3138         1550         1413          818 
## WINTER STORM         COLD    HURRICANE   HEAVY RAIN     WILDFIRE 
##          520          222          201          114           90

Results

Question 1

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Answer

It's easy to see that “tornado” and “high wind” events were the most hazardous to overall population health.

barplot(topHealth[1:5]/1000,
        main="Events causing fatalities and injuries",
        col=c("yellow"),
        xlab="Event",
        ylab="Frequency in thousands",
        ylim=c(0,100))

plot of chunk Plot Health Events

Question 2

Across the United States, which types of events have the greatest economic consequences?

Answer

The highest property damage event was “tornado” with damage at $51.7B

The highest crop damage event was “flood” with damage at $7.25B

The top damange producing, economic event was “tornado” with damage at $51.7B

Here we look at the Crop Damage Events

barplot(allDmg[1:5], 
        xlab="Event", 
        ylab="Damage in Billions", 
        main="Top 5 Economic Damaging Events",
        col=c("purple"),
        ylim=c(0,60))

plot of chunk Plot crop and prop

Final Conclusion

If we compare everything side by side, we can see some insights.

library(lattice)
#couple variables to make life easier when plotting
my_las <- 2
num_events <- 10

par(mfrow=c(2,2))
par(mar=c(9,3,1,1))

barplot(cropDmg[1:num_events], 
        ylab="Damage in Billions", 
        main="Top Crop Damage Events",
        col=c("red"),
        las=my_las,
        ylim=c(0,60))

barplot(propDmg[1:num_events], 
        ylab="Damage in Billions", 
        main="Top Property Damaging Events",
        col=c("purple"),
        las=my_las,
        ylim=c(0,60))

barplot(allInjuries[1:num_events]/1000,
        col=c("yellow"),
        ylab="Frequency in thousands",
        main="Top Injury Events",
        las=my_las,
        ylim=c(0,100))

barplot(allFatalities[1:num_events]/1000,
        col=c("green"),
        ylab="Frequency in thousands",
        main="Top Fatality Events",
        las=my_las,
        ylim=c(0,100))

plot of chunk unnamed-chunk-1

Although we cannot put a cost on human life, we can see that tornadoes cause both catetrophic loss of life and property. Interestingly enough, tornadoes do not cause the largest amount of loss to crops. If we examine the data without tornadoes, floods become the highest damage producing event, but heat is the top casulty producing event (interestingly enough, heat is fairly low on the economic damaging events)