Synopsis

This data analysis evaluates National Weather Service data collected from 1950 through 2011. Due to a lack of good record keeping, earlier records are less complete.

The goal of this analysis is to assess the types of weather events that cause the most harm to human health in the form of injuries and death, and to the economy in the form of property damage and crop damage costs.

This is the final project for the Coursera course titled Reproducible Research.

Data Processing

Several packages were loaded and utilized for this analysis.

#Add packages
library(downloader)
library(plyr)
library(ggplot2)
library(gridExtra)
library(grid)

Data was downloaded using the downloader package, and read into the project. The data file will be saved in the working directory.

url="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
library(downloader)
download(url, dest="stormdata.csv", mode="wb")
sd<- read.csv("stormdata.csv")

The entire data set is not necessary for the analysis, and only 7 variables were selected and placed in a subset.

sdKeep <- sd[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG","CROPDMGEXP")]

Impacts on Public Health

Injury and fatality data was summarized by weather event type, then put in decreasing order for each.

harm <- ddply(sdKeep, .(EVTYPE), summarize, fatalities =sum(FATALITIES), injuries = sum(INJURIES))
fatal<-harm[order(harm$fatalities, decreasing = T), ]
injury <-harm[order(harm$injuries, decreasing = T), ]

Impacts on the Economy

Cost data in this data set is not in ideal form for analysis. Columns described cost with letters as an indicator of magnitude(hundreds, thousands, etc.), which had to be converted into numbers. A function was created for this process.

Exp <- function(e) {
    if (e %in% c("h", "H"))
        return(2)
    else if (e %in% c("k", "K"))
        return(3)
    else if (e %in% c("m", "M"))
        return(6)
    else if (e %in% c("b", "B"))
        return(9)
    else if (!is.na(as.numeric(e))) 
        return(as.numeric(e))
    else if (e %in% c("", "-", "?", "+"))
        return(0)
    else {
        stop("Invalid value.")
    }
}

The function was then uitilized to produce values of property anc crop damage that could be easily compared.

propExp <- sapply(sdKeep$PROPDMGEXP, FUN=Exp)
sdKeep$propDamage <- sdKeep$PROPDMG * (10 ** propExp)
cropExp <- sapply(sdKeep$CROPDMGEXP, FUN=Exp)
sdKeep$cropDamage <- sdKeep$CROPDMG * (10 ** cropExp)

Once this was performed cost data was summarized by event type.

cost <-ddply(sdKeep, .(EVTYPE), summarize, propDamage = sum(propDamage), 
             cropDamage = sum(cropDamage))

Events without any cost were excluded.

cost <- cost[(cost$propDamage > 0 | cost$cropDamage > 0), ]

Data was then sorted in decreasing order

propOrder <-cost[order(cost$propDamage, decreasing = TRUE), ]
cropOrder <-cost[order(cost$cropDamage, decreasing = TRUE), ]

With data processing complete, results can now be assessed.

Results

Impacts on Public Health

Lists were produced of the Top 10 weather events affecting public health in the form of injuries and fatalities.

head(injury[, c("EVTYPE", "injuries")], 10)
##                EVTYPE injuries
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361
head(fatal[, c("EVTYPE", "fatalities")], 10)
##             EVTYPE fatalities
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224

Plots were created to demonstrate the results in a more visual form. The impacts of Tornadoes is quite apparent. Note there are weather events “Heat” and “Excessive Heat” here. The difference between these two depend on regionally and locally defined heat advisory criteria. Flooding is another contender here, which again is separated into different types based on zoning, terrain, and other characteristics.

injplot <- ggplot(data=head(injury,10), aes(x=reorder(EVTYPE, injuries), y=injuries)) +
                geom_bar(fill= "blue",stat="identity") + coord_flip() +  ylab("Total number of injuries") + 
                xlab("Weather Event Type") + 
                ggtitle("Injuries due to Weather Events, Top 10") +
                  theme(legend.position="none")

fatalplot <- ggplot(data=head(fatal,10), aes(x=reorder(EVTYPE, fatalities), y=fatalities)) +
                geom_bar(fill= "red",stat="identity") + coord_flip() + 
                ylab("Total number of deaths") + xlab("Weather Event Type") +
                ggtitle("Fatalities due to Weather Events, Top 10") +
                  theme(legend.position="none")
grid.arrange(injplot, fatalplot, nrow=2)

## Impact on the Economy The top 10 weather events witht the highest property and crop damage costs are listed below. Flooding is seen several times for property damage. Drought leads in damage to crops.

head(propOrder[, c("EVTYPE", "propDamage")], 10)
##                 EVTYPE   propDamage
## 153        FLASH FLOOD 6.820237e+13
## 786 THUNDERSTORM WINDS 2.086532e+13
## 834            TORNADO 1.078951e+12
## 244               HAIL 3.157558e+11
## 464          LIGHTNING 1.729433e+11
## 170              FLOOD 1.446577e+11
## 411  HURRICANE/TYPHOON 6.930584e+10
## 185           FLOODING 5.920826e+10
## 670        STORM SURGE 4.332354e+10
## 310         HEAVY SNOW 1.793259e+10
head(cropOrder[, c("EVTYPE", "cropDamage")], 10)
##                EVTYPE  cropDamage
## 95            DROUGHT 13972566000
## 170             FLOOD  5661968450
## 590       RIVER FLOOD  5029459000
## 427         ICE STORM  5022113500
## 244              HAIL  3025974480
## 402         HURRICANE  2741910000
## 411 HURRICANE/TYPHOON  2607872800
## 153       FLASH FLOOD  1421317100
## 140      EXTREME COLD  1292973000
## 212      FROST/FREEZE  1094086000

Again, visualization of this demonstrates the magnitude of costs by these weather events. Due to high values the log10 is shown here.

propplot <- ggplot(data=head(propOrder,10), aes(x=reorder(EVTYPE, propDamage),
                y=log10(propDamage), fill=propDamage)) + 
                geom_bar( fill= "darkblue", stat="identity") + coord_flip() +
                ylab("Property Damage, USD, log10") + xlab("Weather Event Type") +
                ggtitle("Property Damage due to Weather Events, Top 10") +
                theme(plot.title = element_text(hjust = 0))
                
cropplot <- ggplot(data=head(cropOrder,10), aes(x=reorder(EVTYPE, cropDamage),
                y=log10(cropDamage), fill=cropDamage)) +
                geom_bar(fill= "darkred", stat="identity") + 
                coord_flip() + ylab("Crop Damage, USD, log10") +
                xlab("Weather Event Type") +
                ggtitle("Crop Damage due to Weather Events, Top 10") +
                theme(legend.position="none")
        
grid.arrange(propplot, cropplot, nrow=2)