Synopsis

This project explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. This project addresses the following questions: 1) Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? 2) Across the United States, which types of events have the greatest economic consequences?

Data Processing

The data for this project come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size and can downloaded from the course web site: Storm Data.

#Downloading the data and reading it into r
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "stormData.csv.bz2")
storms <- read.csv("storms.csv.bz2",stringsAsFactors=F)

In order to reduce the size of the dataset, we select the followingvariables of interests, using dply: EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP.

storms2 <- select(storms, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

The next step of the data processing is to summarize the data on population health (FATALITIES, INJURIES), according to the event type and then arrange the data in decreasing order.

popHealth <- ddply(storms2, .(EVTYPE), summarize, fatalities = sum(FATALITIES),injuries = sum(INJURIES))
fatalities <- popHealth[order(popHealth$fatalities, decreasing = T), ]
injury <- popHealth[order(popHealth$injuries, decreasing = T), ]

The last step of the data processing is calculate the value of the economic (PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP). To do this, we need first to convert the information (letters and numbers, such as h = hundred, k = thousand, m = million, b = billion, etc.) used in the dataset into usable number, applying a function:

getExp <- function(e) {
    if (e %in% c("h", "H"))
        return(2)
    else if (e %in% c("k", "K"))
        return(3)
    else if (e %in% c("m", "M"))
        return(6)
    else if (e %in% c("b", "B"))
        return(9)
    else if (!is.na(as.numeric(e))) 
        return(as.numeric(e))
    else if (e %in% c("", "-", "?", "+"))
        return(0)
    else {
        stop("Invalid value.")
    }
}

Applying the above function, we calculate the values of property damage and crop damage.

propExp <- sapply(storms2$PROPDMGEXP, FUN=getExp)
## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion
storms2$propDamage <- storms2$PROPDMG * (10 ** propExp)
cropExp <- sapply(storms2$CROPDMGEXP, FUN=getExp)
## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion

## Warning in FUN(X[[i]], ...): NAs introduced by coercion
storms2$cropDamage <- storms2$CROPDMG * (10 ** cropExp)
# Summarizing the financial damage for crops and property according to the event type
econDamage <- ddply(storms2, .(EVTYPE), summarize,propDamage = sum(propDamage), cropDamage = sum(cropDamage))
#Omitting the events not causing any financial damage:
econDamage <- econDamage[(econDamage$propDamage > 0 | econDamage$cropDamage > 0), ]
# Arranging the data in decreasing order
propDmgSorted <- econDamage[order(econDamage$propDamage, decreasing = T), ]
cropDmgSorted <- econDamage[order(econDamage$cropDamage, decreasing = T), ]

Results

Population Health

We can now identify and rank order the top 5 types of events affecting the population health (FATALITIES and INJURIES).
First, with regard to fatalities, the top 5 types of events are: TORNADO, EXCESSIVE HEAT, FLASH FLOOD, HEAT, and LIGHTNING.

fatalities <- popHealth[order(popHealth$fatalities, decreasing = T), ]

Second, with regard to injuries, the top 5 types of events: TORNADO, TSTM WIND, FLOOD, EXCESSIVE HEAT, and LIGHTNING.

injury <- popHealth[order(popHealth$injuries, decreasing = T), ]

We are plotting below these events along with numbers of victims:

plot1 <- ggplot(data=head(injury,10), aes(x=reorder(EVTYPE, injuries), y=injuries)) +
   geom_bar(fill="olivedrab",stat="identity")  + coord_flip() + 
    ylab("Total number of injuries") + xlab("Event type") +
    ggtitle("Impacts on Population Health of Five Top Weather Events") +
    theme(legend.position="none")

plot2 <- ggplot(data=head(fatalities,10), aes(x=reorder(EVTYPE, fatalities), y=fatalities)) +
    geom_bar(fill="red4",stat="identity") + coord_flip() +
    ylab("Total number of fatalities") + xlab("Event type") +
    theme(legend.position="none")

grid.arrange(plot1, plot2, nrow =2)

Greatest Economic Consequences

We can also identify and rank order the top 5 types of events causing the greatest economic consequences.
First, for the property, the 5 top types of events causing the greatest economic consequences are: FLOOD, HURRICANE/TYPHOON, TORNADO, STORM SURGE, and FLASH FLOOD.

propDmgSorted <- econDamage[order(econDamage$propDamage, decreasing = T), ]

Second, for the crop, the 5 top types of events causing the greatest economic consequences are: DROUGHT, FLOOD, RIVER FLOOD, ICE STORM, and HAIL.

cropDmgSorted <- econDamage[order(econDamage$cropDamage, decreasing = T), ]

We are plotting below these events along with the values of their respective damages:

plot1 <- ggplot(data=head(propDmgSorted,10), aes(x=reorder(EVTYPE, propDamage), y=log10(propDamage), fill=propDamage )) +
    geom_bar(fill="blue", stat="identity") + coord_flip() +
    xlab("Event type") + ylab("Property damage in dollars (log10)") +
    ggtitle("Top Five Weather Events Causing the Greatest Economic Consequences") +
    theme(plot.title = element_text(hjust = 0))

plot2 <- ggplot(data=head(cropDmgSorted,10), aes(x=reorder(EVTYPE, cropDamage), y=cropDamage, fill=cropDamage)) +
    geom_bar(fill="green", stat="identity") + coord_flip() + 
    xlab("Event type") + ylab("Crop damage in dollars") + 
    theme(legend.position="none")

grid.arrange(plot1, plot2, ncol=1, nrow =2)

Conclusion

This study identifies and evaluate the major types of events that are most harmful with respect to population health, and that are causing the greatest economic consequences.

First, with regard to population health:
- the top 5 types of events causing the most fatalities are: TORNADO, EXCESSIVE HEAT, FLASH FLOOD, HEAT, and LIGHTNING.
- the top 5 types of events causing the most injuries are: TORNADO, TSTM WIND, FLOOD, EXCESSIVE HEAT, and LIGHTNING.

Second, with regard to economic consequences:
- the 5 top types of events causing the greatest economic consequences for the properties are: FLOOD, HURRICANE/TYPHOON, TORNADO, STORM SURGE, and FLASH FLOOD.
- the 5 top types of events causing the greatest economic consequences for the crops are: DROUGHT, FLOOD, RIVER FLOOD, ICE STORM, and HAIL.