IMPACT OF SEVERE WEATHER EVENTS IN THE US

1. Synopsis

This is the final analysis in the Coursera Reproducible Research course, part of the Data Science Specialization. This assignment explores the NOAA Storm Database and the effects of severe weather events on both population and economy.

The data is spread between 1950 and November 2011. In the earlier years of the data there has been less events recorded.

The analysis investigates different types of sever weather events and how harmful they are on the populations health in respect of general injuries and fatalities. Economic consequences will be also be explored including the financial damage to both general property and agriculture.

2. Data Processing

The data can be downloaded from the course website: Storm Data. Documentation of the database is available here: .National Weather Service Storm Data Documentation .National Climatic Data Center Storm Events FAQ

The following packages will be needed in the course of this analysis

library(plyr)
library(ggplot2)
library(gridExtra)
library(grid)

Set working directory

setwd("C://Users//u182335//Documents//DataScience//Course 5 Week 2")

Read the data (only once downloaded and set to the correct working directory).

stormData <- read.csv(bzfile("repdata%2Fdata%2FStormData.csv.bz2"))

This analysis looks at the health and economic consequences of severe weather events so a subset of the following columns is required (EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

stormDataRed <- stormData[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]

The data will now be prepared to investigate the effects on populations health and the economic consequences:

2.1. Population Health

The fatalaties and injuries are grouped according to the event type and then ordered decreasingly.

harm2health <- ddply(stormDataRed, .(EVTYPE), summarize,fatalities = sum(FATALITIES),injuries = sum(INJURIES))
fatal <- harm2health[order(harm2health$fatalities, decreasing = T), ]
injury <- harm2health[order(harm2health$injuries, decreasing = T), ]

2.2. Economic Consequences

Exponential values have been stored in a seperate column describing their value with letters (h = hundred, k = thousand, m = million, b = billion), the calucalion of the financial damage becomes more complicated. The first step deals with this using a function that converts the letter value of the exponent to a usable number.

getExp <- function(e) {
  if (e %in% c("h", "H"))
    return(2)
  else if (e %in% c("k", "K"))
    return(3)
  else if (e %in% c("m", "M"))
    return(6)
  else if (e %in% c("b", "B"))
    return(9)
  else if (!is.na(as.numeric(e))) 
    return(as.numeric(e))
  else if (e %in% c("", "-", "?", "+"))
    return(0)
  else {
    stop("Invalid value.")
  }
}

Using this function, the actual values are calculated for property damage and crop damage.

propExp <- sapply(stormDataRed$PROPDMGEXP, FUN=getExp)
stormDataRed$propDamage <- stormDataRed$PROPDMG * (10 ** propExp)
cropExp <- sapply(stormDataRed$CROPDMGEXP, FUN=getExp)
stormDataRed$cropDamage <- stormDataRed$CROPDMG * (10 ** cropExp)

Now the financial damage for crops and property have to be grouped according to the event type.

econDamage <- ddply(stormDataRed, .(EVTYPE), summarize,propDamage = sum(propDamage), cropDamage = sum(cropDamage))

Events not causing any financial damage are excluded.

econDamage <- econDamage[(econDamage$propDamage > 0 | econDamage$cropDamage > 0), ]

Finally the data is sorted decreasingly:

propDmgSorted <- econDamage[order(econDamage$propDamage, decreasing = T), ]
cropDmgSorted <- econDamage[order(econDamage$cropDamage, decreasing = T), ]

Now that the data has been processed the results can be derived.

3. Results

3.1. Effects on population health

Lists of the Top 5 weather events affecting the populations health (injuries and deaths) are provided. This is for injuries as well as fatal eventsk, the most devastating events are tornados in the given time period.

head(injury[, c("EVTYPE", "injuries")],5)
##             EVTYPE injuries
## 834        TORNADO    91346
## 856      TSTM WIND     6957
## 170          FLOOD     6789
## 130 EXCESSIVE HEAT     6525
## 464      LIGHTNING     5230
head(fatal[, c("EVTYPE", "fatalities")],5)
##             EVTYPE fatalities
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816

The plots of the Top 10 events display an even clearer picture:

  p1 <- ggplot(data=head(injury,10), aes(x=reorder(EVTYPE, injuries), y=injuries)) +
  geom_bar(fill="olivedrab",stat="identity")  + coord_flip() + 
  ylab("Total number of injuries") + xlab("Event type") +
  ggtitle("Health impact of weather events in the US - Top 10") +
  theme(legend.position="none")

p2 <- ggplot(data=head(fatal,10), aes(x=reorder(EVTYPE, fatalities), y=fatalities)) +
  geom_bar(fill="red4",stat="identity") + coord_flip() +
  ylab("Total number of fatalities") + xlab("Event type") +
  theme(legend.position="none")

grid.arrange(p1, p2, nrow =2)

The plots show that tornados are the most dangerous events when it comes to population health. Heat and floods also show to cause a high number of deaths and injuries.

3.2. Economic Consequences

Lists of the Top 5 weather events causing financial damage to both property and crops are displayed below. For property; flash floods, thunderstorms and tornados look to cause by far the most damage. As expected the weather event causing most financial damage in respect to agriculture (i.e. crops) is drought. Flood events as well as hail and ice storms do also cause a substancial amount of damage.

head(propDmgSorted[, c("EVTYPE", "propDamage")], 5)
##                 EVTYPE   propDamage
## 153        FLASH FLOOD 6.820237e+13
## 786 THUNDERSTORM WINDS 2.086532e+13
## 834            TORNADO 1.078951e+12
## 244               HAIL 3.157558e+11
## 464          LIGHTNING 1.729433e+11
head(cropDmgSorted[, c("EVTYPE", "cropDamage")], 5)
##          EVTYPE  cropDamage
## 95      DROUGHT 13972566000
## 170       FLOOD  5661968450
## 590 RIVER FLOOD  5029459000
## 427   ICE STORM  5022113500
## 244        HAIL  3025974480

When we plot the Top 10 events for both property and crop damage their is more evidence to support the findings above (Note that for plotting the property damage a logarithmic scale was used in order to increase the readability of the plot)

p1 <- ggplot(data=head(propDmgSorted,10), aes(x=reorder(EVTYPE, propDamage), y=log10(propDamage), fill=propDamage )) +
  geom_bar(fill="darkred", stat="identity") + coord_flip() +
  xlab("Event type") + ylab("Property damage in dollars (log10)") +
  ggtitle("Economic impact of weather events in the US - Top 10") +
  theme(plot.title = element_text(hjust = 0))

p2 <- ggplot(data=head(cropDmgSorted,10), aes(x=reorder(EVTYPE, cropDamage), y=cropDamage, fill=cropDamage)) +
  geom_bar(fill="goldenrod", stat="identity") + coord_flip() + 
  xlab("Event type") + ylab("Crop damage in dollars") + 
  theme(legend.position="none")

grid.arrange(p1, p2, ncol=1, nrow =2)