Synopsis

The analysis at hand represents the second assignment of the course Reproducible Research from the Coursera Data Science Specialization. The objective of the analysis is to investigate the effect of severe weather events on the US population and economy using the Storm Database of the National Oceanic and Atmospheric Administration (NOAA). The impact on the population is measured through fatalities and injuries, whereas economic harm is measured via financial damage on crops and properties. The database contains data from 1950 to 2011. More data tends to be available for the more recent years of the observation period.

Data Processing

The data for the analysis is available as flatfile in CSV-format (Comma-Separated-Value) compressed with the bzip2 algorithm. At the time of the analysus, it can be downloaded here:

https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

The data was downloaded using the mentioned URL. It was then renamed into Stormdata.csv and moved to the local R working directory.

Loading base data

The CSV-file is read into R from the local working directory.

## read CSV-file:
basedata <- read.csv("Stormdata.csv", header=TRUE, na.strings = "") 

Reducing base data and generate dataset for analysis

Seven variables out of the available ones were identified to be relevant for the subsequent analysis steps:

  1. EVTYPE represents the type of weather event (e.g. tornado, flood, etc.)
  2. FATALITIES is one measure of harm to the population
  3. INJURIES is another measure of harm to the population
  4. PROPDMG measures property damage in USD and therefore reflects economic damage
  5. PROPDMGEXP reflects the magnitude of property damage (e.g. hundreds, thousands etc.)
  6. CROPDMG measures crop damage in USD and hence reflects economic damage
  7. CROPDMGEXP reflects the magnitude of crop damage (e.g. hundreds, thousands etc.)

As a consequence the original dataset is now subset to come up with an analysis dataset that only contains the needed variables.

## subset relevant variables:
stormdata <- basedata[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")] 

Aggregate and visualize data

The first question of the assignment asks for the most harmful events with regard to the health of the population. Event types are reflected by the variable EVTYPE which is contained in the analysis dataset.

In the next step fatalaties and injuries are accumulated by event type using functions from the plyr package that will be loaded in the code chunk. In oder to evaluate the impact of the event type, the aformentioned aggregates are then ordered in decreasing order.

## load library plyr:
library(plyr) 

## aggregate fatalities and injuries:
Harm_to_population <- ddply(stormdata, .(EVTYPE), summarize,fatalities = sum(FATALITIES),injuries = sum(INJURIES)) 

## order aggregated data decreasingly by fatalities and assign to new vector
FatalIncidents <- Harm_to_population[order(Harm_to_population$fatalities, decreasing = T), ] 

## order aggregated data decreasingly by injuries and assign to new vector
InjuryIncidents <- Harm_to_population[order(Harm_to_population$injuries, decreasing = T), ]

Results

Identify Top 10 of Event Types for fatalities and incidents

The conducted aggregation forms the basis for the identification of the top 10 weather events that led to fatalities and injuries. The shortlisted data is then visualized using the package ggplot2 which is loaded in the subsequent code chunck.

## use head-function to calculate top 10 events in terms of fatalities
FatalIncidentsTop10 <-head(FatalIncidents[order(FatalIncidents$fatalities,decreasing=T),],10) 

## use head-function to calculate top 10 events in terms of injuries
InjuryIncidentsTop10 <-head(InjuryIncidents[order(InjuryIncidents$injuries,decreasing=T),],10) 

## load libary ggplotw
library(ggplot2)

## plot top 10 events - fatalities
ggplot(data = FatalIncidentsTop10, aes(x = FatalIncidentsTop10$EVTYPE, y = FatalIncidentsTop10$fatalities)) + geom_bar(fill="steelblue", stat = "identity") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Event Type") + ylab("No. of Fatalities") + ggtitle("NOAA Top 10: Highest Fatality Counts, 1950-2011")

## plot top 10 events - injuries
ggplot(data = InjuryIncidentsTop10, aes(x = InjuryIncidentsTop10$EVTYPE, y = InjuryIncidentsTop10$injuries)) + geom_bar(fill="orange", stat = "identity") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Event Type") + ylab("No. of Injuries") + ggtitle("NOAA Top 10: Highest Injury Counts, 1950-2011")

Tornados clearly stand out as most harmful type of weather event both in terms of fatalities and injuries caused.

Identify Top 10 of Event Types in terms of Economic Damage

Also the top 10 weather events causing economic damages shall be evaluated. Economic damage is measured using a numeric variable for harm on properties and crops. The exponential values for the damage estimates are stored in a separate column. These exponents are represented by letters such as h for hundred and k for thousand. This has to be taken into account to calculate the total economic damage. The following code chunck therefore contains steps to convert the data accordingly.

## replace missing values
stormdata$PROPDMG[(stormdata$PROPDMG == "")] <- 0
stormdata$CROPDMG[(stormdata$CROPDMG == "")] <- 0

## convert into character
stormdata$PROPDMGEXP <- as.character(stormdata$PROPDMGEXP)
stormdata$CROPDMGEXP <- as.character(stormdata$CROPDMGEXP)

## conduct conversion for property damage: letter is transponded to respective exponent, e.g. h = 2 for later 10^2 = 100
stormdata$PROPDMGEXP[(stormdata$PROPDMGEXP == "")] <- 0
stormdata$PROPDMGEXP[(stormdata$PROPDMGEXP == "+") | (stormdata$PROPDMGEXP == "-") | (stormdata$PROPDMGEXP == "?")] <- 1
stormdata$PROPDMGEXP[(stormdata$PROPDMGEXP == "h") | (stormdata$PROPDMGEXP == "H")] <- 2
stormdata$PROPDMGEXP[(stormdata$PROPDMGEXP == "k") | (stormdata$PROPDMGEXP == "K")] <- 3
stormdata$PROPDMGEXP[(stormdata$PROPDMGEXP == "m") | (stormdata$PROPDMGEXP == "M")] <- 6
stormdata$PROPDMGEXP[(stormdata$PROPDMGEXP == "B")] <- 9

## conduct conversion for crop damage: letter is transponded to respective exponent, e.g. h = 2 for later 10^2 = 100
stormdata$CROPDMGEXP[(stormdata$CROPDMGEXP == "")] <- 0
stormdata$CROPDMGEXP[(stormdata$CROPDMGEXP == "+") | (stormdata$CROPDMGEXP == "-") | (stormdata$CROPDMGEXP == "?")] <- 1
stormdata$CROPDMGEXP[(stormdata$CROPDMGEXP == "h") | (stormdata$CROPDMGEXP == "H")] <- 2
stormdata$CROPDMGEXP[(stormdata$CROPDMGEXP == "k") | (stormdata$CROPDMGEXP == "K")] <- 3
stormdata$CROPDMGEXP[(stormdata$CROPDMGEXP == "m") | (stormdata$CROPDMGEXP == "M")] <- 6
stormdata$CROPDMGEXP[(stormdata$CROPDMGEXP == "B")] <- 9

# re-convert to integer for computation of next step
stormdata$PROPDMGEXP  <- as.integer(stormdata$PROPDMGEXP)
stormdata$CROPDMGEXP <- as.integer(stormdata$CROPDMGEXP)

# calculate the total damage for each event and sum property and crop damage

economic_damage <- stormdata$PROPDMGEXP * 10^stormdata$PROPDMGEXP + stormdata$CROPDMGEXP * 10^stormdata$CROPDMGEXP
stormdata_econ_damage <- cbind(stormdata, economic_damage)

## subset relevant variables

stormdata_econ_damage <-stormdata_econ_damage[,c(1,2,3,8)]

## aggregate

EconomicDamagesAggregate <-aggregate(. ~ EVTYPE,data = stormdata_econ_damage ,FUN=sum)

## order dataset decreasingly by aggregated variable

EconomicDamagesAggregateSorted <- EconomicDamagesAggregate[order(EconomicDamagesAggregate$economic_damage, decreasing = T), ]

## use head-function to come up with top 10 events for economic damage

EconomicDamageIncidentsTop10 <- head(EconomicDamagesAggregateSorted, 10)

## visualize with ggplot2 which was loaded before

ggplot(data = EconomicDamageIncidentsTop10, aes(x = EconomicDamageIncidentsTop10$EVTYPE, y = EconomicDamageIncidentsTop10$economic_damage)) + geom_bar(fill="lightgreen", stat = "identity") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Event Type") + ylab("Economic Damage in mio. USD") + ggtitle("NOAA Top 10: Highest Economic Costs, 1950-2011")

Conclusion: With regard to economic damage caused, hurricanes/typhoons represent the most harmful type of weahter event.Floods are the runner-up, tornados rank third.