Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities.

In this project the NOAA Storm Database (U.S. National Oceanic and Atmospheric Administration’s) storm database will be explored. The data is from year 1950 till end of November 2011. Within database characteristics of major storms and weather events in the United States are tracked, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The analysis addresses the following two questions:

  1. Across the United States, which types of events are most harmful with respect to population health?
    It has been found that Tornados are the most harmful weather event for population health.
  2. Across the United States, which types of events have the greatest economic consequences?
    The analysis shows that Flood has the highest economic consequences accross the US.

Data Processing

The data has been obtain from Storm Data
The following documents include additional information:
National Weather Service National Climatic Data Center Storm Events Storm Data Documentation
National Climatic Data Center Storm Events FAQ

Reading the data from repdata_data_StormData.csv.bz2. Assuming this has been downloaded to the current working directory.

data <- read.csv("repdata_data_StormData.csv.bz2", sep = ",", header = T)

Loading required libraries: ggplot and dplyr:

library(ggplot2)
library(dplyr)

I’m using the full data from 1950 - 2011. It might be interesting in an a second analysis to use newer complete data only.
The following transformations are done:

# subsetting the data to only the relevant columns for this analysis
data <- select(data,EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)

# PROPDMGEXP column
# converting factor to character
data$PROPDMGEXP <- as.character(data$PROPDMGEXP)
# substituting to numbers
data$PROPDMGEXP = gsub("\\-|\\?|\\+","0",data$PROPDMGEXP)
data$PROPDMGEXP = gsub("H|h", "100", data$PROPDMGEXP)
data$PROPDMGEXP = gsub("M|m", "1000000", data$PROPDMGEXP)
data$PROPDMGEXP = gsub("B|b", "1000000000", data$PROPDMGEXP)
data$PROPDMGEXP = gsub("K|k", "1000", data$PROPDMGEXP)
# converting to numeric values
data$PROPDMGEXP <- as.numeric(data$PROPDMGEXP)

# CROPDMGEXP column
# converting factor to character
data$CROPDMGEXP <- as.character(data$CROPDMGEXP)
# substituting to numbers
data$CROPDMGEXP = gsub("\\-|\\?|\\+","0",data$CROPDMGEXP)
data$CROPDMGEXP = gsub("H|h", "100", data$CROPDMGEXP)
data$CROPDMGEXP = gsub("M|m", "1000000", data$CROPDMGEXP)
data$CROPDMGEXP = gsub("B|b", "1000000000", data$CROPDMGEXP)
data$CROPDMGEXP = gsub("K|k", "1000", data$CROPDMGEXP)
# converting to numeric values
data$CROPDMGEXP <- as.numeric(data$CROPDMGEXP)

# calculating total for Property and Crop
data$PROPTOTAL <- (data$PROPDMG * data$PROPDMGEXP) 
data$CROPTOTAL <- (data$CROPDMG * data$CROPDMGEXP) 

Results

 

1. Across the United States, which types of events are most harmful with respect to population health?

The numbers for fatalities and injuries are summed up separately by event type. Separate figures are plotted for the Top 10 on fatalities and injuries causing the highest harm.
 

# aggregating the data for Fatalities and Injuries by Event
data_h <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data, sum)
# ordering and get the top 10
top1 <- arrange(data_h, desc(FATALITIES))[1:10,]
top1$EVTYPE <- factor(top1$EVTYPE, levels=top1$EVTYPE)
top2 <- arrange(data_h, desc(INJURIES))[1:10,]
top2$EVTYPE <- factor(top2$EVTYPE, levels=top2$EVTYPE)

# calculating total for Property and Crop
data$PROPTOTAL <- (data$PROPDMG * data$PROPDMGEXP) 
data$CROPTOTAL <- (data$CROPDMG * data$CROPDMGEXP) 

# aggregating data to get total sum per Event for Damages
data_eco <- aggregate(PROPTOTAL+CROPTOTAL ~ EVTYPE, data, sum)
names(data_eco) <- c("EVTYPE", "TOTAL")

# ordering and get the top 10
top3 <- arrange(data_eco, desc(TOTAL))[1:10,]
top3$EVTYPE <- factor(top3$EVTYPE, levels=top3$EVTYPE)

 

Fatalities Plot
# using ggplot to show a barplot of the Top 10 events for health
ggplot(top1, aes(x=EVTYPE, y=FATALITIES)) + geom_bar(stat="identity", fill="slateblue", col="black") +
    theme(axis.text.x = element_text(angle = -90, hjust = 1)) + 
    xlab("Weather Event Type") + 
    ylab("Number of Fatalities") + 
    ggtitle("Number of Fatalities Caused by Top 10 Events")

 

Injuries Plot
ggplot(top2, aes(x=EVTYPE, y=INJURIES)) + geom_bar(stat="identity", fill="slateblue", col="black") +
    theme(axis.text.x = element_text(angle = -90, hjust = 1)) + 
    xlab("Weather Event Type") + 
    ylab("Number of Injuries") + 
    ggtitle("Number of Injuries Caused by Top 10 Events")

 

Both graphs show that Tornados are the most harmful weather event for population health.

     

2. Across the United States, which types of events have the greatest economic consequences?

The total costs for property and corp damage are summed up per event type.
For the Top 10 weather events a graph is generated to show those and the corresponding costs.

 

# aggregating data to get total sum per Event for Damages
data_eco <- aggregate(PROPTOTAL+CROPTOTAL ~ EVTYPE, data, sum)
names(data_eco) <- c("EVTYPE", "TOTAL")

# ordering and get the top 10
top3 <- arrange(data_eco, desc(TOTAL))[1:10,]
top3$EVTYPE <- factor(top3$EVTYPE, levels=top3$EVTYPE)

  ##### Property and Corp Damage Plot

ggplot(top3, aes(x=EVTYPE, y=TOTAL)) + geom_bar(stat="identity", fill="slateblue", col="black") +
    theme(axis.text.x = element_text(angle = -90, hjust = 1)) + 
    xlab("Weather Event Type") + 
    ylab("Property and Crop Damages in US$") + 
    ggtitle("Property and Crop Demages in US$ Caused by Top 10 Events")

  The barplot shows that Flood has the highest economic consequences