Storm Damage - Analyzing consequences of natural catastrophes

Synopsis

In this report we aim to describe the problems that can be caused by storms and other severe weather problems. Focus of the analysis is to identify which type of weather problem is most harmful with respect to population. Moreover we investigate which natural event has the greatest economic consequences.
The analysis is based on data provided by the NOAA Storm Database. Moreover the focus of the analysis is on weather problems occured in the US between 1990 an 2011.
In the next section we load and process the data. Following we present the results of the analysis.

Loading and Processing the Raw Data

The raw data can be downloaded from the course website as bzip2 file. We download the data and unzip the file in our local repository.

Reading the data

# read data
data <- read.csv(file = "repdata_data_StormData.csv", sep = ",")
# display structure, data types and dimension of data frame, variables
# str(data)

Cleaning the data

# eliminate inconsitencies in labes and match cases of EVTYPE
data$EVTYPE <- gsub("[ /\t.]+", " ", as.character(data$EVTYPE))
data$EVTYPE <- tolower(as.character(data$EVTYPE))

Aggregate data

# convert date into date format
data$date <- as.Date(as.character(data$BGN_DATE), format = "%m/%d/%Y")
# subset data to date range 1990 - 2011
dataNew <- subset(data, date >= as.Date("1990-01-01"))
# aggreagte relevant data
dataAgg <- aggregate(cbind(FATALITIES, PROPDMG) ~ EVTYPE, dataNew, FUN = sum)

The raw data consists of 37 variables and 902297 datasets. We subset the data to time range 1990-2011. This reduces the data frame to 751740 datasets. The variable EVTYPE describes the type of weather problem which will be the explanatory variable in the following analysis. The variable FATALITIES describes the deaths caused by the weather problem. FATALITIES will be the indicator for the thread to population. The attribute PROPDNG describes the damage caused by the externalities and is the indicator for economic damage.

Results

library(ggplot2)
summary(dataAgg)
##     EVTYPE            FATALITIES        PROPDMG       
##  Length:868         Min.   :   0.0   Min.   :      0  
##  Class :character   1st Qu.:   0.0   1st Qu.:      0  
##  Mode  :character   Median :   0.0   Median :      0  
##                     Mean   :  12.8   Mean   :  10669  
##                     3rd Qu.:   0.0   3rd Qu.:     50  
##                     Max.   :1903.0   Max.   :1588733
# Top 10 types for fatalities
top10Fat <- head(dataAgg[order(dataAgg$FATALITIES, decreasing = TRUE), ], 10)
top10Fat
##             EVTYPE FATALITIES   PROPDMG
## 115 excessive heat       1903    1460.0
## 736        tornado       1752 1588733.0
## 137    flash flood        978 1420124.6
## 239           heat        937     298.5
## 408      lightning        816  603351.8
## 152          flood        470  899938.5
## 513    rip current        368       1.0
## 757      tstm wind        327 1335995.6
## 311      high wind        248  324731.6
## 19       avalanche        224    1623.9
# plot top 10 fatalities
ggplot(data = top10Fat, aes(EVTYPE, FATALITIES)) + geom_bar() + labs(x = "Weather") + 
    opts(axis.text.x = theme_text(angle = -90))
## 'opts' is deprecated. Use 'theme' instead. (Deprecated; last used in version 0.9.1)
## theme_text is deprecated. Use 'element_text' instead. (Deprecated; last used in version 0.9.1)
## Mapping a variable to y and also using stat="bin".
##   With stat="bin", it will attempt to set the y value to the count of cases in each group.
##   This can result in unexpected behavior and will not be allowed in a future version of ggplot2.
##   If you want y to represent counts of cases, use stat="bin" and don't map a variable to y.
##   If you want y to represent values in the data, use stat="identity".
##   See ?geom_bar for examples. (Deprecated; last used in version 0.9.2)

plot of chunk results descriptive

# Top 10 types for fatalities
top10Dmg <- head(dataAgg[order(dataAgg$PROPDMG, decreasing = TRUE), ], 10)
top10Dmg
##                 EVTYPE FATALITIES PROPDMG
## 736            tornado       1752 1588733
## 137        flash flood        978 1420125
## 757          tstm wind        327 1335996
## 152              flood        470  899938
## 668  thunderstorm wind        133  876844
## 208               hail         15  688693
## 408          lightning        816  603352
## 693 thunderstorm winds         64  446318
## 311          high wind        248  324732
## 860       winter storm        206  132721
# plot top 10 damage
ggplot(data = top10Dmg, aes(EVTYPE, PROPDMG)) + geom_bar() + labs(x = "Weather", 
    y = "Damage") + opts(axis.text.x = theme_text(angle = -90))
## 'opts' is deprecated. Use 'theme' instead. (Deprecated; last used in version 0.9.1)
## theme_text is deprecated. Use 'element_text' instead. (Deprecated; last used in version 0.9.1)
## Mapping a variable to y and also using stat="bin".
##   With stat="bin", it will attempt to set the y value to the count of cases in each group.
##   This can result in unexpected behavior and will not be allowed in a future version of ggplot2.
##   If you want y to represent counts of cases, use stat="bin" and don't map a variable to y.
##   If you want y to represent values in the data, use stat="identity".
##   See ?geom_bar for examples. (Deprecated; last used in version 0.9.2)

plot of chunk results descriptive

# combine to 10 of both categories
top10 <- rbind(top10Dmg, top10Fat)
top10 <- unique(top10)
ggplot(data = top10, aes(FATALITIES, PROPDMG), ) + geom_point(aes(group = EVTYPE, 
    color = EVTYPE)) + labs(x = "Fatalities", y = "Damage")

plot of chunk results descriptive

The biggest thread to population is caused by excessive heat, wehreas tornados produce the most economical damage.