Max Welling

July 7, 2017

Synopsis

In this report I describe which types of weather are most harmful to either people or the economy. It is based on data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database [47Mb]. This database contains information on both storms as well as other types of weather.
Further documentation on this dataset can be found here:
1. National Weather Service Storm Data Documentation
2. National Climatic Data Center Storm Events FAQ
3. NF429405 - National Weather Service - NOAA

Load (and install if necessary) required libraries

Because we have to handle a compressed bz2 file the package R.util is loaded and installed if not present.
The same goes for package plyr used for sorting our data.

suppressWarnings(suppressMessages(require("R.utils")))
suppressWarnings(suppressMessages(require("plyr")))

Get and load the data

If neither files StormData.csv nor StormData.csv.bz2 are present in the working directory then the zipfile is downloaded from the location mentioned above and unzipped to the file StormData.csv.
If the file StormData.csv is not present in the working directory but the zipfile is, then the latter is unzipped.
Finally the file StormData.csv is read into the dataframe stormdata.

if (!file.exists("StormData.csv")) {
    if (!file.exists("StormData.csv.bz2")) {
        download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                      destfile="StormData.csv.bz2")
    }
    bunzip2(filename = "StormData.csv.bz2", dest = "StormData.csv", ext = "bz2")
}
if (!exists("stormdata")) stormdata <- read.csv("StormData.csv")

As I didn’t find much explanation in neither the documentation, the FAQ nor the website regarding the meaning of the attributes in the dataset, I could not but guess which attributes to use. As this is only a practice to show how to conduct reproducible research I think this approach is acceptable. In normal research certainly not.

The following attributes are of interest for this practice:
1. EVTYPE the type of storm
2. FATALITIES the number of fatal accidents
3. INJURIES the number of injuries
4. PROPDMG the amount of damage in US$ to properties
5. PROPDMGEXP the factor with which to multiply the propdmg
6. CROPDMG the amount of damage in US$ to agricultural crop (in thousands?)
7. CROPDMGEXP the factor with which to multiply the cropdmg

Data processing

There are a total of 985 levels of EVTYPE with a lot of duplicates. We turn it into lowercase and remove punctuation.

stormdata$EVTYPE <- tolower(as.character(stormdata$EVTYPE))
stormdata$EVTYPE <- gsub("[[:punct:]]","",stormdata$EVTYPE)
stormdata$EVTYPE <- as.factor(stormdata$EVTYPE)

Now 874 levels of EVTYPE are left.

We are interested in the most fatal types of weather across the US. So we will aggregate (sum) the FATALITIES and INJURIES over the weather type EVTYPE.

For economic damage we will conduct the same type of research as for people. But it needs some extra preprocessing. Damage amounts are stored in PROPDMG and CROPDMG, the magnitude of the amounts in PROPDMGEXP and CROPDMGEXP respectively. This factor holds some values that are not straightforward to understand, like ?, + etc. To calculate the damage I’ve chosen the following strategy:

PROPDMGEXP / CROPDMGEXP Action
numeric multiply with PROPDMGEXP/CROPDMGEXP
h/H multiply with 100
k/K multiply with 1,000
m/M multiply with 1,000,000
b/B multiply with 1,000,000,000
other values discard sample
#aggregate (sum) fatalities by type
fatals.per.type <- aggregate(stormdata$FATALITIES, by=list(stormdata$EVTYPE),sum)
names(fatals.per.type) <- c("weather","fatalities")
#order the factor on occurrence descending for later printing of the dataframe
fatals.per.type <- arrange(fatals.per.type,desc(fatalities))

# take subset of stormdata for injuries > 0
#aggregate (sum) injuries by type
injuries.per.type <- aggregate(stormdata$INJURIES, by=list(stormdata$EVTYPE),sum)
names(injuries.per.type) <- c("weather","injuries")
#order the factor on occurrence descending for later printing of the dataframe
injuries.per.type <- arrange(injuries.per.type,desc(injuries))

# Process PROPDMGEXP where it is numeric or in "H","K", "M" and "B" (and lower values)
stormdata$PROPDAMAGE[is.numeric(as.character(stormdata$PROPDMGEXP))] <- 
  stormdata$PROPDMG[is.numeric(as.character(stormdata$PROPDMGEXP))] * is.numeric(as.character(stormdata$PROPDMGEXP))
stormdata$PROPDAMAGE[stormdata$PROPDMGEXP %in% c("h","H")] <- 
  stormdata$PROPDMG[stormdata$PROPDMGEXP %in% c("h","H")] * 1e2
stormdata$PROPDAMAGE[stormdata$PROPDMGEXP %in% c("k","K")] <- 
  stormdata$PROPDMG[stormdata$PROPDMGEXP %in% c("k","K")] * 1e3
stormdata$PROPDAMAGE[stormdata$PROPDMGEXP %in% c("m","M")] <- 
  stormdata$PROPDMG[stormdata$PROPDMGEXP %in% c("m","M")] * 1e6
stormdata$PROPDAMAGE[stormdata$PROPDMGEXP %in% c("b","B")] <- 
  stormdata$PROPDMG[stormdata$PROPDMGEXP %in% c("b","B")] * 1e9
stormdata$PROPDAMAGE[is.na(stormdata$PROPDAMAGE)] <- 0
#aggregate (sum) PROPdmg by type
prop.per.type <- aggregate(stormdata$PROPDAMAGE, by=list(stormdata$EVTYPE),sum)
names(prop.per.type) <- c("weather","damage")
prop.per.type <- arrange(prop.per.type,desc(damage))

# Process CROPDMGEXP where it is numeric or in "H","K", "M" and "B" (and lower values)
stormdata$CROPDAMAGE[is.numeric(as.character(stormdata$CROPDMGEXP))] <- 
  stormdata$CROPDMG[is.numeric(as.character(stormdata$PCOPDMGEXP))] * is.numeric(as.character(stormdata$CROPDMGEXP))
stormdata$CROPDAMAGE[stormdata$CROPDMGEXP %in% c("h","H")] <- 
  stormdata$CROPDMG[stormdata$CROPDMGEXP %in% c("h","H")] * 1e2
stormdata$CROPDAMAGE[stormdata$CROPDMGEXP %in% c("k","K")] <- 
  stormdata$CROPDMG[stormdata$CROPDMGEXP %in% c("k","K")] * 1e3
stormdata$CROPDAMAGE[stormdata$CROPDMGEXP %in% c("m","M")] <- 
  stormdata$CROPDMG[stormdata$CROPDMGEXP %in% c("m","M")] * 1e6
stormdata$CROPDAMAGE[stormdata$CROPDMGEXP %in% c("b","B")] <- 
  stormdata$CROPDMG[stormdata$CROPDMGEXP %in% c("b","B")] * 1e9
stormdata$CROPDAMAGE[is.na(stormdata$CROPDAMAGE)] <- 0
#aggregate (sum) cropdmg by type
crop.per.type <- aggregate(stormdata$CROPDAMAGE, by=list(stormdata$EVTYPE),sum)
names(crop.per.type) <- c("weather","damage")
crop.per.type <- arrange(crop.per.type,desc(damage))

Results

Which of the events are most harmful with respect to population health?

Because there are almost nine hundred weathertypes (EVTYPE), we show barplots with the 10 most harmful weather types for people.

par(mfrow=c(1,2),mar=c(8,4,2,2))
with(head(fatals.per.type,10), barplot(fatalities, names.arg = head(fatals.per.type$weather,10), las=2, main="Total fatalities"))
with(head(injuries.per.type,10), barplot(injuries, names.arg = head(injuries.per.type$weather,10), las=2, main="Total injuries"))

Conclusion on harmfulness to people

From the tables above we conclude that the tornado by far is the most harmful type of weather to people. What is striking is that difference between weather types one and two is about a factor 3 for fatalities and more than a factor 10 for injuries.

Which of the events have the greatest economic consequences?

Again, because there are almost nine hundred weathertypes (EVTYPE), we show barplots with the 10 most harmful weather types causing economic damage.

par(mfrow=c(1,2),mar=c(8,4,2,2))
with(head(prop.per.type,10), barplot(damage/1e9, names.arg = head(prop.per.type$weather,10), las=2, main="Total property damage", ylab= "$ Billion"))
with(head(crop.per.type,10), barplot(damage/1e9, names.arg = head(crop.per.type$weather,10), las=2, main="Total crop damage", ylab= "$ Billion"))

Conclusion on harmfulness to the economy

From the barplots it is clear that the damage to property is about 10 times higher than damage to crop. Flood is most harmful to property whereas drought is most harmful to crop. Yet flood is the second cause of damage to crop.

General conslusion

Tornadoes clearly are the most dangerous type of wheater for people. Most damages are caused by flood although drought is the main cause for crop damage.