1 Synopsis

Climate events can have great impact in society. Costs can be severe in terms of human lives and also economically. The effects can be significant for communities and municipalities. Analysing the impact of such tragedies can help preparing and prioritising the areas and the investments that can help in mitigating the consequences.

This report does a quick analysis on the impact of events recorded by the US National Oceanic and Atmospheric Administration’s (NOAA) between 1950 and Nov/2011. It also offers some data numbers for two major questions.

2 Questions

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?

3 Data Processing

3.1 The Data

The dataset for this report is the Storm Database from the US National Oceanic and Atmospheric Administration’s (NOAA), which includes characteristics of major storms and weather events in the United States.

The period covered in the database goes from 1950 to November 2011, and has attributes about the events, such as when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The dataset has fewer events during the first years and becomes more rich for recent periods.

3.2 Making the Data available

The dataset is available at the course’s web site as a compressed bz2 file. Link here: Storm Dataset (47Mb)

options(scipen=15, digits=2)
my.colours <- c("#4D4D4D", "#5DA5DA")

SOURCE.URL   <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
SOURCE.LOCAL <- "StormData.csv.bz2"

if (!file.exists(SOURCE.LOCAL)) {  # if the file does not exist locally ...
  download.file(url      = SOURCE.URL, 
                destfile = SOURCE.LOCAL, 
                method   = "curl", 
                uiet     = TRUE) # download it
}

data <- read.csv(bzfile(SOURCE.LOCAL), 
                 stringsAsFactors = FALSE)

3.3 Reducing the dataset to the necessary data

It is possible to reduce the original dataset to a minimum necessary to address the specific posed questions.

The key variables are the types of events, the geographical scope, population health and economic consequences.

Types of events: The variable EVTYPE contains the data for the type of events;
Geographical scope: The whole United States, so there is no need to more granular locations as states/cities or longitude/latitude coordinates;
Population health: The variables FATALITIES and INJURIES contain the relevant data;
Economic consequences: The variables PROPDMG (with PROPDMGEXP) and CROPDMG (with CROPDMGEXP) hold the data related to Economic consequences.

More details are available on the document National Weather Service Storm Data Documentation, available at the course’s web site.

Only the variables mentioned above are necessary, so all the others can be discarded.

library (dplyr)

reduce.data <- select(data, EVTYPE, 
                            FATALITIES, INJURIES, 
                            PROPDMG,    PROPDMGEXP, 
                            CROPDMG,    CROPDMGEXP)

Events that have no impact in the population health or economic consequences can be discarded, so the necessary ones as the ones when any of FATALITIES, INJURIES, PROPDMG or CROPDMG are not zero.

Also, events that cannot be identified or classified cannot contribute to the analysis, so they must be removed as well.

# Keep only records with some impact
reduce.data <- filter(reduce.data, FATALITIES > 0 | 
                                   INJURIES   > 0 | 
                                   PROPDMG    > 0 | 
                                   CROPDMG    > 0)
# Remove events with no details
reduce.data <- filter(reduce.data, EVTYPE != "?")

3.4 Cleaning and Sanitising

3.4.1 Type of Events

As per item 2.1.1-Storm Data Event Table from the National Weather Service Storm Data Documentation, the Types of Events can be:

EVENTS <- c("Astronomical Low Tide",    "Avalanche",      "Blizzard",          "Coastal Flood", 
            "Cold/Wind Chill",          "Debris Flow",    "Dense Fog",         "Dense Smoke", 
            "Drought",                  "Dust Devil",     "Dust Storm",        "Excessive Heat", 
            "Extreme Cold/Wind Chill",  "Flash Flood",    "Flood",             "Frost/Freeze", 
            "Funnel Cloud",             "Freezing Fog",   "Hail",              "Heat", 
            "Heavy Rain",               "Heavy Snow",     "High Surf",         "High Wind", 
            "Hurricane (Typhoon)",      "Ice Storm",      "Lake-Effect Snow",  "Lakeshore Flood",
            "Lightning",                "Marine Hail",    "Marine High Wind",  "Marine Strong Wind", 
            "Marine Thunderstorm Wind", "Rip Current",    "Seiche",            "Sleet", 
            "Storm Surge/Tide",         "Strong Wind",    "Thunderstorm Wind", "Tornado", 
            "Tropical Depression",      "Tropical Storm", "Tsunami",           "Volcanic Ash", 
            "Waterspout",               "Wildfire",       "Winter Storm",      "Winter Weather")

# create an extra variable for cleared events (keep the original), and a flag for when it is cleared
reduce.data <- mutate(reduce.data, EVNEW = tolower(EVTYPE), 
                                   EVFIX = tolower(EVTYPE) %in% tolower(EVENTS))
# reorder the columns
reduce.data <- select(reduce.data, EVTYPE,     EVNEW, 
                                   FATALITIES, INJURIES, 
                                   PROPDMG,    PROPDMGEXP, 
                                   CROPDMG,    CROPDMGEXP, 
                                   EVFIX)
mc <- sum(reduce.data$EVFIX)

Initially there are 172,872 events that match the classification, out of a total of 254,632 observations. There is a total of 446 distinct classifications, including the 48 of the NOAA’s classification table.

3.4.1.1 Supporting functions

# The function clear, receives a copy of the data (data), a regular expression (find), 
# and a string to replace it (replace). 
# It also receives a flag to mark the record as cleared or not. An EVNEW is cleared (EVFIX=TRUE), 
# when it matches one of the classifications AND no longer needs to be dealt with again.
clear <- function (data, find, replace, clear) {
  # creates a logical vector for where the 'find' was 'found'
  found <- grepl(find, data$EVNEW)
  # restrict the found to the ones that are not flagged as cleared
  valid <- found & !data$EVFIX
  # if is there any valid records to be changed
  if (length(valid) > 0) {
    # make the actual changes
    data$EVNEW[valid] <- gsub(find, replace, data$EVNEW[valid])
    # and flag if this change was final
    data$EVFIX[valid] <- clear
  }
  return(data)
}

# Simple Capitalisation function
simpleCap <- function(str) {
  # split the string in a vector of its individual characters
  s <- strsplit(str, "")[[1]]
  # find the location of the first character after the separators space, "-", "(" and "/"; 
  # and the first position
  up <- c(1, grep("[-(/ ]", s) + 1)
  # convert those characters to uppercase
  s[up] <- toupper(s[up])
  # return the characters back as a string
  return(paste(s, sep="", collapse=""))
}

3.4.1.2 Actual cleaning

For cleaning the data it was decided on an approach that tries to identify elements of the event type’s description and deal with them, instead of know the data problems before hand. This would allow the report to be run periodically with new data, without having to change the R code. This is a design choice in no detriment of any other.

The actual cleaning of the variable EVNEW is done by initially removing punctuation and references of qualification and quantification. Next double spaces are removed. Following that, some translations of expressions that are equivalent to the classifications are done. And after that translations of some terms that can be confusing.

# Remove punctuation
# Remove qualification, quantification
finds <- c("non-", 
           "-(.)*$", 
           "[\\\\/\\(\\),&0-9](.)*$", 
           " and (.)*$", 
           " from (.)*$", 
           " on (.)*$",
           "cold air", 
           "dry", 
           "excessive", 
           "late season", 
           "record", 
           "severe", 
           "small", 
           "snowmelt")
for (f in 1:length(finds)) {
  find <- finds[f]
  reduce.data <- clear(reduce.data, find, "", FALSE)
}

# Remove double spaces
reduce.data <- clear(reduce.data, "[[:space:]]{2,}", " ", FALSE)
# Remove heading and trailing spaces
reduce.data <- clear(reduce.data, "^[[:space:]]+|[[:space:]]+$", "", FALSE)

# Remove hard cases (an expression can be replaced by the final category)
finds.l <- list(
  c("thu(.)*", "tun(.)*", "lightning thunderstorm winds", "storm force winds", "turbulence"), 
  c("avalance"), 
  c("(.)*erosion"),
  c("cool", "low temperature"), 
  c("extreme heat"), 
  c("extreme cold"), 
  c("dam break", "rapidly rising(.)*", "river(.)*"),
  c("ice jam(.)*", "urban"),
  c("glaze(.)*", "light freezing rain"),
  c("(.)*warm(.)*"),
  c("gustnado", "lands(.)*", "mud(.)*", "rainfall", "wetness"), 
  c("rogue wave"), 
  c("ligntning"), 
  c("rough seas"), 
  c("storm surge"), 
  c("downburst", "(.)*microburst(.)*", "mirco(.)*", "torndao"), 
  c("black ice", "mixed precip(.)*", "winter(.)*", "wintry mix")) 
replaces.l <- c("thunderstorm wind", 
                "avalanche", 
                "coastal flood", 
                "cold/wind chill", 
                "excessive heat", 
                "extreme cold/wind chill", 
                "flash flood", 
                "flood", 
                "frost/freeze", 
                "heat",
                "heavy rain", 
                "high surf",
                "lightning", 
                "marine strong wind", 
                "storm surge/tide", 
                "tornado", 
                "winter weather")
for (r in 1:length(replaces.l)) {
  finds   <- finds.l[[r]]
  replace <- replaces.l[r]
  for (f in 1:length(finds)) {
    find <- paste("^", finds[f], "$", sep="")
    reduce.data <- clear(reduce.data, find, replace, TRUE)
  }
}

# Remove soft cases (an expression is replaced, but does NOT to a final category)
finds    <- c("heavy lake", 
              "heat wave drought", 
              "icy", 
              "lake flood", 
              "tstm")
replaces <- c("lake", 
              "drought", 
              "ice", 
              "flood", 
              "thunderstorm wind")
for (f in 1:length(finds)) {
  find        <- paste("^(.)*(", finds[f], ")(.)*$", sep="")
  replace     <- replaces[f]
  reduce.data <- clear(reduce.data, find, replace, FALSE)
}

Continuing with the cleaning process, only the first two words are kept, so there would be no double interpretation. The first description is considered to be the most relevant one, and any second just a more detailed explanation. Following that there is an attempt to match the classification again, followed by a word by word matching exercise on the remaining records.

# Reduce event to two words to avoid multiple classification
categories <- sort(unique(reduce.data$EVNEW))
in.list    <- categories %in% tolower(EVENTS)
for (c in categories[!in.list]) {
  words <- unlist(strsplit(c, " "))
  if (length(words) > 2) {
    find        <- paste("^(", c, ")$", sep="")
    replace     <- paste(words[1], words[2])
    reduce.data <- clear(reduce.data, find, replace, FALSE)
  }
}

# Check by the events' list
for (i in tolower(EVENTS)) {
  find        <- paste("^(.)*(", i, ")(.)*$", sep="")
  replace     <- i
  reduce.data <- clear(reduce.data, find, replace, TRUE)
}

# Try to guess the remaining events
categories <- sort(unique(reduce.data$EVNEW))
in.list    <- categories %in% tolower(EVENTS)
for (c in categories[!in.list]) {
  words <- strsplit(c, " ")[[1]]
  for (w in words) {
    loc <- grep(w, tolower(EVENTS))
    if (length(loc) > 0) {
      find        <- paste("^(.)*(", w, ")(.)*$", sep="")
      replace     <- tolower(EVENTS[loc[1]])
      reduce.data <- clear(reduce.data, find, replace, TRUE)
      break
    }
  }
}

At the very end, any event that was not matched, is classified as “other”. The final list is then capitalised and converted to a factor.

# Convert all the remaining to a category "other"
reduce.data <- clear(reduce.data, "(.)*", "other", TRUE)

# Capitalise the events
reduce.data$EVNEW <- sapply(reduce.data$EVNEW, simpleCap)

# Convert to factor
reduce.data$EVNEW <- as.factor(reduce.data$EVNEW)
# Remove temporary columns
reduce.data <- select(reduce.data, EVTYPE,     EVNEW, 
                                   FATALITIES, INJURIES, 
                                   PROPDMG,    PROPDMGEXP, 
                                   CROPDMG,    CROPDMGEXP)
em <- nrow(reduce.data) - sum(reduce.data$EVNEW=="Other")
c <- (nrow(reduce.data) - sum(reduce.data$EVNEW=="Other")) / nrow(reduce.data) * 100

After all the transformations on the EVNEW variable there are 254,586 events that match the original classification and 46 events categorised as “Other”. This covers 99.98%. There is a total of 48 distinct classifications out of the original 48 plus 1 category labeled as “Other”.

3.4.2 Fatalities and Injuries

These two variables are quite simple and direct. The only necessary check is for their availability.

# check for NAs
f.na <- sum(is.na(reduce.data$FATALITIES))
i.na <- sum(is.na(reduce.data$INJURIES))

Out of 254,632 records there are 0 missing figures for Fatalities and 0 missing figures for Injuries.

3.4.2.1 Actual cleaning

The only action to be called cleaning is to convert the values to integers.

reduce.data$FATALITIES <- as.integer(reduce.data$FATALITIES)
reduce.data$INJURIES   <- as.integer(reduce.data$INJURIES)

3.4.3 Property Damage

There are two variables to represent Property Damage. A value and an exponent factor. An initial assessment is to check for data availability and the domain of the exponents.

# check for NAs
p.na <- sum(is.na(reduce.data$PROPDMG))
# check for Exponents
table(reduce.data$PROPDMGEXP)

## 
##             -      +      0      2      3      4      5      6      7 
##  11585      1      5    210      1      1      4     18      3      3 
##      B      h      H      K      m      M 
##     40      1      6 231427      7  11320

There are 0 missing records out of 254,632.

The exponents of the property damage figures are quite disperse and contain some missing or inconsistent values that must be cleaned.

3.4.3.1 Actual cleaning

To correct the exponents problem, it was adopted that:

missing values are assume as no exponent, so the actual damage number is as is;
values of “+” and “-” are considered typos, or in any case should not change the magnitude of the actual damage number;
numeric values represent the actual exponent, as in $10^n, n=0,2...7$;
character values represent multipliers as “B” for Billions, “H” for Hundreds, “K” for Thousands and “M” for Millions.

# create a temporary variable for the new exponent
reduce.data <- mutate(reduce.data, EXPCALC = 0)
# convert the exponents to upper case
reduce.data$PROPDMGEXP <- toupper(reduce.data$PROPDMGEXP)

# lists all the cases and provides a matching list to replace them
finds    <- c("^$", "\\-", "\\+", "0", "2", "3", "4", "5", "6", "7", "B", "H", "K", "M")
replaces <- c(1e0,  1e0,   1e0,   1e0, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e9, 1e2, 1e3, 1e6)
# for each case
for (i in 1:length(finds)) {
  find    <- finds[i]
  replace <- replaces[i]
  # identify the records for a particular case
  valid   <- grep(find, reduce.data$PROPDMGEXP)
  # assign the proper exponent
  reduce.data$EXPCALC[valid] <- gsub(find, replace, reduce.data$PROPDMGEXP[valid])
}
# for the exponent to be a number, as gsub produces a character
reduce.data$EXPCALC <- as.numeric(reduce.data$EXPCALC)
# creates a new variable for the actual damage figure
reduce.data <- mutate(reduce.data, PROPDMGVAL = PROPDMG * EXPCALC)
# keep and reorder the variables
reduce.data <- select(reduce.data, EVTYPE,     EVNEW, 
                                   FATALITIES, INJURIES, 
                                   PROPDMG,    PROPDMGEXP, PROPDMGVAL, 
                                   CROPDMG,    CROPDMGEXP)

3.4.4 Crops Damage

There are two variables to represent Crops Damage. A value and an exponent factor. An initial assessment is to check for data availability and the domain of the exponents.

# check for NAs
c.na <- sum(is.na(reduce.data$CROPDMG))
# check for Exponents
table(reduce.data$CROPDMGEXP)

## 
##             ?      0      B      k      K      m      M 
## 152663      6     17      7     21  99932      1   1985

There are 0 missing records out of 254,632.

The exponents of the crops damage figures are quite disperse and contain some missing or inconsistent values that must be cleaned.

3.4.4.1 Actual cleaning

To correct the exponents problem, it was adopted that:

missing and incorrect values, such as “?”, are assume as no exponent, so the actual damage number is as is;
numeric values represent the actual exponent, as there is only “0”, it is changes to $10^0$;
character values represent multipliers as “B” for Billions, “K” for Thousands and “M” for Millions.

# create a temporary variable for the new exponent
reduce.data <- mutate(reduce.data, EXPCALC=0)
# convert the exponents to upper case
reduce.data$CROPDMGEXP <- toupper(reduce.data$CROPDMGEXP)

# lists all the cases and provides a matching list to replace them
finds    <- c("^$", "\\?", "0", "B", "K", "M")
replaces <- c(1e0,  1e0,   1e0, 1e9, 1e3, 1e6)
# for each case
for (i in 1:length(finds)) {
  find    <- finds[i]
  replace <- replaces[i]
  # identify the records for a particular case
  valid   <- grep(find, reduce.data$CROPDMGEXP)
  # assign the proper exponent
  reduce.data$EXPCALC[valid] <- gsub(find, replace, reduce.data$CROPDMGEXP[valid])
}
# for the exponent to be a number, as gsub produces a character
reduce.data$EXPCALC <- as.numeric(reduce.data$EXPCALC)
# creates a new variable for the actual damage figure
reduce.data <- mutate(reduce.data, CROPDMGVAL = CROPDMG * EXPCALC)
# keep and reorder the variables
reduce.data <- select(reduce.data, EVTYPE,     EVNEW, 
                                   FATALITIES, INJURIES, 
                                   PROPDMG,    PROPDMGEXP, PROPDMGVAL, 
                                   CROPDMG,    CROPDMGEXP, CROPDMGVAL)

4 Results

4.1 Question 1:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Compiling the data and checking the table below:

f <- count(reduce.data, EVNEW, wt = FATALITIES)
f <- mutate(f, Victims = "Fatalities")
i <- count(reduce.data, EVNEW, wt = INJURIES)
i <- mutate(i, Victims = "Injuries")
t <- data.frame(E = f$EVNEW, F = f$n, I = i$n, T = f$n + i$n)
t <- filter(t, T > 0)
t <- arrange(t, desc(T))
t$F <- prettyNum(t$F, big.mark = ",")
t$I <- prettyNum(t$I, big.mark = ",")
t$T <- prettyNum(t$T, big.mark = ",")

library(knitr)
kable(t, row.names = NA, 
         col.names = c("Event Type", "Fatalities", "Injuries", "Total"),
         align     = c("l", "r", "r", "r"),
         caption   = "Victims by Type of Events in the USA\nbetween 1950 and Nov/2011")

Victims by Type of Events in the USA between 1950 and Nov/2011
Event Type	Fatalities	Injuries	Total
Tornado	5,661	91,394	97,055
Thunderstorm Wind	721	9,518	10,239
Excessive Heat	1,999	6,680	8,679
Flood	525	6,886	7,411
Lightning	817	5,232	6,049
Heat	1,156	2,548	3,704
Flash Flood	1,023	1,788	2,811
Ice Storm	101	2,146	2,247
High Wind	293	1,471	1,764
Wildfire	90	1,608	1,698
Winter Storm	206	1,321	1,527
Hurricane (Typhoon)	135	1,333	1,468
Hail	15	1,371	1,386
Heavy Snow	146	1,156	1,302
Dense Fog	81	1,077	1,158
Rip Current	577	529	1,106
Blizzard	101	805	906
Winter Weather	76	697	773
Extreme Cold/Wind Chill	304	260	564
Heavy Rain	152	337	489
Dust Storm	22	440	462
Tropical Storm	66	383	449
High Surf	171	257	428
Strong Wind	111	301	412
Avalanche	225	170	395
Cold/Wind Chill	200	166	366
Frost/Freeze	9	234	243
Tsunami	33	129	162
Waterspout	6	71	77
Storm Surge/Tide	24	43	67
Freezing Fog	11	38	49
Marine Strong Wind	22	27	49
Dust Devil	2	43	45
Marine Thunderstorm Wind	10	26	36
Other	27	4	31
Drought	6	19	25
Coastal Flood	10	9	19
Marine Hail	8	7	15
Funnel Cloud	0	3	3
Marine High Wind	1	1	2
Sleet	2	0	2

From the table, the most harmful type of event in respect to population health is Tornado with 5,661 fatalities and 91,394 injuries (total of 97,055).

Below is a graphical representation:

library(ggplot2)

harm <- union(f, i)
harm <- filter(harm, n > 0)
g <- ggplot(harm) 
g <- g + geom_bar(aes(x = reorder(EVNEW, n), y = n, fill = Victims), 
                  position = "dodge", stat = "identity")
g <- g + scale_fill_manual(values = my.colours)
g <- g + scale_y_sqrt()
g <- g + coord_flip()
g <- g + theme(axis.text.x = element_text(hjust = 1, vjust = 0.5))
g <- g + labs(title = "Victims by Type of Events in the USA\nbetween 1950 and Nov/2011",
              x     = "Type of Event",
              y     = "Count\n(scaled with sqrt)")
print(g)

4.2 Question 2:

Across the United States, which types of events have the greatest economic consequences?

Compiling the data and checking the table below:

m <- 1e6 # Millions
p <- count(reduce.data, EVNEW, wt = PROPDMGVAL)
p <- mutate(p, Damage = "Property")
c <- count(reduce.data, EVNEW, wt = CROPDMGVAL)
c <- mutate(c, Damage = "Crops")
t <- data.frame(E = p$EVNEW, P = p$n / m, C = c$n / m, T = (p$n + c$n) / m)
t <- filter(t, T > 0)
t <- arrange(t, desc(T))
t$P <- formatC(t$P, big.mark = ",", format = "f", digits = 2)
t$C <- formatC(t$C, big.mark = ",", format = "f", digits = 2)
t$T <- formatC(t$T, big.mark = ",", format = "f", digits = 2)

kable(t, row.names = NA, 
         col.names = c("Event Type", "Property Damage", "Crops Damage", "Total"),
         align     = c("l", "r", "r", "r"),
         caption   = "Damage by Type of Events in the USA, in Millions\nbetween 1950 and Nov/2011")

Damage by Type of Events in the USA, in Millions between 1950 and Nov/2011
Event Type	Property Damage	Crops Damage	Total
Flood	145,156.02	5,893.49	151,049.51
Hurricane (Typhoon)	85,356.41	5,516.12	90,872.53
Tornado	58,559.49	417.48	58,976.97
Storm Surge/Tide	47,964.72	0.85	47,965.58
Flash Flood	22,652.17	6,494.65	29,146.82
Hail	15,977.54	3,046.89	19,024.43
Drought	1,046.31	13,972.62	15,018.93
Thunderstorm Wind	11,184.86	1,271.66	12,456.52
Ice Storm	3,966.54	5,022.11	8,988.65
Wildfire	8,496.63	403.28	8,899.91
Tropical Storm	7,714.39	694.90	8,409.29
Winter Storm	6,688.50	26.94	6,715.44
High Wind	6,003.36	686.30	6,689.66
Heavy Rain	3,565.53	968.52	4,534.05
Frost/Freeze	20.96	1,997.06	2,018.02
Extreme Cold/Wind Chill	77.19	1,330.02	1,407.21
Heavy Snow	981.46	134.68	1,116.15
Lightning	935.45	12.09	947.54
Blizzard	659.81	112.06	771.87
Excessive Heat	7.87	497.40	505.27
Coastal Flood	445.18	0.06	445.24
Heat	12.26	407.02	419.28
Strong Wind	177.67	69.95	247.63
Cold/Wind Chill	71.09	102.81	173.90
Tsunami	144.06	0.02	144.08
Winter Weather	88.60	20.50	109.10
High Surf	102.05	0.00	102.05
Waterspout	60.69	0.00	60.69
Lake-Effect Snow	40.68	0.00	40.68
Dense Fog	22.83	0.00	22.83
Freezing Fog	10.60	0.00	10.60
Astronomical Low Tide	9.74	0.00	9.74
Dust Storm	5.60	3.60	9.20
Lakeshore Flood	7.54	0.00	7.54
Avalanche	3.72	0.00	3.72
Tropical Depression	1.74	0.00	1.74
Marine High Wind	1.30	0.00	1.30
Other	0.21	1.03	1.24
Seiche	0.98	0.00	0.98
Dust Devil	0.74	0.00	0.74
Sleet	0.50	0.00	0.50
Volcanic Ash	0.50	0.00	0.50
Marine Thunderstorm Wind	0.44	0.05	0.49
Marine Strong Wind	0.42	0.00	0.42
Funnel Cloud	0.19	0.00	0.19
Rip Current	0.16	0.00	0.16
Dense Smoke	0.10	0.00	0.10
Marine Hail	0.05	0.00	0.05

From the table, the type of events that has the greatest economic consequences is Flood, with $145,156.02 millions in property damage and $5,893.49 millions in crops damage (total of $151,049.51 millions).

Below is a graphical representation:

damage <- union(p, c)
damage <- filter(damage, n > 0)
g <- ggplot(damage) 
g <- g + geom_bar(aes(x = reorder(EVNEW, n), y = n / m, fill = Damage), 
                  position = "dodge", stat = "identity")
g <- g + scale_fill_manual(values = my.colours)
g <- g + scale_y_sqrt()
g <- g + coord_flip()
g <- g + theme(axis.text.x = element_text(hjust = 1, vjust = 0.5))
g <- g + labs(title = "Damage by Type of Events in the USA\nbetween 1950 and Nov/2011",
              x     = "Type of Event",
              y     = "Cost in Millions\n(scaled with sqrt)")
print(g)

5 Conclusion

With a total of 151,049.51 victims, Flood is the type of event that was the most harmful in respect to the population health, with fatalities and injuries.

For financial consequences, Flood is the type of event that had the greatest impact, adding up to $151,049.51 Millions ($145,156.02 millions in property damage and $5,893.49 millions in crops damage).

Climate events and their impact

A brief analysis on fatalities and financial consequences of weather events

Angelo Klin

16 February 2015

1 Synopsis

2 Questions

3 Data Processing

3.1 The Data

3.2 Making the Data available

3.3 Reducing the dataset to the necessary data

3.4 Cleaning and Sanitising

3.4.1 Type of Events

3.4.1.1 Supporting functions

3.4.1.2 Actual cleaning

3.4.2 Fatalities and Injuries

3.4.2.1 Actual cleaning

3.4.3 Property Damage

3.4.3.1 Actual cleaning

3.4.4 Crops Damage

3.4.4.1 Actual cleaning

4 Results

4.1 Question 1:

4.2 Question 2:

5 Conclusion