Synopsis

The following analysis works on the National Oceanic and Atmospheric Administration (NOAA) data set to discover which are the weather events with most severe effects on human health and property.

First, we worked on cleaning the data set in order to match event type names and discard data before 1996, when current event name standards were introduced.

We have analysed four variables: fatalities, injuries, property damage and crop damage. For each event types of the 48 official ones, we’ve computed total and average values for each variable.

Data loading

Creating general settings and loading libraries.

library(knitr)
opts_chunk$set(echo = TRUE,
               message = TRUE,
               cache = TRUE)
library(dplyr)
library(tools)
library(stringdist)
library(VennDiagram)

Downloading data.

URL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

name <- "StormData.csv.bz2"

download.file(URL, destfile = name)

To make sure data has not been changed in any way, let’s create a checksum to verify integrity. Using the function md5sum() from tools package, we generated a md5 hash of the raw file: df4aa61fff89427db6b7f7b1113b5553. Let’s check if the hashes match.

original.hash <- "df4aa61fff89427db6b7f7b1113b5553"
raw.hash <- md5sum(name)[[1]]

if (original.hash == raw.hash) {
  print("The checksums match. File integrity verified.")
} else {
  print("ATTENTION! Hashes do not match. The file you downloaded may not be identical to the one used when this analysis was originally done. It might have been corrupted or altered in some way.")
}
## [1] "The checksums match. File integrity verified."

Loading data.

DataStorm <- read.csv(name)

str(DataStorm)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Preprocessing data

Date

The NOAA website states that, although data began being colected in 1950, for the first years it only recorded tornados and for the next decades only three other event types were added. From 1996 onwards, the data base began recording a set of 48 event types. To make our comparisons more suitable, we’ll work with a subset of the original data base comprising the records happening in 1996 or later.

Storm96 <- DataStorm %>%
  mutate(Date = as.Date(BGN_DATE, format = "%m/%d/%Y")) %>%
  filter(Date >= "1996-01-01")

The original data set has 902297 rows, while our subset has 653530 (72.43%)

Event type

Working from the previous subset (all events that happened after January 1, 1996), we will standardize the names of the event types.

From 1996 onwards, the data base began working with an official list of 48 event types. We will transform this list in a object in our environment.

list.of.events <- tolower(c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme Cold/Wind Chill", "Flash Flood", "Flood", "Frost/Freeze", "Funnel Cloud", "Freezing Fog", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane Typhoon", "Ice Storm", "Lake-Effect Snow", "Lakeshore Flood", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Surge/Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather"))

The number of unique event types in the data set does not match the official list. There are 48 events in the official list and 516 different event names in the data set.

Let’s work with the names of event types in our subset to try and match them to the official list. First, we’ll try to standardize the names by:
1. making everything lower case;
2. deleting unwanted spaces;
3. deleting parenthesis;
4. changing the common “tstm” abbreviation to “thunderstorm”.

before <- length(unique(Storm96$EVTYPE))

Storm96$EVTYPE <- Storm96$EVTYPE %>%
  as.character %>%
  tolower %>%
  gsub(pattern = "^\\s+|\\s+$", replacement = "") %>%
  gsub(pattern = "\\s+", replacement = " ") %>%
  gsub(pattern = "\\(|\\)", replacement = "") %>%
  gsub(pattern = "tstm", replacement = "thunderstorm")

after <- length(unique(Storm96$EVTYPE))

There were 516 different events before standardization and 421 after, still a lot more than the 48 from the official list.

Let’s work on matching the different names list.

Storm96.events <- Storm96$EVTYPE %>%
  table %>%
  as.data.frame %>%
  `names<-` (c("Original_name", "Frequency"))

match.names <- paste("^",
                   Storm96.events$Original_name,
                     "$",
                     sep = "") %>%
  lapply(grep,
         list.of.events,
         ignore.case = T,
         value = T)

how.many.matches <- match.names %>%
  lapply(length) %>%
  unlist %>%
  table %>%
  data.frame %>%
  `colnames<-`(c("Number.of.matches", "Frequency")) %>%
  print
##   Number.of.matches Frequency
## 1                 0       375
## 2                 1        46

There were only 46 exact matches, but there are 48 names in the official list, so some of the official events found no equivalent in our data set. They are: debris flow, hurricane typhoon.

There are still 375 event names that had no match in the official list. Let’s create a table with summary information on each event type name and see how are the rows distributed across the matched and unmatched names.

match.number <- vector("list", length = length(match.names))
match.factor <- vector("list", length = length(match.names))

for (i in 1:length(match.names)) {
  
  match.number[i] <- length(match.names[[i]])
  
  if (match.number[i] == 0) {
    match.factor[i] <- "No match"
  } else if (match.number[i] == 1) {
    match.factor[i] <- "Exact match"
  } else if (match.number[i] >= 1) {
    match.factor[i] <- "Multiple matches"
  } else {
    match.factor[i] <- "Other"
  }
}

match.table <- cbind(as.character(Storm96.events$Original_name),
                     Storm96.events$Frequency,
                     match.number,
                     Match_factor = match.factor) %>%
  as.data.frame %>%
  rename(Original_name = V1,
         Frequency = V2,
         Number_of_matches = match.number)

class(match.table$Original_name) <- "character"
class(match.table$Frequency) <- "numeric"
match.table$Match_factor <- match.table$Match_factor %>%
  unlist %>%
  as.factor

table.of.match.groups <- match.table %>%
  group_by(Match_factor) %>%
  summarise(Frequency = sum(Frequency),
            Percentage = sum(Frequency) * 100 / nrow(Storm96)) %>%
  print
## # A tibble: 2 × 3
##   Match_factor Frequency Percentage
##         <fctr>     <dbl>      <dbl>
## 1  Exact match    639474  97.849219
## 2     No match     14056   2.150781

Only 2.15% of rows correspond to unmatched names (although they amount to 375 different entries).

To minimize loss of data, we’ll try to manually match event name entries that correspond to more than 500 rows but were not matched.

maybe.match <- match.table %>%
  filter(Match_factor == "No match") %>%
  filter(Frequency >= 500) %>%
  print
##            Original_name Frequency Number_of_matches Match_factor
## 1           extreme cold       617                 0     No match
## 2                    fog       532                 0     No match
## 3              landslide       588                 0     No match
## 4 thunderstorm wind/hail      1028                 0     No match
## 5   urban/sml stream fld      3392                 0     No match
## 6       wild/forest fire      1443                 0     No match
## 7     winter weather/mix      1104                 0     No match

There are 7 unmatched event names with more than 500 hundred rows. With the aid of the Storm Data Preparation document, we matched them one by one. Then, the modified names were inserted back into the original subset Storm96.

matched.list <- c("extreme cold/wind chill", "dense fog", "debris flow", "winter storm", "heavy rain", "wildfire", "winter weather")

manual.match <- cbind(matched.list, maybe.match$Original_name)
  
for(i in 1:nrow(manual.match)) {
    Storm96$EVTYPE <- gsub(pattern = paste("^",
                                           manual.match[[i,2]],
                                           "$",
                                           sep = ""),
                           replacement = manual.match[[i,1]],
                           x = Storm96$EVTYPE)
    }

Let’s rerun part of the previous code to check the impact of the manual match of data.

Storm96.events.2 <- Storm96$EVTYPE %>%
  table %>%
  as.data.frame %>%
  `names<-` (c("Original_name", "Frequency"))

match.names.2 <- paste("^",
                     Storm96.events.2$Original_name,
                     "$",
                     sep = "") %>%
  lapply(grep,
         list.of.events,
         ignore.case = T,
         value = T)

how.many.matches.2 <- match.names.2 %>%
  lapply(length) %>%
  unlist %>%
  table %>%
  data.frame %>%
  `colnames<-`(c("Number.of.matches", "Frequency")) %>%
  print
##   Number.of.matches Frequency
## 1                 0       368
## 2                 1        47
match.number.2 <- vector("list", length = length(match.names.2))
match.factor.2 <- vector("list", length = length(match.names.2))

for (i in 1:length(match.names.2)) {
  
  match.number.2[i] <- length(match.names.2[[i]])
  
  if (match.number.2[i] == 0) {
    match.factor.2[i] <- "No match"
  } else if (match.number.2[i] == 1) {
    match.factor.2[i] <- "Exact match"
  } else if (match.number.2[i] >= 1) {
    match.factor.2[i] <- "Multiple matches"
  } else {
    match.factor.2[i] <- "Other"
  }
}

match.table.2 <- cbind(as.character(Storm96.events.2$Original_name),
                     Storm96.events.2$Frequency,
                     match.number.2,
                     Match_factor = match.factor.2) %>%
  as.data.frame %>%
  rename(Original_name = V1,
         Frequency = V2,
         Number_of_matches = match.number.2)

class(match.table.2$Original_name) <- "character"
class(match.table.2$Frequency) <- "numeric"
match.table.2$Match_factor <- match.table.2$Match_factor %>%
  unlist %>%
  as.factor

table.of.match.groups.2 <- match.table.2 %>%
  group_by(Match_factor) %>%
  summarise(Frequency = sum(Frequency),
            Percentage = sum(Frequency) * 100 / nrow(Storm96)) %>%
  print
## # A tibble: 2 x 3
##   Match_factor Frequency Percentage
##         <fctr>     <dbl>      <dbl>
## 1  Exact match    648178 99.1810628
## 2     No match      5352  0.8189372

We have now more than 99% of the rows of our subset matching one entry in the official list.

There are 47 matched events in the data set, but 48 in the official list, so there seems to be one single event type that has had zero matches. Let’s found out which.

unmatched <- setdiff(list.of.events,
        match.table.2 %>%
          filter(Match_factor == "Exact match") %>%
          `[`(,1)) %>%
  print
## [1] "hurricane typhoon"

Let’s make a little bit more of an effort to find some matches for it.

hur.typ.match <- lapply(c("typhoon", "hurricane"),
       grep,
       match.table.2$Original_name,
       value = T) %>%
  unlist %>%
  unique

hur.typ.table <- match.table.2 %>%
  filter(Original_name %in% hur.typ.match) %>%
  print
##       Original_name Frequency Number_of_matches Match_factor
## 1         hurricane       170                 0     No match
## 2 hurricane edouard         2                 0     No match
## 3 hurricane/typhoon        88                 0     No match
## 4           typhoon        11                 0     No match

Let’s change those names to the standard format.

for(i in 1:nrow(hur.typ.table)) {
    Storm96$EVTYPE <- gsub(pattern = paste("^",
                          hur.typ.table[[i,1]],
                                           "$",
                                           sep = ""),
                           replacement = unmatched,
                           x = Storm96$EVTYPE)
    }

Finally, let’s discard the unmatched rows.

Storm.96.std <- Storm96 %>%
  filter(EVTYPE %in% list.of.events)

We’re now ready to work on it. But before that, let’s visualize how row selection affected the size of our final data set.

The following Venn diagram shows the difference between the number of rows in the original data set and in our final subset.

no.color <- rgb(red = 0, blue = 0, green = 0, alpha = 0)
Dataset.rows <- nrow(DataStorm)
Subset.rows = nrow(Storm.96.std)

plot.new()

venn.plot <- draw.pairwise.venn(area1 = Dataset.rows,
                   area2 = Subset.rows,
                   cross.area = Subset.rows,
                   lty = 0,
                   fill = c("blue", "red"),
                   alpha = c(0.5, 0.5),
                   category = c("                                 Original data set,
                                902,297 rows",
                                "                               Final subset,
                                648,449 rows"),
                   cat.pos = c(50, 250),
                   cat.dist = c(-0.038, -0.02),
                   cat.cex = 1.2,
                   cat.col = c("black", "white"),
                   label.col = rep(no.color, 3),
                   ext.text = F)

grid.draw(venn.plot)
title("Proportion of data sets", outer = T, line = -1)

Weather events and population health

Which types of events are most harmful to population health?

Let’s create a subset from the original data set showing health-related summary statistics for each event type in the data set.

That was accomplished by selecting only the variables EVTYPE, FATALITIES and INJURIES from the original data, then grouping it by EVTYPE.

Next, a summary table was created that included: the total number of occurences for that event type; the total and mean number of fatalities; and the total and mean number of injured people.

storm.health <- Storm.96.std %>%
  select(EVTYPE, FATALITIES, INJURIES) %>%
  group_by(EVTYPE) %>%
  summarise(Occurrences = n(),
            Total.deaths = sum(FATALITIES),
            Average.deaths = FATALITIES %>%
              mean %>%
              round(3),
            Total.injured = sum(INJURIES),
            Average.injured = INJURIES %>%
              mean %>%
              round(3)) %>%
  print
## # A tibble: 48 x 6
##                   EVTYPE Occurrences Total.deaths Average.deaths
##                    <chr>       <int>        <dbl>          <dbl>
##  1 astronomical low tide         174            0          0.000
##  2             avalanche         378          223          0.590
##  3              blizzard        2633           70          0.027
##  4         coastal flood         596            3          0.005
##  5       cold/wind chill         539           95          0.176
##  6           debris flow         588           37          0.063
##  7             dense fog        1725           69          0.040
##  8           dense smoke          10            0          0.000
##  9               drought        2433            0          0.000
## 10            dust devil         136            2          0.015
## # ... with 38 more rows, and 2 more variables: Total.injured <dbl>,
## #   Average.injured <dbl>

Deaths

The 5 deadliest types of weather events are the following:

storm.dead <- storm.health %>%
  select(EVTYPE, Average.deaths, Total.deaths, Occurrences) %>%
  arrange(desc(Average.deaths)) %>%
  head(5) %>%
  print
## # A tibble: 5 x 4
##              EVTYPE Average.deaths Total.deaths Occurrences
##               <chr>          <dbl>        <dbl>       <int>
## 1           tsunami          1.650           33          20
## 2    excessive heat          1.085         1797        1656
## 3       rip current          0.787          340         432
## 4         avalanche          0.590          223         378
## 5 hurricane typhoon          0.461          125         271

Injuries

The 5 event types with highest average number of injured people are the following:

storm.inj <- storm.health %>%
  select(EVTYPE, Average.injured, Total.injured, Occurrences) %>%
  arrange(desc(Average.injured)) %>%
  head(5) %>%
  print
## # A tibble: 5 x 4
##              EVTYPE Average.injured Total.injured Occurrences
##               <chr>           <dbl>         <dbl>       <int>
## 1           tsunami           6.450           129          20
## 2 hurricane typhoon           4.900          1328         271
## 3    excessive heat           3.859          6391        1656
## 4              heat           1.707          1222         716
## 5        dust storm           0.902           376         417

Most harmful

Putting together both the top 5 events in both lists (average injured people and average number of deaths), we get this list of most harmful events for human health:

harmful.index <- storm.dead$EVTYPE %>%
  c(storm.inj$EVTYPE) %>%
  unique %>%
  match(storm.health$EVTYPE)
  
most.harmful <- storm.health[harmful.index, c(1, 4, 6)] %>%
  arrange(desc(Average.deaths)) %>%
  print
## # A tibble: 7 x 3
##              EVTYPE Average.deaths Average.injured
##               <chr>          <dbl>           <dbl>
## 1           tsunami          1.650           6.450
## 2    excessive heat          1.085           3.859
## 3       rip current          0.787           0.484
## 4         avalanche          0.590           0.413
## 5 hurricane typhoon          0.461           4.900
## 6              heat          0.331           1.707
## 7        dust storm          0.026           0.902

Weather events and economic impact

Which types of events are responsible for most material damage?

Let’s create a subset from the original data set showing property damage summary statistics for each event type in the data set. We will separate material damages into property damage and crop damage.

First, we have to change the values in the exponent columns (PROPEXP and CROPEXP) from factors to the actual numbers they represent.

storm.damage <- Storm.96.std %>%
  select(EVTYPE, PROPDMG,
         PROPDMGEXP, CROPDMG, CROPDMGEXP)

exps <- c("B", "b", "K", "k", "M", "m", "H", "h", "+", "-", "?", "", 0:9)
values <- c(10^9, 10^9, 10^3, 10^3, 10^6, 10^6, 10^2, 10^2, 1, 0, 0, 0, rep(10, 10))

exp.table <- data.frame(exps, values)

for(i in 1:nrow(exp.table)){
  
  storm.damage$PROPDMGEXP <- storm.damage$PROPDMGEXP %>%
    gsub(pattern = paste("^", exp.table[i,1], "$", sep = ""),
         replacement = exp.table[i,2])
  
  storm.damage$CROPDMGEXP <- storm.damage$CROPDMGEXP %>%
    gsub(pattern = paste("^", exp.table[i,1], "$", sep = ""),
         replacement = exp.table[i,2])
  }

Property damage

Now, let’s group by event type and create summmary statistics for property damage that include: the total number of occurencies for that event type and the total and mean number of property damage.

storm.damage.prop <- storm.damage %>%
  group_by(EVTYPE) %>%
  summarise(Average.damage = PROPDMGEXP %>%
              as.numeric %>%
              `*` (PROPDMG) %>%
              mean %>%
              round,
            Property.damage = sum(PROPDMG * as.numeric(PROPDMGEXP)) %>%
              format(big.mark = ","),
            Occurrences = n()) %>%
  arrange((desc(Average.damage)))

storm.damage.prop$Average.damage <- storm.damage.prop$Average.damage %>%
  format(big.mark = ",")

The six event types that cause the highest average property damage per occurence are:

head(storm.damage.prop, 6)
## # A tibble: 6 x 4
##              EVTYPE Average.damage Property.damage Occurrences
##               <chr>          <chr>           <chr>       <int>
## 1 hurricane typhoon    301,545,716  81,718,889,010         271
## 2  storm surge/tide     31,359,378   4,641,188,000         148
## 3    tropical storm     11,205,976   7,642,475,550         682
## 4           tsunami      7,203,100     144,062,000          20
## 5             flood      5,936,359 143,944,833,550       24248
## 6         ice storm      1,938,397   3,642,248,810        1879

Crop damage

Now, let’s do the same with crop damage.

storm.damage.crop <- storm.damage %>%
  group_by(EVTYPE) %>%
  summarise(Average.damage = CROPDMGEXP %>%
              as.numeric %>%
              `*` (CROPDMG) %>%
              mean %>%
              round,
            Property.damage = sum(CROPDMG * as.numeric(CROPDMGEXP)) %>%
              format(big.mark = ","),
            Occurrences = n()) %>%
  arrange((desc(Average.damage)))

storm.damage.crop$Average.damage <- storm.damage.crop$Average.damage %>%
  format(big.mark = ",")

The six event types that have the highest average crop damage per occurence are:

head(storm.damage.crop, 6)
## # A tibble: 6 x 4
##                    EVTYPE Average.damage Property.damage Occurrences
##                     <chr>          <chr>           <chr>       <int>
## 1       hurricane typhoon     19,742,095   5,350,107,800         271
## 2                 drought      5,494,273  13,367,566,000        2433
## 3          tropical storm        993,711     677,711,000         682
## 4            frost/freeze        814,733   1,094,186,000        1343
## 5 extreme cold/wind chill        808,538   1,309,023,000        1619
## 6          excessive heat        297,344     492,402,000        1656

Results

Out of the 48 weather event types identified by NOAA, those 7 are the most harmful to human health (both the average number of fatalities caused and injuries):

print(most.harmful)
## # A tibble: 7 x 3
##              EVTYPE Average.deaths Average.injured
##               <chr>          <dbl>           <dbl>
## 1           tsunami          1.650           6.450
## 2    excessive heat          1.085           3.859
## 3       rip current          0.787           0.484
## 4         avalanche          0.590           0.413
## 5 hurricane typhoon          0.461           4.900
## 6              heat          0.331           1.707
## 7        dust storm          0.026           0.902

Out of the 48 weather event types identified by NOAA, those 48 are the ones that cause most property damage:

print(storm.damage.prop)
## # A tibble: 48 x 4
##               EVTYPE Average.damage Property.damage Occurrences
##                <chr>          <chr>           <chr>       <int>
##  1 hurricane typhoon    301,545,716  81,718,889,010         271
##  2  storm surge/tide     31,359,378   4,641,188,000         148
##  3    tropical storm     11,205,976   7,642,475,550         682
##  4           tsunami      7,203,100     144,062,000          20
##  5             flood      5,936,359 143,944,833,550       24248
##  6         ice storm      1,938,397   3,642,248,810        1879
##  7          wildfire      1,858,790   7,760,449,500        4175
##  8           tornado      1,063,183  24,616,945,710       23154
##  9       debris flow        552,003     324,578,000         588
## 10           drought        429,963   1,046,101,000        2433
## # ... with 38 more rows

Out of the 48 weather event types identified by NOAA, those 48 are the ones that cause most property damage:

print(storm.damage.crop)
## # A tibble: 48 x 4
##                     EVTYPE Average.damage Property.damage Occurrences
##                      <chr>          <chr>           <chr>       <int>
##  1       hurricane typhoon     19,742,095   5,350,107,800         271
##  2                 drought      5,494,273  13,367,566,000        2433
##  3          tropical storm        993,711     677,711,000         682
##  4            frost/freeze        814,733   1,094,186,000        1343
##  5 extreme cold/wind chill        808,538   1,309,023,000        1619
##  6          excessive heat        297,344     492,402,000        1656
##  7                   flood        205,162   4,974,778,400       24248
##  8                wildfire         96,349     402,255,130        4175
##  9              heavy rain         49,374     736,657,900       14920
## 10             debris flow         34,043      20,017,000         588
## # ... with 38 more rows

Put together, these are the types of weather events that create more trouble:

c(storm.damage.crop$EVTYPE %>% head(6),
  storm.damage.prop$EVTYPE %>% head(6),
  most.harmful$EVTYPE) %>% 
  unique
##  [1] "hurricane typhoon"       "drought"                
##  [3] "tropical storm"          "frost/freeze"           
##  [5] "extreme cold/wind chill" "excessive heat"         
##  [7] "storm surge/tide"        "tsunami"                
##  [9] "flood"                   "ice storm"              
## [11] "rip current"             "avalanche"              
## [13] "heat"                    "dust storm"

Other information

About this report

This report was written as part of the second project assignment of the Reproducible Research course, taught by Johns Hopkins University through Coursera. It is the fifth couse in the Data Science Specialization.

About the data set

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

More information about the data set might be found here:
1. Storm Data Preparation, a documentation of the data set.
2. Storm Data FAQ Page.
3. The NOAA website.

Session information

 sessionInfo()
## R version 3.4.1 (2017-06-30)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 10240)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=Portuguese_Brazil.1252  LC_CTYPE=Portuguese_Brazil.1252   
## [3] LC_MONETARY=Portuguese_Brazil.1252 LC_NUMERIC=C                      
## [5] LC_TIME=Portuguese_Brazil.1252    
## 
## attached base packages:
## [1] grid      tools     stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
## [1] bindrcpp_0.2        VennDiagram_1.6.17  futile.logger_1.4.3
## [4] stringdist_0.9.4.6  dplyr_0.7.2         knitr_1.16         
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.12         bindr_0.1            magrittr_1.5        
##  [4] R6_2.2.2             rlang_0.1.1          stringr_1.2.0       
##  [7] parallel_3.4.1       lambda.r_1.1.9       htmltools_0.3.6     
## [10] yaml_2.1.14          assertthat_0.2.0     rprojroot_1.2       
## [13] digest_0.6.12        tibble_1.3.3         codetools_0.2-15    
## [16] futile.options_1.0.0 glue_1.1.1           evaluate_0.10.1     
## [19] rmarkdown_1.6        stringi_1.1.5        compiler_3.4.1      
## [22] backports_1.1.0      pkgconfig_2.0.1