Version 1.0 / Jim P / 2015-03-22

Important Note: 

all of the R code used to create this report is listed at the bottom of the report. 
For improved clarity it was not put inline in the report.

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. In this report we aim to describe the impact in terms of public health and economic consequences of the different weather event types. Our overall hypothesis is that out of the 48 different event types that at least 80% of the damages are caused by less than 20% of the event types (the Pareto or 80 - 20 rule). To investigate this hypothesis, we obtained Storm Data from the National Weather Service and performed a detailed analysis of the event type vs. damage caused. From these data, we found that the 80 - 20 rule did not even require 10 event types in all cases. Specifically, for Fatalities, Injuries, Property Damage, and Crop Damage, 80% was covered by 10, 7, 4, and 5 event types respectively. Our findings point out that proactively working to reduce the effects of a relatively few event types will cover over 80% of the damages that caused by storms and other severe weather events.

Data Processing

The data for this report came in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size.

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

The events in the database start in the year 1950 and end in The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete. As a result only data from 1996 forward has been used in this report.

Processing Steps

The following steps were performed in processing the data

  1. downloaded the file and if the first time unzipped the file
  2. read the file into a dataframe using read.csv
  3. performed clean up of the data event types
    1. made all event types lower case
    2. removed leading and trailing blanks
    3. cleaned up ALL event types reclssifying 221 event types to the approvred 48 event types
  4. calculated correct property damages based upon the exponential modifier
    1. wrote function for adjusting the crop and property damage
    2. added two new columns to the df
    3. set the property and crop damages based upon the exponent
  5. created information for result tables
    1. created dataframes containing the top 10 event types for category
    2. created dataframes for tables summarizing the results
  6. created information for plotting
    1. created dataframes for plotting the results
    2. clean up the event type names for the plots
    3. created information for the 80% cumulative line
    4. wrote function for combining two plots

Results

The following summarizes the property/crop damage (in billions) and the counts of fatalities/injuries. The following is a description of this table:

  1. Total - the total dollar/count
  2. Top Ten - the amount of the dollar/count in the top ten event types in each category
  3. 80 % - the 80% cutoff of the dollar/count in each category

It can be seen that the top 10 event types types meet or exceed the 80 - 20 rule that 80% of the results are in 20% or less of the event types. Specifically, for Fatalities, Injuries, Property Damage, and Crop Damage, 80% was covered by 10, 7, 4, and 5 event types respectively.

results
##         Property Damage Crop Damage Fatalities Injuries
## Total               367          35       8732    57975
## Top Ten             354          33       6944    49828
## 80 %                294          28       6986    46380

Events Harmful to Population Health

The following two tables show the event types that have had the most harmful effects with respect to population health.

Fatalities top 10 event types

fatalities_sum_top_ten_tbl
## Source: local data frame [10 x 2]
## 
##           Event Type Fatalities
## 1     EXCESSIVE HEAT       1797
## 2            TORNADO       1511
## 3        FLASH FLOOD        915
## 4          LIGHTNING        651
## 5        RIP CURRENT        543
## 6              FLOOD        419
## 7  THUNDERSTORM WIND        383
## 8    COLD/WIND CHILL        247
## 9               HEAT        239
## 10         HIGH WIND        239

Injuries damage top 10 event types:

injuries_sum_top_ten_tbl
## Source: local data frame [10 x 2]
## 
##           Event Type Injuries
## 1            TORNADO    20667
## 2              FLOOD     6760
## 3     EXCESSIVE HEAT     6393
## 4  THUNDERSTORM WIND     5043
## 5          LIGHTNING     4141
## 6        FLASH FLOOD     1753
## 7       WINTER STORM     1342
## 8  HURRICANE/TYPHOON     1328
## 9               HEAT     1292
## 10         HIGH WIND     1109

The following graph shows the top 10 event types in each category. The red line shows 80% of the total amount.

print_plots(fatalities_plot, injuries_plot)

Events With High Economic Consequences

The following two tables show the event types that have had the highest economic consequences.

Property damage top 10 event types

prop_sum_top_ten_tbl
## Source: local data frame [10 x 2]
## 
##           Event Type Property Damage
## 1              FLOOD      144.095035
## 2  HURRICANE/TYPHOON       81.718889
## 3   STORM SURGE/TIDE       47.834724
## 4            TORNADO       24.616953
## 5        FLASH FLOOD       15.280569
## 6               HAIL       14.639573
## 7  THUNDERSTORM WIND        7.874601
## 8     TROPICAL STORM        7.642476
## 9          HIGH WIND        5.251463
## 10          WILDFIRE        4.758667

Crop damage top 10 event types

crop_sum_top_ten_tbl
## Source: local data frame [10 x 2]
## 
##           Event Type Crop Damage
## 1            DROUGHT  13.3675660
## 2  HURRICANE/TYPHOON   5.3501078
## 3              FLOOD   5.0046734
## 4               HAIL   2.5615187
## 5        FLASH FLOOD   1.3433898
## 6    COLD/WIND CHILL   1.3397155
## 7             FREEZE   1.3346310
## 8  THUNDERSTORM WIND   0.9522464
## 9         HEAVY RAIN   0.7399198
## 10    TROPICAL STORM   0.6777110

The following graph shows the top 10 event types in each category. The red line shows 80% of the total amount.

print_plots(prop_plot, crop_plot)

In order to reduce the damages caused by storms and other severe weather events our findings suggest that proactively working to reduce the effects of a relatively few event types will cover over 80% of the damages caused by these event types.


Reproducible R Code

Below is all of the R code used to create this report

suppressWarnings(
    suppressPackageStartupMessages(
        {
            library(dplyr)
            library(lubridate)
            library(ggplot2)
            library(stringr)
            library(grid)
        }
    )
)
#################################
#################################
# get the file
if (!file.exists("./repdata-data-StormData.csv")) {
    file.url <-
        "http://d396qusza40orc.cloudfront.net/repdata/data/StormData.csv.bz2"
    file_tmp <- download.file(file.url, "./repdata-data-StormData.csv.bz2")
    file.csv <- unzip("./repdata-data-StormData.csv.bz2")
    unlink(file.tmp)
}
#################################
#################################
# read in the data
storm_df <- read.csv("./repdata-data-StormData.csv", 
                      sep = ",", stringsAsFactors = FALSE)
#################################
#################################
# initial clean up of the data - make all event types lower case
storm_cleaned_df <- storm_df %>%
    filter(year(mdy_hms(BGN_DATE)) > "1995") %>%
    filter(FATALITIES> 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0) %>%
    mutate(CLEAN_EVTYPE = tolower(EVTYPE))

# remove leading and trailing blanks
storm_cleaned_df$CLEAN_EVTYPE <- str_trim(storm_cleaned_df$CLEAN_EVTYPE,
                                          side = "both")

# verify all lower case event types are lower case, so that all have been
# updated
ev_type_clean <- storm_cleaned_df$CLEAN_EVTYPE[
    (grepl("[a-z]", storm_cleaned_df$CLEAN_EVTYPE))]

# if (length(ev_type_clean) == 0) {
#        print("All Event Types have been corrected")
#     } else {
#         print("All Event Types have not yet been corrected")
#     }

#################################
# clean up ALL event types
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'astronomical low tide'))] <-
    'ASTRONOMICAL LOW TIDE'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'avalanche'))] <- 'AVALANCE'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'blizzard'))] <- 'BLIZZARD'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'coastal flood'))] <- 'COASTAL FLOOD'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'beach erosion'))] <- 'COASTAL FLOOD'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'coast'))] <- 'COASTAL FLOOD'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'astronomical high tide'))] <-
    'COASTAL FLOOD'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'landslide'))] <- 'DEBRIS FLOW'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'mud'))] <- 'DEBRIS FLOW'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'dam'))] <- 'DEBRIS FLOW'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'landslump'))] <- 'DEBRIS FLOW'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'rock slide'))] <- 'DEBRIS FLOW'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'dense fog'))] <- 'DENSE FOG'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'fog'))] <- 'DENSE FOG'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'dense smoke'))] <- 'DENSE SMOKE'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'drought'))] <- 'DROUGHT'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'dust devil'))] <- 'DUST DEVIL'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'dust storm'))] <- 'DUST STORM'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'blowing dust'))] <- 'DUST STORM'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'excessive heat'))] <- 'EXCESSIVE HEAT'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'warm weather'))] <- 'EXCESSIVE HEAT'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'extreme cold/wind chill'))] <-
    'EXTREME COLD/WIND CHILL'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'extreme windchill'))] <-
    'EXTREME WIND CHILL'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'cold/wind chill'))] <- 'COLD/WIND CHILL'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'extreme cold'))] <- 'COLD/WIND CHILL'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'cold'))] <- 'COLD/WIND CHILL'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'flash flood'))] <- 'FLASH FLOOD'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'urban/sml stream fld'))] <- 'FLASH FLOOD'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'flood'))] <- 'FLOOD'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
        storm_cleaned_df$CLEAN_EVTYPE, 'high water'))] <- 'FLOOD'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'freezing fog'))] <- 'FREEZING FOG'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'freez'))] <- 'FREEZE'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'frost'))] <- 'FREEZE'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'funnel cloud'))] <- 'FUNNEL CLOUD'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'hail'))] <- 'HAIL'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'heat'))] <- 'HEAT'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'heavy rain'))] <- 'HEAVY RAIN'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'rain'))] <- 'HEAVY RAIN'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'downburst'))] <- 'HEAVY RAIN'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'heavy snow'))] <- 'HEAVY SNOW'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'surf'))] <- 'HIGH SURF'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'seas'))] <- 'HIGH SURF'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'wave'))] <- 'HIGH SURF'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'swells'))] <- 'HIGH SURF'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'high wind'))] <- 'HIGH WIND'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'microburst'))] <- 'HIGH WIND'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'hurricane'))] <- 'HURRICANE/TYPHOON'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'typhoon'))] <- 'HURRICANE/TYPHOON'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'ice storm'))] <- 'ICE STORM'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'glaze'))] <- 'ICE STORM'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'ice'))] <- 'ICE STORM'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'lake-effect snow'))] <- 'LAKE-EFFECT SNOW'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'lakeshore flood'))] <- 'LAKESHORE FLOOD'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'lightning'))] <- 'LIGHTNING'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'marine hail'))] <- 'MARINE HAIL'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'marine high wind'))] <- 'MARINE HIGH WIND'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'marine strong wind'))] <- 'MARINE STRONG WIND'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'marine thunderstorm wind'))] <-
    'MARINE THUNDERSTORM WIND'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'rip current'))] <- 'RIP CURRENT'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'drowning'))] <- 'RIP CURRENT'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'seiche'))] <- 'SEICHE'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'sleet'))] <- 'SLEET'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'storm surge'))] <- 'STORM SURGE/TIDE'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'strong wind'))] <- 'STRONG WIND'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'tstm'))] <- 'THUNDERSTORM WIND'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'thunderstorm wind'))] <-
    'THUNDERSTORM WIND'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'thunderstorm'))] <- 'THUNDERSTORM WIND'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'wind'))] <- 'STRONG WIND'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'tornado'))] <- 'TORNADO'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'landspout'))] <- 'TORNADO'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'tropical depression'))] <-
    'TROPICAL DEPRESSION'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'tropical storm'))] <- 'TROPICAL STORM'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'tsunami'))] <- 'TSUNAMI'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'volcanic ash'))] <- 'VOLCANIC ASH'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'waterspout'))] <- 'WATERSPOUT'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'wildfire'))] <- 'WILDFIRE'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'brush fire'))] <- 'WILDFIRE'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'excessive snow'))] <- 'HEAVY SNOW'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'lake-effect snow'))] <- 'LAKE EFFECT SNOW'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'light snow'))] <- 'WINTER STORM'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'squall'))] <- 'WINTER STORM'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'snow'))] <- 'WINTER STORM'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'wild/forest fire'))] <- 'WILD/FOREST FIRE'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'winter weather'))] <- 'WINTER WEATHER'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'winter storm'))] <- 'WINTER STORM'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'hyperthermia/exposure'))] <- 'WINTER STORM'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'wintry mix'))] <- 'WINTER WEATHER'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'icy roads'))] <- 'WINTER WEATHER'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'mixed precipitation'))] <- 'WINTER WEATHER'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'mixed precip'))] <- 'WINTER WEATHER'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'other'))] <- 'OTHER'
storm_cleaned_df$CLEAN_EVTYPE[which(str_detect(
    storm_cleaned_df$CLEAN_EVTYPE, 'marine accident'))] <- 'OTHER'

# verify all lower case event types are lower case, so that all have been
# updated
ev_type_clean <- storm_cleaned_df$CLEAN_EVTYPE[
    (grepl("[a-z]", storm_cleaned_df$CLEAN_EVTYPE))]

# if (length(ev_type_clean) == 0) {
#       print("All Event Types have been corrected")
#    } else {
#       print("All Event Types have not yet been corrected")
#    }
#################################
#################################
# function for adjusting the crop and property damage
calc_dmg <- function (dmg, exp) {
    if (exp == "K") {
        result <- dmg * 1000
    } else if (exp == "M") {
        result <- dmg * 1000000
    } else if (exp == "B") {
        result <- dmg * 1000000000
    } else result <- dmg
    return(result)
}

# add two new columns to the df
storm_cleaned_df <- storm_cleaned_df %>%
    mutate(CLEAN_PROPDMG = 0) %>%
    mutate(CLEAN_CROPDMG = 0)

# set all of the property and crop damages
for (i in 1:nrow(storm_cleaned_df)) {
    storm_cleaned_df$CLEAN_PROPDMG[i] =
        calc_dmg(storm_cleaned_df$PROPDMG[i],
            storm_cleaned_df$PROPDMGEXP[i])
    storm_cleaned_df$CLEAN_CROPDMG[i] =
        calc_dmg(storm_cleaned_df$CROPDMG[i],
                storm_cleaned_df$CROPDMGEXP[i])
}

# create the sums df
sum_df <- storm_cleaned_df %>%
    group_by(CLEAN_EVTYPE) %>%
    summarise(Property_Damage = sum(CLEAN_PROPDMG) / 1000000000,
              Crop_Damage = sum(CLEAN_CROPDMG) / 1000000000,
              Fatalities = sum(FATALITIES),
              Injuries = sum(INJURIES))
#################################
#################################
# create the property damage df
prop_sum_df <- sum_df %>%
    arrange(desc(Property_Damage)) %>%
    head(n = 10)

# create the display table for property damage
prop_sum_top_ten_tbl <- prop_sum_df[-c(3, 4, 5)]
colnames(prop_sum_top_ten_tbl) <- c("Event Type", "Property Damage")

# clean up the plot names
prop_sum_df$CLEAN_EVTYPE <-
    c("Flood", "Hurricane", "Surge/Tide", "Tornado", "Flash Flood",
      "Hail", "TStorm Wind", "Trop Storm", "High Wind", "Wildfire")

# set the property damage df up for plotting
prop_sum_df$CLEAN_EVTYPE <- factor(prop_sum_df$CLEAN_EVTYPE,
                                   levels=prop_sum_df$CLEAN_EVTYPE)
prop_sum_df$Cum_Sum <- cumsum(prop_sum_df$Property_Damage)

#################################
# create the crop damage df
crop_sum_df <- sum_df %>%
    arrange(desc(Crop_Damage)) %>%
    head(n = 10)

# create the display table for property damage
crop_sum_top_ten_tbl <- crop_sum_df[-c(2, 4, 5)]
colnames(crop_sum_top_ten_tbl) <- c("Event Type", "Crop Damage")

# clean up the plot names
crop_sum_df$CLEAN_EVTYPE <-
    c("Drought", "Hurricane", "Flood", "Hail", "Flash Flood",
      "Cold/WChill", "Freeze", "TStorm Wind", "Heavy Rain", "Trop Storm")

# set the crop damage df up for plotting
data <- crop_sum_df
crop_sum_df$CLEAN_EVTYPE <- factor(crop_sum_df$CLEAN_EVTYPE,
                                   levels=crop_sum_df$CLEAN_EVTYPE)
crop_sum_df$Cum_Sum <- cumsum(crop_sum_df$Crop_Damage)

#################################
# create the fatalities df
fatalities_sum_df <- sum_df %>%
    arrange(desc(Fatalities)) %>%
    head(n = 10)

# create the display table for fatalities
fatalities_sum_top_ten_tbl <- fatalities_sum_df[-c(2, 3, 5)]
colnames(fatalities_sum_top_ten_tbl) <- c("Event Type", "Fatalities")

# clean up the plot names
fatalities_sum_df$CLEAN_EVTYPE <-
    c("Tornado", "Flood", "Excess Heat", "TStorm Wind", "Ligtning",
      "Flash Flood", "Winter Storm", "Hurricane", "Heat", "High Wind")

# set the fatalities damage df up for plotting
fatalities_sum_df$CLEAN_EVTYPE <- factor(fatalities_sum_df$CLEAN_EVTYPE,
                                   levels=fatalities_sum_df$CLEAN_EVTYPE)
fatalities_sum_df$Cum_Sum <- cumsum(fatalities_sum_df$Fatalities)

#################################
# create the injuries df
injuries_sum_df <- sum_df %>%
    arrange(desc(Injuries)) %>%
    head(n = 10)

# create the display table for injuries
injuries_sum_top_ten_tbl <- injuries_sum_df[-c(2, 3, 4)]
colnames(injuries_sum_top_ten_tbl) <- c("Event Type", "Injuries")

# clean up the plot names
injuries_sum_df$CLEAN_EVTYPE <-
    c("Excess Heat", "Tornado", "Flash Flood", "Lightning", "Rip Current",
      "Flood", "Tstorm Wind", "Cold/WChill", "Heat", "High Wind")

# set the injuries damage df up for plotting
injuries_sum_df$CLEAN_EVTYPE <- factor(injuries_sum_df$CLEAN_EVTYPE,
                                   levels=injuries_sum_df$CLEAN_EVTYPE)
injuries_sum_df$Cum_Sum <- cumsum(injuries_sum_df$Injuries)

#################################
#################################
# create the table information and graph 80% lines
prop_dmg_tot <- round(sum(storm_cleaned_df$CLEAN_PROPDMG) / 1000000000)
crop_dmg_tot <- round(sum(storm_cleaned_df$CLEAN_CROPDMG) / 1000000000)
fatalities_tot <- round(sum(storm_cleaned_df$FATALITIES))
injuries_tot <- round(sum(storm_cleaned_df$INJURIES))
prop_dmg_80 <- round(prop_dmg_tot * .8)
crop_dmg_80 <- round(crop_dmg_tot * .8)
fatalities_80 <- round(fatalities_tot * .8)
injuries_80 <- round(injuries_tot * .8)
prop_top_ten <- round(prop_sum_df$Cum_Sum[10])
crop_top_ten <- round(crop_sum_df$Cum_Sum[10])
fatalities_top_ten <- round(fatalities_sum_df$Cum_Sum[10])
injuries_top_ten <- round(injuries_sum_df$Cum_Sum[10])

# create table showing counts
results <- matrix( c(prop_dmg_tot, prop_top_ten, prop_dmg_80,
                     crop_dmg_tot, crop_top_ten, crop_dmg_80,
                     fatalities_tot, fatalities_top_ten, fatalities_80,
                     injuries_tot, injuries_top_ten, injuries_80),
                   3, 4)
rownames(results) <- c("Total", "Top Ten", "80 %")
colnames(results) <-
    c("Property Damage", "Crop Damage", "Fatalities", "Injuries")
#################################
#################################
# print plots function
print_plots <- function (plot1, plot2) {
    cols <- 1
    numPlots <- 2
    layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
                     ncol = cols, nrow = ceiling(numPlots/cols))
    grid.newpage()
    pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))
    
    matchidx <- as.data.frame(which(layout == 1, arr.ind = TRUE))
    print(plot1, vp = viewport(layout.pos.row = matchidx$row,
                               layout.pos.col = matchidx$col))
    
    matchidx <- as.data.frame(which(layout == 2, arr.ind = TRUE))
    print(plot2, vp = viewport(layout.pos.row = matchidx$row,
                               layout.pos.col = matchidx$col))
}

#################################
# create the plot for property damage
prop_plot <- ggplot(prop_sum_df, aes(x=CLEAN_EVTYPE)) +
    geom_bar(aes(y=Property_Damage), fill="blue", stat="identity") +
    geom_point(aes(y=Cum_Sum)) +
    geom_path(aes(y=Cum_Sum, group=1)) +
    theme(axis.text.x=element_text(angle = 90, vjust = 0.5)) +
    labs(y = "Property Damage",
         x = "Event Type",
         title=paste("Property and Crop Damage (in Billions)", "\n")) +
    geom_hline(aes(yintercept = prop_dmg_80, col = "red"))

# create the plot for crop damage
crop_plot <- ggplot(crop_sum_df, aes(x=CLEAN_EVTYPE)) +
    geom_bar(aes(y=Crop_Damage), fill="blue", stat="identity") +
    geom_point(aes(y=Cum_Sum)) +
    geom_path(aes(y=Cum_Sum, group=1)) +
    theme(axis.text.x=element_text(angle = 90, vjust = 0.5)) +
    labs(y = "Crop Damage",
         x = "Event Type") +
    geom_hline(aes(yintercept = crop_dmg_80, col = "red"))

#################################
# plot the fatalities
fatalities_plot <- ggplot(fatalities_sum_df, aes(x=CLEAN_EVTYPE)) +
    geom_bar(aes(y=Fatalities), fill="blue", stat="identity") +
    geom_point(aes(y=Cum_Sum)) +
    geom_path(aes(y=Cum_Sum, group=1)) +
    theme(axis.text.x=element_text(angle = 90, vjust = 0.5)) +
    labs(y = "Fatalities",
         x = "Event Type",
         title=paste("Fatalities and Injuries", "\n")) +
    geom_hline(aes(yintercept = fatalities_80, col = "red"))

# plot the injuries
injuries_plot <- ggplot(injuries_sum_df, aes(x=CLEAN_EVTYPE)) +
    geom_bar(aes(y=Injuries), fill="blue", stat="identity") +
    geom_point(aes(y=Cum_Sum)) +
    geom_path(aes(y=Cum_Sum, group=1)) +
    theme(axis.text.x=element_text(angle = 90, vjust = 0.5)) +
    labs(y = "Injuries",
         x = "Event Type") +
    geom_hline(aes(yintercept = injuries_80, col = "red"))