Synopsis

This analysis will review storm damage for weather events across the United States focusing on harm to the population and economic cost to both property and crop. The data will consist of storm data provided by the National Oceanic & Atmospheric Administration (NOAA). Each recorded event will consist of an event type, injury count, fatality count, property damage cost, and crop damage cost. The report will show the storms with the most harm to the population which include fatalities and injuries with a breakdown of each harmful type. On the economic side, the report will show the top storm events costing the most damage for both property and crops. The report will use the “NOAA National Weather Service Insturction 10-1605”" on Storm Data Preparation as a guideline for grouping weather events and calculating damage cost.

Requirements

This section with outline the necessary requirements in order to replicate and reproduce the report. The outline will describe the packages used and version of R used to generate the report.

Below are the list of R libraries used in the analysis:

library(dplyr)
library(ggplot2)
library(RColorBrewer)
library(mice)
library(VIM)
library(grid)
library(gridExtra)

Data Processing

The data file used in the analysis is a CSV file compressed as a BZ2 file. The R function read.csv will automatically decompress the file and open it as a CSV file. Due to the size of the file, the time to open and read the data will depend on the machine memory and processor. Our execution took one minute to complete.

data <- read.csv("./StormData.csv.bz2", sep = ",", header = TRUE, fill = TRUE)

The data was first filtered by states, any state not directly a member of the United States was removed from the analysis.

The next step was to clean the Event Type data values. After completing a quick analysis on the data values, several key items are discovered. First, there are more event types then listed in the NOAA document. Second, the values seem to be manually entered based on the best guess by the observer. This caused the data values to contain missing spellings or similiar events defined with different values. To clean the Event Type values, the values were uppercased and the spaces removed. Then, the data was grouped under the researcher’s best guess based on the NOAA document definition. In order to achieve more accurate results based on weather event, the researcher believed this was a necessary step. Note, this is an area for others to review and move events to other groupings.

filteredData = data %>% 
  filter(STATE %in% state.abb) %>%
  mutate(EVTYPE = toupper(gsub(" ", "", EVTYPE, fixed = FALSE)), PROPDMGEXP = toupper(PROPDMGEXP), CROPDMGEXP = toupper(CROPDMGEXP)) %>%
  mutate(NOAAEVTYPE = ifelse(EVTYPE == "AVALANCE", "AVALANCHE",
                  ifelse(EVTYPE %in% c("COASTALFLOODING","COASTAL FLOODING"), "COASTALFLOOD",
                  ifelse(EVTYPE %in% c("COLD/WINDS","COLD","COLDTEMPERATURE","COLDWEATHER",
                                       "RECORDCOLD","EXTENDEDCOLD"), "COLD/WINDCHILL",
                  ifelse(EVTYPE %in% c("EXTREMECOLD","EXTREMEWINDCHILL"), "EXTREMECOLD/WINDCHILL",
                  ifelse(EVTYPE %in% c("FLASHFLOODING","FLOOD/FLASHFLOOD","FLASHFLOOD/FLOOD","FLASHFLOODS",
                                       "FLASHFLOODING/FLOOD","FLASHFLOODING/FLOOD"), "FLASHFLOOD",
                  ifelse(EVTYPE %in% c("URBANANDSMALLSTREAMFLOODIN","RIVERFLOODING"),"FLOOD",
                  ifelse(EVTYPE %in% c("FOG"),"DENSEFOG",
                  ifelse(EVTYPE %in% c("FOGANDCOLDTEMPERATURES","GLAZE","GLAZE/ICESTORM"),"FREEZINGFOG",
                  ifelse(EVTYPE %in% c("FREEZE","FREEZINGRAIN","FREEZINGDRIZZLE",
                                       "FREEZINGSPRAY","FREEZINGRAIN/SNOW"),"FROST/FREEZE",
                  ifelse(EVTYPE %in% c("HEAT","EXTREMEHEAT","HEATWAVE","HEATWAVES",
                                       "UNSEASONABLYWARMANDDRY","WARMWEATHER","RECORDHEAT",
                                       "RECORD/EXCESSIVEHEAT","HEATWAVEDROUGHT"),"EXCESSIVEHEAT",
                  ifelse(EVTYPE %in% c("SMALLHAIL"),"HAIL",
                  ifelse(EVTYPE %in% c("HEAVYRAINS"),"HEAVYRAIN",
                  ifelse(EVTYPE %in% c("HEAVYSNOWANDHIGHWINDS"),"HEAVYSNOW",
                  ifelse(EVTYPE %in% c("HEAVYSURF/HIGHSURF"),"HEAVYSURFANDWIND",
                  ifelse(EVTYPE %in% c("HIGHWIND","HIGHWIND48"),"HIGHWINDS",
                  ifelse(EVTYPE %in% c("HURRICANE","HURRICANEEMILY","HURRICANEEDOUARD",
                                       "HURRICANE-GENERATEDSWELLS",
                                       "HURRICANEOPAL/HIGHWINDS","HURRICANEFELIX","HURRICANEOPAL",
                                       "HURRICANEERIN"),"HURRICANE/TYPHOON",
                  ifelse(EVTYPE %in% c("HYPERTHERMIA/EXPOSURE"),"HYPOTHERMIA",
                  ifelse(EVTYPE %in% c("ICEONROAD","SLEET"),"ICEROADS",
                  ifelse(EVTYPE %in% c("LIGHTNING.","LIGHTNINGINJURY"),"LIGHTNING",
                  ifelse(EVTYPE %in% c("MUDSLIDE"),"MUDSLIDES",
                  ifelse(EVTYPE %in% c("THUNDERTORMWINDS","THUNDERSTORMWIND","THUNDERSTORMW",
                                       "THUNDERSTORMWINDS/HAIL",
                                       "THUNDERSTORMWINDS13","THUNDERSTORMWINDSS","THUNDERSTORMWINDG52",
                                       "THUNDERSTORMWIND(G40)","TSTMWIND(G35)","TSTMWIND(G40)","TSTMWIND(G45)",
                                       "LIGHTNINGANDTHUNDERSTORMWIN","TSTMWIND",
                                       "THUNDERSTORM"),"THUNDERSTORMWINDS",
                  ifelse(EVTYPE %in% c("HIGHWIND/HEAVYSNOW","HEAVYSNOWSHOWER","SNOW/BITTERCOLD","EXCESSIVESNOW",
                                       "LIGHTSNOW","SNOWSQUALL","SNOWSQUALLS"),"SNOW",
                  ifelse(EVTYPE %in% c("MARINEMISHAP"),"MARINEACCIDENT",
                  ifelse(EVTYPE %in% c("RIPCURRENT","RIPCURRENTS/HEAVYSURF"),"RIPCURRENTS",
                  ifelse(EVTYPE %in% c("RIVERFLOOD"),"RIVERFLOODING",
                  ifelse(EVTYPE %in% c("TORNADOES,TSTMWIND,HAIL","TORNADOF3","TORNADOF2"),"TORNADO",
                  ifelse(EVTYPE %in% c("TROPICALSTORMGORDON"),"TROPICALSTORM",
                  ifelse(EVTYPE %in% c("WILD/FORESTFIRE","WILDFIRES"),"WILDFIRE",
                  ifelse(EVTYPE %in% c("WATERSPOUT/TORNADO","WATERSPOUTTORNADO"),"WATERSPOUT",
                  ifelse(EVTYPE %in% c("WINDS","WIND","GUSTYWIND","GUSTYWINDS","NONTSTMWIND",
                                       "STRONGWIND","STRONGWINDS"),"WINDSTORM",
                  ifelse(EVTYPE %in% c("WINTERSTORMS","WINTERSTORMHIGHWINDS"),"WINTERSTORM",
                  ifelse(EVTYPE %in% c("WINTRYMIX","WINTERWEATHERMIX","SNOWANDICE",
                                       "WINTERWEATHER/MIX"),"WINTERWEATHER",
                  EVTYPE)))))))))))))))))))))))))))))))))

After cleaning the data, the column containing harmful counts which is the sum of fatalities and injuries will be used later in the analysis.

report = filteredData %>% mutate(HARMFUL = FATALITIES + INJURIES)

Results

Population Impact

For events causing harm to the population, group the data by Event Type and summarize the data for number of events, number of injuries, fatalities and total harm (fatalities + injuries).

report.summary = report %>% 
  group_by(EVENT_TYPE = as.factor(NOAAEVTYPE)) %>% 
  summarize(NUMBER_OF_EVENTS = n(), 
            MEAN = mean(HARMFUL, na.rm = TRUE), 
            TOTAL = sum(HARMFUL, na.rm = TRUE),
            INJURIES = sum(INJURIES, na.rm = TRUE),
            FATALITIES = sum(FATALITIES, na.rm = TRUE)) %>%
  mutate(PER_EVENT = TOTAL / NUMBER_OF_EVENTS) %>%
  arrange(desc(TOTAL))

Now select only the top 20 for reporting and charts. Note: For per event counts, in order to make the data meaningful, records which had 30 or more events where selected.

topCount = 20
report.summary.topHarmful <- report.summary[1:topCount,]
report.summary.topInjuries <- arrange(report.summary, desc(INJURIES))[1:topCount,]
report.summary.topFatalities <- arrange(report.summary, desc(FATALITIES))[1:topCount,]
report.summary.topNumberOfEvents <- arrange(report.summary, desc(NUMBER_OF_EVENTS))[1:topCount,]

report.summary.topPerEvent <- (report.summary %>% filter(NUMBER_OF_EVENTS > 30) %>% arrange(desc(PER_EVENT)))[1:topCount,]

Below is a chart showing the top storms causing the most harm to the population based on injury and fatalities counts.

populationHealthChart <- ggplot(data = report.summary.topHarmful,
                                aes(x=reorder(EVENT_TYPE,TOTAL), y=TOTAL, 
                                    fill=colorRampPalette(brewer.pal(9,"Set1"))(topCount))) + 
  geom_bar(stat="identity") +
  geom_text(aes(label=TOTAL), position=position_dodge(width=0.5), vjust=0.50, hjust="inward") +
  coord_cartesian(ylim=c(0,200000)) + coord_flip() +
  labs(y="Injuries or Fatalities to Population", x="Type of Event",
       title="Most Harmful with Respect to Population Health", subtitle="(Top Twenty)") +
  theme(legend.position = "none", 
        plot.title = element_text(hjust = 0.5, face = "bold"), 
        plot.subtitle = element_text(hjust = 0.5, face = "bold")) +
  theme(axis.title = element_text(face="bold")) 

print(populationHealthChart)

The report shows Tornados with the largest number of fatalities and injuries combined across the United States. This is probably expected due to the large number of sever weather events across the Southern and Mid-West states. Next question to ask will this event hold true for fatalities and injuries.

Top events with the largest fatalities:

report.summary.topFatalities %>% select(EVENT_TYPE, NUMBER_OF_EVENTS, FATALITIES)
## # A tibble: 20 x 3
##               EVENT_TYPE NUMBER_OF_EVENTS FATALITIES
##                   <fctr>            <int>      <dbl>
##  1               TORNADO            60641       5658
##  2         EXCESSIVEHEAT             2613       3143
##  3            FLASHFLOOD            54623        994
##  4             LIGHTNING            15702        807
##  5     THUNDERSTORMWINDS           323294        703
##  6           RIPCURRENTS              668        523
##  7                 FLOOD            24983        467
##  8 EXTREMECOLD/WINDCHILL             1864        303
##  9             HIGHWINDS            21712        283
## 10             AVALANCHE              387        225
## 11           WINTERSTORM            11419        216
## 12             WINDSTORM             4191        140
## 13        COLD/WINDCHILL              695        140
## 14             HEAVYSNOW            15701        129
## 15     HURRICANE/TYPHOON              237        113
## 16              BLIZZARD             2716        101
## 17              WILDFIRE             4191         90
## 18              ICESTORM             2004         89
## 19              HIGHSURF              686         87
## 20              DENSEFOG             1821         80

Top events with the largest injuries:

report.summary.topInjuries %>% select(EVENT_TYPE, NUMBER_OF_EVENTS, INJURIES)
## # A tibble: 20 x 3
##           EVENT_TYPE NUMBER_OF_EVENTS INJURIES
##               <fctr>            <int>    <dbl>
##  1           TORNADO            60641    91364
##  2 THUNDERSTORMWINDS           323294     9369
##  3     EXCESSIVEHEAT             2613     8910
##  4             FLOOD            24983     6787
##  5         LIGHTNING            15702     5213
##  6          ICESTORM             2004     1975
##  7        FLASHFLOOD            54623     1790
##  8          WILDFIRE             4191     1605
##  9         HIGHWINDS            21712     1437
## 10              HAIL           288667     1371
## 11       WINTERSTORM            11419     1353
## 12          DENSEFOG             1821     1076
## 13         HEAVYSNOW            15701     1021
## 14 HURRICANE/TYPHOON              237      973
## 15          BLIZZARD             2716      803
## 16     WINTERWEATHER             8248      616
## 17       RIPCURRENTS              668      471
## 18         DUSTSTORM              422      440
## 19         WINDSTORM             4191      386
## 20     TROPICALSTORM              619      381

Top events with the largest Per Event Harm to the Population:

report.summary.topPerEvent %>% select(EVENT_TYPE, NUMBER_OF_EVENTS, TOTAL, PER_EVENT)
## # A tibble: 20 x 4
##               EVENT_TYPE NUMBER_OF_EVENTS TOTAL PER_EVENT
##                   <fctr>            <int> <dbl>     <dbl>
##  1         EXCESSIVEHEAT             2613 12053 4.6127057
##  2     HURRICANE/TYPHOON              237  1086 4.5822785
##  3           FREEZINGFOG               91   240 2.6373626
##  4                   ICE               61   143 2.3442623
##  5               TORNADO            60641 97022 1.5999406
##  6           RIPCURRENTS              668   994 1.4880240
##  7              ICYROADS               32    36 1.1250000
##  8             DUSTSTORM              422   462 1.0947867
##  9              ICESTORM             2004  2064 1.0299401
## 10             AVALANCHE              387   395 1.0206718
## 11         TROPICALSTORM              619   440 0.7108239
## 12              DENSEFOG             1821  1156 0.6348160
## 13             HEAVYSURF               85    48 0.5647059
## 14              WILDFIRE             4191  1695 0.4044381
## 15             LIGHTNING            15702  6020 0.3833907
## 16      HEAVYSURFANDWIND              192    69 0.3593750
## 17              HIGHSURF              686   230 0.3352770
## 18              BLIZZARD             2716   904 0.3328424
## 19 EXTREMECOLD/WINDCHILL             1864   563 0.3020386
## 20             DUSTDEVIL              149    45 0.3020134

Tornados remained the top event for the most fatalities and injuries to the population while Excessive Heat was the largest Per Event totals.

Economic Impact to Crops and Properties

All economic dollar values are stored with a multiplier

PROPDMG and CROPDMG contain the value
PROPRDMGEXP and CROPDMGEXP contain the multipler with the following reference
   H - hundreds  multiplier $100
   K - thousands multiplier $1000
   M - millions multiplier $1,000,000
   B - Billions multiplier $1,000,000,000

   All other multipliers will be ignored due to lack of documenation and not a
   significant number of recorded events to change the data outcome.

The first step is to make all dollar values for crop and property damage to be in the same monetary unit.

hundred = 100
thousand = 1000
million = 1000000
billion = 1000000000

economic = filteredData %>% filter(PROPDMGEXP %in% c("H","K","M","B") | CROPDMGEXP %in% c("H","K","M","B")) %>%
  mutate(PROPERTY_COST = ifelse(PROPDMGEXP == "H", PROPDMG * hundred,
                         ifelse(PROPDMGEXP == "K", PROPDMG * thousand,
                         ifelse(PROPDMGEXP == "M", PROPDMG * million,
                         ifelse(PROPDMGEXP == "B", PROPDMG * billion, 0)))),
         CROP_COST = ifelse(CROPDMGEXP == "H", CROPDMG * hundred,
                     ifelse(CROPDMGEXP == "K", CROPDMG * thousand,
                     ifelse(CROPDMGEXP == "M", CROPDMG * million,
                     ifelse(CROPDMGEXP == "B", CROPDMG * billion, 0))))
        )

For events causing damage to crops and property, group the data by Event Type and summarize the data by cost of damage to crops, damage to properties and total cost damage (Property Cost + Crop Cost).

economic.summary = economic %>%
  group_by(EVENT_TYPE = as.factor(NOAAEVTYPE)) %>%
  summarize(NUMBER_OF_EVENTS = n(),
            PROPERTY_MEAN = mean(PROPERTY_COST, na.rm = TRUE),
            PROPERTY_DAMAGE = sum(PROPERTY_COST, na.rm = TRUE),
            CROP_MEAN = mean(CROP_COST, na.rm = TRUE),
            CROP_DAMAGE = sum(CROP_COST, na.rm = TRUE),
            TOTAL_DAMAGE = sum(PROPERTY_COST + CROP_COST))

Summarize the data for top total cost, crop cost and property cost.

economic.summary.property = economic.summary %>%
  select(EVENT_TYPE, DAMAGE_COST = PROPERTY_DAMAGE) %>% 
  mutate(DAMAGE_COST_TYPE = "Property Damage", DAMAGE_COST = DAMAGE_COST / billion) %>%
  arrange(desc(DAMAGE_COST))
economic.summary.property.top = economic.summary.property[1:topCount,]

economic.summary.crop = economic.summary %>%
  select(EVENT_TYPE, DAMAGE_COST = CROP_DAMAGE) %>% 
  mutate(DAMAGE_COST_TYPE = "Crop Damage", DAMAGE_COST = DAMAGE_COST / billion) %>%
  arrange(desc(DAMAGE_COST))
economic.summary.crop.top = economic.summary.crop[1:topCount,]

economic.summary.total = economic.summary %>%
  select(EVENT_TYPE, DAMAGE_COST = TOTAL_DAMAGE) %>% 
  mutate(DAMAGE_COST_TYPE = "Total Damage", DAMAGE_COST = DAMAGE_COST / billion) %>%
  arrange(desc(DAMAGE_COST))
economic.summary.total.top = economic.summary.total[1:topCount,]

Below is a chart showing the top highest events impacting cost on properties and crops.

economicChart <- ggplot(data = economic.summary.total.top,
                                aes(x=reorder(EVENT_TYPE,DAMAGE_COST), y=DAMAGE_COST,
                                    fill=colorRampPalette(brewer.pal(9,"Set1"))(topCount))) + 
  geom_bar(stat="identity") +
  geom_text(aes(label=format(round(DAMAGE_COST, digits=2), big.mark = ",", scientific = FALSE)), 
            position=position_dodge(width=0.5), vjust=0.50, hjust="inward") +
  coord_flip() +
  labs(y="Cost in Dollars (Billions)", x="Type of Event",
       title="Highest Economic Damage Cost") +
  theme(legend.position = "none", 
        plot.title = element_text(hjust = 0.5, face = "bold"), 
        plot.subtitle = element_text(hjust = 0.5, face = "bold")) +
  theme(axis.title = element_text(face="bold")) 

print(economicChart)

The report shows Floods with the highest cost (in billions) for damage to properties and crops combined across the United States. Tornados continue to show up in the top 10 due to the same factors found in comparing harm to the population.

Top events with the highest cost impact for Properties:

economic.summary.property.top %>% select(EVENT_TYPE, DAMAGE_COST)
## # A tibble: 20 x 2
##                 EVENT_TYPE DAMAGE_COST
##                     <fctr>       <dbl>
##  1                   FLOOD 144.6474628
##  2       HURRICANE/TYPHOON  82.5286190
##  3                 TORNADO  58.5390135
##  4              STORMSURGE  43.3234610
##  5              FLASHFLOOD  16.6312325
##  6                    HAIL  15.7323302
##  7       THUNDERSTORMWINDS   9.6962257
##  8                WILDFIRE   8.4844715
##  9           TROPICALSTORM   7.4756215
## 10             WINTERSTORM   6.7489922
## 11               HIGHWINDS   5.8765009
## 12           RIVERFLOODING   5.1189420
## 13         STORMSURGE/TIDE   4.6400380
## 14                ICESTORM   3.9449078
## 15 HEAVYRAIN/SEVEREWEATHER   2.5000000
## 16      SEVERETHUNDERSTORM   1.2053600
## 17                 DROUGHT   1.0411060
## 18               HEAVYSNOW   0.9325541
## 19               LIGHTNING   0.9276158
## 20               HEAVYRAIN   0.7022801

Top events with the highest cost impact for Crops:

economic.summary.crop.top %>% select(EVENT_TYPE, DAMAGE_COST)
## # A tibble: 20 x 2
##               EVENT_TYPE DAMAGE_COST
##                   <fctr>       <dbl>
##  1               DROUGHT  13.9723610
##  2                 FLOOD   5.6419935
##  3         RIVERFLOODING   5.0294590
##  4              ICESTORM   5.0221135
##  5     HURRICANE/TYPHOON   4.9589408
##  6                  HAIL   3.0467474
##  7          FROST/FREEZE   1.5509110
##  8            FLASHFLOOD   1.5177851
##  9 EXTREMECOLD/WINDCHILL   1.3300230
## 10     THUNDERSTORMWINDS   1.1605896
## 11         EXCESSIVEHEAT   0.9044635
## 12             HEAVYRAIN   0.7937898
## 13             HIGHWINDS   0.6790179
## 14         TROPICALSTORM   0.5744950
## 15               TORNADO   0.4174521
## 16              WILDFIRE   0.4016046
## 17        DAMAGINGFREEZE   0.2962300
## 18      EXCESSIVEWETNESS   0.1420000
## 19             HEAVYSNOW   0.1346531
## 20      FLOOD/RAIN/WINDS   0.1128000

It is ironic that both Floods and Droughts had the highest costly impact due to weather events across the United States.