Synopsis

This report aims to show how severe weather events recorded during the period 1950 to November 2011 have affected population health and the economy in the United States. This information is useful so that resources can be prioritised for the different types of events. Population health was measured by injuries and fatalities while economic consequences was measured by using cost of damages to property and crops. It was found that tornadoes, thunderstorm winds, excessive heat, floods and lightning caused the most fatalities and injuries. Floods, including flash floods, hurricanes/typhoons, hail and thunderstorm wind had the largest economic consequences.

Loading and Processing the Raw Data

The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database tracks characteristics of major storms and weather events in the United States, including when and where they occur.

Loading the data

The data was in the form of a comma-separated-value file compressed via the bzip2 algorithm. It was loaded directly from the zipped file.

        library(dplyr)
        stormdata <- tbl_df(read.csv("repdata_data_StormData.csv.bz2"))

Data Processing

The number of rows, columns and structure of the dataset were checked, as well as the first and last few rows to ensure that the data was loaded correctly.

        dim(stormdata)
## [1] 902297     37
        str(stormdata)
## Classes 'tbl_df', 'tbl' and 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
        head(stormdata)
## # A tibble: 6 x 37
##   STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
##     <dbl> <fct>    <fct>    <fct>      <dbl> <fct>      <fct> <fct> 
## 1       1 4/18/19~ 0130     CST           97 MOBILE     AL    TORNA~
## 2       1 4/18/19~ 0145     CST            3 BALDWIN    AL    TORNA~
## 3       1 2/20/19~ 1600     CST           57 FAYETTE    AL    TORNA~
## 4       1 6/8/195~ 0900     CST           89 MADISON    AL    TORNA~
## 5       1 11/15/1~ 1500     CST           43 CULLMAN    AL    TORNA~
## 6       1 11/15/1~ 2000     CST           77 LAUDERDALE AL    TORNA~
## # ... with 29 more variables: BGN_RANGE <dbl>, BGN_AZI <fct>,
## #   BGN_LOCATI <fct>, END_DATE <fct>, END_TIME <fct>, COUNTY_END <dbl>,
## #   COUNTYENDN <lgl>, END_RANGE <dbl>, END_AZI <fct>, END_LOCATI <fct>,
## #   LENGTH <dbl>, WIDTH <dbl>, F <int>, MAG <dbl>, FATALITIES <dbl>,
## #   INJURIES <dbl>, PROPDMG <dbl>, PROPDMGEXP <fct>, CROPDMG <dbl>,
## #   CROPDMGEXP <fct>, WFO <fct>, STATEOFFIC <fct>, ZONENAMES <fct>,
## #   LATITUDE <dbl>, LONGITUDE <dbl>, LATITUDE_E <dbl>, LONGITUDE_ <dbl>,
## #   REMARKS <fct>, REFNUM <dbl>
        tail(stormdata)
## # A tibble: 6 x 37
##   STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
##     <dbl> <fct>    <fct>    <fct>      <dbl> <fct>      <fct> <fct> 
## 1      47 11/28/2~ 03:00:0~ CST           21 TNZ001>00~ TN    WINTE~
## 2      56 11/30/2~ 10:30:0~ MST            7 WYZ007 - ~ WY    HIGH ~
## 3      30 11/10/2~ 02:48:0~ MST            9 MTZ009 - ~ MT    HIGH ~
## 4       2 11/8/20~ 02:58:0~ AKS          213 AKZ213     AK    HIGH ~
## 5       2 11/9/20~ 10:21:0~ AKS          202 AKZ202     AK    BLIZZ~
## 6       1 11/28/2~ 08:00:0~ CST            6 ALZ006     AL    HEAVY~
## # ... with 29 more variables: BGN_RANGE <dbl>, BGN_AZI <fct>,
## #   BGN_LOCATI <fct>, END_DATE <fct>, END_TIME <fct>, COUNTY_END <dbl>,
## #   COUNTYENDN <lgl>, END_RANGE <dbl>, END_AZI <fct>, END_LOCATI <fct>,
## #   LENGTH <dbl>, WIDTH <dbl>, F <int>, MAG <dbl>, FATALITIES <dbl>,
## #   INJURIES <dbl>, PROPDMG <dbl>, PROPDMGEXP <fct>, CROPDMG <dbl>,
## #   CROPDMGEXP <fct>, WFO <fct>, STATEOFFIC <fct>, ZONENAMES <fct>,
## #   LATITUDE <dbl>, LONGITUDE <dbl>, LATITUDE_E <dbl>, LONGITUDE_ <dbl>,
## #   REMARKS <fct>, REFNUM <dbl>

The documentation provided with the data describes 48 different event types all in upper case. The number of unique event types in the database was checked.

        length(unique(stormdata$EVTYPE))
## [1] 985

985 uniques values in the EVTYPE column shows a huge variation in how information was entered for this variable. There seemes to be incorrect spelling, variations of the same name, combinations of several names as well as combinations of upper and lower case letters.

It was decided that some of these event type names should be replaced with what is in the documentation, but only those that might affect the results of the analysis. A description of what was done, as well as justification of the changes made are provided in the sections below.

Processing the data to determine how events affect Population Health

The FATALITIES and INJURIES variables were used to show how weather events affect population health. Since there were so many more event type names in the dataset compared to what was in the documentation provided, some checks were done to see how these ‘other’ names contribute to the numbers of fatalities and injuries.

Justification for replacing event names - checking injuries and fatalities

A vector with the 48 event types described in the documentation was created. The raw data was then filtered to obtain those rows with event types that are NOT included in this vector. The proportions of fatalities and injuries attributed to these invalid event types were then calculated.

        # Create vector of valid event types
                event_types <- c("ASTRONOMICAL LOW TIDE", "AVALANCHE", "BLIZZARD", "COASTAL FLOOD", "COLD/WIND CHILL", "DEBRIS FLOW", "DENSE FOG", "DENSE SMOKE", "DROUGHT", "DUST DEVIL", "DUST STORM", "EXCESSIVE HEAT", "EXTREME COLD/WIND CHILL", "FLASH FLOOD", "FLOOD", "FREEZING FOG", "FROST/FREEZE", "FUNNEL CLOUD", "HAIL", "HEAT", "HEAVY RAIN", "HEAVY SNOW", "HIGH SURF", "HIGH WIND", "HURRICANE/TYPHOON", "ICE STORM", "LAKESHORE FLOOD", "LAKE-EFFECT SNOW", "LIGHTNING", "MARINE HAIL", "MARINE HIGH WIND", "MARINE STRONG WIND", "MARINE THUNDERSTORM WIND", "RIP CURRENT", "SEICHE", "SLEET", "STORM TIDE", "STRONG WIND", "THUNDERSTORM WIND", "TORNADO", "TROPICAL DEPRESSION", "TROPICAL STORM", "TSUNAMI", "VOLCANIC ASH", "WATERSPOUT", "WILDFIRE", "WINTER STORM", "WINTER WEATHER")

        # Filter invalid event types from the data
                invEvents <- filter(stormdata, !(EVTYPE %in% event_types))
        
        # Calculate how much invalid event types contribute to the total fatalities and injuries
                sum(invEvents$FATALITIES)/sum(stormdata$FATALITIES)
## [1] 0.1344998
                sum(invEvents$INJURIES)/sum(stormdata$INJURIES)
## [1] 0.08943413

About 13% of fatalities and 9% of injuries can be attributed to events with names not in the documentation. These amounts are large enough to justify replacing some of these names with the documented names.

Description of the selection process for replacing event names

It may not be necessary to replace every invalid event type. Some analysis was done to determine which ones should be replaced.

        # Filter rows with invalid event names and injuries not equal to 0
                storm_i <- filter(stormdata, !(EVTYPE %in% event_types) & INJURIES != 0)

        # Sum injuries for each event type, and arrange in descending order
                isum <- storm_i %>%
                        group_by(EVTYPE) %>%
                        summarise (inj_sum = sum(INJURIES)) %>%
                        arrange(desc(inj_sum))
                head(isum, 20)
## # A tibble: 20 x 2
##    EVTYPE               inj_sum
##    <fct>                  <dbl>
##  1 TSTM WIND               6957
##  2 THUNDERSTORM WINDS       908
##  3 FOG                      734
##  4 WILD/FOREST FIRE         545
##  5 HEAT WAVE                309
##  6 HIGH WINDS               302
##  7 RIP CURRENTS             297
##  8 EXTREME COLD             231
##  9 GLAZE                    216
## 10 EXTREME HEAT             155
## 11 WILD FIRES               150
## 12 ICE                      137
## 13 TSTM WIND/HAIL            95
## 14 WIND                      86
## 15 URBAN/SML STREAM FLD      79
## 16 WINTRY MIX                77
## 17 WINTER WEATHER/MIX        72
## 18 Heat Wave                 70
## 19 WINTER WEATHER MIX        68
## 20 LANDSLIDE                 52
        # Checking proportion if injuries < 100 for invalid events
                lessthan100_i <- filter(isum, !(EVTYPE %in% event_types) & inj_sum < 100)
                sum(lessthan100_i$inj_sum)/sum(stormdata$INJURIES)
## [1] 0.01157776

The sum of the injuries which were less than 100 for invalid event names only contribute to about 1% of the total injuries which is very small. A decision was therefore made to replace those event names with numbers of injuries greater than 100.

        # Filter rows where injuries are over 100
                morethan100_i <- filter(isum, !(EVTYPE %in% event_types) & inj_sum > 100)  
                
        # Select unique event names with over 100 injuries
                unique(morethan100_i$EVTYPE)
##  [1] TSTM WIND          THUNDERSTORM WINDS FOG               
##  [4] WILD/FOREST FIRE   HEAT WAVE          HIGH WINDS        
##  [7] RIP CURRENTS       EXTREME COLD       GLAZE             
## [10] EXTREME HEAT       WILD FIRES         ICE               
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND

The raw dataset was copied to sd_edited and event names were replaced in this copied dataset.

        # Raw data copied
                sd_edited <- stormdata
        
        # Ensure all values in the EVTYPE variable are in upper case
                sd_edited$EVTYPE <- toupper(sd_edited$EVTYPE)
        
        # Replace event names
                sd_edited$EVTYPE <- gsub("^TSTM WIND$|^THUNDERSTORM WINDS$", "THUNDERSTORM WIND", 
                                         sd_edited$EVTYPE)
                sd_edited$EVTYPE <- gsub("^FOG$", "DENSE FOG", sd_edited$EVTYPE)
                sd_edited$EVTYPE <- gsub("^WILD/FOREST FIRE$|^WILD FIRES$", "WILDFIRE", 
                                         sd_edited$EVTYPE)
                sd_edited$EVTYPE <- gsub("^HEAT WAVE$|^EXTREME HEAT$", "EXCESSIVE HEAT", 
                                         sd_edited$EVTYPE)
                sd_edited$EVTYPE <- gsub("^HIGH WINDS$|^STRONG WINDS$|^STRONG WIND$", "HIGH WIND", 
                                         sd_edited$EVTYPE)
                sd_edited$EVTYPE <- gsub("^RIP CURRENTS$", "RIP CURRENT", sd_edited$EVTYPE)
                sd_edited$EVTYPE <- gsub("^EXTREME COLD$", "EXTREME COLD/WIND CHILL", 
                                         sd_edited$EVTYPE)
                sd_edited$EVTYPE <- gsub("^GLAZE$", "FROST/FREEZE", sd_edited$EVTYPE)
                sd_edited$EVTYPE <- gsub("^ICE$", "ICE STORM", sd_edited$EVTYPE)

A similar check was done for FATALITIES.

        # Filter rows with invalid event names and fatalities are not 0
                storm_f <- filter(sd_edited, !(EVTYPE %in% event_types) & FATALITIES != 0)

        # Sum fatalities for each event type, and arrange in descending order
                fsum <- storm_f %>%
                        group_by(EVTYPE) %>%
                        summarise (fat_sum = sum(FATALITIES)) %>%
                        arrange(desc(fat_sum))

        # Calculate contribution to total fatalities
                sum(fsum$fat_sum)/sum(sd_edited$INJURIES)
## [1] 0.004974098

No further event names were replaced as the sum of fatalities for invalid names contribute to less than 1% of the total fatalities.

Processing the data to show how events affect the economy

The following variables were used to determine how events affect the economy:

  • PROPDMG - contains figures representing estimates of the damage done to property

  • PROPDMGEXP - an alpha character that signifies the magnitude of the amounts in PROPDMG, ie.

      + H - hundreds
      + K - thousands
      + M - millions
      + B - billions
  • CROPDMG - contains figures representing estimates of the damage done to crops

  • CROPDMGEXP - an alpha character that signifies the magnitude of the amounts in CROPDMG. The values are the same as for PROPDMGEXP.

Note that going forward the updated sd_edited dataset was used for all processing.

Checking PROPDMGEXP and CROPDMGEXP

PROPDMGEXP and CROPDMGEXP variables were checked to ensure that they contain only H, K, M or B.

        unique(sd_edited$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
        unique(sd_edited$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M

There are a few values which are not valid. Since there is nothing in the documentation to indicate what these characters mean, for the purposes of this analysis they were ignored.

Total damages for the billions magnitude were summed to get an idea of the figures for damages. Total hundreds were also checked to determine its contribution to the total.

        # Check billions
                sum(filter(sd_edited, sd_edited$PROPDMGEXP %in% c("B", "b"))$PROPDMG)*1000000000
## [1] 2.7585e+11
        # Check hundreds        
                sum(filter(sd_edited, sd_edited$PROPDMGEXP %in% c("H", "h"))$PROPDMG)*100
## [1] 2700

Total damages where the magnitude is in the hundreds is very low compared to the total damages (which is in the billions) so these figures were ignored.

Justification for replacing event names - checking property and crop damage

For both property and crops, invalid events were checked to see how much they contribute to total damages.

The data required was separated into 2 datasets: prop and crop.

The following was done to the data:

  • Select only the columns that will be used in the analysis: EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP.

  • Rows where the damage magnitude is B, M or K, were filtered.

  • The damage magnitude characters were substituted with corresponding numerical values: + B or b was replaced with 1 000 000 000 + M or m was replaced with 1 000 000 + K or k was replaced with 1 000

  • Damages were calculated and added to a new column in each dataset.

  • Filter those rows with events which are not in the documentation.

  • Find the proportion of the total damages from invalid event names.

        # Select the appropriate columns
                econtmp <- select(sd_edited, EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

        # Filter required rows 
                prop <- filter(econtmp, PROPDMGEXP %in% c("B", "b", "M", "m", "K", "k"))
                crop <- filter(econtmp, CROPDMGEXP %in% c("B", "b", "M", "m", "K", "k"))
        
        # Substitute the characters with numerical values
                prop$PROPDMGEXP <- gsub("K", 1000, prop$PROPDMGEXP, ignore.case = TRUE)
                prop$PROPDMGEXP <- gsub("M", 1e+06, prop$PROPDMGEXP, ignore.case = TRUE)
                prop$PROPDMGEXP <- gsub("B", 1e+09, prop$PROPDMGEXP, ignore.case = TRUE)
                prop$PROPDMGEXP <- as.numeric(prop$PROPDMGEXP)
                        
                crop$CROPDMGEXP <- gsub("K", 1000, crop$CROPDMGEXP, ignore.case = TRUE)
                crop$CROPDMGEXP <- gsub("M", 1e+06, crop$CROPDMGEXP, ignore.case = TRUE)
                crop$CROPDMGEXP <- gsub("B", 1e+09, crop$CROPDMGEXP, ignore.case = TRUE)
                crop$CROPDMGEXP <- as.numeric(crop$CROPDMGEXP)        

        # Calculate damages
                prop <- mutate(prop, propDamages = PROPDMG * PROPDMGEXP)
                crop <- mutate(crop, cropDamages = CROPDMG * CROPDMGEXP)

        # Filter rows with invalid events
                invEventsProp <- filter(prop, !(EVTYPE %in% event_types))
                invEventsCrop <- filter(crop, !(EVTYPE %in% event_types))
        
        # Find proportion of total contribution from invalid events
                sum(invEventsProp$propDamages)/sum(prop$propDamages)
## [1] 0.1803346
                sum(invEventsCrop$cropDamages)/sum(crop$cropDamages)
## [1] 0.1962875

About 18% of property damage and 20% of crop damage can be attributed to invalid events types. These amounts are large enough to justify replacing some of these names with the documented names.

Description of the selection process for replacing event names

Property damage for invalid events were checked. An arbitrary threshold was experimented with, where names for invalid event type above this threshold were replaced.

        # Group by invalid event type and arrange in descending order
                invPropOrdered <- invEventsProp %>%
                        group_by(EVTYPE) %>%
                        summarise (invp_sum = sum(propDamages)) %>%
                        arrange(desc(invp_sum))

        # Checking threshold to use for which events to replace
                lessthan10B_p <- filter(invPropOrdered, invp_sum < 1e+10)
                sum(lessthan10B_p$invp_sum)/sum(prop$propDamages)
## [1] 0.05117609

Sum of damages for invalid events which were less than 10 billion contributed to about 5% to total property damage so it was decided that names would be replaced for those events where property damage was over 10 billion.

                morethan10B_p <- filter(invPropOrdered, invp_sum > 1e+10)
                unique(morethan10B_p$EVTYPE)
## [1] "STORM SURGE" "HURRICANE"
        # Replace event names
                prop$EVTYPE <- gsub("^HURRICANE .*|^HURRICANE$|^TYPHOON$", "HURRICANE/TYPHOON", 
                                    prop$EVTYPE)
                prop$EVTYPE <- gsub("^STORM SURGE.*", "STORM TIDE", prop$EVTYPE)

A similar exercise was done on invalid event names for crop damages.

       # Group by invalid event type and arrange in descending order
                invCropOrdered <- invEventsCrop %>%
                        group_by(EVTYPE) %>%
                        summarise (invc_sum = sum(cropDamages)) %>%
                        arrange(desc(invc_sum))

        # Checking threshold to use for which events to replace
                lessthan1B_c <- filter(invCropOrdered, invc_sum < 1e+9)
                sum(lessthan1B_c$invc_sum)/sum(crop$cropDamages)
## [1] 0.03802465

Sum of damages for invalid events which were less than 1 billion contributed to about 4% to total property damage, so it was decided that names would be replaced for only those invalid events where property damage was over 1 billion.

                morethan1B_c <- filter(invCropOrdered, invc_sum >1e+9)
                unique(morethan1B_c$EVTYPE)
## [1] "RIVER FLOOD" "HURRICANE"
        # Replace event names
                crop$EVTYPE <- gsub("^HURRICANE .*|^HURRICANE$", "HURRICANE/TYPHOON", crop$EVTYPE)
                crop$EVTYPE <- gsub("^RIVER FLOOD.*|FLOOD/RAIN.*", "FLOOD", crop$EVTYPE)

Results

Weather events most harmful to population health

The edited database, sd_edited, was used to create a plot showing the events harmful to the population ie., fatalities and injuries.

First, the injuries were plotted against event type in a barchart.

        # Group by Event type, sum injuries for each event and arrange in descending order
                inj_ordered <- sd_edited %>%
                        group_by(EVTYPE) %>%
                        summarise (inj_sum = sum(INJURIES)) %>%
                        arrange(desc(inj_sum))

                head(inj_ordered, 20)
## # A tibble: 20 x 2
##    EVTYPE            inj_sum
##    <chr>               <dbl>
##  1 TORNADO             91346
##  2 THUNDERSTORM WIND    9353
##  3 EXCESSIVE HEAT       7059
##  4 FLOOD                6789
##  5 LIGHTNING            5230
##  6 ICE STORM            2112
##  7 HEAT                 2100
##  8 FLASH FLOOD          1777
##  9 HIGH WIND            1740
## 10 WILDFIRE             1606
## 11 HAIL                 1361
## 12 WINTER STORM         1321
## 13 HURRICANE/TYPHOON    1275
## 14 DENSE FOG            1076
## 15 HEAVY SNOW           1021
## 16 BLIZZARD              805
## 17 RIP CURRENT           529
## 18 DUST STORM            440
## 19 WINTER WEATHER        398
## 20 TROPICAL STORM        340

The top ten events were sufficient to reasonably illustrate the relationship between weather events and injuries.

        # Take the top 10 events
                inj_plotdata <- head(inj_ordered, 10)
        
        # Make EVTYPE an ordered factor so that ggplot does not re-order it
                inj_plotdata$EVTYPE <- factor(inj_plotdata$EVTYPE, levels = inj_plotdata$EVTYPE)

        # Plot injuries
                library(ggplot2)
                injPlot <- ggplot(inj_plotdata, aes(EVTYPE, inj_sum)) +
                        geom_bar(stat = "identity", fill = "#3296FF") +
                        scale_y_continuous(name = "No. of Injuries", breaks=seq(0, 1e+5, 1e+4)) +
                        xlab("Type of Weather Event") +
                        theme(axis.text.x = element_text(angle = 60, hjust = 1))

Fatalities were then plotted against event type. Again the top ten events were used to illustrate the relationship between weather events and fatalities.

        # Group by Event type, sum fatalities for each event and arrange in descending order
                fat_ordered <- sd_edited %>%
                        group_by(EVTYPE) %>%
                        summarise (fat_sum = sum(FATALITIES)) %>%
                        arrange(desc(fat_sum))
        
        # Take the top 10 events
                fat_plotdata <- head(fat_ordered, 10)
        
        # Make EVTYPE an ordered factor so that ggplot does not re-order it
                fat_plotdata$EVTYPE <- factor(fat_plotdata$EVTYPE, levels = fat_plotdata$EVTYPE)

        # Plot fatalities
                fatPlot <- ggplot(fat_plotdata, aes(EVTYPE, fat_sum)) +
                        geom_bar(stat = "identity", fill = "#0080C0") +
                        scale_y_continuous(name = "No. of Fatalities", breaks=seq(0, 10000, 1000)) +
                        xlab("Type of Weather Event") +
                        theme(axis.text.x = element_text(angle = 60, hjust = 1))

Both charts are displayed in one panel.

        library(ggpubr)
        panelInjFat <- ggarrange(injPlot, fatPlot, align = "h")
        annotate_figure(panelInjFat,
                        top = text_grob("Events most harmful to population health", 
                                        color = "navy blue", face = "bold", size = 12),
                        fig.lab = "Figure 1", fig.lab.face = "bold")

The barcharts show that tornodoes, thunderstorm winds and excessive heat are the top 3 causes of injuries, while tornadoes, excessive heat and flash flood are the top 3 causes of fatalities.

Tornadoes, thunderstorm winds, excessive heat, floods, lightning, heat and high wind are the major causes of injuries and fatalities.

Weather events that have the greatest economic consequences

Property damage was plotted against event type using a barchart. Again the top ten events were sufficient to illustrate the weather events that caused the greatest damage.

        # Group by event type, sum damages and arrange in descending order
                propOrdered <- prop %>%
                        group_by(EVTYPE) %>%
                        summarise (p_sum = sum(propDamages)) %>%
                        arrange(desc(p_sum))

        # Take the top 10 events
                prop_plotdata <- head(propOrdered, 10)
        
        # Make EVTYPE an ordered factor so that ggplot does not re-order it
                prop_plotdata$EVTYPE <- factor(prop_plotdata$EVTYPE, levels = prop_plotdata$EVTYPE)

        # Plot property damage
                propPlot <- ggplot(prop_plotdata, aes(EVTYPE, p_sum)) +
                        geom_bar(stat = "identity", fill = "#008080") +
                        scale_y_continuous(name = "Amount of Property Damage", 
                                           breaks=seq(0, 1.5e+11, 1e+10)) +
                        xlab("Type of Weather Event") +
                        theme(axis.text.x = element_text(angle = 60, hjust = 1))

Crop Damage was then plotted against event type.

        # Group by event type, sum damages and arrange in descending order
                cropOrdered <- crop %>%
                                group_by(EVTYPE) %>%
                                summarise (c_sum = sum(cropDamages)) %>%
                                arrange(desc(c_sum))

        # Take the top 10 events
                crop_plotdata <- head(cropOrdered, 10)
        
        # Make EVTYPE an ordered factor so that ggplot does not re-order it
                crop_plotdata$EVTYPE <- factor(crop_plotdata$EVTYPE, levels = crop_plotdata$EVTYPE)

        # Plot crop damage
                cropPlot <- ggplot(crop_plotdata, aes(EVTYPE, c_sum)) +
                        geom_bar(stat = "identity", fill = "#00CCCC") +
                        scale_y_continuous(name = "Amount of Crop Damage", 
                                           breaks=seq(0, 1.4e+10, 1e+9)) +
                        xlab("Type of Weather Event") +
                        theme(axis.text.x = element_text(angle = 60, hjust = 1))

Both plots are displayed in one panel.

        panelDamage <- ggarrange(propPlot, cropPlot, align = "h")
        annotate_figure(panelDamage,
                        top = text_grob("Events with greatest economic consequences", 
                                        color = "navy blue", face = "bold", size = 12),
                        fig.lab = "Figure 2", fig.lab.face = "bold")

The barcharts show that while flood, hurricanes/typhoons and tornadoes are the top 3 causes of property damage, drought, flood and hurricanes/typhoons are the top 3 causes of damage to crops.

Floods, including flash floods, hurricanes/typhoons, hail and thunderstorm wind are the major causes of damage to both property and crops.