Synopsis

This project evaluates the NOAA weather events data from the period of 1950 to the end of 2011. It cleans and reorganizes the data and presents answers to two questions:

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database for 1950 to November, 2011. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The documentation from the National Weather Service (NWS) and the National Climatic Data Center helped clean the data for processing.

Data Processing

Loading the data

Before we can even begin to load and analyze the data we need to load the proper packages. In this case we will load packages that simplify cleaning and organizing the data and graphing the results at the end.

# load the necessary packages
library(dplyr)
library(tidyr)
library(stringr)
library(ggplot2)

 

We begin our analysis by downloading the data and saving it to a local file. Then we read the file into R for further processing.

 

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "StormData.csv.bz2")
stormData <- as_tibble(read.csv("StormData.csv.bz2"))
rows <- format(nrow(stormData),  big.mark = ",", scientific=FALSE)

 

The first step is to take a quick look at the data and see its size and the type of information it contains in its columns (especially the ones important to our questions).

The data contain close to a million (902,297) observations in 37 variables, containing information about the weather events, their location, duration and the damage they caused.

However, the information relevant to our analysis is only contained in the following seven columns:

  • EVTYPE - the name of the weather event that took place
  • FATALITIES - the number of fatalities that occurred as the result of the weather event
  • INJURIES - the number of injuries that occurred as the result of the weather event
  • PROPDMG - the amount of damage in dollars caused to property by the weather event
  • PROPDMGEXP - the modifier to be applied to the value in PROPDMGEXP, defined as:
    • h - hundreds
    • k - thousands
    • m - millions
    • b - billions
  • CROPDMG - the amount of damage in dollars caused to crops by the weather event
  • CROPDMGEXP - the modifier for the value in CROPDMG, defined in the same way as PROPDMGEXP

     

str(stormData)
## tibble [902,297 × 37] (S3: tbl_df/tbl/data.frame)
##  $ STATE__   : num [1:902297] 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr [1:902297] "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr [1:902297] "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr [1:902297] "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num [1:902297] 97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr [1:902297] "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr [1:902297] "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr [1:902297] "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr [1:902297] "" "" "" "" ...
##  $ BGN_LOCATI: chr [1:902297] "" "" "" "" ...
##  $ END_DATE  : chr [1:902297] "" "" "" "" ...
##  $ END_TIME  : chr [1:902297] "" "" "" "" ...
##  $ COUNTY_END: num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi [1:902297] NA NA NA NA NA NA ...
##  $ END_RANGE : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr [1:902297] "" "" "" "" ...
##  $ END_LOCATI: chr [1:902297] "" "" "" "" ...
##  $ LENGTH    : num [1:902297] 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num [1:902297] 100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int [1:902297] 3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num [1:902297] 0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num [1:902297] 15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num [1:902297] 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr [1:902297] "K" "K" "K" "K" ...
##  $ CROPDMG   : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr [1:902297] "" "" "" "" ...
##  $ WFO       : chr [1:902297] "" "" "" "" ...
##  $ STATEOFFIC: chr [1:902297] "" "" "" "" ...
##  $ ZONENAMES : chr [1:902297] "" "" "" "" ...
##  $ LATITUDE  : num [1:902297] 3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num [1:902297] 8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num [1:902297] 3051 0 0 0 0 ...
##  $ LONGITUDE_: num [1:902297] 8806 0 0 0 0 ...
##  $ REMARKS   : chr [1:902297] "" "" "" "" ...
##  $ REFNUM    : num [1:902297] 1 2 3 4 5 6 7 8 9 10 ...

 

Here are 10 randomly selected rows of data that include only the columns useful for the analysis.

 

set.seed(1976)   
stormData[sample(1:nrow(stormData), size = 10, replace = FALSE),
          c("EVTYPE","FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
## # A tibble: 10 × 7
##    EVTYPE            FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
##    <chr>                  <dbl>    <dbl>   <dbl> <chr>        <dbl> <chr>     
##  1 THUNDERSTORM WIND          0        0    10   "K"              0 "K"       
##  2 TSTM WIND                  0        0     1   "K"              0 ""        
##  3 FLOOD/FLASH FLOOD          0        0     0.5 "K"              0 ""        
##  4 HAIL                       0        0     0   "K"              0 "K"       
##  5 THUNDERSTORM WIND          0        0    12   "K"              0 "K"       
##  6 HAIL                       0        0     0   ""               0 ""        
##  7 HAIL                       0        0     0   ""               0 ""        
##  8 LIGHTNING                  0        0     0   ""               0 ""        
##  9 HAIL                       0        0     0   ""               0 ""        
## 10 TSTM WIND                  0        0     0   ""               0 ""

 

Clean the data

Next, the data will be cleaned to ease further processing. Events and the modifiers for the damages will be converted to factors and any leading or trailing white space will be removed.

The columns with the amounts for property and crop damage will be converted to proper dollar amounts by multiplying the amounts currently present in columns P/CROPDMG by the appropriate conversion for the letters in P/CROPDMGEXP. Some of the data will be left as a dollar amount due to lack of proper information in the P/CROPDMGEXP column.

 

cleanedData <- stormData

# change needed levels to factors for easier processing and get rid of extra white spaces

cleanedData$EVTYPE <- as.factor(cleanedData$EVTYPE)
cleanedData$EVTYPE <- str_squish(tolower(cleanedData$EVTYPE))
cleanedData$PROPDMGEXP <- as.factor(cleanedData$PROPDMGEXP)
cleanedData$PROPDMGEXP <- str_squish(tolower(cleanedData$PROPDMGEXP))
cleanedData$CROPDMGEXP <- as.factor(cleanedData$CROPDMGEXP)
cleanedData$CROPDMGEXP <- str_squish(tolower(cleanedData$CROPDMGEXP))


# clean up and reformat the data to help with later analysis

cleanedData <- cleanedData %>% 
        mutate(PROPDMG = ifelse(PROPDMGEXP =="k", PROPDMG*1000, 
            ifelse(PROPDMGEXP == "m", PROPDMG*1000000, 
                  ifelse(PROPDMGEXP == "b", PROPDMG*1000000000, 
                         ifelse(PROPDMGEXP == "h", PROPDMG*100,
                                PROPDMG)))))  # Otherwise, keep the original value


cleanedData <- cleanedData %>% 
        mutate(CROPDMG = ifelse(CROPDMGEXP == "k", CROPDMG*1000,  
            ifelse(CROPDMGEXP == "m", CROPDMG*1000000, 
                  ifelse(CROPDMGEXP == "b", CROPDMG*1000000000,
                         ifelse(CROPDMGEXP == "h", CROPDMG*100,
                                CROPDMG)))))  # Otherwise, keep the original value



# clean values in the field meant to signify the magnitude of the numbers in the preceding column

validExp <- c("h","k","m","b")
cleanedData <- cleanedData %>% mutate(PROPDMGEXP = ifelse(PROPDMGEXP %in% validExp, PROPDMGEXP, "")) 
cleanedData <- cleanedData %>% mutate(CROPDMGEXP = ifelse(CROPDMGEXP %in% validExp, CROPDMGEXP, "")) 
allEvents <- length(unique(cleanedData$EVTYPE))

 

After the cleaning the same 10 rows as presented above will look a little different.

 

set.seed(1976)   
cleanedData[sample(1:nrow(cleanedData), size = 10, replace = FALSE),
                  c("EVTYPE","FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
## # A tibble: 10 × 7
##    EVTYPE            FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
##    <chr>                  <dbl>    <dbl>   <dbl> <chr>        <dbl> <chr>     
##  1 thunderstorm wind          0        0   10000 "k"              0 "k"       
##  2 tstm wind                  0        0    1000 "k"              0 ""        
##  3 flood/flash flood          0        0     500 "k"              0 ""        
##  4 hail                       0        0       0 "k"              0 "k"       
##  5 thunderstorm wind          0        0   12000 "k"              0 "k"       
##  6 hail                       0        0       0 ""               0 ""        
##  7 hail                       0        0       0 ""               0 ""        
##  8 lightning                  0        0       0 ""               0 ""        
##  9 hail                       0        0       0 ""               0 ""        
## 10 tstm wind                  0        0       0 ""               0 ""

 

However, there are still 883 unique values for the weather events in the data, which is significantly higher than the 48 event types listed in the National Weather Service documentation that provides more details and explanations.

To make processing more efficient, the unnecessary columns will be discarded and the 902,297 observations will be grouped by event type to reduce the size of data frame. The columns will also be renamed for better readability. The remarks column has been kept with the data to help disentangle ambiguous weather event names.

 

harmfulEvents <- cleanedData %>% select(event = EVTYPE,
                    fatalities = FATALITIES,
                    injuries = INJURIES,
                    propertyDamage = PROPDMG,
                    cropDamage = CROPDMG,
                    remarks=REMARKS)

set.seed(1976)   
harmfulEvents[sample(1:nrow(harmfulEvents), size = 10, replace = FALSE), ]
## # A tibble: 10 × 6
##    event             fatalities injuries propertyDamage cropDamage remarks      
##    <chr>                  <dbl>    <dbl>          <dbl>      <dbl> <chr>        
##  1 thunderstorm wind          0        0          10000          0 "EPISODE NAR…
##  2 tstm wind                  0        0           1000          0 "Large trees…
##  3 flood/flash flood          0        0            500          0 "Heavy thund…
##  4 hail                       0        0              0          0 "EPISODE NAR…
##  5 thunderstorm wind          0        0          12000          0 "EVENT NARRA…
##  6 hail                       0        0              0          0 "The Sheriff…
##  7 hail                       0        0              0          0 ""           
##  8 lightning                  0        0              0          0 "Thunderstor…
##  9 hail                       0        0              0          0 ""           
## 10 tstm wind                  0        0              0          0 ""

 

The next part is the most labor intensive part of processing the data, however it is necessary for getting proper results.

As mentioned earlier, the National Weather Service (NWS) only lists 48 event names that are documented (see table below).

 

NWS Event Name NWS Event Name NWS Event Name
Astronomical Low Tide Debris Flow Excessive Heat
Avalanche Dense Fog Extreme Cold/Wind Chill
Blizzard Dense Smoke Flash Flood
Coastal Flood Drought Flood
Cold/Wind Chill Dust Devil Frost/Freeze
Funnel Cloud Heat Heavy Rain
Freezing Fog Heavy Snow High Surf
High Wind Hurricane (Typhoon) Ice Storm
Lake-Effect Snow Lakeshore Flood Lightning
Marine Hail Marine High Wind Marine Strong Wind
Marine Thunderstorm Wind Rip Current Seiche
Sleet Storm Surge/Tide Strong Wind
Thunderstorm Wind Tornado Tropical Depression
Tropical Storm Tsunami Volcanic Ash
Waterspout Wildfire Winter Storm
Winter Weather

 

In cleaning the event names, the definitions provided in the documentation took priority, however, there were situations where the names were misspelled, contained more than one weather event or included incomplete information (ex. “high” or “normal precipitation” or “?”). When possible, the “Remarks” column was checked to see if the event name could be ascertained, but a separate category, called “Other” was also created to include data that could not be assigned to a specific weather event.

The code below shows the processing involved in matching event names in the data to the NWS event names.

 

# rename fields that have more than one event type, based on the first event type - assume that's a primary cause

harmfulEvents$event[grepl("(?i)^astron(.*)low", harmfulEvents$event)] <- "Astronomical Low Tide"
harmfulEvents$event[grepl("(?i)^avalan", harmfulEvents$event)] <- "Avalanche"
harmfulEvents$event[grepl("(?i)^blizzard|ground blizzard", harmfulEvents$event) |
         grepl("(?i)^blowing snow$", harmfulEvents$event)] <- "Blizzard"  
harmfulEvents$event[
        grepl("(?i)(coastal|cstl|tidal|beach)(.*)(flood.*|storm|surg|eros)", harmfulEvents$event)] <- "Coastal Flood"
harmfulEvents$event[grepl("(?i)^cold$", harmfulEvents$event) |
         grepl("(?i)^hypo", harmfulEvents$event) |
         grepl("(?i)^(low|cool).*(wet|spell|temp)", harmfulEvents$event) |
         !grepl("(?i)(snow|ex|prolong|record|severe|unseason|unusual|bitter)", harmfulEvents$event) &
         grepl("(?i)^(cold|low wind|wind).*(wind|temp|weath|wet|chill|wave)", harmfulEvents$event)] <- 
         "Cold/Wind Chill"
harmfulEvents$event[grepl("(?i)debris|landsl|mudsl|slide", harmfulEvents$event)] <- "Debris Flow/Landslide"
harmfulEvents$event[grepl("(?i)dense fog" , harmfulEvents$event)] <- "Dense Fog"
harmfulEvents$event[grepl("(?i)smoke", harmfulEvents$event)] <- "Dense Smoke"
harmfulEvents$event[grepl("(?i)drought|dry|below normal prec", harmfulEvents$event)] <- "Drought"
harmfulEvents$event[grepl("(?i)^(dus)(.*)d|whirl", harmfulEvents$event)] <- "Dust Devil"
harmfulEvents$event[grepl("(?i)^(dus)(.*)st", harmfulEvents$event) |
        grepl("(?i)saharan dust", harmfulEvents$event)] <- "Dust Storm"
harmfulEvents$event[grepl("(?i)(ex|rec|pro|unseas)(.*)(high|heat|hot|warm.*)", harmfulEvents$event)|
        grepl("(?i)^high temperature record", harmfulEvents$event)] <- "Excessive Heat"
harmfulEvents$event[grepl("(?i)^.*low temp|record low", harmfulEvents$event) |
        !grepl("(?i)snow|low", harmfulEvents$event) &
        grepl("(?i)^(ex|prolong|record|severe|unseason|unusual|bitter).*(cool|cold|wind)", 
              harmfulEvents$event)] <- "Extreme Cold/Wind Chill"
harmfulEvents$event[grepl("(?i)flash|((high|ris).*water)", harmfulEvents$event) |
        grepl("(?i)ice (floes|jam)|(breakup|local).*flood.*", harmfulEvents$event)|
        grepl("(?i)((street|snowmelt|highway).*flood.*)", harmfulEvents$event)] <- "Flash Flood"
harmfulEvents$event[grepl("(?i)^flood(|s|ing)$", harmfulEvents$event) |
        grepl("(?i)^dam", harmfulEvents$event) |
        grepl("(?i)(major|minor).*flood.*|flood watch", harmfulEvents$event)] <- "Flood"
harmfulEvents$event[grepl("(?i)frost|freeze$" , harmfulEvents$event) |
        grepl("(?i)^ice$", harmfulEvents$event) |
        grepl("(?i)^glaze$", harmfulEvents$event) |
        grepl("(?i)^(black ice)|(ic.*road)|patchy ice", harmfulEvents$event)] <- "Frost/Freeze"
harmfulEvents$event[grepl("(?i)funnel" , harmfulEvents$event)] <- "Funnel Cloud"
harmfulEvents$event[grepl("(?i)^(freezing|ice|)(\\s*|\\W*)fog", harmfulEvents$event) |
        grepl("(?i)^fog and cold", harmfulEvents$event)] <- "Freezing Fog"
harmfulEvents$event[!grepl("(?i)marine", harmfulEvents$event) &
        grepl("(?i)hail|ice pellets", harmfulEvents$event)] <- "Hail"
harmfulEvents$event[grepl("(?i)^heat", harmfulEvents$event) |
        grepl("(?i)warm weather|(abn|very|un).*warm|hyper|hot.*(pattern|spell|weather)", harmfulEvents$event)
        ] <- "Heat"
harmfulEvents$event[grepl ("(?i)(urban|rural)|rai.*(h.*vy|dama|storm)|(ex|abn|uns).*wet|wet year", harmfulEvents$event) |
        grepl("(?i)(ex|h.*vy|torr|long|record|unseas|early).*(rain|show|preci)", harmfulEvents$event) |
        grepl("(?i)^rain$", harmfulEvents$event) |
        grepl("(?i)\\/rain|rain\\/", harmfulEvents$event) |
        grepl("(?i)^(tstm|thun)(\\s*|\\W*)rain", harmfulEvents$event)] <- "Heavy Rain"
harmfulEvents$event[grepl("(?i)^heavy(.*)snow|record(.*)snow|ex(.*)snow", harmfulEvents$event)|
        !grepl("(?i)(ice|rain|squall|wind|lack|sleet|flurries|flood|lake|cold|drought|blow)", harmfulEvents$event) &
        grepl("(?i)^(snow)", harmfulEvents$event)] <- "Heavy Snow"
harmfulEvents$event[grepl("(?i)^astron(.*)high", harmfulEvents$event) |
        grepl("(?i)surf", harmfulEvents$event) & !grepl("(?i)rain", harmfulEvents$event) |
        grepl("(?i)(high|heavy|rough).*(seas|swell.*|tide|waves)", harmfulEvents$event)] <- "High Surf"
harmfulEvents$event[grepl("(?i)^high wind", harmfulEvents$event)|
        grepl("(?i)^wind|winds$)", harmfulEvents$event) |
        grepl("(?i)((non-sev|low|gust|grad|rain and).*wind)", harmfulEvents$event)] <- "High Wind"
harmfulEvents$event[grepl("(?i)(wind.*(adv))", harmfulEvents$event)] <- "High Wind"
harmfulEvents$event[grepl("(?i)^hurricane|typh|floyd", harmfulEvents$event)] <- "Hurricane/Typhoon"
harmfulEvents$event[grepl("(?i)^ice storm|glaze.*ice" , harmfulEvents$event)] <- "Ice Storm"
harmfulEvents$event[grepl("(?i)lake(.*)snow", harmfulEvents$event)] <- "Lake-Effect Snow"
harmfulEvents$event[grepl("(?i)(lake|creek|river|stream).*fl.*d|small stream", harmfulEvents$event)] <- "Lakeshore Flood"
harmfulEvents$event[grepl("(?i)^lig.t", harmfulEvents$event)] <- "Lightning"
harmfulEvents$event[grepl("(?i)^marine hail", harmfulEvents$event)] <- "Marine Hail"
harmfulEvents$event[grepl("(?i)^marine high wind", harmfulEvents$event)] <- "Marine High Wind"
harmfulEvents$event[grepl("(?i)^marine strong wind", harmfulEvents$event)] <- "Marine Strong Wind"
harmfulEvents$event[grepl("(?i)^marine(\\s*|\\W*)(tstm|thun)", harmfulEvents$event)] <- "Marine Thunderstorm Wind"
harmfulEvents$event[grepl("(?i)^(rip)(.*)cur", harmfulEvents$event)] <- "Rip Current"
harmfulEvents$event[grepl("(?i)^seiche", harmfulEvents$event)] <- "Seiche"
harmfulEvents$event[grepl("(?i)^sleet" , harmfulEvents$event)] <- "Sleet"
harmfulEvents$event[grepl("(?i)surge" , harmfulEvents$event)] <- "Storm Surge/Tide"
harmfulEvents$event[!grepl("(?i)marine", harmfulEvents$event) &
        grepl("(?i)stron.*wind|((storm).*wind)", harmfulEvents$event)|
        grepl("(?i)wind.*storm", harmfulEvents$event)] <- "Strong Wind"
harmfulEvents$event[!grepl("(?i)marine", harmfulEvents$event) &
        (grepl("(?i)(tstm|thu|tun)(.*)w", harmfulEvents$event) |
        grepl("(?i)^thunderstorm|severe thunderstorm.*|tstm$", harmfulEvents$event) |
        grepl("(?i)burst|gustnado", harmfulEvents$event))] <- "Thunderstorm Wind"
harmfulEvents$event[
        grepl("(?i)(torn|landspo|cold air funnel)", harmfulEvents$event) & 
        !grepl("(?i)debris", harmfulEvents$event)] <- "Tornado"
harmfulEvents$event[grepl("(?i)^tropical depression", harmfulEvents$event)] <- "Tropical Depression"
harmfulEvents$event[grepl("(?i)^tropical storm", harmfulEvents$event)] <- "Tropical Storm"
harmfulEvents$event[grepl("(?i)tsunami" , harmfulEvents$event)] <- "Tsunami"
harmfulEvents$event[grepl("(?i)^volcan", harmfulEvents$event)] <- "Volcanic Ash"
harmfulEvents$event[grepl("(?i)^(wa)(.*)(te)(.*)sp", harmfulEvents$event)] <- "Waterspout"
harmfulEvents$event[grepl("(?i)fire" , harmfulEvents$event)] <- "Wildfire"
harmfulEvents$event[
        grepl("(?i)^((winter|snow)(.*)storm)|(snow(.*)squall|blow)|(ice(.*)blizzard)", harmfulEvents$event)
        ] <- "Winter Storm"
harmfulEvents$event[grepl("(?i)^winter(.*)weather|mix", harmfulEvents$event)|
        grepl("(?i)^freezing.*(rain|drizzle|sleet|spray)", harmfulEvents$event)|
        grepl("(?i)^light freezing rain", harmfulEvents$event) |
        grepl("(?i)snow drought|late snow|ice.*winds", harmfulEvents$event)|
        grepl("(?i)(snow|flurries)", harmfulEvents$event) &
        grepl("(?i)(wind|cold|drift|ice|lack|season|light|moderate|rain|sleet|
              late|early|wet|first|fall|mountain)", harmfulEvents$event)] <- "Winter Weather"
harmfulEvents$event[grepl("(?i)^excessive$", harmfulEvents$event)|
        grepl("(?i)month|none|wall cloud|turbulence|rogue|pattern|northern|red flag", harmfulEvents$event)|
        grepl("(?i)normal precip|temperature record|wet weather", harmfulEvents$event)|
        grepl("(?i)^high$", harmfulEvents$event)|
        grepl("(?i)^record temperature.*$", harmfulEvents$event)|
        grepl("(?i)^(other|summary|\\?|.*county|.*mishap|.*accident|metro|drowning|wnd|vog|no severe|south.*)", harmfulEvents$event)
        ] <- "Other"

# check the total number of unique event names after processing

allEvents <- length(unique(harmfulEvents$event))

 

At this point the number of unique event names was brought down to 49 which is equal to the 48 provided by the NWS and 1 more for the “Other” category, and the data was ready to be used to answer the questions.

Results

Which types of events are most harmful with respect to population health?

 

To answer which weather events are the most harmful to population health across the United States, the data was again regrouped by the type of the weather event and only included the information on fatalities and injuries that occurred due to that event.

Since there were 49 possible events and the question only concerns the most harmful ones, the data was ordered by the sum of fatalities and injuries, sorted in a decreasing order and truncated after the top 15 events.

 

# sort the necessary data, select only the top values for plotting

healthTop <- harmfulEvents %>% group_by(event) %>% 
        summarise(fatalities = sum(fatalities),  # Sum the fatalities
                  injuries = sum(injuries),        # sum the injuries
                  totalPersonalHarm = sum(fatalities + injuries)) %>% 
        arrange(desc(totalPersonalHarm))
healthTop15 <- subset(healthTop[1:15,1:3])

plotHealth<-gather(healthTop15, Type, value, fatalities:injuries) 


# plot the selected data

healthPlot <- ggplot(plotHealth, 
        aes(x=reorder(event,-value), y=value, fill=Type))+
        geom_bar(stat="identity")+
        labs(title="Events Harmful to Population", 
             x="Weather Event", 
             y="Number of People Affected")+
        scale_y_continuous(labels = label_number(big.mark = ","))+
        scale_fill_manual(values = c("violet","thistle"), labels=c("Fatalities", "Injuries"))+
        theme( panel.background = element_rect(fill = "white"),
                plot.title = element_text(hjust = 0.5, face = "bold", size = 16), 
                axis.text.x = element_text(face = "bold", angle = 90, hjust = 1, vjust=0.1),
                axis.text.y = element_text(face = "bold")
             )
healthPlot

 

The stacked graph above helps us see that there is a clear leader in the weather event most harmful to the population.

Tornados cause by far the most injuries (91,407) and deaths (5,636), with the next closest most harmful event - Excessive Heat with 6,747 injuries and 2,029 fatalities coming in far behind.

 

Which types of events have the greatest economic consequences?

To answer which weather events across the United States have the greatest economic consequences, the data was regrouped by the weather event and only included the information on property and crop damage that occurred due to that event.

Since there were 49 possible events and the question only concerns the most harmful ones, the data was ordered by the sum of the amounts of property and crop damages, sorted in a decreasing order and truncated after the top 15 events.

 

# sort the necessary data, select only the top values for plotting

economyTop <- harmfulEvents %>% group_by(event) %>% 
        summarise(propertyDamage = sum(propertyDamage),  # sum the property damage
                  cropDamage = sum(cropDamage),      # sum the crop damage
                  totalDamages = sum(propertyDamage + cropDamage)) %>% 
        arrange(desc(totalDamages))
economyTop15 <- subset(economyTop[1:15,1:3])


# plot the selected data

plotEconomy<-gather(economyTop15, Type, value, propertyDamage:cropDamage)

economyPlot <- ggplot(plotEconomy, 
        aes(x=reorder(event,-value), y=value/1000000000, fill=Type))+
        geom_bar(stat="identity")+
        labs(title="Events Harmful to Property and Crops", 
             x="Weather Event", 
             y="Amount of Damage in Billions of Dollars")+
        scale_y_continuous(labels = label_number(big.mark = ","))+
        scale_fill_manual(values = c("slateblue2","lightblue"), labels=c("Crop Damage", "Property Damage"))+
        theme( panel.background = element_rect(fill = "white"),
               plot.title = element_text(hjust = 0.5, face = "bold", size = 16), 
               axis.text.x = element_text(face = "bold", angle = 90, hjust = 1, vjust=0.1),
               axis.text.y = element_text(face = "bold")
             )
economyPlot

 

The graph above helps us visualize the top weather events that causes the most damage to property and crops.

While floods cause by far the most property damage (worth $144.89 billion), their damage to crops is not the highest ($5.97 billion). Drought is the worst for crops, causing $13.97 billion in damages, but very little property damage (compared to floods) at $1.05 billion.

Floods are the worst weather event overall for economic impact, and cause a total of $150.85 billion in combined damages.