“Weather Damage Between 1950 and 2011”

author: “mbandi” date: “2025-06-18” —————-

Synopsis

The purpose of this report is to analyse the impact of different weather events in the health of the population as well as the economic impact in the United States between the years 1950 and 2011. The hypothesis we want to investigate is if there are a type of weather events that have significantly worst impact in the population’s health and a higher economic impact, so if preemptive measures are to be taken, this efforts can have the biggest impact. The data was obtained from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and comprises details of different weather events. From the data we found that the biggest impact seems to be related with extreme wind and water phenomena, like tornadoes, hurricanes and floods, but also heat has a particularly negative effect in the population’s health.

Data Processing

Although the original data was gathered from the NOAA, the raw data utilized for this report was hosted in the Reproducible Research Coursera’s course. Additional documentation can be found in National Weather Service and National Climatic Data Center Storm Events

Downloading and reading the data

Download the data:

compressedFileName <- "repdata_data_StormData.csv.bz2"
if(!file.exists(compressedFileName)) {
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", compressedFileName)
}

Read the data: Read the data:

rawData <- read.csv(compressedFileName, na.strings = "")
str(rawData)

## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  NA NA NA NA ...
##  $ BGN_LOCATI: chr  NA NA NA NA ...
##  $ END_DATE  : chr  NA NA NA NA ...
##  $ END_TIME  : chr  NA NA NA NA ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  NA NA NA NA ...
##  $ END_LOCATI: chr  NA NA NA NA ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  NA NA NA NA ...
##  $ WFO       : chr  NA NA NA NA ...
##  $ STATEOFFIC: chr  NA NA NA NA ...
##  $ ZONENAMES : chr  NA NA NA NA ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  NA NA NA NA ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

We’re interested in health and economic damage, so we’ll calculate the average impact each year for the different types of events across the U.S. Thus, the most relevant variables are:

BGN_DATE to group the data by year EVTYPE event type, this is a free text field that we’ll clean later
FATALITIES and INJURIES, we’ll count them together as casualties PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP, damage to property and crop. Here DMG column is the numeric value with 3 representative digits and EXP is a char (K, M, B) representing the magnitude.

Keep the useful columns and free some memory

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

data <- select(.data = rawData, BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
rawData <- NULL

Cleaning the data

Let’s start by converting BGN_DATE from char to date

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

data <- mutate(data, BGN_DATE = mdy_hms(BGN_DATE))

Now let’s take a look to the event types

length(unique(data$EVTYPE))

## [1] 985

head(unique(data$EVTYPE), 50)

##  [1] "TORNADO"                        "TSTM WIND"                     
##  [3] "HAIL"                           "FREEZING RAIN"                 
##  [5] "SNOW"                           "ICE STORM/FLASH FLOOD"         
##  [7] "SNOW/ICE"                       "WINTER STORM"                  
##  [9] "HURRICANE OPAL/HIGH WINDS"      "THUNDERSTORM WINDS"            
## [11] "RECORD COLD"                    "HURRICANE ERIN"                
## [13] "HURRICANE OPAL"                 "HEAVY RAIN"                    
## [15] "LIGHTNING"                      "THUNDERSTORM WIND"             
## [17] "DENSE FOG"                      "RIP CURRENT"                   
## [19] "THUNDERSTORM WINS"              "FLASH FLOOD"                   
## [21] "FLASH FLOODING"                 "HIGH WINDS"                    
## [23] "FUNNEL CLOUD"                   "TORNADO F0"                    
## [25] "THUNDERSTORM WINDS LIGHTNING"   "THUNDERSTORM WINDS/HAIL"       
## [27] "HEAT"                           "WIND"                          
## [29] "LIGHTING"                       "HEAVY RAINS"                   
## [31] "LIGHTNING AND HEAVY RAIN"       "FUNNEL"                        
## [33] "WALL CLOUD"                     "FLOODING"                      
## [35] "THUNDERSTORM WINDS HAIL"        "FLOOD"                         
## [37] "COLD"                           "HEAVY RAIN/LIGHTNING"          
## [39] "FLASH FLOODING/THUNDERSTORM WI" "WALL CLOUD/FUNNEL CLOUD"       
## [41] "THUNDERSTORM"                   "WATERSPOUT"                    
## [43] "EXTREME COLD"                   "HAIL 1.75)"                    
## [45] "LIGHTNING/HEAVY RAIN"           "HIGH WIND"                     
## [47] "BLIZZARD"                       "BLIZZARD WEATHER"              
## [49] "WIND CHILL"                     "BREAKUP FLOODING"

We can see there are some entries that represent multiple events and some entries do not follow the National Weather Service instructions for the event names. We’ll convert the event’s to the names instructed in the NWS documentation, table 1. To accomplish this we’ve compared the keywords in the NWS event table with the EVTYPE unique entries to match them as best as we could. E.g. Since not all entries for “Fog” contain the formula “Dense Fog”, we’ll group all fog instances in “Dense fog” by matching the keywords “FOG” and “VOG”. This will include also “Freezing Fog” in the “Dense Fog” category. If more in-depth analysis are made, this should be kept in mind. The file used for the categories matching can be found here

names <- make.names(
    c(
        "Astronomical Low Tide",
        "Avalanche",
        "Blizzard",
        "Coastal Flood",
        "Cold/Wind Chill",
        "Debris Flow",
        "Dense Fog",
        "Dense Smoke",
        "Drought",
        "Dust Devil",
        "Dust Storm",
        "Excessive Heat",
        "Extreme Cold/Wind Chill",
        "Flash Flood",
        "Flood",
        "Frost/Freeze",
        "Funnel Cloud",
        "Freezing Fog",
        "Hail",
        "Heat",
        "Heavy Rain",
        "Heavy Snow",
        "High Surf",
        "High Wind",
        "Hurricane Typhoon",
        "Ice Storm",
        "Lake-Effect Snow",
        "Lakeshore Flood",
        "Lightning",
        "Marine Hail",
        "Marine High Wind",
        "Marine Strong Wind",
        "Marine Thunderstorm Wind",
        "Rip Current",
        "Seiche",
        "Sleet",
        "Storm Surge/Tide",
        "Strong Wind",
        "Thunderstorm Wind",
        "Tornado",
        "Tropical Depression",
        "Tropical Storm",
        "Tsunami",
        "Volcanic Ash",
        "Waterspout",
        "Wildfire",
        "Winter Storm",
        "Winter Weather"
    ),
    unique = TRUE
)
keywords <- list(
    c("LOW TIDE", "BLOW-OUT TIDE"),
    c("AVALANCHE"),
    c("BLIZZARD"),
    c("COASTAL FLOOD"),
    c("WIND CHILL", "COLD"),
    c("LANDSLIDE"),
    c("FOG", "VOG"),
    c("SMOKE"),
    c("DROUGHT"),
    c("DUST DEVIL", "DUST DEVEL"),
    c("DUST"),
    c("EXCESSIVE HEAT"),
    c("EXTREME COLD"),
    c("FLASH FLOOD"),
    c("FLOOD"),
    c("FROST", "FREEZE"),
    c("CLOUD") ,
    c("FREEZING FOG", "ICE FOG"),
    c("HAIL"),
    c("HEAT"),
    c("RAIN", "HEAVY SHOWER"),
    c("SNOW"),
    c("SURF"),
    c("HIGH WIND"),
    c("HURRICANE", "TYPHOON"),
    c("ICE"),
    c("LAKE-EFFECT SNOW", "LAKE EFFECT SNOW", "LAKE SNOW"),
    c("LAKESHORE FLOOD"),
    c("LIGHTNING"),
    c("MARINE HAIL"),
    c("MARINE HIGH WIND"),
    c("MARINE STRONG WIND"),
    c("MARINE THUNDERSTORM WIND", "MARINE TSTM WIND"),
    c("RIP CURRENT"),
    c("SEICHE"),
    c("SLEET"),
    c("STORM SURGE/TIDE", "HIGH TIDE"),
    c("STRONG WIND"),
    c("THUNDERSTORM WIND", "TSTM WIND", "THUNDERSNOW"),
    c("TORNADO"),
    c("TROPICAL DEPRESSION"),
    c("TROPICAL STORM"),
    c("TSUNAMI"),
    c("VOLCANIC ASH", "VOLCANIC ERUPTION"),
    c("WATERSPOUT"),
    c("FIRE"),
    c("WINTER STORM"),
    c("WINTER WEATHER")
)
refCategories <- tibble (category = names, keywords = keywords)
data[names] <- FALSE 

## This function fills each category column with true when the EVTYPE matches
matchEventTypes <- function(data, refCategories) {
    for(i in rownames(refCategories)) {
        eventTest <- refCategories[i,]$category
        data[eventTest] <- apply(sapply(unlist(refCategories[refCategories$category == eventTest,]$keywords), grepl, data$EVTYPE), 1, any)
    }
    data
}

data  <- matchEventTypes(data, refCategories)

Processing

We have entries with multiple events (e.g. “THUNDERSTORM WINDS/ FLOOD”). Since we cannot separate the damage corresponding to each event, we’ll count the whole damage for the thunderstorm, and the whole damage for the flood. We’ll calculate the casualties and economic damage for each “merged” event.

library(tidyr)
data <- data %>% mutate(year = year(BGN_DATE), .before = EVTYPE) ## I want the mean by year and type of event
## Here we fix the values with the magnitude column
data <- mutate(data, propValue = ifelse(is.na(PROPDMGEXP), 0, ifelse(PROPDMGEXP == "K", PROPDMG*1e3, ifelse(PROPDMGEXP == "M", PROPDMG*1e6, ifelse(PROPDMGEXP == "B", PROPDMG*1e9, 0)))), .before = Astronomical.Low.Tide)
data <- data <- mutate(data, cropValue = ifelse(is.na(CROPDMGEXP), 0, ifelse(CROPDMGEXP == "K", CROPDMG*1e3, ifelse(CROPDMGEXP == "M", CROPDMG*1e6, ifelse(CROPDMGEXP == "B", CROPDMG*1e9, 0)))), .before = Astronomical.Low.Tide)

data <- mutate(data, valueEvent = propValue + cropValue, .before = Astronomical.Low.Tide) ## property and crop value
data <- mutate(data, casualtiesEvent = FATALITIES + INJURIES, .before = Astronomical.Low.Tide) ## casualties

## Create a new column with the correct event types, this generates more rows because there were multiple event types in the same EVTYPE entry
data <- data %>% pivot_longer(Astronomical.Low.Tide:Winter.Weather, names_to = "eventType", values_to = "isEvent") %>% filter(isEvent == TRUE)
data <- data %>% mutate(casualtiesYear = sum(casualtiesEvent), valueYear = sum(valueEvent), .by = c(eventType, year)) ## year total by type of event.
dataMean <- data %>% select(year, eventType, casualtiesYear, valueYear) %>% 
    unique %>% 
    mutate(casualtiesMean = mean(casualtiesYear), valueMean = mean(valueYear), .by = eventType) ## year mean by type of event
ranking <- dataMean %>% group_by(eventType) %>% 
    filter(row_number()==1) %>% 
    arrange(eventType) %>% 
    select(eventType, casualtiesMean, valueMean) %>% 
    mutate(casualtiesMean = casualtiesMean/100, valueMean = valueMean/1e9) ## select one row for each group and sort by event type. Also, fix scale to hundreds casualties and billions damage

We can plot the 10 events which have the most impact in population’s health and economic damage every year, on average.

library(ggplot2)
library(patchwork)

valueBar <- ggplot(data = head(arrange(ranking, desc(valueMean)), 10)) + geom_bar(aes(x = eventType, y = valueMean, fill = eventType), stat = "identity") + labs(x = "", y = "Property Damage (billions)") + theme(axis.text.x=element_blank())

casualtiesBar <- ggplot(data = head(arrange(ranking, desc(casualtiesMean)), 10)) + geom_bar(aes(x = eventType, y = casualtiesMean, fill = eventType), stat = "identity") + labs(x = "", y = "Casualties (hundreds)") + theme(axis.text.x=element_blank())

casualtiesBar + valueBar

From this data it seems that the events with higher impact every year are tornadoes, hurricanes, floods (includes flash flood), heat (includes excessive heat), lightning and thunderstorm wind. Let’s see how the damages of each of this events looks like over the years.

events <- c("Tornado", "Hurricane.Typhoon", "Flood", "Heat", "Lightning", "Thunderstorm.Wind")

histData <- data %>% select(year, eventType, casualtiesYear, valueYear) %>% 
    unique %>%
    filter(eventType %in% events) %>% 
    mutate(casualtiesYear = casualtiesYear/100, valueYear = valueYear/1e9) %>%
    pivot_longer(casualtiesYear:valueYear, names_to = "damageType", values_to = "damageAmount")

ggplot(data=histData, aes(x=year, y=damageAmount)) + 
    geom_line(aes(colour = eventType), linewidth = 1) +
    facet_grid(damageType ~ ., labeller = as_labeller(c(casualtiesYear = "Casualties (hundreds)", valueYear = "Value (billions)"))) + 
    coord_cartesian(ylim = c(0, 75)) +
    labs(title = "Year Damage by Event Type", x = "Year", y = "Damage", colour = "Event Type") +
    scale_x_continuous(limits = c(1950, 2011), breaks = seq(1950, 2011, 10)) +
    theme(legend.position = "bottom") +
    scale_color_hue(labels=c("Flood", "Heat", "Hurricane", "Lightning", "Thunderstorm/Wind", "Tornado"))

For the casualties, we can see that heat and flood have some spikes, signaling some years with particularly intense events, while thunderstorms follow a smoother line. But clearly, the most dangerous weather event for the population are tornadoes.

For the economic damage, we find two extreme events, the hurricane of 2005 and the floods of 2006.They’re also the events that tend to cause most of the economic damage year after year, except maybe for the tornadoes breakout of 2011.

Results

We find that the events with the most impact on the population are tornadoes, hurricanes, floods and heat. Strong winds and water related measures may work to mitigate the negative effects of multiple of this events, but further analysis to study the real causes of the damage this events create should be done. This study has also the limitation with the weather event categories, maybe a more in-depth study could find a better way to categorize the data and draw more precise conclusions. Specific analysis of each event type, including seasonality, areas typically affected, and changes in the demographics of the area as well as the amount of infrastructure developed over time could be useful both to measure effectivity of current measures and to help to decide future measures.