Synopsis

In this report we aim to describe the most damaging weather event types on NOAA record by looking on the one hand at damages to property and agriculture and on the other hand on damages to public health induced to all severe weather event types across the United states from the years 1950 to 2011. The reader will learn which events are to prepared for. And the results are surprisingly clear by showing a single most impactful event type each, far ahead of any ranked event types.

Loading and Processing the Raw Data

From the NOAA Storm Database we obtained the Storm database for the years 1950 to 2011.

Reading in the NOAA data

We read in the database from a comma separated file (delimited) which actually uses the comma char for field separation. The file has a header for the various field types, which are not officically coded and documented and are therefore subject to interpretation.

options(scipen=999)
if(!exists('pm0')) {
    pm0 <- read.table("repdata_data_StormData.csv.bz2", comment.char = "#", 
                  header = TRUE, sep = ",", na.strings = "")
    alarm()
}
## 

Question 1:

Across the United States, which types of events are most harmful with respect to population health?

Exploring data

Checking cleanliness of EVType factor variable:

head(unique(pm0$EVTYPE), 50)
##  [1] TORNADO                        TSTM WIND                     
##  [3] HAIL                           FREEZING RAIN                 
##  [5] SNOW                           ICE STORM/FLASH FLOOD         
##  [7] SNOW/ICE                       WINTER STORM                  
##  [9] HURRICANE OPAL/HIGH WINDS      THUNDERSTORM WINDS            
## [11] RECORD COLD                    HURRICANE ERIN                
## [13] HURRICANE OPAL                 HEAVY RAIN                    
## [15] LIGHTNING                      THUNDERSTORM WIND             
## [17] DENSE FOG                      RIP CURRENT                   
## [19] THUNDERSTORM WINS              FLASH FLOOD                   
## [21] FLASH FLOODING                 HIGH WINDS                    
## [23] FUNNEL CLOUD                   TORNADO F0                    
## [25] THUNDERSTORM WINDS LIGHTNING   THUNDERSTORM WINDS/HAIL       
## [27] HEAT                           WIND                          
## [29] LIGHTING                       HEAVY RAINS                   
## [31] LIGHTNING AND HEAVY RAIN       FUNNEL                        
## [33] WALL CLOUD                     FLOODING                      
## [35] THUNDERSTORM WINDS HAIL        FLOOD                         
## [37] COLD                           HEAVY RAIN/LIGHTNING          
## [39] FLASH FLOODING/THUNDERSTORM WI WALL CLOUD/FUNNEL CLOUD       
## [41] THUNDERSTORM                   WATERSPOUT                    
## [43] EXTREME COLD                   HAIL 1.75)                    
## [45] LIGHTNING/HEAVY RAIN           HIGH WIND                     
## [47] BLIZZARD                       BLIZZARD WEATHER              
## [49] WIND CHILL                     BREAKUP FLOODING              
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND

It is apparent that Event Type Strings are not assigned in a systematic manner but are rather all over the place in similar variants that do not allow systematic aggregation under the same category as is.

Good Example is the event ‘Thunderstorm’ which has often wind speeds inside the Event type, although wind speeds are clearly a different piece of information.

Data Processing

# cleaning and streamling efforts
# goal is to have most similar event types under one factor variable

pm0$EVTYPE <- toupper(pm0$EVTYPE)
pm0$EVTYPE[grep("THUNDERS",pm0$EVTYPE)] = "THUNDERSTORM"
pm0$EVTYPE[grep("TSTM",pm0$EVTYPE)] = "THUNDERSTORM"
pm0$EVTYPE[grep("LIGHTNING",pm0$EVTYPE)] = "THUNDERSTORM"
pm0$EVTYPE[grep("TORNAD",pm0$EVTYPE)] = "TORNADO"
pm0$EVTYPE[grep("HAIL",pm0$EVTYPE)] = "HAIL"
pm0$EVTYPE[grep("FLOOD",pm0$EVTYPE)] = "FLOOD"
pm0$EVTYPE[grep("HEAT",pm0$EVTYPE)] = "HEAT"
pm0$EVTYPE[grep("WIND",pm0$EVTYPE)] = "WIND"
pm0$EVTYPE[grep("SNOW",pm0$EVTYPE)] = "SNOW"
pm0$EVTYPE[grep("HURRICANE",pm0$EVTYPE)] = "HURRICANE"

In order to determine which event types have the strongest impact on public health we need to aggregate the large number of events (>900K events).

For operationalisation we use the sum FATALITIES and INJURIES data.

# public health subset
pm1 <- subset(pm0, pm0$FATALITIES + pm0$INJURIES > 0)
pm1$HARMED <- pm1$FATALITIES + pm1$INJURIES
#pm1 <- pm1[order(pm1$HARMED, decreasing = T),]

library(plyr)
## Warning: package 'plyr' was built under R version 3.6.1
# aggregate by event type
pm1.sum <- ddply(pm1, c("EVTYPE"), summarize, HARMED = sum(HARMED))
# order by impact
pm1.sum <- pm1.sum[order(pm1.sum$HARMED, decreasing = T),]

# plotting Top 10 event types
library(ggplot2)
ggplot(pm1.sum[1:10,], aes(x = reorder(EVTYPE, HARMED), y = HARMED)) + 
  geom_bar(stat = "identity", colour="Steelblue") +
  ggtitle("Types of weather events on record most harmful to the population health in the USA between 1950 and 2011") +
  labs(x="Event Type", y="Affected People (Fatalaties + Injuries)") +
  coord_flip() +
    labs( caption = "Weather events sorted descending by health damages | Data source: NOAA")

Question Two

Across the United States, which types of events have the greatest economic consequences?

In order to answer this question we assume that both agricultural and property damages will be taken into account. Also earlier records are believed not to be complete and concise as to each data points and are certainly just estimations anyway. All together this represents certainly an unknown margin of error.

#preparing and subsetting damages data for summarizing - by merging number and unit field into a single number field
prop <- pm0[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
prop$PROPDMGEXP <- toupper(pm0$PROPDMGEXP)
# operationalize power calc
prop$PROPDMGEXP[prop$PROPDMGEXP %in% c("", "+" ,"0" ,"?" ,"-")] <- 0
prop$PROPDMGEXP[prop$PROPDMGEXP == "K"] <- 3
prop$PROPDMGEXP[prop$PROPDMGEXP == "M"] <- 6
prop$PROPDMGEXP[prop$PROPDMGEXP == "H"] <- 2
prop$PROPDMGEXP[prop$PROPDMGEXP == "B"] <- 9
# prepare numeric power calc
prop$PROPDMGEXP <- as.numeric(prop$PROPDMGEXP)
# finally calculate the single damage number that should have been there in the first place
prop$PROPTOT <- prop$PROPDMG * 10^(prop$PROPDMGEXP)

#preparing damages data for summarizing by merging number and unit field into a single number field
prop$CROPDMGEXP <- toupper(prop$CROPDMGEXP)
# operationalize power calc
prop$CROPDMGEXP[prop$CROPDMGEXP == "?"] <-0
prop$CROPDMGEXP[prop$CROPDMGEXP == ""] <- 0
prop$CROPDMGEXP[prop$CROPDMGEXP == "B"] <- 9
prop$CROPDMGEXP[prop$CROPDMGEXP == "M"] <- 6
prop$CROPDMGEXP[prop$CROPDMGEXP == "K"] <- 3
# prepare numeric power calc 
prop$CROPDMGEXP <- as.numeric(prop$CROPDMGEXP)
# finally calculate the single damage number that should have been there in the first place
prop$CROPTOT <- prop$CROPDMG * 10^(prop$CROPDMGEXP)
# impute na values with zero
prop$CROPTOT[is.na(prop$CROPTOT)] <- 0
prop$PROPTOT[is.na(prop$PROPTOT)] <- 0
# calc combined total damage
prop$TOTAL <- prop$PROPTOT + prop$CROPTOT

library(plyr)
# aggregate all events by event type and summarize TOTAL to GRAND TOTAL
prop.sum <- ddply(prop, c("EVTYPE"), summarize, TOTAL = sum(TOTAL))
prop.sum <- prop.sum[order(prop.sum$TOTAL, decreasing = T),]

# custom tick marks for Billion USD formatting
ylab <- c(25, 50, 100, 150)

library(ggplot2)
ggplot(prop.sum[1:10,], aes(x = reorder(EVTYPE, TOTAL), y = TOTAL)) + 
  geom_bar(stat = "identity", colour="Steelblue") +
  ggtitle("Types of weather events on record with most economical damages\n in the USA between 1950 - 2011") +
  labs(x="Event Type", y="Damages in USD (Property & Crops)") +
  coord_flip() +
  scale_y_continuous(labels = paste0(ylab, "B"), breaks = 10^9 * ylab ) +
    labs( caption = "Weather events sorted descending by property damages | Data source: NOAA")

Results

By far the most danger to people lifes and health is the weather event type TORNADO followed by THUNDERSTORMS.

The Number One type of weather event in terms of property damage is by far the flooding. FLOODs are responsible for about 180B USD in this time period between 1950 and 2011 alone.