Public Health and Economic impact of storm events included in the NOAA Weather Events Database

By Antonio Ferraro Date: 16 January 2016

Synopsis

Weather events like flash floods tornadoes and thunderstorms have often caused a high number of casualties and great economic damages.

This exploratory analysis is attempting to assess a few facts about these catastrophic events according to the Reproducible Research course from Jan 2016 Project 2 Rubric. The data used here is from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database which contains events dating from 1950 to November 2011.

The analysis reads the data and reduces it by removing elements not useful for the specific tasks. Aggregations are then created and then used to show the results, in the form of bar plots and tables.

Data Processing

We load our data.

storms = read.csv(bzfile("repdata_data_StormData.csv.bz2"), header = TRUE)

It is always good to have a look at the raw data. Let us do it to see what type of database we are looking at (This is not displayed here, it is just part of my analysis).

str(storms)
summary(storms)
head(storms)

The data may contain also events which (luckily) did not cause damages, fatalities or injuries. To have an idea of how many, let us count them.

nrow(storms[storms$INJURIES==0.0 & storms$PROPDMG==0.0 & storms$FATALITIES==0.0 & storms$CROPDMG==0.0 , ])
## [1] 647664

We are not interested in these entries, as they do not affect the public health and do not have economic consequences, so we can remove them from the dataset (and so make it lighter). I will also remove columnns that I think are not relevant for this analysis.

For what concerns the fatalities and the injuries I just aggregate the relevant events, for the damages I consider the damages to the properties and those to the crops and I sum them up in a single aggregated table. Then the plot is generated in the same way as usual.

storms <- storms[,c("REFNUM", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
storms <- storms[!(storms$INJURIES==0 & storms$PROPDMG==0.0 & storms$FATALITIES==0 & storms$CROPDMG==0.0), ]

# AGGREGATIONS: FATALITIES 
stormfatalities <- aggregate(storms$FATALITIES, by=list(storms$EVTYPE), FUN=sum, na.rm=TRUE)
colnames(stormfatalities) = c("typeofstorm", "fatalities")
stormfatalities <- stormfatalities[order(-stormfatalities$fatalities),] 

# This data frame may contain rows that relate to events that have no casualties 
nrow(stormfatalities[stormfatalities$fatalities==0,])
## [1] 320
# I remove them as well 
stormfatalities <- stormfatalities[stormfatalities$fatalities !=0,]
# I check how many rows do I have to see if I could make a full plot 
nrow(stormfatalities)
## [1] 168
# AGGREGATIONS: INJURIES 
storminjuries <- aggregate(storms$INJURIES, by=list(storms$EVTYPE), FUN=sum, na.rm=TRUE)
colnames(storminjuries) = c("typeofstorm", "injuries")
storminjuries <- storminjuries[order(-storminjuries$injuries),] 
# This data frame may contain rows that relate to events that have no injuries 
nrow(storminjuries[storminjuries$injuries==0,])
## [1] 330
# I remove them as well 
storminjuries <- storminjuries[storminjuries$injuries !=0,]
# I check how many rows do I have to see if I could make a full plot 
nrow(storminjuries)
## [1] 158
# Numerical multipliers for property damages (making assumptions based on table(storms$PROPDMGEXP))
# Add exp numerical columns 
storms$PROPDMGEXPNum<-0
storms$CROPDMGEXPNum<-0

storms[storms$PROPDMGEXP=="-" | storms$PROPDMGEXP=="+" | storms$PROPDMGEXP=="?" | storms$PROPDMGEXP=="0" | storms$PROPDMGEXP=="", ]$PROPDMGEXPNum<-1
storms[storms$PROPDMGEXP=="H" | storms$PROPDMGEXP=="h" | storms$PROPDMGEXP=="2",]$PROPDMGEXPNum<-100
storms[storms$PROPDMGEXP=="K" | storms$PROPDMGEXP=="k" | storms$PROPDMGEXP=="3",]$PROPDMGEXPNum<-1000
storms[storms$PROPDMGEXP=="4",]$PROPDMGEXPNum<-10000
storms[storms$PROPDMGEXP=="5",]$PROPDMGEXPNum<-100000
storms[storms$PROPDMGEXP=="M" | storms$PROPDMGEXP=="m" | storms$PROPDMGEXP=="6",]$PROPDMGEXPNum<-1000000
storms[storms$PROPDMGEXP=="7",]$PROPDMGEXPNum<-10000000
storms[storms$PROPDMGEXP=="B" | storms$PROPDMGEXP=="b",]$PROPDMGEXPNum<-1000000000
# No events in database with exponential 8, 9 etc 

# Numerical multipliers for crop damages (making assumptions based on table(storms$CROPDMGEXP))
storms[storms$CROPDMGEXP=="?" | storms$CROPDMGEXP=="0", ]$CROPDMGEXPNum<-1
storms[storms$CROPDMGEXP=="K" | storms$CROPDMGEXP=="k" ,]$CROPDMGEXPNum<-1000
storms[storms$CROPDMGEXP=="M" | storms$CROPDMGEXP=="m",]$CROPDMGEXPNum<-1000000
storms[storms$CROPDMGEXP=="B" | storms$CROPDMGEXP=="b",]$CROPDMGEXPNum<-1000000000

# AGGREGATIONS: MATERIAL DAMAGES Using the exponentials
stormdamagesP <- aggregate(storms$PROPDMG*storms$PROPDMGEXPNum, by=list(storms$EVTYPE), FUN=sum, na.rm=TRUE)

colnames(stormdamagesP) = c("typeofstorm", "damages")
stormdamagesP <- stormdamagesP[order(-stormdamagesP$damages),] 
# This data frame may contain rows that relate to events that have no damages 
nrow(stormdamagesP[stormdamagesP$damages==0,])
## [1] 82
# I remove them as well 
stormdamagesP <- stormdamagesP[stormdamagesP$damages !=0  ,]
# I check how many rows do I have to see if I could make a full plot 

# AGGREGATIONS: CROP DAMAGES
stormdamagesC <- aggregate(storms$CROPDMG * storms$CROPDMGEXPNum, by=list(storms$EVTYPE), FUN=sum, na.rm=TRUE)
colnames(stormdamagesC) = c("typeofstorm", "damages")
# This data frame may contain rows that relate to events that have no damages 
nrow(stormdamagesC[stormdamagesC$damages==0,])
## [1] 352
# I remove them as well 
stormdamagesC <- stormdamagesC[stormdamagesC$damages !=0  ,]
# I check how many rows do I have to see if I could make a full plot 
stormdamages <- rbind(stormdamagesP, stormdamagesC)
# And now aggregate by stormtype
stormdamages<-aggregate(stormdamages$damages, by=list(stormdamages$typeofstorm), FUN=sum, na.rm=TRUE)
# All damages
colnames(stormdamages) = c("typeofstorm", "damages")
stormdamages <- stormdamages[order(-stormdamages$damages),] 
nrow(stormdamages)
## [1] 431

Results

Fatalities

The number of storms that have caused fatalities is too big to propose a readable barplot. So I only consider the 15 types of storms that have costed more human lives during the period covered by the database.

library(ggplot2)
library(colorspace)

mypal<-terrain_hcl(15) 

ggplot(data=head(stormfatalities,15), aes(x=reorder(typeofstorm, -fatalities), 
                                          y=fatalities, fill=typeofstorm)) + 
    geom_bar(stat="identity", colour="darkblue") + 
    scale_fill_manual(values = mypal)+
    scale_y_continuous(breaks=seq(0, 5500, 250))+      
    xlab("Type of storm") + 
    ylab("Total Fatalities" ) + 
    ggtitle("Fatalities By Event Type") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))

The tornadoes are the storms that cause the most fatalities. It is still worth to notice that we have only considered the first 15. By doing this we left out:

sum(stormfatalities$fatalities)-sum(head(stormfatalities,15)$fatalities)
## [1] 2189

On a total mumber of fatalities of:

sum(stormfatalities$fatalities)
## [1] 15145

Corresponding to the following proportion of the total fatalities:

(sum(stormfatalities$fatalities)-sum(head(stormfatalities,15)$fatalities))/
                                           sum(stormfatalities$fatalities)
## [1] 0.1445362

Any human life is important. Excluding 14% of victims from a statistics does not seem right to me, so I feel it is right to publish the full table.

This table leads to questions: It is hard to understand the difference between events like “ICE ON ROAD” and “ICY ROADS”. The database document does not help explain this type of questions.

library(knitr)
colnames(stormfatalities) = c("Storm Type", "Fatalities")
kable(stormfatalities, row.names=FALSE, format = "html")
Storm Type Fatalities
TORNADO 5633
EXCESSIVE HEAT 1903
FLASH FLOOD 978
HEAT 937
LIGHTNING 816
TSTM WIND 504
FLOOD 470
RIP CURRENT 368
HIGH WIND 248
AVALANCHE 224
WINTER STORM 206
RIP CURRENTS 204
HEAT WAVE 172
EXTREME COLD 160
THUNDERSTORM WIND 133
HEAVY SNOW 127
EXTREME COLD/WIND CHILL 125
STRONG WIND 103
BLIZZARD 101
HIGH SURF 101
HEAVY RAIN 98
EXTREME HEAT 96
COLD/WIND CHILL 95
ICE STORM 89
WILDFIRE 75
HURRICANE/TYPHOON 64
THUNDERSTORM WINDS 64
FOG 62
HURRICANE 61
TROPICAL STORM 58
HEAVY SURF/HIGH SURF 42
LANDSLIDE 38
COLD 35
HIGH WINDS 35
TSUNAMI 33
WINTER WEATHER 33
UNSEASONABLY WARM AND DRY 29
URBAN/SML STREAM FLD 28
WINTER WEATHER/MIX 28
TORNADOES, TSTM WIND, HAIL 25
WIND 23
DUST STORM 22
FLASH FLOODING 19
DENSE FOG 18
EXTREME WINDCHILL 17
FLOOD/FLASH FLOOD 17
RECORD/EXCESSIVE HEAT 17
HAIL 15
COLD AND SNOW 14
FLASH FLOOD/FLOOD 14
MARINE STRONG WIND 14
STORM SURGE 13
WILD/FOREST FIRE 12
STORM SURGE/TIDE 11
UNSEASONABLY WARM 11
MARINE THUNDERSTORM WIND 10
WINTER STORMS 10
MARINE TSTM WIND 9
ROUGH SEAS 8
TROPICAL STORM GORDON 8
FREEZING RAIN 7
GLAZE 7
HEAVY SURF 7
LOW TEMPERATURE 7
MARINE MISHAP 7
STRONG WINDS 7
FLOODING 6
HURRICANE ERIN 6
ICE 6
COLD WEATHER 5
FLASH FLOODING/FLOOD 5
HEAT WAVES 5
HIGH SEAS 5
ICY ROADS 5
RIP CURRENTS/HEAVY SURF 5
SNOW 5
TSTM WIND/HAIL 5
GUSTY WINDS 4
HEAT WAVE DROUGHT 4
HIGH WIND/SEAS 4
Hypothermia/Exposure 4
Mudslide 4
RAIN/SNOW 4
ROUGH SURF 4
SNOW AND ICE 4
COASTAL FLOOD 3
COASTAL STORM 3
Cold 3
COLD WAVE 3
DRY MICROBURST 3
HEAVY SEAS 3
Heavy surf and wind 3
High Surf 3
HIGH WATER 3
HIGH WIND AND SEAS 3
HIGH WINDS/SNOW 3
HYPOTHERMIA/EXPOSURE 3
WATERSPOUT 3
WATERSPOUT/TORNADO 3
WILD FIRES 3
Coastal Flooding 2
Cold Temperature 2
DROUGHT/EXCESSIVE HEAT 2
DUST DEVIL 2
EXCESSIVE RAINFALL 2
Extreme Cold 2
FLASH FLOODS 2
FREEZING DRIZZLE 2
HEAVY SNOW AND HIGH WINDS 2
HURRICANE OPAL/HIGH WINDS 2
MIXED PRECIP 2
RECORD HEAT 2
RIVER FLOOD 2
RIVER FLOODING 2
SLEET 2
SNOW SQUALL 2
UNSEASONABLY COLD 2
AVALANCE 1
BLACK ICE 1
blowing snow 1
BLOWING SNOW 1
COASTAL FLOODING 1
COASTALSTORM 1
COLD/WINDS 1
DROWNING 1
Extended Cold 1
FALLING SNOW/ICE 1
FLOOD & HEAVY RAIN 1
FLOOD/RIVER FLOOD 1
FOG AND COLD TEMPERATURES 1
FREEZE 1
FREEZING RAIN/SNOW 1
Freezing Spray 1
FROST 1
GUSTY WIND 1
Heavy Surf 1
HIGH SWELLS 1
HIGH WAVES 1
HURRICANE FELIX 1
HURRICANE OPAL 1
HYPERTHERMIA/EXPOSURE 1
HYPOTHERMIA 1
ICE ON ROAD 1
LANDSLIDES 1
LIGHTNING. 1
LIGHT SNOW 1
Marine Accident 1
MARINE HIGH WIND 1
MINOR FLOODING 1
Mudslides 1
RAIN/WIND 1
RAPIDLY RISING WATER 1
RECORD COLD 1
SNOW/ BITTER COLD 1
Snow Squalls 1
Strong Winds 1
THUNDERSNOW 1
THUNDERSTORM 1
THUNDERSTORM WIND (G40) 1
THUNDERSTORM WIND G52 1
THUNDERTORM WINDS 1
TSTM WIND (G35) 1
URBAN AND SMALL STREAM FLOODIN 1
Whirlwind 1
WINDS 1
WIND STORM 1
WINTER STORM HIGH WINDS 1
WINTRY MIX 1

Injuries

We take exactly the same approach for the injuries.

Even here there are too many rows to propose a barplot with all type of storms. So the 15 events that cause most injuries are identified and brief evaluation of what is not shown is then proposed.

ggplot(data=head(storminjuries,15), aes(x=reorder(typeofstorm, -injuries), 
                                        y=injuries, fill=typeofstorm)) + 
    geom_bar(stat="identity", colour="darkblue") + 
    scale_fill_manual(values = mypal)+
    scale_y_continuous(breaks=seq(0, 100000, 4000))+      
    xlab("Type of storm") + 
    ylab("Total Injuries" ) + 
    ggtitle("Injuries By Event Type") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))

Even here, as expected, the tornadoes are the most harmful type of storm event. Only the first 15 have been considered. By doing this the following mumber of injuries have been left out:

sum(storminjuries$injuries)-sum(head(storminjuries,15)$injuries)
## [1] 9315

On a total number of injuries of:

sum(storminjuries$injuries)
## [1] 140528

Corresponding to the following proportion of the total injuries:

(sum(storminjuries$injuries)-sum(head(storminjuries,15)$injuries))/
                                       sum(storminjuries$injuries)
## [1] 0.06628572

Damages

Again the same type of reasoning for the property damages is used here.

ggplot(data=head(stormdamages,15), aes(x=reorder(typeofstorm, -damages), 
                                       y=damages, fill=typeofstorm)) + 
    geom_bar(stat="identity", colour="darkblue") + 
    scale_fill_manual(values = mypal)+
    scale_y_continuous(breaks=seq(0, 400000000000, 20000000000))+      
    xlab("Type of storm") + 
    ylab("Total Damages in USD" ) + 
    ggtitle("Material damages by event type") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))

By plotting only the first 15 we left out the following sum:

sum(stormdamages$damages)-sum(head(stormdamages,15)$damages)
## [1] 37554416962

On a total sum of USD:

sum(stormdamages$damages)
## [1] 477329060938

Corresponding to the following proportion of the total damages:

(sum(stormdamages$damages)-sum(head(stormdamages,15)$damages))/
                                     sum(stormdamages$damages)
## [1] 0.07867616

The floods clearly cause more material damages than any other type of storm.