Analysis of the NOAA Storm Data to identify high impact weather event types

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.This project involves analysis of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The objective of the research was to identify the weather event types across the United States that: 1. are most harmful with respect to population health 2. have the greatest economic consequences

The outcome is relevant for policy makers and other stakeholders that are involved in prevention of the impact of weather conditions

Synopsis

As a result of a basic analysis of the NOAA data we found that the weather event types for tornados, (marine) thunderstorm winds, (flash) flood and winter storms are most harmful in both human casualties and economic damage. This conclusion is drawn on basis of the NOAA data over the years 1951-2011.

Data Processing

Data loading

From the NOAA database (https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2) the data on weather event types over the years 1951-2011 was obtained

library(ggplot2)
setwd("~/Datasciencecoursera/Module 5 Reproducible Research/Week4 Programming Assignment")

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "NOAA_datafile")

data <- read.csv("NOAA_datafile")

To get the first assessment of the data that has been loaded into the data object we use the str() function:

 str(data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Combined with the data description from NOAA in https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf , this means that the variables FATALITIES and INJURIES are relevant for the metrics in health impact of event types. The variables PROPDMG and CROPDMG give a measure in US dollars of estimated damage for an event. These are relevant to analyse the economic damage that events caused.

Data preparation and transformation

First, the human health impact and the economic damage need to be aggregated by event type (variable EVTYPE). The following code will sum the human impact and damage into the 985 event types that are in the NOAA data frame

totals_per_evtype <- aggregate(cbind(FATALITIES+INJURIES, PROPDMG + CROPDMG) ~ EVTYPE, data, sum)
colnames(totals_per_evtype) <- c("Event_type", "Casualties", "Damage")

sorted_on_casualties <- totals_per_evtype[order(totals_per_evtype$Casualties, decreasing = TRUE),]
sorted_on_damage <- totals_per_evtype[order(totals_per_evtype$Damage, decreasing = TRUE),]

Now we take the Top 10 of events for casualties (the sum of fatalities and injuries). Furthermore, we calculate how many casualties are in the top 10 versus the other 975 event types, to get a flavour of how significant the top 10 is versus the other mass of event types.

top_10_casualties <- sorted_on_casualties[1:10,]

Percentage_casualties_in_top_10 <- sum(top_10_casualties$Casualties)/sum(totals_per_evtype$Casualties)

Percentage_casualties_in_top_10
## [1] 0.8811868

This shows that 88% casualties is in the Top-10 event types

Then take the events on basis of economic damage. This is expressed in the columns PROPDMG and CROPDMG where damages to properties van crops are given in dollars.

top_10_damage <- sorted_on_damage[1:10,]

Percentage_damage_in_top_10 <- sum(top_10_damage$Damage)/sum(totals_per_evtype$Damage)

Percentage_damage_in_top_10
## [1] 0.9121675

This shows that for the top-10 in damages, 91% of all damage is covered, so the remaining 975 event types only have less than 8% of all damage. This makes the top 10 highly relevant

Results for human health impact

Now we graphically present the data for the human health impact.

top_10_casualties$Event_type <- factor(top_10_casualties$Event_type, levels = top_10_casualties$Event_type)

ggplot(top_10_casualties, aes(Event_type,Casualties) ) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(x = "Event type") + labs(y = "Number of Casualties") + labs(title = "Top 10 Events on basis of Human Casualties")

Results for economical consequences

Now we graphically present the data for economical damage

top_10_damage$Event_type <- factor(top_10_damage$Event_type, levels = top_10_damage$Event_type)

ggplot(top_10_damage, aes(Event_type,Damage) ) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(x = "Event type") + labs(y = "Damage in US Dollars") + labs(title = "Top 10 Events on basis of Economical Damage")

This shows that again tornados account for the highest damage. Interestingly,the differences between tornados and the rest are less strong than when taking the human casualties into account.

Finally we can look at the events that are in both top-10’s

in_both_top_10s <- intersect(top_10_casualties$Event_type, top_10_damage$Event_type)

in_both_top_10s
## [1] "TORNADO"           "TSTM WIND"         "FLOOD"            
## [4] "LIGHTNING"         "FLASH FLOOD"       "THUNDERSTORM WIND"
## [7] "WINTER STORM"

Conclusion

The weather event types for tornados, (marine) thunderstorm winds, (flash) flood and winter storms are most harmful in both human casualties and economic damage. This conclusion is drawn from the NOAA data over the years 1951-2011.

For policy makers we can offer the reasoning that flooding is typically an event that can be preventively approached (water regulation and protection), while the other events are hard to prevent as atmospherical events and therefore should be approached by means of monitoring and timely response to prevent casualties and damages.