NOAA Data Analysis

Synopsis

Weather events are known to cause devastating damage to both individuals and the economy itself through the massive costs as a result of the destruction. If these events aren’t monitored then the potential for harm can have harmful future effects, especially when it comes to preparing for future events.

This study seeks to gauge the extent to which all weather events in America affect its people and its economy. By looking at each type of impact, it is then possible to provide a recommendation on what events should be prepared for both from a health perspective as well as an economic perspective.

Data Processing

The data can either be downloaded manually or by the method explained below. If you set up a project space in R it can download into the same folder without having to set or change folder locations:

Downloading the data

library(tidyverse)

## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr

## Warning: package 'readr' was built under R version 3.3.3

## Conflicts with tidy packages ----------------------------------------------

## filter(): dplyr, stats
## lag():    dplyr, stats

library(R.utils)

## Loading required package: R.oo

## Loading required package: R.methodsS3

## R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.

## R.oo v1.21.0 (2016-10-30) successfully loaded. See ?R.oo for help.

## 
## Attaching package: 'R.oo'

## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods

## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save

## R.utils v2.4.0 (2016-09-13) successfully loaded. See ?R.utils for help.

## 
## Attaching package: 'R.utils'

## The following object is masked from 'package:tidyr':
## 
##     extract

## The following object is masked from 'package:RevoMods':
## 
##     timestamp

## The following object is masked from 'package:utils':
## 
##     timestamp

## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, parse, warnings

library(ggplot2)

data_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(data_url, "StormData.csv.bz2")

bunzip2("StormData.csv.bz2", "StormData.csv")

NOAA_Data <- read.csv("StormData.csv")

Health impact

To gauge the health impact, there are two variables we can look at: total injuries from all events and total fatalities from all events.

library(tidyverse)
NOAA_HealthImpact <- NOAA_Data %>% select(EVTYPE, FATALITIES, INJURIES) %>% 
        group_by(EVTYPE) %>% 
        mutate(HEALTH_IMPACT = FATALITIES + INJURIES) %>%       # Measures ALL those affected
        summarise(TOTAL.HEALTHIMPACT = sum(HEALTH_IMPACT)) %>% 
        arrange(-TOTAL.HEALTHIMPACT)

head(NOAA_HealthImpact)

## # A tibble: 6 × 2
##           EVTYPE TOTAL.HEALTHIMPACT
##           <fctr>              <dbl>
## 1        TORNADO              96979
## 2 EXCESSIVE HEAT               8428
## 3      TSTM WIND               7461
## 4          FLOOD               7259
## 5      LIGHTNING               6046
## 6           HEAT               3037

Economic Impact

Processing the data for the economic impact was a bit different to that of the health impact due to the exponents that the values had to be multiplied by. For this reason we had to look online to see if anyone has a table that can provide these values. Fortunately, this article has values for the symbols which will be used in this study.

THe first thing we need to do is select the columns that we will be using, namely event type, property damage, property damage exponent, crop damage and crop damage exponent.Thereafter we wish to find out which of the values in the exponent column are unique.

NOAA_DAMAGE <- NOAA_Data %>% select(EVTYPE, PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)
Exponents <- unique(NOAA_Data$PROPDMGEXP)

Now that we know which are the unique values, we need to create a column that displays the unique numeric values of the possible exponents:

Exponent_values <- c(10^2, 10^6, 0, 10^9, 10^6, 1, 0, 10, 10, 0, 10, 10, 10, 10^2, 10, 10^2, 0, 10, 10)

Bind the two together

Exponents_Final <- data.frame(Exponents, Exponent_values)
colnames(Exponents_Final) <- c('PROPDMGEXP', 'PROPDMGNUM')

To join the created table with the original one we can simply use a left join. This takes the primary key from the original table and looks up the foreign key from the new table to create a new column with the exponent values.

NOAA_DAMAGE_FINAL <- left_join(NOAA_DAMAGE, Exponents_Final, by = "PROPDMGEXP")
NOAA_DAMAGE_FINAL <- NOAA_DAMAGE_FINAL %>% mutate(PROPDMGFINAL = PROPDMG*PROPDMGNUM) %>% mutate(CROPDMGFINAL = CROPDMG*PROPDMGNUM)

head(NOAA_DAMAGE_FINAL)

##    EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP PROPDMGNUM PROPDMGFINAL
## 1 TORNADO    25.0          K       0                   100         2500
## 2 TORNADO     2.5          K       0                   100          250
## 3 TORNADO    25.0          K       0                   100         2500
## 4 TORNADO     2.5          K       0                   100          250
## 5 TORNADO     2.5          K       0                   100          250
## 6 TORNADO     2.5          K       0                   100          250
##   CROPDMGFINAL
## 1            0
## 2            0
## 3            0
## 4            0
## 5            0
## 6            0

One can even make one more column with total damage:

NOAA_DAMAGE_FINAL <- NOAA_DAMAGE_FINAL %>% mutate(TOTALDMG = PROPDMGFINAL + CROPDMGFINAL)
head(NOAA_DAMAGE_FINAL)

##    EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP PROPDMGNUM PROPDMGFINAL
## 1 TORNADO    25.0          K       0                   100         2500
## 2 TORNADO     2.5          K       0                   100          250
## 3 TORNADO    25.0          K       0                   100         2500
## 4 TORNADO     2.5          K       0                   100          250
## 5 TORNADO     2.5          K       0                   100          250
## 6 TORNADO     2.5          K       0                   100          250
##   CROPDMGFINAL TOTALDMG
## 1            0     2500
## 2            0      250
## 3            0     2500
## 4            0      250
## 5            0      250
## 6            0      250

Lastly, we need to summarise the damage statistics and sort the final dataset:

NOAA_DAMAGE_TOTAL <- NOAA_DAMAGE_FINAL %>% group_by(EVTYPE) %>% summarize(EVTYPE_TOTALDAMAGE = sum(TOTALDMG))

head(NOAA_DAMAGE_TOTAL) # Needs to be ordered.

## # A tibble: 6 × 2
##                  EVTYPE EVTYPE_TOTALDAMAGE
##                  <fctr>              <dbl>
## 1    HIGH SURF ADVISORY              20000
## 2         COASTAL FLOOD                  0
## 3           FLASH FLOOD               5000
## 4             LIGHTNING                  0
## 5             TSTM WIND            8010000
## 6       TSTM WIND (G45)                800

# Sort DAMAGE by magnitude

NOAA_DAMAGE_TOTAL <- as.data.frame(NOAA_DAMAGE_TOTAL)
NOAA_DAMAGE_TOTAL <- NOAA_DAMAGE_TOTAL %>% arrange(-NOAA_DAMAGE_TOTAL$EVTYPE_TOTALDAMAGE)
head(NOAA_DAMAGE_TOTAL)

##              EVTYPE EVTYPE_TOTALDAMAGE
## 1         HURRICANE       814739475501
## 2 HURRICANE/TYPHOON       802071608133
## 3             FLOOD       231021569207
## 4           TORNADO        82300967348
## 5       FLASH FLOOD        53574336025
## 6       STORM SURGE        43311790600

Results

Looking at the health impact, we can view the first few results graphically to ascertain which events are causing the most harm in America:

g <- ggplot(NOAA_HealthImpact[1:5,], aes(x=reorder(EVTYPE, -TOTAL.HEALTHIMPACT), y=TOTAL.HEALTHIMPACT))
g + geom_bar(stat="identity", width = 0.5, fill="tomato2") + 
        theme(axis.text.x = element_text(angle=65, vjust=0.6)) +
        labs(title = "Total Number of Fatalities & Injuries per Event",
             x = "Event Type",
             y = "Number of Persons affected")

As is evident from the graph, tornadoes are by far the leading cause of fatalities and injuries in America. Quite surprising though is excessive heat which is the second highest event in terms of combined health impacts.

In terms of the economic impact, we again can view the summarised information graphically:

g <- ggplot(NOAA_DAMAGE_TOTAL[1:5,], aes(x=reorder(EVTYPE, -EVTYPE_TOTALDAMAGE), y=EVTYPE_TOTALDAMAGE))
g + geom_bar(stat="identity", width = 0.5, fill="tomato2") + 
        theme(axis.text.x = element_text(angle=65, vjust=0.6)) +
        labs(title = "Economic Damage per Event",
             x = "Event Type",
             y = "Monetary Value")

According to the above graph, hurricanes and hurricane/typhoon have the highest economic damage out of any weather event with floods coming in at a distant third.

Conclusion

Weather damage causes heavy personal and economic costs every year and are important to understand. Health impacts are very large while economic damage from certain events cost very large amounts of money every year to fix. As such it is important to understand what are the most harmful events in order to prepare for their eventualities as well as prioritise which events must be researched further. This study found that tornados by far are the most dangerous events for people while hurricanes cause the most damage economically to property and crops. It must be noted though that this study is limited by the records of event dating back before proper recording instruments were mainstream as well as a proper index of what the damge cost exponents are in the dataset.