Reproducible Research: Course Project 2

Analysis of Weather Events on Population Health and the Economy in the United States using NOAA Storm Data from 1950 - 2011

Synopsis

This report will examine weather data to determine the types of events most likely to cause an impact to the population health and the economy. Specifcally, the data set used is NOAA storm data in the United States from 1950-2011. The data will be analyzed to determine the types of weather events that cause the greatest injuries and fatalities in the popluation. In addition, the data will be analyzed to determine the top weather events that cause damage to property and crops in the United States.

Setup R

Load necessary libraries:

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(scales)

Data Processing

The raw data file “StormData.csv.bz2” is in the working directory and will be read with read.csv:

# The file "StormData.csv.bz2" should already be in the working directory.
stormdata <- read.csv('StormData.csv.bz2')
print('Finished reading data file.', quote = FALSE)
## [1] Finished reading data file.

Next, this code chunk will reduce the raw data file down to seven columns of variables that will be sufficient to answer the two assignment questions, as well as check for NA values:

storm <- stormdata[, c('EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG',
                  'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')]
names(storm) <- tolower(names(storm))
na_sum <- sum(is.na(storm))

The NA count for these seven columns is 0.

To answer the question about harm to human health, the injuries and fatalities variables have been combined - the sum of these two columns will be stored in a new column named causalties. The should allow us to examine the full impact of the various weather events on the population health.

storm <- mutate(storm, casualties = fatalities + injuries)
stormcas <- aggregate(casualties ~ evtype, storm, sum)
stormcas <- stormcas[order(stormcas$casualties, decreasing = TRUE),]
head(stormcas, 30)
##                 evtype casualties
## 834            TORNADO      96979
## 130     EXCESSIVE HEAT       8428
## 856          TSTM WIND       7461
## 170              FLOOD       7259
## 464          LIGHTNING       6046
## 275               HEAT       3037
## 153        FLASH FLOOD       2755
## 427          ICE STORM       2064
## 760  THUNDERSTORM WIND       1621
## 972       WINTER STORM       1527
## 359          HIGH WIND       1385
## 244               HAIL       1376
## 411  HURRICANE/TYPHOON       1339
## 310         HEAVY SNOW       1148
## 957           WILDFIRE        986
## 786 THUNDERSTORM WINDS        972
## 30            BLIZZARD        906
## 188                FOG        796
## 585        RIP CURRENT        600
## 955   WILD/FOREST FIRE        557
## 586       RIP CURRENTS        501
## 278          HEAT WAVE        481
## 117         DUST STORM        462
## 978     WINTER WEATHER        431
## 848     TROPICAL STORM        398
## 19           AVALANCHE        394
## 140       EXTREME COLD        391
## 676        STRONG WIND        383
## 89           DENSE FOG        360
## 290         HEAVY RAIN        349

Examining the top 30 weather events that cause casualties in the United States, we see a variety of different weather types. Also, examining the data along with NOAA data set codebook, several the columns are near duplicates. Several of these duplicate weather event types in the evtype variable will be combined to obtain a more accurate analysis of the data. For example, “TSTM WIMD”, “THUNDERSTORM WIND”, and “THUNDERSTORM WINDS” appear to be duplicates and will be combined into one type.

storm[storm$evtype == 'THUNDERSTORM WINDS', ]$evtype = 'THUNDERSTORM WIND'
storm[storm$evtype == 'TSTM WIND', ]$evtype = 'THUNDERSTORM WIND'
storm[storm$evtype == 'RIP CURRENTS', ]$evtype = 'RIP CURRENT'
storm[storm$evtype == 'HEAT', ]$evtype = 'EXCESSIVE HEAT'
storm[storm$evtype == 'HEAT WAVE', ]$evtype = 'EXCESSIVE HEAT'
storm[storm$evtype == 'HURRICANE/TYPHOON', ]$evtype = 'HURRICANE'

After combining the duplicate event types, the following script will re-run the analysis and produce a list of the top ten weather events that are most harmful to population health (defined as total casualties, including both injuries and fatalities caused by the weather event). The casualties data column will also be ordered in decreasing order so that we determine the most harmful event types:

stormcas <- aggregate(casualties ~ evtype, storm, sum)
stormcas <- stormcas[order(stormcas$casualties, decreasing = TRUE),]

Next, to answer the second assignment question, we need to process the property and crop damage variables to obtain totals in the proper form to sum as a monetary total. Let’s examine the levels of the propdmgexp and cropdmgexp variables:

summary(storm$propdmgexp)
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330
summary(storm$cropdmgexp)
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994

To properly use the damage and damage exponent data, the following script will change “k|K” to thousands, “m|M” to millions, and “b|B” to billions; any remaining exponents are used very rarely and will be ignored. The script will create two new data columns, multiplying property damage by its exponent, and crop damage by its exponent, using the mutate function.

storm$propdmgexp <- as.character(storm$propdmgexp) 
storm$cropdmgexp <- as.character(storm$cropdmgexp) 

exponent_function <- function(x) {
    if (x == 'k' | x== 'K') {
        1e+03
    } else if (x == 'm' | x == 'M') {
        1e+06
    } else if (x == 'b' | x == 'B') {
        1e+09
    } else {
        0
    }
}

storm$propdmgexp <- as.numeric(sapply(storm$propdmgexp, exponent_function))
storm$cropdmgexp <- as.numeric(sapply(storm$cropdmgexp, exponent_function))
storm <- mutate(storm, prop = propdmg * propdmgexp, crop = cropdmg * cropdmgexp)

The final data processing step is to aggregate the property damage and crop damage variables by event type. Again, the damage variables will be ordered in decreasing order so that we can determine the most harmful weather events:

stormpropdmg <- aggregate(prop ~ evtype, storm, sum)
stormpropdmg <- stormpropdmg[order(stormpropdmg$prop, decreasing = TRUE),]
stormpropsum <- sum(stormpropdmg$prop)

stormcropdmg <- aggregate(crop ~ evtype, storm, sum)
stormcropdmg <- stormcropdmg[order(stormcropdmg$crop, decreasing = TRUE),]
stormcropsum <- sum(stormcropdmg$crop)

Now that we have finished processing the data, we can examine the results to determine if the two assignment questions can be answered.

Results

Question 1: Across the United States, which types of events (as indicated in the 𝙴𝚅𝚃𝚈𝙿𝙴 variable) are most harmful with respect to population health?

Ordering the new data file stormcas so that we can see the “Top 10” weather events that impact population health, reveals the that tornadoes, heat, thunderstorms, flooding, and lightning cause the most signficant casualties.

head(stormcas, 10)
##                evtype casualties
## 829           TORNADO      96979
## 130    EXCESSIVE HEAT      11946
## 756 THUNDERSTORM WIND      10054
## 170             FLOOD       7259
## 461         LIGHTNING       6046
## 153       FLASH FLOOD       2755
## 424         ICE STORM       2064
## 966      WINTER STORM       1527
## 400         HURRICANE       1446
## 357         HIGH WIND       1385

Next, a plot of the top ten weather events that cause casualties in the United States according to the NOAA data set:

g1 <- ggplot(stormcas[1:10,], aes(x=reorder(evtype, casualties), y=casualties, fill = casualties)) +
    geom_bar(stat = 'identity') +
    coord_flip() +
    ylab('Total Casualties (Injuries and Fatalities)') +
    xlab('') +
    ggtitle('Top Ten Weather Events in the United States (1950-2011)') +
    theme(legend.position='none')
g1

Question 2: Across the United States, which types of events have the greatest economic consequences?

Examining first the property damage results, the following is a list of the top ten weather events causing property damage in the United States from 1950-2011:

head(stormpropdmg, 10)
##                evtype         prop
## 170             FLOOD 144657709800
## 400         HURRICANE  81174159010
## 829           TORNADO  56937160480
## 666       STORM SURGE  43323536000
## 153       FLASH FLOOD  16140811510
## 244              HAIL  15732266720
## 756 THUNDERSTORM WIND   9704002430
## 843    TROPICAL STORM   7703890550
## 966      WINTER STORM   6688497250
## 357         HIGH WIND   5270046260

The top events include flooding, hurricanes, tornado, storm surge and flash flooding.

The following is a plot in answer to Question 2, the top ten weather events causing property damage:

g2 <- ggplot(stormpropdmg[1:10,], aes(x=reorder(evtype, prop), y = prop, fill = prop)) +
    geom_bar(stat = 'identity') +
    coord_flip() +
    ylab('Total Property Damage') +
    xlab('') +
    ggtitle('Top Ten Weather Events in the United States (1950-2011)') +
    theme(legend.position='none')
g2

The final part of the analysis is to examine the impact of weather events on crop damage. The top ten weather events causing crop damage in the United States from 1950-2011 are:

head(stormcropdmg, 10)
##                evtype        crop
## 95            DROUGHT 13972566000
## 170             FLOOD  5661968450
## 400         HURRICANE  5349782800
## 586       RIVER FLOOD  5029459000
## 424         ICE STORM  5022113500
## 244              HAIL  3025954450
## 153       FLASH FLOOD  1421317100
## 140      EXTREME COLD  1292973000
## 756 THUNDERSTORM WIND  1159505100
## 212      FROST/FREEZE  1094086000

Although there is some overlap with the list of property damage events, the top crop damage events include drought, flood, hurricane, river flood, and ice storm.

And, a plot of the top ten weather events causing crop damage:

g3 <- ggplot(stormcropdmg[1:10,], aes(x=reorder(evtype, crop), y = crop, fill = crop)) +
    geom_bar(stat = 'identity') +
    coord_flip() +
    ylab('Total Crop Damage') +
    xlab('') +
    ggtitle('Top Ten Weather Events in the United States (1950-2011)') +
    theme(legend.position='none')
g3

The total amount of property damage recorded in this data set is $427,318,642,100 and the total amount for crop damage is $49,104,191,910.