Evaluation of economic burden and population harm caused by weather events since 1950’s.

Synopsis

When figuring out how to measure the impact from weather events on human health, injuries and fatalities were evaluated and economic burden was measured by evaluating property and crop damage. It is important to note that the canonical categories classifying weather events data collectors recorded were not followed in all records. Evaluating these categories to group them appropriately so as to determine the actual impact of weather events is critical to ensure an accurate analysis. Tornados by far caused the most impact on human health being one order magnitude higher than all other weather events. The greatest economic burden caused by storms was by far flooding. Flooding alone was responsible for 1.5 Trillion in damages to property and agriculture. Considering many homes are built in known flood zones there is substantiation for why the impact from floods is as great as it is. Tornadoes are sudden onset and have a great magnitude of force accompanied. This blitz of raw natures fury is probably why there is such an impact on human life and well being. Prevention and planning can help offset potential burden from these weather events.

Data Processing

library(ggplot2)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(readr)

## Warning: package 'readr' was built under R version 3.2.5

The data for this project were downloaded from a course repository associated with the Coursera Reproducible Research course. the data for this analysis will be saved as the file “StormData.csv” and can be accessed as follows:

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "StormData.csv.bz2", method = "curl")

Create a data frame “StormData_csv” to evaluate the data structure and evaluate variables.

StormData_csv <- read_csv("~/Desktop/Coursera/ReproducibleResearch/Project2/StormData.csv.bz2")

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   STATE__ = col_double(),
##   COUNTY = col_double(),
##   BGN_RANGE = col_double(),
##   COUNTY_END = col_double(),
##   END_RANGE = col_double(),
##   LENGTH = col_double(),
##   WIDTH = col_double(),
##   F = col_integer(),
##   MAG = col_double(),
##   FATALITIES = col_double(),
##   INJURIES = col_double(),
##   PROPDMG = col_double(),
##   CROPDMG = col_double(),
##   LATITUDE = col_double(),
##   LONGITUDE = col_double(),
##   LATITUDE_E = col_double(),
##   LONGITUDE_ = col_double(),
##   REFNUM = col_double()
## )

## See spec(...) for full column specifications.

The goal is to find which types of events are most harmful with respect to population health. Harmful to population health can be investigated by looking at the injuries and fatalities for specific weather events. The rationalle employed here is to find the sum of injuries and fatalities for each event and call it a casualty the casualties for the events will be stored in a new variable Casualties.

StormData_csv$Casualties <- StormData_csv$INJURIES + StormData_csv$FATALITIES

Results

Across the United States, which types of events (as indicated in the 𝙴𝚅𝚃𝚈𝙿𝙴 variable) are most harmful with respect to population health?

To determine the impact a weather event has on human health we will to determine the sum of casualties for particular weather EVTYPE and sort it from greatest to least. We will then subset all events with one or more casualties.

StormCasualties <- count(StormData_csv, EVTYPE, wt=Casualties, sort = TRUE)
StormHarmful <- subset(StormCasualties, n > 0)

This plot shows the top 10 greatest amount of casualties by events. This helps us get an understanding of what event types have shown to cause the most harm to human health.

bp <- ggplot(head(StormHarmful, 10), aes(x=EVTYPE, y=n)) + geom_bar(stat = "identity")
bp + scale_y_continuous(trans = "log", breaks = c(10,100,1000,10000,100000)) + theme(axis.text.x = element_text(angle = 60, hjust = 1)) + xlab("Event Type") + ylab("log Number Fatalities + Injuries")

By examining the plot it is apparent some of the events seem to be synonymous, “THUNDERSTORM WIND” is the canonical category and “TSTM WIND” is short hand. To ensure we are evaluating the proper impact of events on casualites the names of the top 30 event types were evaluated to determin if some event classifications should be comined. 2 were chosen to be combined.

head(as.data.frame(StormHarmful), 30)

##                EVTYPE     n
## 1             TORNADO 96979
## 2      EXCESSIVE HEAT  8428
## 3           TSTM WIND  7461
## 4               FLOOD  7259
## 5           LIGHTNING  6046
## 6                HEAT  3037
## 7         FLASH FLOOD  2755
## 8           ICE STORM  2064
## 9   THUNDERSTORM WIND  1621
## 10       WINTER STORM  1527
## 11          HIGH WIND  1385
## 12               HAIL  1376
## 13  HURRICANE/TYPHOON  1339
## 14         HEAVY SNOW  1148
## 15           WILDFIRE   986
## 16 THUNDERSTORM WINDS   972
## 17           BLIZZARD   906
## 18                FOG   796
## 19        RIP CURRENT   600
## 20   WILD/FOREST FIRE   557
## 21       RIP CURRENTS   501
## 22          HEAT WAVE   481
## 23         DUST STORM   462
## 24     WINTER WEATHER   431
## 25     TROPICAL STORM   398
## 26          AVALANCHE   394
## 27       EXTREME COLD   391
## 28        STRONG WIND   383
## 29          DENSE FOG   360
## 30         HEAVY RAIN   349

StormData_csv$EVTYPE <- gsub('^FOG$', 'DENSE FOG', StormData_csv$EVTYPE)
StormData_csv$EVTYPE <- gsub('^HIGH WINDS$', 'HIGH WIND', StormData_csv$EVTYPE)
StormData_csv$EVTYPE <- gsub('^TSTM WIND$', 'THUNDERSTORM WIND', StormData_csv$EVTYPE)
StormData_csv$Casualties <- StormData_csv$INJURIES + StormData_csv$FATALITIES
StormCasualties2 <- count(StormData_csv, EVTYPE, wt=Casualties, sort = TRUE)
StormHarmful2 <- subset(StormCasualties2, n > 0)
head(as.data.frame(StormHarmful2), 30)

##                EVTYPE     n
## 1             TORNADO 96979
## 2   THUNDERSTORM WIND  9082
## 3      EXCESSIVE HEAT  8428
## 4               FLOOD  7259
## 5           LIGHTNING  6046
## 6                HEAT  3037
## 7         FLASH FLOOD  2755
## 8           ICE STORM  2064
## 9           HIGH WIND  1722
## 10       WINTER STORM  1527
## 11               HAIL  1376
## 12  HURRICANE/TYPHOON  1339
## 13          DENSE FOG  1156
## 14         HEAVY SNOW  1148
## 15           WILDFIRE   986
## 16 THUNDERSTORM WINDS   972
## 17           BLIZZARD   906
## 18        RIP CURRENT   600
## 19   WILD/FOREST FIRE   557
## 20       RIP CURRENTS   501
## 21          HEAT WAVE   481
## 22         DUST STORM   462
## 23     WINTER WEATHER   431
## 24     TROPICAL STORM   398
## 25          AVALANCHE   394
## 26       EXTREME COLD   391
## 27        STRONG WIND   383
## 28         HEAVY RAIN   349
## 29          HIGH SURF   253
## 30       EXTREME HEAT   251

Plot of the Casualties vs Event after recalculating overlapping event classification

bp <- ggplot(head(StormHarmful2, 10), aes(x=EVTYPE, y=n)) + geom_bar(stat = "identity")
bp + scale_y_continuous(trans = "log", breaks = c(10,100,1000,10000,100000)) + theme(axis.text.x = element_text(angle = 60, hjust = 1)) + xlab("Event Type") + ylab("log Number Fatalities + Injuries")

Across the events Tornados by far have the greatest impact on human health. They have a magnitude higher number of casualties over the time span of the records reviewed. Floods, Heat, Winterstorms and Thunderstorm have also had a substantial amount of impact on Human Health across the United States.

Acrosss the United States, which types of events have the greatest economic consequences?

How to determine greatest economic consequence? What variables show economic impact from a weather event? PROPDMG and CROPDMG the damage in dollar amounts from weather events.

Before determining the sum of the damage in dollar amounts we have to apply the magnintude in thousands, millions or billions to the value recorded in PROPDMG or CROPDMG.

To determine the multiplier for the property and crop damage we have to first look at what exponent the property and crop damage was classified as.

table(StormData_csv$CROPDMGEXP)

## 
##      ?      0      2      B      k      K      m      M 
##      7     19      1      9     21 281832      1   1994

table(StormData_csv$PROPDMGEXP)

## 
##      -      ?      +      0      1      2      3      4      5      6 
##      1      8      5    216     25     13      4      4     28      4 
##      7      8      B      h      H      K      m      M 
##      5      1     40      1      6 424665      7  11330

To coerce the classifiers to factors which can be multiplied we have to clean up the classifiers and convert them all to uppercase

StormData_csv$CROPDMGEXP<- toupper(StormData_csv$CROPDMGEXP)
StormData_csv$PROPDMGEXP<- toupper(StormData_csv$PROPDMGEXP)

Create the factors for property and crop damage exp

StormData_csv$CROPDMGEXP<- factor(StormData_csv$CROPDMGEXP)
StormData_csv$PROPDMGEXP<- factor(StormData_csv$PROPDMGEXP)

Change the levels for the property and crop damage exp to numbers corresponding to the documentation (0,1000,1000000,1000000000).

levels(StormData_csv$CROPDMGEXP) <- c("0","0","0","1000000000","1000","1000000")
levels(StormData_csv$PROPDMGEXP) <- c(rep("0",12),"1000000000","0","1000","1000000")

Convert all the NA’s to zero since we cant arbitrarily assign a exponent to the NAs

StormData_csv$PROPDMGEXP[is.na(StormData_csv$PROPDMGEXP)] <- 0
StormData_csv$CROPDMGEXP[is.na(StormData_csv$CROPDMGEXP)] <- 0

Create a new variable for property and crop damage multiplied by its exponent of magnitude.

StormData_csv$propdmgmag <- StormData_csv$PROPDMG * as.numeric(as.character(StormData_csv$PROPDMGEXP))
StormData_csv$cropdmgmag <- StormData_csv$CROPDMG * as.numeric(as.character(StormData_csv$CROPDMGEXP))

View the top sources of property and crop damage

CROPDMGMAG <- count(StormData_csv, EVTYPE, wt=cropdmgmag, sort = TRUE)
PROPDMGMAG <- count(StormData_csv, EVTYPE, wt=propdmgmag, sort = TRUE)

Create a new variable that holds the sum of the property and crop damage for a weather event

StormData_csv$PROPCROPDMG <- StormData_csv$cropdmgmag + StormData_csv$propdmgmag

Find the weather events with the highest combined property and crop damage and plot that amount vs event type

PROPCROPDMG <- count(StormData_csv, EVTYPE, wt=PROPCROPDMG, sort = TRUE)
bp <- ggplot(head(PROPCROPDMG, 10), aes(x=EVTYPE, y=n)) + geom_bar(stat = "identity")
bp + scale_y_continuous(trans = "log", breaks = c(10,100,1000,10000,1e+06,1e+09,1e+12)) + theme(axis.text.x = element_text(angle = 60, hjust = 1)) + xlab("Event Type") + ylab("log Total in $USD \nProperty and Crop Damage")

```