When figuring out how to measure the impact from weather events on human health, injuries and fatalities were evaluated and economic burden was measured by evaluating property and crop damage. It is important to note that the canonical categories classifying weather events data collectors recorded were not followed in all records. Evaluating these categories to group them appropriately so as to determine the actual impact of weather events is critical to ensure an accurate analysis. Tornados by far caused the most impact on human health being one order magnitude higher than all other weather events. The greatest economic burden caused by storms was by far flooding. Flooding alone was responsible for 1.5 Trillion in damages to property and agriculture. Considering many homes are built in known flood zones there is substantiation for why the impact from floods is as great as it is. Tornadoes are sudden onset and have a great magnitude of force accompanied. This blitz of raw natures fury is probably why there is such an impact on human life and well being. Prevention and planning can help offset potential burden from these weather events.
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readr)
## Warning: package 'readr' was built under R version 3.2.5
The data for this project were downloaded from a course repository associated with the Coursera Reproducible Research course. the data for this analysis will be saved as the file “StormData.csv” and can be accessed as follows:
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "StormData.csv.bz2", method = "curl")
Create a data frame “StormData_csv” to evaluate the data structure and evaluate variables.
StormData_csv <- read_csv("~/Desktop/Coursera/ReproducibleResearch/Project2/StormData.csv.bz2")
## Parsed with column specification:
## cols(
## .default = col_character(),
## STATE__ = col_double(),
## COUNTY = col_double(),
## BGN_RANGE = col_double(),
## COUNTY_END = col_double(),
## END_RANGE = col_double(),
## LENGTH = col_double(),
## WIDTH = col_double(),
## F = col_integer(),
## MAG = col_double(),
## FATALITIES = col_double(),
## INJURIES = col_double(),
## PROPDMG = col_double(),
## CROPDMG = col_double(),
## LATITUDE = col_double(),
## LONGITUDE = col_double(),
## LATITUDE_E = col_double(),
## LONGITUDE_ = col_double(),
## REFNUM = col_double()
## )
## See spec(...) for full column specifications.
The goal is to find which types of events are most harmful with respect to population health. Harmful to population health can be investigated by looking at the injuries and fatalities for specific weather events. The rationalle employed here is to find the sum of injuries and fatalities for each event and call it a casualty the casualties for the events will be stored in a new variable Casualties.
StormData_csv$Casualties <- StormData_csv$INJURIES + StormData_csv$FATALITIES
To determine the impact a weather event has on human health we will to determine the sum of casualties for particular weather EVTYPE and sort it from greatest to least. We will then subset all events with one or more casualties.
StormCasualties <- count(StormData_csv, EVTYPE, wt=Casualties, sort = TRUE)
StormHarmful <- subset(StormCasualties, n > 0)
This plot shows the top 10 greatest amount of casualties by events. This helps us get an understanding of what event types have shown to cause the most harm to human health.
bp <- ggplot(head(StormHarmful, 10), aes(x=EVTYPE, y=n)) + geom_bar(stat = "identity")
bp + scale_y_continuous(trans = "log", breaks = c(10,100,1000,10000,100000)) + theme(axis.text.x = element_text(angle = 60, hjust = 1)) + xlab("Event Type") + ylab("log Number Fatalities + Injuries")
By examining the plot it is apparent some of the events seem to be synonymous, “THUNDERSTORM WIND” is the canonical category and “TSTM WIND” is short hand. To ensure we are evaluating the proper impact of events on casualites the names of the top 30 event types were evaluated to determin if some event classifications should be comined. 2 were chosen to be combined.
head(as.data.frame(StormHarmful), 30)
## EVTYPE n
## 1 TORNADO 96979
## 2 EXCESSIVE HEAT 8428
## 3 TSTM WIND 7461
## 4 FLOOD 7259
## 5 LIGHTNING 6046
## 6 HEAT 3037
## 7 FLASH FLOOD 2755
## 8 ICE STORM 2064
## 9 THUNDERSTORM WIND 1621
## 10 WINTER STORM 1527
## 11 HIGH WIND 1385
## 12 HAIL 1376
## 13 HURRICANE/TYPHOON 1339
## 14 HEAVY SNOW 1148
## 15 WILDFIRE 986
## 16 THUNDERSTORM WINDS 972
## 17 BLIZZARD 906
## 18 FOG 796
## 19 RIP CURRENT 600
## 20 WILD/FOREST FIRE 557
## 21 RIP CURRENTS 501
## 22 HEAT WAVE 481
## 23 DUST STORM 462
## 24 WINTER WEATHER 431
## 25 TROPICAL STORM 398
## 26 AVALANCHE 394
## 27 EXTREME COLD 391
## 28 STRONG WIND 383
## 29 DENSE FOG 360
## 30 HEAVY RAIN 349
StormData_csv$EVTYPE <- gsub('^FOG$', 'DENSE FOG', StormData_csv$EVTYPE)
StormData_csv$EVTYPE <- gsub('^HIGH WINDS$', 'HIGH WIND', StormData_csv$EVTYPE)
StormData_csv$EVTYPE <- gsub('^TSTM WIND$', 'THUNDERSTORM WIND', StormData_csv$EVTYPE)
StormData_csv$Casualties <- StormData_csv$INJURIES + StormData_csv$FATALITIES
StormCasualties2 <- count(StormData_csv, EVTYPE, wt=Casualties, sort = TRUE)
StormHarmful2 <- subset(StormCasualties2, n > 0)
head(as.data.frame(StormHarmful2), 30)
## EVTYPE n
## 1 TORNADO 96979
## 2 THUNDERSTORM WIND 9082
## 3 EXCESSIVE HEAT 8428
## 4 FLOOD 7259
## 5 LIGHTNING 6046
## 6 HEAT 3037
## 7 FLASH FLOOD 2755
## 8 ICE STORM 2064
## 9 HIGH WIND 1722
## 10 WINTER STORM 1527
## 11 HAIL 1376
## 12 HURRICANE/TYPHOON 1339
## 13 DENSE FOG 1156
## 14 HEAVY SNOW 1148
## 15 WILDFIRE 986
## 16 THUNDERSTORM WINDS 972
## 17 BLIZZARD 906
## 18 RIP CURRENT 600
## 19 WILD/FOREST FIRE 557
## 20 RIP CURRENTS 501
## 21 HEAT WAVE 481
## 22 DUST STORM 462
## 23 WINTER WEATHER 431
## 24 TROPICAL STORM 398
## 25 AVALANCHE 394
## 26 EXTREME COLD 391
## 27 STRONG WIND 383
## 28 HEAVY RAIN 349
## 29 HIGH SURF 253
## 30 EXTREME HEAT 251
Plot of the Casualties vs Event after recalculating overlapping event classification
bp <- ggplot(head(StormHarmful2, 10), aes(x=EVTYPE, y=n)) + geom_bar(stat = "identity")
bp + scale_y_continuous(trans = "log", breaks = c(10,100,1000,10000,100000)) + theme(axis.text.x = element_text(angle = 60, hjust = 1)) + xlab("Event Type") + ylab("log Number Fatalities + Injuries")
Across the events Tornados by far have the greatest impact on human health. They have a magnitude higher number of casualties over the time span of the records reviewed. Floods, Heat, Winterstorms and Thunderstorm have also had a substantial amount of impact on Human Health across the United States.
How to determine greatest economic consequence? What variables show economic impact from a weather event? PROPDMG and CROPDMG the damage in dollar amounts from weather events.
Before determining the sum of the damage in dollar amounts we have to apply the magnintude in thousands, millions or billions to the value recorded in PROPDMG or CROPDMG.
To determine the multiplier for the property and crop damage we have to first look at what exponent the property and crop damage was classified as.
table(StormData_csv$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 7 19 1 9 21 281832 1 1994
table(StormData_csv$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5 6
## 1 8 5 216 25 13 4 4 28 4
## 7 8 B h H K m M
## 5 1 40 1 6 424665 7 11330
To coerce the classifiers to factors which can be multiplied we have to clean up the classifiers and convert them all to uppercase
StormData_csv$CROPDMGEXP<- toupper(StormData_csv$CROPDMGEXP)
StormData_csv$PROPDMGEXP<- toupper(StormData_csv$PROPDMGEXP)
Create the factors for property and crop damage exp
StormData_csv$CROPDMGEXP<- factor(StormData_csv$CROPDMGEXP)
StormData_csv$PROPDMGEXP<- factor(StormData_csv$PROPDMGEXP)
Change the levels for the property and crop damage exp to numbers corresponding to the documentation (0,1000,1000000,1000000000).
levels(StormData_csv$CROPDMGEXP) <- c("0","0","0","1000000000","1000","1000000")
levels(StormData_csv$PROPDMGEXP) <- c(rep("0",12),"1000000000","0","1000","1000000")
Convert all the NA’s to zero since we cant arbitrarily assign a exponent to the NAs
StormData_csv$PROPDMGEXP[is.na(StormData_csv$PROPDMGEXP)] <- 0
StormData_csv$CROPDMGEXP[is.na(StormData_csv$CROPDMGEXP)] <- 0
Create a new variable for property and crop damage multiplied by its exponent of magnitude.
StormData_csv$propdmgmag <- StormData_csv$PROPDMG * as.numeric(as.character(StormData_csv$PROPDMGEXP))
StormData_csv$cropdmgmag <- StormData_csv$CROPDMG * as.numeric(as.character(StormData_csv$CROPDMGEXP))
View the top sources of property and crop damage
CROPDMGMAG <- count(StormData_csv, EVTYPE, wt=cropdmgmag, sort = TRUE)
PROPDMGMAG <- count(StormData_csv, EVTYPE, wt=propdmgmag, sort = TRUE)
Create a new variable that holds the sum of the property and crop damage for a weather event
StormData_csv$PROPCROPDMG <- StormData_csv$cropdmgmag + StormData_csv$propdmgmag
Find the weather events with the highest combined property and crop damage and plot that amount vs event type
PROPCROPDMG <- count(StormData_csv, EVTYPE, wt=PROPCROPDMG, sort = TRUE)
bp <- ggplot(head(PROPCROPDMG, 10), aes(x=EVTYPE, y=n)) + geom_bar(stat = "identity")
bp + scale_y_continuous(trans = "log", breaks = c(10,100,1000,10000,1e+06,1e+09,1e+12)) + theme(axis.text.x = element_text(angle = 60, hjust = 1)) + xlab("Event Type") + ylab("log Total in $USD \nProperty and Crop Damage")
```