Synopsis

Based on weather data collected by NOAA starting 1996, we attempt to answer two questions. What weather events have the most effect on human health? What weather events cause the most economic loss measured by property and crop loss? Based on our analysis, it is determined that heat has the highest health effect as seen from the fatalities recorded, followed by tornados. As for economic loss, floods create the largest total loss, property plus crop loss. Though droughts cause substantial crop loss, it is 10 fold lower in magnitude.

Pre-Processing

Setup of appropriate libraries and reading of data file

## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
## Warning: package 'stringdist' was built under R version 3.3.2
## Warning: package 'knitr' was built under R version 3.3.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:lubridate':
## 
##     intersect, setdiff, union
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Warning: package 'ggplot2' was built under R version 3.3.2
## Warning: package 'ggrepel' was built under R version 3.3.2

Data Processing

Beginning analysis starting with data from 1996, as data prior are incomplete representation of all types of weather events.

Additionally data that did not cause fatalty/injury or economic loss are excluded from this analysis.

## Filter data starting 1996, as data prior is not available for all EVTYPES
d1$BGN_DATE <- mdy_hms(d1$BGN_DATE)
d1 <- filter(d1, BGN_DATE >= "1996-01-01")
## Filter only data that contains either Fatalities or Injuries or Prop Damage or Crop Damage
d1  <-  filter(d1, FATALITIES != 0 | INJURIES != 0 | PROPDMG != 0 | CROPDMG !=0)
## Select only 7 columns that will be necessary for processing
d1 <- select(d1, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

Adjust Crop and Property loss into standard numeric value

test <- grepl("[K]", d1$PROPDMGEXP)
d1[test,]$PROPDMG <- d1[test,]$PROPDMG * 1000
test <- grepl("[M]", d1$PROPDMGEXP)
d1[test,]$PROPDMG <- d1[test,]$PROPDMG * 1000000
test <- grepl("[B]", d1$PROPDMGEXP)
d1[test,]$PROPDMG <- d1[test,]$PROPDMG * 1000000000
test <- grepl("[K]", d1$CROPDMGEXP)
d1[test,]$CROPDMG <- d1[test,]$CROPDMG * 1000
test <- grepl("[M]", d1$CROPDMGEXP)
d1[test,]$CROPDMG <- d1[test,]$CROPDMG * 1000000
test <- grepl("[B]", d1$CROPDMGEXP)
d1[test,]$CROPDMG <- d1[test,]$CROPDMG * 1000000000
d1 <- select(d1, EVTYPE, FATALITIES, INJURIES, PROPDMG,  CROPDMG)

Weather events (EVTYPE) are based on offical EVTYPES as outlined in the National Weather Service Instruction 10-1605 (dated August2007)

The data contains may errors with regards to EVTYPE coding. Adjusting for various errors with end result of matching most to appropriate official EVTYPE

## convert all to lower case
d1$EVTYPE <- tolower(d1$EVTYPE)

## Official EVTYPES as outlined in National Weather Service Instruction 10-1605 (dated Aug2007)
officialEVTYPE <- 
c('Astronomical Low Tide', 
'Avalanche',
'Blizzard',
'Coastal Flood',
'Cold/Wind Chill',
'Debris Flow',
'Dense Fog',
'Dense Smoke',
'Drought',
'Dust Devil',
'Dust Storm',
'Excessive Heat',
'Extreme Cold/Wind Chill',
'Flash Flood',
'Flood',
'Freezing Fog',
'Frost/Freeze',
'Funnel Cloud',
'Hail',
'Heat',
'Heavy Rain',
'Heavy Snow',
'High Surf',
'High Wind',
'Hurricane/Typhoon',
'Ice Storm',
'Lakeshore Flood',
'Lake-Effect Snow',
'Lightning',
'Marine Hail',
'Marine High Wind',
'Marine Strong Wind',
'Marine Thunderstorm Wind',
'Rip Current',
'Seiche',
'Sleet',
'Storm Tide',
'Strong Wind',
'Thunderstorm Wind',
'Tornado',
'Tropical Depression',
'Tropical Storm',
'Tsunami',
'Volcanic Ash',
'Waterspout',
'Wildfire',
'Winter Storm',
'Winter Weather')

## clean EVTYPE to conform to official EVTYPE list
d1$EVTYPE[grepl("hurricane", d1$EVTYPE)] <- "hurricane/typhoon"
d1$EVTYPE[grepl("typhoon", d1$EVTYPE)] <- "hurricane/typhoon"
d1$EVTYPE[grepl("tstm wind|gusty wind", d1$EVTYPE)] <- "high wind"
d1$EVTYPE[grepl("thunderstorm", d1$EVTYPE)] <- "thunderstorm wind"
d1$EVTYPE[grepl("frost", d1$EVTYPE)] <- "frost/freezing"
d1$EVTYPE[grepl("freeze|freezing|cold", d1$EVTYPE)] <- "frost/freezing"
d1$EVTYPE[grepl("surf", d1$EVTYPE)] <- "high surf"
d1$EVTYPE[grepl("fire", d1$EVTYPE)] <- "wild fire"
d1$EVTYPE[grepl("tide", d1$EVTYPE)] <- "storm tide"
d1$EVTYPE[grepl("flash/flood|flash flood", d1$EVTYPE)] <- "flash flood"
d1$EVTYPE[grepl("stream fld|river flood|river flooding|unseasonal rain|dam break", d1$EVTYPE)] <- "flood"
d1$EVTYPE[grepl("tidal flooding|cstl flood|coastal flooding|flooding/erosion|beach erosion|coastal erosion", d1$EVTYPE)] <- "coastal flood"
d1$EVTYPE[grepl("snow", d1$EVTYPE)] <- "heavy snow"
d1$EVTYPE[grepl("heat wave", d1$EVTYPE)] <- "excessive heat"
d1$EVTYPE[grepl("landslide|mudslide|mud slide|rock slide", d1$EVTYPE)] <- "debris flow"
d1$EVTYPE[grepl("hail", d1$EVTYPE)] <- "hail"
d1$EVTYPE[grepl("extreme windchill", d1$EVTYPE)] <- "Extreme Cold/Wind Chill"

##Match data to official EVTYPE list
matchEVTYPE <- amatch(d1$EVTYPE, tolower(officialEVTYPE), maxDist = 4)
d1 <- mutate(d1, correctedEVTYPE = officialEVTYPE[matchEVTYPE])


## percentage of data that could not be place on official list
m1 <- mean(is.na(d1$correctedEVTYPE))
Unable to match 8.345213710^{-4}% of data with appropriate offical EVTYPE.

Results

Question 1

Across the United States, which types of events (as indicated in the 𝙴𝚅𝚃𝚈𝙿𝙴 variable) are most harmful with respect to population health?

Assumption: Because fatalities are more likely to be reported when associated with a weather event and there are not degrees of fatality vs. comparison of injury, this analysis will opt to only take into account fatality when measuring harmful health effect of weather to humans?

Result of Question 1

Heat is the biggest threat to humans. Followed by tornados and floods/flash floods.

## Only filter events with fatalities
d2 <- filter(d1, FATALITIES > 0)
f1 <- aggregate(FATALITIES ~ correctedEVTYPE, data = d2, sum)

## Filter on only the EVTYPES that cause fatalities in the upper quartile
q1 <- quantile(f1$FATALITIES)
f2 <- filter(f1, f1$FATALITIES >= q1[4])

kable(f2, caption = "Most dangerous weather events (top quartile) in dollars")
Most dangerous weather events (top quartile) in dollars
correctedEVTYPE FATALITIES
Excessive Heat 1797
Flash Flood 887
Flood 523
Frost/Freeze 378
Heat 237
High Wind 499
Lightning 650
Rip Current 542
Tornado 1511
ggplot(f2, aes(x=correctedEVTYPE, y=FATALITIES)) + geom_bar(stat = "identity", fill="blue")+geom_text(aes(label=FATALITIES), col = "white", vjust = 1.5)+theme_minimal()+ggtitle("Fatalities by Weather Events")+ labs(x="Weather Events", y= "Fatalities")

Question 2

Across the United States, which types of events have the greatest economic consequences?

Result of Question 2

Floods cause the most economic loss, though it skews mostly to property loss. As expected Drought causes the most crop loss, but as expected it causes little property loss and the absolute value of the loss is a factor of 10 less than the total loss causes by floods.

##Filter only weather events with property and crop damage
c1 <- filter(d1, PROPDMG >0 | CROPDMG >0)

##aggregate data on types of events and total
c2 <- aggregate(CROPDMG ~ correctedEVTYPE, data = c1, sum)
c3 <- aggregate(PROPDMG ~ correctedEVTYPE, data = c1, sum)
c4 <- merge(c2,c3)
c4$TOTAL <- c4$CROPDMG + c4$PROPDMG

##table and plot only events with large effect (top quartile)
q2 <- quantile(c4$TOTAL)
c4 <- filter(c4, c4$TOTAL >= q2[4])

kable(c4, caption = "Economic Loss Due to Weather Events (top quartile) in dollars")
Economic Loss Due to Weather Events (top quartile) in dollars
correctedEVTYPE CROPDMG PROPDMG TOTAL
Drought 13367566000 1046101000 14413667000
Flash Flood 1334901700 15222268910 16557170610
Flood 5023461500 144146167200 149169628700
Hail 2497072450 14595517420 17092589870
High Wind 1252382900 9786379800 11038762700
Hurricane/Typhoon 5350107800 81718889010 87068996810
Storm Tide 855000 47844469000 47845324000
Thunderstorm Wind 398381000 3383090840 3781471840
Tornado 283425010 24616905710 24900330720
Tropical Storm 677711000 7642475550 8320186550
Wildfire 402255130 7760449500 8162704630
ggplot(c4, aes(x = c4$CROPDMG, y = c4$PROPDMG)) + geom_point(aes(color=TOTAL)) + geom_text_repel(aes(label=c4$correctedEVTYPE))+theme_light()+ggtitle("Property and Crop Economic Loss by Weather Events")+labs(x="Crop Loss ($)", y="Property Loss ($)")