To assess the significant weather contributors to public health and economic impact in the USA, the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm data was analysed from 1950 to 2011. The data was transformed using R Studio ([Details at end of this document]). The findings of the analysis:
In conclusion, hurricanes have the most adverse public health and economic impact of all weather in the USA.
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url,"archiveRaw.csv.bz2")
storm <- read.csv("archiveRaw.csv.bz2", header = TRUE, sep = ",", quote = "\"", stringsAsFactors = FALSE)
storm$PROPDMGEXP <- toupper(storm$PROPDMGEXP)
storm$CROPDMGEXP <- toupper(storm$CROPDMGEXP)
cropDmgKey <- c("\"\"" = 10^0,
"?" = 10^0,
"0" = 10^0,
"K" = 10^3,
"M" = 10^6,
"B" = 10^9)
propDmgKey <- c("\"\"" = 10^0,
"-" = 10^0,
"+" = 10^0,
"0" = 10^0,
"1" = 10^1,
"2" = 10^2,
"3" = 10^3,
"4" = 10^4,
"5" = 10^5,
"6" = 10^6,
"7" = 10^7,
"8" = 10^8,
"9" = 10^9,
"H" = 10^2,
"K" = 10^3,
"M" = 10^6,
"B" = 10^9)
storm$PROPDMGEXP <- propDmgKey[as.character(storm$PROPDMGEXP)]
storm$PROPDMGEXP[is.na(storm$PROPDMGEXP)] <- 10^0
# Map crop damage alphanumeric exponents to numeric values
cropDmgKey <- c("\"\"" = 10^0,
"?" = 10^0,
"0" = 10^0,
"K" = 10^3,
"M" = 10^6,
"B" = 10^9)
storm$CROPDMGEXP <- cropDmgKey[as.character(storm$CROPDMGEXP)]
storm$CROPDMGEXP[is.na(storm$CROPDMGEXP)] <- 10^0
Across the United States, which types of events (as indicated in the 𝙴𝚅𝚃𝚈𝙿𝙴 variable) are most harmful with respect to population health?
for this question, we need to define a response variable aggregating the health effects. There are two: INJURIES and FATALITIES
I assume that there is a relationship between number of fatalities and relationships:
summary(lm(FATALITIES ~ INJURIES, data = storm))
##
## Call:
## lm(formula = FATALITIES ~ INJURIES, data = storm)
##
## Residuals:
## Min 1Q Median 3Q Max
## -70.07 -0.01 -0.01 -0.01 582.99
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0097265 0.0007631 12.74 <2e-16 ***
## INJURIES 0.0453207 0.0001404 322.71 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7246 on 902295 degrees of freedom
## Multiple R-squared: 0.1035, Adjusted R-squared: 0.1035
## F-statistic: 1.041e+05 on 1 and 902295 DF, p-value: < 2.2e-16
There is a relationship between INJURIES and FATALITIES. The regression tells us that each on average there is about 22 injuries per FATALITY. We will use these regression parameter estimates to create a summary variable of damages on health later on. Here is a plot of the discussed relationship:
library(ggplot2)
ggplot(storm, aes(x = INJURIES, y = FATALITIES)) +
geom_point(alpha = 0.1) +
geom_smooth(method = "lm") +
ggtitle(label="relationship of INJURIES vs FATALITIES in weather related accidents") + coord_cartesian(xlim=c(0,1000),ylim=c(0,1000))
Next, we will create a new variable which will describe the total HEALTH damages caused by a weather event. We will use the regression estimate to assign weight to FATALITIES. Each FATALITY will have a value of roughly 22 INJURIES. New variable combining the health-related effects will be called PEOPLEDAMAGE.
In the following command, we aggregate the storm dataset by EVTYPE to get SUMs of this newly calculated PEOPLEDAMAGE variable.
We find out that the most harmful type of weather event in the United States is a tornadoe.
storm$PEOPLEDAMAGE <- storm$INJURIES + 1/0.0453207 * storm$FATALITIES
aggregated <- aggregate(PEOPLEDAMAGE ~ EVTYPE, storm, FUN = sum)
aggregated[which.max(aggregated$PEOPLEDAMAGE),]
## EVTYPE PEOPLEDAMAGE
## 834 TORNADO 215638
aggregated <- aggregated[with(aggregated,order(PEOPLEDAMAGE, decreasing = TRUE)),]
To analyze economic damage, we will have to construct a variable combining all the damages on assets. We obtain this variable by summing PROPDMG, and CROPDMG. The new variable will be calles THINGDMG.
storm$THINGDMG <- as.numeric(storm$PROPDMG) + as.numeric(storm$CROPDMG)
aggregated2 <- aggregate(THINGDMG ~ EVTYPE, storm, FUN = sum)
as.character(aggregated2[which.max(aggregated2$THINGDMG),1])
## [1] "TORNADO"
aggregated2 <- aggregated2[with(aggregated2,order(THINGDMG, decreasing = TRUE)),]
library(reshape2)
agg <- melt(head(aggregated,10))
## Warning in melt.data.table(head(aggregated, 10)): To be consistent with
## reshape2's melt, id.vars and measure.vars are internally guessed when both
## are 'NULL'. All non-numeric/integer/logical type columns are conisdered
## id.vars, which in this case are columns [EVTYPE, variable]. Consider
## providing at least one of 'id' or 'measure' vars in future.
## Duplicate column names found in molten data.table. Setting unique names using 'make.names'
ggplot(agg, aes(x = reorder(EVTYPE, value),y = value)) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + ggtitle(label = "Health effects of weather events") + xlab("Event type") + ylab("Injuries and equivalents") +
coord_flip()
agg <- melt(head(aggregated2,15))
## Using EVTYPE as id variables
ggplot(agg, aes(x = reorder(EVTYPE, value),y = value)) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + ggtitle(label = "Economic effects of weather events") + xlab("Event type") + ylab("Economic damage in USD") + scale_y_continuous(labels = scales::dollar) +
coord_flip()
The most economically harmful type of meteorological event is the United States are tornadoes.
Analysis performed in R, sessioninfo follows.
sessionInfo()
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: macOS Sierra 10.12.1
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] data.table_1.9.6 reshape2_1.4.2 ggplot2_2.1.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.7 digest_0.6.10 assertthat_0.1 chron_2.3-47
## [5] plyr_1.8.4 grid_3.3.2 gtable_0.2.0 formatR_1.4
## [9] magrittr_1.5 evaluate_0.10 scales_0.4.0 stringi_1.1.2
## [13] rmarkdown_1.1 labeling_0.3 tools_3.3.2 stringr_1.1.0
## [17] munsell_0.4.3 yaml_2.1.13 colorspace_1.2-7 htmltools_0.3.5
## [21] knitr_1.14 tibble_1.2