Weather events are known to cause devastating damage to both individuals and the economy itself through the massive costs as a result of the destruction. If these events aren’t monitored then the potential for harm can have harmful future effects, especially when it comes to preparing for future events.
This study seeks to gauge the extent to which all weather events in America affect its people and its economy. By looking at each type of impact, it is then possible to provide a recommendation on what events should be prepared for both from a health perspective as well as an economic perspective.
The data can either be downloaded manually or by the method explained below. If you set up a project space in R it can download into the same folder without having to set or change folder locations:
library(tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Warning: package 'readr' was built under R version 3.3.3
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.21.0 (2016-10-30) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
## The following objects are masked from 'package:base':
##
## attach, detach, gc, load, save
## R.utils v2.4.0 (2016-09-13) successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
## The following object is masked from 'package:tidyr':
##
## extract
## The following object is masked from 'package:RevoMods':
##
## timestamp
## The following object is masked from 'package:utils':
##
## timestamp
## The following objects are masked from 'package:base':
##
## cat, commandArgs, getOption, inherits, isOpen, parse, warnings
library(ggplot2)
data_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(data_url, "StormData.csv.bz2")
bunzip2("StormData.csv.bz2", "StormData.csv")
NOAA_Data <- read.csv("StormData.csv")
To gauge the health impact, there are two variables we can look at: total injuries from all events and total fatalities from all events.
library(tidyverse)
NOAA_HealthImpact <- NOAA_Data %>% select(EVTYPE, FATALITIES, INJURIES) %>%
group_by(EVTYPE) %>%
mutate(HEALTH_IMPACT = FATALITIES + INJURIES) %>% # Measures ALL those affected
summarise(TOTAL.HEALTHIMPACT = sum(HEALTH_IMPACT)) %>%
arrange(-TOTAL.HEALTHIMPACT)
head(NOAA_HealthImpact)
## # A tibble: 6 × 2
## EVTYPE TOTAL.HEALTHIMPACT
## <fctr> <dbl>
## 1 TORNADO 96979
## 2 EXCESSIVE HEAT 8428
## 3 TSTM WIND 7461
## 4 FLOOD 7259
## 5 LIGHTNING 6046
## 6 HEAT 3037
Processing the data for the economic impact was a bit different to that of the health impact due to the exponents that the values had to be multiplied by. For this reason we had to look online to see if anyone has a table that can provide these values. Fortunately, this article has values for the symbols which will be used in this study.
THe first thing we need to do is select the columns that we will be using, namely event type, property damage, property damage exponent, crop damage and crop damage exponent.Thereafter we wish to find out which of the values in the exponent column are unique.
NOAA_DAMAGE <- NOAA_Data %>% select(EVTYPE, PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)
Exponents <- unique(NOAA_Data$PROPDMGEXP)
Now that we know which are the unique values, we need to create a column that displays the unique numeric values of the possible exponents:
Exponent_values <- c(10^2, 10^6, 0, 10^9, 10^6, 1, 0, 10, 10, 0, 10, 10, 10, 10^2, 10, 10^2, 0, 10, 10)
Bind the two together
Exponents_Final <- data.frame(Exponents, Exponent_values)
colnames(Exponents_Final) <- c('PROPDMGEXP', 'PROPDMGNUM')
To join the created table with the original one we can simply use a left join. This takes the primary key from the original table and looks up the foreign key from the new table to create a new column with the exponent values.
NOAA_DAMAGE_FINAL <- left_join(NOAA_DAMAGE, Exponents_Final, by = "PROPDMGEXP")
NOAA_DAMAGE_FINAL <- NOAA_DAMAGE_FINAL %>% mutate(PROPDMGFINAL = PROPDMG*PROPDMGNUM) %>% mutate(CROPDMGFINAL = CROPDMG*PROPDMGNUM)
head(NOAA_DAMAGE_FINAL)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP PROPDMGNUM PROPDMGFINAL
## 1 TORNADO 25.0 K 0 100 2500
## 2 TORNADO 2.5 K 0 100 250
## 3 TORNADO 25.0 K 0 100 2500
## 4 TORNADO 2.5 K 0 100 250
## 5 TORNADO 2.5 K 0 100 250
## 6 TORNADO 2.5 K 0 100 250
## CROPDMGFINAL
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
One can even make one more column with total damage:
NOAA_DAMAGE_FINAL <- NOAA_DAMAGE_FINAL %>% mutate(TOTALDMG = PROPDMGFINAL + CROPDMGFINAL)
head(NOAA_DAMAGE_FINAL)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP PROPDMGNUM PROPDMGFINAL
## 1 TORNADO 25.0 K 0 100 2500
## 2 TORNADO 2.5 K 0 100 250
## 3 TORNADO 25.0 K 0 100 2500
## 4 TORNADO 2.5 K 0 100 250
## 5 TORNADO 2.5 K 0 100 250
## 6 TORNADO 2.5 K 0 100 250
## CROPDMGFINAL TOTALDMG
## 1 0 2500
## 2 0 250
## 3 0 2500
## 4 0 250
## 5 0 250
## 6 0 250
Lastly, we need to summarise the damage statistics and sort the final dataset:
NOAA_DAMAGE_TOTAL <- NOAA_DAMAGE_FINAL %>% group_by(EVTYPE) %>% summarize(EVTYPE_TOTALDAMAGE = sum(TOTALDMG))
head(NOAA_DAMAGE_TOTAL) # Needs to be ordered.
## # A tibble: 6 × 2
## EVTYPE EVTYPE_TOTALDAMAGE
## <fctr> <dbl>
## 1 HIGH SURF ADVISORY 20000
## 2 COASTAL FLOOD 0
## 3 FLASH FLOOD 5000
## 4 LIGHTNING 0
## 5 TSTM WIND 8010000
## 6 TSTM WIND (G45) 800
# Sort DAMAGE by magnitude
NOAA_DAMAGE_TOTAL <- as.data.frame(NOAA_DAMAGE_TOTAL)
NOAA_DAMAGE_TOTAL <- NOAA_DAMAGE_TOTAL %>% arrange(-NOAA_DAMAGE_TOTAL$EVTYPE_TOTALDAMAGE)
head(NOAA_DAMAGE_TOTAL)
## EVTYPE EVTYPE_TOTALDAMAGE
## 1 HURRICANE 814739475501
## 2 HURRICANE/TYPHOON 802071608133
## 3 FLOOD 231021569207
## 4 TORNADO 82300967348
## 5 FLASH FLOOD 53574336025
## 6 STORM SURGE 43311790600
Looking at the health impact, we can view the first few results graphically to ascertain which events are causing the most harm in America:
g <- ggplot(NOAA_HealthImpact[1:5,], aes(x=reorder(EVTYPE, -TOTAL.HEALTHIMPACT), y=TOTAL.HEALTHIMPACT))
g + geom_bar(stat="identity", width = 0.5, fill="tomato2") +
theme(axis.text.x = element_text(angle=65, vjust=0.6)) +
labs(title = "Total Number of Fatalities & Injuries per Event",
x = "Event Type",
y = "Number of Persons affected")
As is evident from the graph, tornadoes are by far the leading cause of fatalities and injuries in America. Quite surprising though is excessive heat which is the second highest event in terms of combined health impacts.
In terms of the economic impact, we again can view the summarised information graphically:
g <- ggplot(NOAA_DAMAGE_TOTAL[1:5,], aes(x=reorder(EVTYPE, -EVTYPE_TOTALDAMAGE), y=EVTYPE_TOTALDAMAGE))
g + geom_bar(stat="identity", width = 0.5, fill="tomato2") +
theme(axis.text.x = element_text(angle=65, vjust=0.6)) +
labs(title = "Economic Damage per Event",
x = "Event Type",
y = "Monetary Value")
According to the above graph, hurricanes and hurricane/typhoon have the highest economic damage out of any weather event with floods coming in at a distant third.
Weather damage causes heavy personal and economic costs every year and are important to understand. Health impacts are very large while economic damage from certain events cost very large amounts of money every year to fix. As such it is important to understand what are the most harmful events in order to prepare for their eventualities as well as prioritise which events must be researched further. This study found that tornados by far are the most dangerous events for people while hurricanes cause the most damage economically to property and crops. It must be noted though that this study is limited by the records of event dating back before proper recording instruments were mainstream as well as a proper index of what the damge cost exponents are in the dataset.