Using recordings of storm and weather events from January 1996 to November 2011, we analyse the event types that contribute most to fatalities and injuries on one hand, crop and property damage on the other.
We conclude that the priorities that public policies need to address are protecting the population against floods, tornadoes and heat waves.
For this analysis, we will be using several R packages:
library(data.table); library(R.utils); library(dplyr); library(ggplot2)
This analysis was performed using the following set up:
We will be using a subset of the Storm Events Database (https://www.ncdc.noaa.gov/stormevents/details.jsp?type=eventtype) provided by the U.S. National Climatic Data Center. The compressed file is downloaded from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 and saved into the R working directory. We then unzip the file in the “data” subfolder. If the subfolder and file are already present, we skip this phase. We then load the data using the very fast fread routine from the data.table package.
Variable names are tidied up from the original data file and added to our data frame.
if (!file.exists("data/StormData.csv")) {
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, "repdata-data-StormData.csv.bz2", method = "curl")
dir.create("data")
bunzip2("repdata-data-StormData.csv.bz2", "data/StormData.csv", remove = FALSE)
}
storm.data <- as.data.frame(fread("data/StormData.csv", header = FALSE, #Using fread to load the data
skip = 1, na.strings=""))
##
Read 22.7% of 967216 rows
Read 45.5% of 967216 rows
Read 61.0% of 967216 rows
Read 78.6% of 967216 rows
Read 93.1% of 967216 rows
Read 902297 rows and 37 (of 37) columns from 0.523 GB file in 00:00:07
names(storm.data) <- tolower(make.names(fread("data/StormData.csv",
header = FALSE, nrows = 1)))
From https://www.ncdc.noaa.gov/stormevents/details.jsp?type=eventtype we learn that until 1995, only 1 to 3 types of events were registered. From 1996, all types have been registered. In order to avoid skewing the results of this analysis, it is necessary to discard any data prior to 1996. The added benefit to doing this is that the data is much more complete, especially with regards to damage estimates.
For the purpose of our study, we only require information on dates, fatalities, injuries, crop and property damage, so will retain these variables only, along with each event’s reference number.
Note that the amounts of property and crop damage are each coded over two variables, one numeric indicating the amount and the other a character indicating the magnitude (K = 1,000, M = 1,000,000). We therefore need to perform a units conversion.
storm.data$bgn_date <- as.Date(storm.data$bgn_date, "%m/%d/%Y") # Convert bgn_date to Date
recent.data <- storm.data %>%
filter(bgn_date >= "1996-01-01") %>% # Subset for events since 1996
select(bgn_date, evtype, fatalities, injuries, # Only select relevant variables
propdmg, propdmgexp, cropdmg, cropdmgexp,
refnum) %>%
mutate(propdmg_value = ifelse(propdmgexp == "M", propdmg * 1000, propdmg), # Convert units
propdmg = NULL, propdmgexp = NULL) %>%
mutate(cropdmg_value = ifelse(cropdmgexp == "M", cropdmg * 1000, cropdmg),
cropdmg = NULL, cropdmgexp = NULL)
recent.data <- recent.data[, c(1, 2, 3, 4, 6, 7, 5)] # Re-order columns and make evtype a factor
recent.data$evtype <- as.factor(recent.data$evtype)
str(recent.data)
## 'data.frame': 653530 obs. of 7 variables:
## $ bgn_date : Date, format: "1996-01-06" "1996-01-11" ...
## $ evtype : Factor w/ 516 levels " HIGH SURF ADVISORY",..: 507 426 434 434 434 142 177 434 434 434 ...
## $ fatalities : num 0 0 0 0 0 0 0 0 0 0 ...
## $ injuries : num 0 0 0 0 0 0 0 0 0 0 ...
## $ propdmg_value: num 380 100 3 5 2 NA 400 12 8 12 ...
## $ cropdmg_value: num 38 NA NA NA NA NA NA NA NA NA ...
## $ refnum : num 248768 248769 248770 248771 248772 ...
We now have a tidy and more compact data set. Using this, we can more easily manipulate the data to extract the required information.
To answer this question, we group the data by event type and calculate sums of fatalities and injuries for each group. We then retain only the event types with at least 10 fatalities or 10 injuries over the observed period, nationally. This is required to make the plot more readable.
health.sub <- recent.data %>% group_by(evtype) %>%
summarise(fatalities_total = sum(fatalities),
injuries_total = sum(injuries)) %>%
filter(fatalities_total >= 10 | injuries_total >= 10)
We can then use this data to build two plots: fatalities and injuries, with event types sorted in decreasing order of the number of fatilities and injuries. See Figure 1 and Figure 2 in the Results section for the charts and corresponding code.
Again, we subset the recent_data data frame to sum up the damage estimates. This time however we will add up property and crop damage into a single variable. Exploratory data analysis has indeed shown that property damage is by far the largest contributor, therefore the total of the two variables is very strongly correlated to property damage.
To make the plot more readable, we only retain events that caused $1mil damage or more.
damage.sub <- recent.data %>%
mutate(alldmg = cropdmg_value + propdmg_value) %>%
filter(alldmg >= 1000 & !is.na(alldmg)) %>%
group_by(evtype) %>%
summarise(alldmg_total = sum(alldmg),
property_total = sum(propdmg_value),
crop_total = sum(cropdmg_value))
This data will be used to build the plot shown in Figure 3 in the Results section. The code used to build that plot is also provided there.
Using the processed data from the previous section, we plot fatalities for each type of weather event.
ggplot(health.sub,
aes(x= reorder(evtype, -fatalities_total), y = fatalities_total)) +
geom_point(colour = "orange") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7)) +
labs(x = "Event types") + labs(y = "Number of fatalities") +
labs(title = "Fatalities from weather events from Jan 1996 to Nov 2011")
Fig.1 - Fatalities from weather events from Jan 1996 to Nov 2011. Events that have led to less than 10 fatalities are not included in the chart.
Figure 1 shows that in terms of fatalities, heat, tornadoes, flash floods and lightning strikes have been the most harmful over the 1996-2011 period.
ggplot(health.sub,
aes(x= reorder(evtype, -injuries_total), y = injuries_total)) +
geom_point(colour = "orange") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7)) +
labs(x = "Event types") + labs(y = "Number of injuries") +
labs(title = "Injuries from weather events from Jan 1996 to Nov 2011")
Fig.2 - Injuries from weather events from Jan 1996 to Nov 2011. Events that have led to less than 10 injuries are not included in the chart
Figure 2 shows that in terms of tornadoes, floods, heat and lightning strikes have been the most harmful over the period.
In detail, the values for fatalities and injuries are:
print.data.frame(health.sub, row.names = FALSE)
## evtype fatalities_total injuries_total
## AVALANCHE 223 156
## BLACK ICE 1 24
## BLIZZARD 70 385
## COLD 15 12
## COLD AND SNOW 14 0
## COLD/WIND CHILL 95 12
## DENSE FOG 9 143
## DRY MICROBURST 3 25
## DUST DEVIL 2 38
## DUST STORM 11 376
## EXCESSIVE HEAT 1797 6391
## EXTREME COLD 113 79
## EXTREME COLD/WIND CHILL 125 24
## EXTREME WINDCHILL 17 5
## FLASH FLOOD 887 1674
## FLOOD 414 6758
## FOG 60 712
## FREEZING DRIZZLE 2 13
## GLAZE 1 212
## HAIL 7 713
## HEAT 237 1222
## Heat Wave 0 70
## HEAVY RAIN 94 230
## HEAVY SNOW 107 698
## HEAVY SURF 5 40
## HEAVY SURF/HIGH SURF 42 48
## HIGH SURF 87 146
## HIGH WIND 235 1083
## HURRICANE 61 46
## HURRICANE/TYPHOON 64 1275
## ICE STORM 82 318
## ICY ROADS 4 22
## LANDSLIDE 37 52
## LIGHTNING 651 4141
## MARINE STRONG WIND 14 22
## MARINE THUNDERSTORM WIND 10 26
## MIXED PRECIP 2 26
## RIP CURRENT 340 209
## RIP CURRENTS 202 294
## SMALL HAIL 0 10
## SNOW 2 10
## SNOW SQUALL 2 35
## STORM SURGE 2 37
## STORM SURGE/TIDE 11 5
## STRONG WIND 103 278
## STRONG WINDS 6 21
## THUNDERSTORM WIND 130 1400
## TORNADO 1511 20667
## TROPICAL STORM 57 338
## TSTM WIND 241 3629
## TSTM WIND/HAIL 5 95
## TSUNAMI 33 129
## UNSEASONABLY WARM 0 17
## URBAN/SML STREAM FLD 28 79
## WILD/FOREST FIRE 12 545
## WILDFIRE 75 911
## WIND 18 84
## WINTER STORM 191 1292
## WINTER WEATHER 33 343
## WINTER WEATHER MIX 0 68
## WINTER WEATHER/MIX 28 72
## WINTRY MIX 1 77
Using the processed data from the previous section, we plot total damage estimates (property + crops) for each event type (retaining only events that have actually caused damage).
ggplot(damage.sub,
aes(x= reorder(evtype, -alldmg_total), y = alldmg_total)) +
geom_point(colour = "orange") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7),
axis.text.y = element_text(size = 7)) +
labs(x = "Event types") +
labs(y = "Total damage estimates in k$") +
labs(title = "Damage from weather events from Jan 1996 to Nov 2011")
Fig.3 - Property and crop damage from weather events from Jan 1996 to Nov 2011. Events that have led to less than $1mil in damage are not included in the chart.
Figure 3 shows that the majority of the damaged caused by weather events comes from floods / flash floods and tornadoes. Hail and huricanes are the next largest contributors.
In detail, the values are:
print.data.frame(damage.sub, row.names = FALSE)
## evtype alldmg_total property_total crop_total
## AVALANCHE 2100.0 2100.0 0.0
## BLIZZARD 35800.0 28800.0 7000.0
## COASTAL FLOOD 161400.0 161400.0 0.0
## DROUGHT 1853527.0 232547.0 1620980.0
## DUST STORM 2140.0 640.0 1500.0
## EXCESSIVE HEAT 492570.0 170.0 492400.0
## EXTREME COLD 4380.0 2230.0 2150.0
## EXTREME COLD/WIND CHILL 6000.0 6000.0 0.0
## FLASH FLOOD 6701189.8 5504008.8 1197181.0
## FLOOD 17231251.6 12517005.1 4714246.5
## FREEZING FOG 2000.0 2000.0 0.0
## Frost/Freeze 1100.0 1000.0 100.0
## FROST/FREEZE 934710.0 8610.0 926100.0
## HAIL 6923998.5 5536155.0 1387843.5
## HEAT 1500.0 1500.0 0.0
## HEAVY RAIN 297435.0 244425.0 53010.0
## Heavy Rain/High Surf 15000.0 13500.0 1500.0
## HEAVY SNOW 210600.0 143100.0 67500.0
## HIGH SURF 81620.0 81620.0 0.0
## HIGH WIND 2923578.1 2297980.0 625598.1
## HURRICANE 6701024.7 4013714.7 2687310.0
## HURRICANE/TYPHOON 3707282.5 2610011.8 1097270.8
## ICE STORM 839965.0 824515.0 15450.0
## LAKE-EFFECT SNOW 26000.0 26000.0 0.0
## LAKESHORE FLOOD 7500.0 7500.0 0.0
## LANDSLIDE 162814.0 142814.0 20000.0
## LIGHTNING 121050.0 117650.0 3400.0
## MARINE HIGH WIND 1000.0 1000.0 0.0
## River Flooding 134010.0 105990.0 28020.0
## STORM SURGE 2615.0 2610.0 5.0
## STORM SURGE/TIDE 636550.0 635700.0 850.0
## STRONG WIND 147750.0 84350.0 63400.0
## THUNDERSTORM WIND 2879357.0 2539347.0 340010.0
## TORNADO 10395791.0 10166482.0 229309.0
## TROPICAL DEPRESSION 1000.0 1000.0 0.0
## TROPICAL STORM 1466135.4 1016695.4 449440.0
## TSTM WIND 930751.6 514522.1 416229.5
## TSTM WIND/HAIL 24329.0 2129.0 22200.0
## TSUNAMI 143320.0 143300.0 20.0
## TYPHOON 16050.0 15400.0 650.0
## URBAN/SML STREAM FLD 4532.0 4290.0 242.0
## WATERSPOUT 5000.0 5000.0 0.0
## WILD/FOREST FIRE 145820.0 48070.0 97750.0
## WILDFIRE 2591708.0 2407926.0 183782.0
## WINTER STORM 957140.0 949500.0 7640.0
## WINTER WEATHER 23650.0 8650.0 15000.0
From a health perspective, heat, floods, tornadoes and lightning are the major factors. It is interesting to note that floods and tonadoes are also the two largest contributors to material damage.
In light of these results, public policies should therefore aim at: