Synopsis

Using recordings of storm and weather events from January 1996 to November 2011, we analyse the event types that contribute most to fatalities and injuries on one hand, crop and property damage on the other.
We conclude that the priorities that public policies need to address are protecting the population against floods, tornadoes and heat waves.

Data Processing

Environment set up

For this analysis, we will be using several R packages:

library(data.table); library(R.utils); library(dplyr); library(ggplot2)

Data preparation

This analysis was performed using the following set up:

  • Computer: Intel x64 Quad-core
  • OS: Windows 10, build number 10240
  • R version 3.2.2
  • RStudio version 0.99.484
  • Locale = French_France.1252

We will be using a subset of the Storm Events Database (https://www.ncdc.noaa.gov/stormevents/details.jsp?type=eventtype) provided by the U.S. National Climatic Data Center. The compressed file is downloaded from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 and saved into the R working directory. We then unzip the file in the “data” subfolder. If the subfolder and file are already present, we skip this phase. We then load the data using the very fast fread routine from the data.table package.
Variable names are tidied up from the original data file and added to our data frame.

if (!file.exists("data/StormData.csv")) {
    fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
    download.file(fileUrl, "repdata-data-StormData.csv.bz2", method = "curl")
    dir.create("data")
    bunzip2("repdata-data-StormData.csv.bz2", "data/StormData.csv", remove = FALSE)
}

storm.data <- as.data.frame(fread("data/StormData.csv", header = FALSE,  #Using fread to load the data
                    skip = 1, na.strings=""))
## 
Read 22.7% of 967216 rows
Read 45.5% of 967216 rows
Read 61.0% of 967216 rows
Read 78.6% of 967216 rows
Read 93.1% of 967216 rows
Read 902297 rows and 37 (of 37) columns from 0.523 GB file in 00:00:07
names(storm.data) <- tolower(make.names(fread("data/StormData.csv", 
                                              header = FALSE, nrows = 1)))

From https://www.ncdc.noaa.gov/stormevents/details.jsp?type=eventtype we learn that until 1995, only 1 to 3 types of events were registered. From 1996, all types have been registered. In order to avoid skewing the results of this analysis, it is necessary to discard any data prior to 1996. The added benefit to doing this is that the data is much more complete, especially with regards to damage estimates.

For the purpose of our study, we only require information on dates, fatalities, injuries, crop and property damage, so will retain these variables only, along with each event’s reference number.
Note that the amounts of property and crop damage are each coded over two variables, one numeric indicating the amount and the other a character indicating the magnitude (K = 1,000, M = 1,000,000). We therefore need to perform a units conversion.

storm.data$bgn_date <- as.Date(storm.data$bgn_date, "%m/%d/%Y")  # Convert bgn_date to Date

recent.data <- storm.data %>% 
    filter(bgn_date >= "1996-01-01") %>%  # Subset for events since 1996
    select(bgn_date, evtype, fatalities, injuries,  # Only select relevant variables
           propdmg, propdmgexp, cropdmg, cropdmgexp,
           refnum) %>%
    mutate(propdmg_value = ifelse(propdmgexp == "M", propdmg * 1000, propdmg),  # Convert units
           propdmg = NULL, propdmgexp = NULL) %>%
    mutate(cropdmg_value = ifelse(cropdmgexp == "M", cropdmg * 1000, cropdmg), 
           cropdmg = NULL, cropdmgexp = NULL)

recent.data <- recent.data[, c(1, 2, 3, 4, 6, 7, 5)]  # Re-order columns and make evtype a factor
recent.data$evtype <- as.factor(recent.data$evtype)

str(recent.data)
## 'data.frame':    653530 obs. of  7 variables:
##  $ bgn_date     : Date, format: "1996-01-06" "1996-01-11" ...
##  $ evtype       : Factor w/ 516 levels "   HIGH SURF ADVISORY",..: 507 426 434 434 434 142 177 434 434 434 ...
##  $ fatalities   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ injuries     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ propdmg_value: num  380 100 3 5 2 NA 400 12 8 12 ...
##  $ cropdmg_value: num  38 NA NA NA NA NA NA NA NA NA ...
##  $ refnum       : num  248768 248769 248770 248771 248772 ...

We now have a tidy and more compact data set. Using this, we can more easily manipulate the data to extract the required information.

1. Across the United States, which types of events are most harmful with respect to population health?

To answer this question, we group the data by event type and calculate sums of fatalities and injuries for each group. We then retain only the event types with at least 10 fatalities or 10 injuries over the observed period, nationally. This is required to make the plot more readable.

health.sub <- recent.data %>% group_by(evtype) %>%
    summarise(fatalities_total = sum(fatalities), 
              injuries_total = sum(injuries)) %>% 
    filter(fatalities_total >= 10 | injuries_total >= 10)

We can then use this data to build two plots: fatalities and injuries, with event types sorted in decreasing order of the number of fatilities and injuries. See Figure 1 and Figure 2 in the Results section for the charts and corresponding code.

2. Across the United States, which types of events have the greatest economic consequences?

Again, we subset the recent_data data frame to sum up the damage estimates. This time however we will add up property and crop damage into a single variable. Exploratory data analysis has indeed shown that property damage is by far the largest contributor, therefore the total of the two variables is very strongly correlated to property damage.
To make the plot more readable, we only retain events that caused $1mil damage or more.

damage.sub <- recent.data %>%
    mutate(alldmg = cropdmg_value + propdmg_value) %>%
    filter(alldmg >= 1000 & !is.na(alldmg)) %>%
    group_by(evtype) %>%
    summarise(alldmg_total = sum(alldmg), 
              property_total = sum(propdmg_value), 
              crop_total = sum(cropdmg_value))

This data will be used to build the plot shown in Figure 3 in the Results section. The code used to build that plot is also provided there.


Results

1. Across the United States, which types of events are most harmful with respect to population health?

Using the processed data from the previous section, we plot fatalities for each type of weather event.

ggplot(health.sub, 
       aes(x= reorder(evtype, -fatalities_total), y = fatalities_total)) +
    geom_point(colour = "orange") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7)) +
    labs(x = "Event types") + labs(y = "Number of fatalities") + 
    labs(title = "Fatalities from weather events from Jan 1996 to Nov 2011") 
Fig.1 - Fatalities from weather events from Jan 1996 to Nov 2011. Events that have led to less than 10 fatalities are not included in the chart.

Fig.1 - Fatalities from weather events from Jan 1996 to Nov 2011. Events that have led to less than 10 fatalities are not included in the chart.

Figure 1 shows that in terms of fatalities, heat, tornadoes, flash floods and lightning strikes have been the most harmful over the 1996-2011 period.

ggplot(health.sub, 
       aes(x= reorder(evtype, -injuries_total), y = injuries_total)) +
    geom_point(colour = "orange") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7)) +
    labs(x = "Event types") + labs(y = "Number of injuries") + 
    labs(title = "Injuries from weather events from Jan 1996 to Nov 2011") 
Fig.2 - Injuries from weather events from Jan 1996 to Nov 2011. Events that have led to less than 10 injuries are not included in the chart

Fig.2 - Injuries from weather events from Jan 1996 to Nov 2011. Events that have led to less than 10 injuries are not included in the chart

Figure 2 shows that in terms of tornadoes, floods, heat and lightning strikes have been the most harmful over the period.

In detail, the values for fatalities and injuries are:

print.data.frame(health.sub, row.names = FALSE)
##                    evtype fatalities_total injuries_total
##                 AVALANCHE              223            156
##                 BLACK ICE                1             24
##                  BLIZZARD               70            385
##                      COLD               15             12
##             COLD AND SNOW               14              0
##           COLD/WIND CHILL               95             12
##                 DENSE FOG                9            143
##            DRY MICROBURST                3             25
##                DUST DEVIL                2             38
##                DUST STORM               11            376
##            EXCESSIVE HEAT             1797           6391
##              EXTREME COLD              113             79
##   EXTREME COLD/WIND CHILL              125             24
##         EXTREME WINDCHILL               17              5
##               FLASH FLOOD              887           1674
##                     FLOOD              414           6758
##                       FOG               60            712
##          FREEZING DRIZZLE                2             13
##                     GLAZE                1            212
##                      HAIL                7            713
##                      HEAT              237           1222
##                 Heat Wave                0             70
##                HEAVY RAIN               94            230
##                HEAVY SNOW              107            698
##                HEAVY SURF                5             40
##      HEAVY SURF/HIGH SURF               42             48
##                 HIGH SURF               87            146
##                 HIGH WIND              235           1083
##                 HURRICANE               61             46
##         HURRICANE/TYPHOON               64           1275
##                 ICE STORM               82            318
##                 ICY ROADS                4             22
##                 LANDSLIDE               37             52
##                 LIGHTNING              651           4141
##        MARINE STRONG WIND               14             22
##  MARINE THUNDERSTORM WIND               10             26
##              MIXED PRECIP                2             26
##               RIP CURRENT              340            209
##              RIP CURRENTS              202            294
##                SMALL HAIL                0             10
##                      SNOW                2             10
##               SNOW SQUALL                2             35
##               STORM SURGE                2             37
##          STORM SURGE/TIDE               11              5
##               STRONG WIND              103            278
##              STRONG WINDS                6             21
##         THUNDERSTORM WIND              130           1400
##                   TORNADO             1511          20667
##            TROPICAL STORM               57            338
##                 TSTM WIND              241           3629
##            TSTM WIND/HAIL                5             95
##                   TSUNAMI               33            129
##         UNSEASONABLY WARM                0             17
##      URBAN/SML STREAM FLD               28             79
##          WILD/FOREST FIRE               12            545
##                  WILDFIRE               75            911
##                      WIND               18             84
##              WINTER STORM              191           1292
##            WINTER WEATHER               33            343
##        WINTER WEATHER MIX                0             68
##        WINTER WEATHER/MIX               28             72
##                WINTRY MIX                1             77
  • Total number of fatalities (01/1996-11/2011): 8,629
  • Total number of injuries (01/1996-11/2011): 57,862

2. Across the United States, which types of events have the greatest economic consequences?

Using the processed data from the previous section, we plot total damage estimates (property + crops) for each event type (retaining only events that have actually caused damage).

ggplot(damage.sub, 
       aes(x= reorder(evtype, -alldmg_total), y = alldmg_total)) +
    geom_point(colour = "orange") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7),
          axis.text.y = element_text(size = 7)) +
    labs(x = "Event types") + 
    labs(y = "Total damage estimates in k$") +
    labs(title = "Damage from weather events from Jan 1996 to Nov 2011") 
Fig.3 - Property and crop damage from weather events from Jan 1996 to Nov 2011. Events that have led to less than $1mil in damage are not included in the chart.

Fig.3 - Property and crop damage from weather events from Jan 1996 to Nov 2011. Events that have led to less than $1mil in damage are not included in the chart.

Figure 3 shows that the majority of the damaged caused by weather events comes from floods / flash floods and tornadoes. Hail and huricanes are the next largest contributors.

In detail, the values are:

print.data.frame(damage.sub, row.names = FALSE)
##                   evtype alldmg_total property_total crop_total
##                AVALANCHE       2100.0         2100.0        0.0
##                 BLIZZARD      35800.0        28800.0     7000.0
##            COASTAL FLOOD     161400.0       161400.0        0.0
##                  DROUGHT    1853527.0       232547.0  1620980.0
##               DUST STORM       2140.0          640.0     1500.0
##           EXCESSIVE HEAT     492570.0          170.0   492400.0
##             EXTREME COLD       4380.0         2230.0     2150.0
##  EXTREME COLD/WIND CHILL       6000.0         6000.0        0.0
##              FLASH FLOOD    6701189.8      5504008.8  1197181.0
##                    FLOOD   17231251.6     12517005.1  4714246.5
##             FREEZING FOG       2000.0         2000.0        0.0
##             Frost/Freeze       1100.0         1000.0      100.0
##             FROST/FREEZE     934710.0         8610.0   926100.0
##                     HAIL    6923998.5      5536155.0  1387843.5
##                     HEAT       1500.0         1500.0        0.0
##               HEAVY RAIN     297435.0       244425.0    53010.0
##     Heavy Rain/High Surf      15000.0        13500.0     1500.0
##               HEAVY SNOW     210600.0       143100.0    67500.0
##                HIGH SURF      81620.0        81620.0        0.0
##                HIGH WIND    2923578.1      2297980.0   625598.1
##                HURRICANE    6701024.7      4013714.7  2687310.0
##        HURRICANE/TYPHOON    3707282.5      2610011.8  1097270.8
##                ICE STORM     839965.0       824515.0    15450.0
##         LAKE-EFFECT SNOW      26000.0        26000.0        0.0
##          LAKESHORE FLOOD       7500.0         7500.0        0.0
##                LANDSLIDE     162814.0       142814.0    20000.0
##                LIGHTNING     121050.0       117650.0     3400.0
##         MARINE HIGH WIND       1000.0         1000.0        0.0
##           River Flooding     134010.0       105990.0    28020.0
##              STORM SURGE       2615.0         2610.0        5.0
##         STORM SURGE/TIDE     636550.0       635700.0      850.0
##              STRONG WIND     147750.0        84350.0    63400.0
##        THUNDERSTORM WIND    2879357.0      2539347.0   340010.0
##                  TORNADO   10395791.0     10166482.0   229309.0
##      TROPICAL DEPRESSION       1000.0         1000.0        0.0
##           TROPICAL STORM    1466135.4      1016695.4   449440.0
##                TSTM WIND     930751.6       514522.1   416229.5
##           TSTM WIND/HAIL      24329.0         2129.0    22200.0
##                  TSUNAMI     143320.0       143300.0       20.0
##                  TYPHOON      16050.0        15400.0      650.0
##     URBAN/SML STREAM FLD       4532.0         4290.0      242.0
##               WATERSPOUT       5000.0         5000.0        0.0
##         WILD/FOREST FIRE     145820.0        48070.0    97750.0
##                 WILDFIRE    2591708.0      2407926.0   183782.0
##             WINTER STORM     957140.0       949500.0     7640.0
##           WINTER WEATHER      23650.0         8650.0    15000.0
  • Total estimated damage (01/1996-11/2011, in k$): 69,954,045

Conclusions and recommendations

From a health perspective, heat, floods, tornadoes and lightning are the major factors. It is interesting to note that floods and tonadoes are also the two largest contributors to material damage.

In light of these results, public policies should therefore aim at: