Severe Weather Events Exploratory Analysis

Project 2

Reproducible Research

Johns Hopkins Bloomberg School of Public Health @ Coursera

M.Giglia October 24, 2015

Public Health and Economic Impacts of Severe Weather Events

Synaposis

Storms and other severe weather events can cause both public and economic problems for communities and municipalities including fatalities, injuries and property damage. Preventing these and other negative outcomes from a severe weather weather event is a key concern for government agencies. Severe Weather events such as excessive heat, hurricanes, and rip currents are the most harmful to public health across the United States. In terms of economic impact (property damage plus crop damage), the most harmful severe weather events all are related to water including hurricanes, storm surges, river floods and tropical storms.

Data Processing

Storms and other severe weather events can cause both public and economic problems for communities and municipalities including fatalities, injuries and property damage. Preventing these and other negative outcomes from a severe weather weather event is a key concern for government agencies. The U.S. National Oceanic and Atmospheric Administration’s (NOAA) database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. In this exploratory analysis we will use the NOAA storm database to answer two specific questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

The data set from the NOAA storm database is comma delimited and may be download from the following link:

Data set: Storm Data [47Mb]

Additional documentation about the NOAA database may also be found at the following links:

We’ll begin our analysis by directly connecting to the NOAA Storm Data set, and store it in an initial R object we’ll call import to differentiate it from any analytic data sets we’ll create later.

# download the file using a temporary connection
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
temp <- tempfile()
download.file(url=fileUrl, destfile = temp, method = 'curl')

# store the temporarily downloaded csv.bz2 file into the import dataframe
## use bzfile instead of unz to unzip a bz2 file, type ?unz for more info
import <- read.csv(bzfile(temp, "repdata-data-StormData.csv.bz2"), stringsAsFactors = FALSE, strip.white = TRUE)
unlink(temp)

The data set import contains 902297 observations and 37 attributes, however we don’t need all of them for our exploratory analysis to answer our two questions regarding the impact of meteorological events on public health and the economy, therefore we’ll limit our focus to the following columns:

  • EVTYPE: The type of storm event (i.e. Blizzard, Dense Fog, Flood, Heat, ..)
  • FATALITIES: The number of fatalities as a result of the storm event (direct or indirect)
  • INJURIES: The number of injuries as a result of the storm event (direct or indirect)
  • PROPDMG: The total property damage rounded to three significant digits used in conjunction with PROPDMGEXP to determine the appropriate size multiplier.
  • PROPDMGEXP: A letter code indicating the magnitude of the PROPDMG dollar amount {“K”,“M”,“B”} for “thousands”, “millions” and “billions” respectively.
  • CROPDMG: The total crop damage rounded to three significant digits used in conjunction with CROPDMGEXP to determine the the appropriate size multiplier.
  • CROPDMGEXP: A letter code indicating the magnitude of the CROPDMG dollar amount {“K”,“M”,“B”} for “thousands”, “millions” and “billions” respectively.
  • REFNUM: A unique reference number for the event.

We’ll select just these attributes plus LATITUDE and LONGITUDE in case we want to graph the storm data on a map later for fun. We’ll call this new data set main to differentiate it from our raw import data set.

# select only the attributes relevant to our analysis
main <- import[,c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP","LATITUDE","LONGITUDE","REFNUM")]

Now that we have our main set we’ll first want view the data’s structure and it’s first few observations to determine if we need to make any changes such as apply date formatting to dates, or correctly assigning certain string attributes as factors.

# view the structure of the dataframe
str(main)
## 'data.frame':    902297 obs. of  10 variables:
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
# view the first few observations
head(main)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0           
##   LATITUDE LONGITUDE REFNUM
## 1     3040      8812      1
## 2     3042      8755      2
## 3     3340      8742      3
## 4     3458      8626      4
## 5     3412      8642      5
## 6     3450      8748      6

EVTYPE, PROPDMGEXP, and CROPDMGEXP are shown as a character variables, but these variables can only take certain values, therefore it’s appropriate to change each of these to factor variables. Additionally we know that PROPDMGEXP and CROPDMGEXP are ordered with “K” < “M” < “B” since the values are codes that represent thousands, millions, and billions of dollars respectively. We can also see from the first few observations that CROPDMGEXP is null when CROPDMG is zero, this makes perfect sense since a magnitude multiplier is certainly not needed when the value it would be multiplying against is zero. Inspecting more of the data set shows that the same is true for PROPDMGEXP when PROPDMG is zero. To remove the nulls we’ll replace any missing values with of CROPDMGEXP or PROPDMGEXP with a “K” when their respective dollar amounts is zero. Our numeric variables that remain are all fine as they are for now, so we’ll only change the attribute types for our character variables.

# Change EVTYPE to a factor variable
main$EVTYPE <- factor(main$EVTYPE)

# Replace the null **PROPDMGEXP** and **CROPDMGEXP** with "K" when their respective dollar amounts is zero
## Note that this doesn't affect the magnitude calculation since zero time 1,000 is stil zero.
main[main$PROPDMG ==  0,]$PROPDMGEXP <- "K"
main[main$CROPDMG ==  0,]$CROPDMGEXP <- "K"

# Change PROPDMGEXP and CROPDMGEXP to ordered factor variables
main$PROPDMGEXP <- factor(main$PROPDMGEXP, levels = c("K","M","B"))
main$CROPDMGEXP <- factor(main$CROPDMGEXP, levels = c("K","M","B"))

# View the new structure 
str(main)
## 'data.frame':    902297 obs. of  10 variables:
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 3 levels "K","M","B": 1 1 1 1 1 1 1 1 1 1 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 3 levels "K","M","B": 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Now that we’ve changed EVTYPE into a factor variable we see that it’s not as clean as we would like with 985 levels! As it turns out the EVTYPE variable contains some misspellings of proper event names, the concatenation of various event types into a single event type, and in some cases the event type is listed as the “Summary of …” for a particular day. Fully cleaning the data set is outside the scope of this particular analysis and with 902297 observations hopefully these few bad records won’t impact our comparison analysis. However, there are a few things we can do to bring the number of levels down and that is to ensure that all of the values are in one “case” format, that is all uppercase letters or mixed case etc. Since the majority of “correct” EVTYPE values are uppercase we’ll make sure that the values are uppercase and then apply factor to it again to see how many levels we have after that.

#load the plyr package to take advantage of the mutate function
library(plyr)

## upper case all of the EVTYPE and remove leading and trailing spaces
main <- mutate(main, EVTYPE.fix = toupper(EVTYPE))

# returns string w/o leading or trailing whitespace
trim <- function (x) gsub("^\\s+|\\s+$", "", x)
main$EVTYPE.fix <- trim(main$EVTYPE.fix)

# turn into a factor variable again
main$EVTYPE.fix <- factor(main$EVTYPE.fix)

Creating a new EVTYPE.fix variable with all uppercase brought our never of levels down by 95 but it’s really still too many for our analysis later. Since we’ll be doing a mean minus mean multiple comparison, we really shouldn’t include values of EVTYPE.fix that only have a small number of observed events as the confidence intervals around the means would be too large to really be meaningful. We’ll make an executive decision at this point and only evaluate distinct EVTYPE.fix that are represented in at least 0.005% of the original import set’s observations and therefore remove any records where the EVTYPE.fix isn’t in at least 45.11485 observations. This should hopefully only have a minimal effect on the analysis since we have so many observations, however for a more formal analysis we may wish to spend the considerable time cleaning all of the data.

# calculate the frequency that each EVTYPE.fix appears in the main set
evtype.fix.tbl <- stack(table(main$EVTYPE.fix))

# isolate the values of EVTYPE.fix that appear in more than 0.005% of observations.
above.min.level <- unique(subset(evtype.fix.tbl, values >= nrow(import)*0.00005, "ind"))$ind

# subset the main data set such that it only contains storm events that meet the minimum number of occurences.
main <- subset(main, EVTYPE.fix %in% above.min.level)

# reset the levels of EVTYPE.fix
main$EVTYPE.fix <- factor(as.character(main$EVTYPE.fix))

# check the number of levels in our fixed factored EVTYPE
str(main$EVTYPE.fix)
##  Factor w/ 92 levels "ASTRONOMICAL HIGH TIDE",..: 75 75 75 75 75 75 75 75 75 75 ...

We’ve now brought the number of levels down from 985 to 92 which is much more manageable even though it’s still a very large number and we’ve likely not cleaned up it completely, however this will be a very good start for our analysis of variance and mean minus mean multiple comparisons. Now we will need to make sure that all of the dollar amounts are represented in the same units, so we’ll create two new variables that take the values of PROPDMG and CROPDMG and applies their magnitude from PROPDMGEXP and CROPDMGEXP respectively. Additionally, a full economic impact variable by taking the sum of our new property and crop damage variables.

## create new variables for the dollar amounts that may be used for apples to apples comparisons.
main <- mutate(main,
           
                    PROPDMG.fix = ifelse(PROPDMGEXP == "B", PROPDMG * 10^9, ifelse(
                         PROPDMGEXP == "M", PROPDMG * 10^6, PROPDMG * 10^3
                              ) 
                        )
                   
                   ,CROPDMG.fix = ifelse(CROPDMGEXP == "B", CROPDMG * 10^9, ifelse(
                         CROPDMGEXP == "M", CROPDMG * 10^6, CROPDMG * 10^3
                              )
                        )
                  ,econ.impact = PROPDMG.fix + CROPDMG.fix
)

### view the first few observations of our main data set
head(main)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0          K
## 2 TORNADO          0        0     2.5          K       0          K
## 3 TORNADO          0        2    25.0          K       0          K
## 4 TORNADO          0        2     2.5          K       0          K
## 5 TORNADO          0        2     2.5          K       0          K
## 6 TORNADO          0        6     2.5          K       0          K
##   LATITUDE LONGITUDE REFNUM EVTYPE.fix PROPDMG.fix CROPDMG.fix econ.impact
## 1     3040      8812      1    TORNADO       25000           0       25000
## 2     3042      8755      2    TORNADO        2500           0        2500
## 3     3340      8742      3    TORNADO       25000           0       25000
## 4     3458      8626      4    TORNADO        2500           0        2500
## 5     3412      8642      5    TORNADO        2500           0        2500
## 6     3450      8748      6    TORNADO        2500           0        2500

We now have a data set that contains variables we can use to compare both the public health and the economic impact of storm events and determine if there are any storm types that are more harmful than others across the United States. But before we start in our analysis, let’s quickly look at a summary of the data to determine if we have any missing values and to see the ranges for our continuous variables.

summary(main)
##                EVTYPE         FATALITIES          INJURIES        
##  HAIL             :288661   Min.   :  0.0000   Min.   :   0.0000  
##  TSTM WIND        :219940   1st Qu.:  0.0000   1st Qu.:   0.0000  
##  THUNDERSTORM WIND: 82563   Median :  0.0000   Median :   0.0000  
##  TORNADO          : 60652   Mean   :  0.0164   Mean   :   0.1548  
##  FLASH FLOOD      : 54277   3rd Qu.:  0.0000   3rd Qu.:   0.0000  
##  FLOOD            : 25326   Max.   :583.0000   Max.   :1700.0000  
##  (Other)          :167930                                         
##     PROPDMG        PROPDMGEXP       CROPDMG       CROPDMGEXP   
##  Min.   :   0.00   K   :887787   Min.   :  0.00   K   :897433  
##  1st Qu.:   0.00   M   : 11202   1st Qu.:  0.00   M   :  1874  
##  Median :   0.00   B   :    34   Median :  0.00   B   :     7  
##  Mean   :  12.04   NA's:   326   Mean   :  1.52   NA's:    35  
##  3rd Qu.:   0.50                 3rd Qu.:  0.00                
##  Max.   :5000.00                 Max.   :990.00                
##                                                                
##     LATITUDE      LONGITUDE          REFNUM      
##  Min.   :   0   Min.   :-14451   Min.   :     1  
##  1st Qu.:2830   1st Qu.:  7313   1st Qu.:225854  
##  Median :3542   Median :  8713   Median :452452  
##  Mean   :2884   Mean   :  6962   Mean   :451679  
##  3rd Qu.:4020   3rd Qu.:  9607   3rd Qu.:677407  
##  Max.   :9706   Max.   : 17124   Max.   :902297  
##  NA's   :47                                      
##              EVTYPE.fix      PROPDMG.fix         CROPDMG.fix       
##  HAIL             :288661   Min.   :0.000e+00   Min.   :0.000e+00  
##  TSTM WIND        :219946   1st Qu.:0.000e+00   1st Qu.:0.000e+00  
##  THUNDERSTORM WIND: 82564   Median :0.000e+00   Median :0.000e+00  
##  TORNADO          : 60652   Mean   :4.624e+05   Mean   :5.338e+04  
##  FLASH FLOOD      : 54278   3rd Qu.:5.000e+02   3rd Qu.:0.000e+00  
##  FLOOD            : 25327   Max.   :1.150e+11   Max.   :5.000e+09  
##  (Other)          :167921   NA's   :326         NA's   :35         
##   econ.impact       
##  Min.   :0.000e+00  
##  1st Qu.:0.000e+00  
##  Median :0.000e+00  
##  Mean   :5.158e+05  
##  3rd Qu.:1.000e+03  
##  Max.   :1.150e+11  
##  NA's   :361

Interestingly we do have some nulls remaining for PROPDMGEXP and CROPDMGEXP even after we accounted for storm events that had zero dollars amounts. We also see that these nulls carried through in our calculations of our magnitude adjusted dollar amount variables PROMPDMG.fix, CROPDMG.fix and econ.impact since our nested if-then-else statements in the mutate function didn’t know what to use for the multiplication when PROPDMGEXP or CROPDMEXP were null. Note that we picked up null values when we created factor variables that only took the three values of “K”, “M” and “B” respectively as defined in the data dictionary, causing other values to be removed. Let’s look at the original import set again to see what else was present in PROPDMGEXP and CROPDMGEXP besides the values we were looking for:

# view the unique values found in PROPDMGEXP
unique(import$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
# view the unique values found in CROPDMGEXP
unique(import$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

Looking at the unique values we see that there is some bad data including numbers, signs, and lowercase letters. Since only a very small percentage of the original data set is affected by null values lets remove any records that contain a null by using the complete.cases function.

main <- main[complete.cases(main),]

Now our final data main data set has 898941 observations, a difference of only 0.37% from our originally imported data set. But this final main data set now only contains observations with complete not null values. We’re now ready to being our analysis using an ANOVA and eventually a mean minus mean comparison.


Results

Event Type and Fatalities

We’ll begin our first event type comparison by looking at how harmful a storm event is to public health in terms of fatalities and we’ll run an analysis of variance test to see if there is any relationship between EVTYPE.fix and FATALITIES.

# set up the ANOVA to see if there is any difference in mean fatalities by EVTYPE
fatal.aov <- aov(FATALITIES ~ EVTYPE.fix, main)

# view the summary for the ANOVA model
summary(fatal.aov)
##                 Df Sum Sq Mean Sq F value Pr(>F)    
## EVTYPE.fix      91   4837   53.16   92.54 <2e-16 ***
## Residuals   898849 516333    0.57                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With a p-value less than 0.05 we can reject the null hypothesis that all of the storm event types have statistically the same mean. There are two many levels still in our EVTYPE.fix variable to do a full mean minus mean multiple comparison so we’ll attempt to use a hierarchical cluster analysis to attempt differentiate storm event types together by their means.

# load the reshape2 package to take advantage of the melt and dcast functions
library(reshape2)

# melt the main data set
main.melt <- melt(main, id.vars = c("REFNUM", "EVTYPE.fix", "LATITUDE", "LONGITUDE"), measure.vars = c("FATALITIES", "INJURIES", "econ.impact"))

# recast to show the means of measure variables by EVTYPE.fix
EVTYPE.means <- dcast(main.melt, EVTYPE.fix ~ variable, mean)

# create a variable that we may referennce for the EVTYPE.fix factor variables numeric represenation.
EVTYPE.means$EVTYPE.fix.num <- as.numeric(EVTYPE.means$EVTYPE.fix)

#view the EVTYPE means table
EVTYPE.means
##                  EVTYPE.fix   FATALITIES     INJURIES  econ.impact
## 1    ASTRONOMICAL HIGH TIDE 0.000000e+00 0.000000e+00 9.150485e+04
## 2     ASTRONOMICAL LOW TIDE 0.000000e+00 0.000000e+00 1.839080e+03
## 3                 AVALANCHE 5.803109e-01 4.404145e-01 9.641969e+03
## 4                  BLIZZARD 3.714601e-02 2.960647e-01 2.836609e+05
## 5             COASTAL FLOOD 4.566210e-03 3.044140e-03 3.950846e+05
## 6          COASTAL FLOODING 1.639344e-02 0.000000e+00 7.268934e+05
## 7                      COLD 4.634146e-01 5.853659e-01 6.756098e+03
## 8           COLD/WIND CHILL 1.762523e-01 2.226345e-02 4.805195e+03
## 9                 DENSE FOG 1.392111e-02 2.645012e-01 7.481825e+03
## 10                  DROUGHT 0.000000e+00 1.607717e-03 6.036444e+06
## 11           DRY MICROBURST 1.612903e-02 1.505376e-01 3.627742e+04
## 12               DUST DEVIL 1.342282e-02 2.885906e-01 4.823020e+03
## 13               DUST STORM 5.152225e-02 1.030445e+00 2.025527e+04
## 14           EXCESSIVE HEAT 1.134088e+00 3.888558e+00 2.980666e+05
## 15             EXTREME COLD 2.465753e-01 3.515982e-01 2.101538e+06
## 16  EXTREME COLD/WIND CHILL 1.247505e-01 2.395210e-02 8.680639e+03
## 17        EXTREME WINDCHILL 8.333333e-02 2.450980e-02 8.703431e+04
## 18              FLASH FLOOD 1.803430e-02 3.276784e-02 3.238259e+05
## 19           FLASH FLOODING 2.794118e-02 1.176471e-02 4.748054e+05
## 20                    FLOOD 1.855874e-02 2.680750e-01 5.935624e+06
## 21        FLOOD/FLASH FLOOD 2.733119e-02 2.411576e-02 4.325932e+05
## 22                 FLOODING 5.128205e-02 1.709402e-02 1.000944e+06
## 23                      FOG 1.152416e-01 1.364312e+00 2.445260e+04
## 24                   FREEZE 1.315789e-02 0.000000e+00 6.012237e+06
## 25             FREEZING FOG 0.000000e+00 0.000000e+00 4.743478e+04
## 26            FREEZING RAIN 2.692308e-02 8.846154e-02 3.133269e+04
## 27                    FROST 1.754386e-02 5.263158e-02 1.158158e+06
## 28             FROST/FREEZE 0.000000e+00 0.000000e+00 8.225361e+05
## 29                   FUNNEL 0.000000e+00 0.000000e+00 0.000000e+00
## 30             FUNNEL CLOUD 0.000000e+00 4.384042e-04 2.843782e+01
## 31            FUNNEL CLOUDS 0.000000e+00 0.000000e+00 0.000000e+00
## 32              GUSTY WINDS 6.153846e-02 1.692308e-01 2.278462e+04
## 33                     HAIL 5.197091e-05 4.705100e-03 6.488637e+04
## 34                     HEAT 1.221643e+00 2.737940e+00 5.257608e+05
## 35                HEAT WAVE 2.293333e+00 5.053333e+00 2.134673e+05
## 36               HEAVY RAIN 8.366058e-03 2.142735e-02 1.218754e+05
## 37               HEAVY SNOW 7.958235e-03 6.500286e-02 6.794692e+04
## 38               HEAVY SURF 9.195402e-02 4.597701e-01 1.655172e+04
## 39     HEAVY SURF/HIGH SURF 1.842105e-01 2.105263e-01 4.328947e+04
## 40                HIGH SURF 1.416894e-01 2.125341e-01 1.225545e+05
## 41                HIGH WIND 1.217099e-02 5.625371e-02 2.923322e+05
## 42               HIGH WINDS 2.289078e-02 1.962067e-01 4.244894e+05
## 43                HURRICANE 3.505747e-01 2.643678e-01 8.396683e+07
## 44        HURRICANE/TYPHOON 7.272727e-01 1.448864e+01 8.172013e+08
## 45                      ICE 9.836066e-02 2.245902e+00 2.074590e+05
## 46                ICE STORM 4.438903e-02 9.850374e-01 4.472338e+06
## 47         LAKE-EFFECT SNOW 0.000000e+00 0.000000e+00 6.307390e+04
## 48                LANDSLIDE 6.333333e-02 8.666667e-02 5.743550e+05
## 49               LIGHT SNOW 5.681818e-03 1.136364e-02 1.427841e+04
## 50                LIGHTNING 5.186221e-02 3.322741e-01 5.978336e+04
## 51              MARINE HAIL 0.000000e+00 0.000000e+00 9.049774e+00
## 52         MARINE HIGH WIND 7.407407e-03 7.407407e-03 9.607481e+03
## 53       MARINE STRONG WIND 2.916667e-01 4.583333e-01 8.715208e+03
## 54 MARINE THUNDERSTORM WIND 1.720578e-03 4.473503e-03 8.368892e+01
## 55         MARINE TSTM WIND 1.457490e-03 1.295547e-03 8.778947e+02
## 56        MODERATE SNOWFALL 0.000000e+00 0.000000e+00 0.000000e+00
## 57                    OTHER 0.000000e+00 7.692308e-02 2.095962e+04
## 58              RECORD COLD 1.492537e-02 0.000000e+00 8.358209e+05
## 59              RECORD HEAT 2.439024e-02 6.097561e-01 0.000000e+00
## 60            RECORD WARMTH 0.000000e+00 0.000000e+00 0.000000e+00
## 61              RIP CURRENT 7.829787e-01 4.936170e-01 2.127660e+00
## 62             RIP CURRENTS 6.710526e-01 9.769737e-01 5.328947e+02
## 63              RIVER FLOOD 1.156069e-02 1.156069e-02 5.866130e+07
## 64                    SLEET 3.389831e-02 0.000000e+00 0.000000e+00
## 65               SMALL HAIL 0.000000e+00 1.886792e-01 3.936415e+05
## 66                     SNOW 8.103728e-03 5.024311e-02 2.404789e+04
## 67              STORM SURGE 4.980843e-02 1.455939e-01 1.659906e+08
## 68         STORM SURGE/TIDE 7.432432e-02 3.378378e-02 3.136512e+07
## 69              STRONG WIND 2.885962e-02 7.845335e-02 6.730539e+04
## 70             STRONG WINDS 3.921569e-02 1.029412e-01 3.634701e+04
## 71        THUNDERSTORM WIND 1.610950e-03 1.802326e-02 4.721371e+04
## 72       THUNDERSTORM WINDS 3.104084e-03 4.399069e-02 9.304564e+04
## 73  THUNDERSTORM WINDS HAIL 0.000000e+00 0.000000e+00 1.273684e+04
## 74      THUNDERSTORM WINDSS 0.000000e+00 8.000000e-02 3.790300e+04
## 75                  TORNADO 9.285220e-02 1.505525e+00 9.448567e+05
## 76      TROPICAL DEPRESSION 0.000000e+00 0.000000e+00 2.895000e+04
## 77           TROPICAL STORM 8.405797e-02 4.927536e-01 1.214817e+07
## 78                TSTM WIND 2.291482e-03 3.163063e-02 2.294694e+04
## 79           TSTM WIND/HAIL 4.863813e-03 9.241245e-02 1.060620e+05
## 80         UNSEASONABLY DRY 0.000000e+00 0.000000e+00 0.000000e+00
## 81        UNSEASONABLY WARM 8.730159e-02 1.349206e-01 7.936508e+01
## 82              URBAN FLOOD 0.000000e+00 0.000000e+00 7.653984e+04
## 83           URBAN FLOODING 0.000000e+00 0.000000e+00 6.032424e+04
## 84     URBAN/SML STREAM FLD 8.254717e-03 2.329009e-02 1.969273e+04
## 85               WATERSPOUT 7.900974e-04 7.637609e-03 2.463445e+03
## 86         WILD/FOREST FIRE 8.236102e-03 3.740563e-01 2.133580e+06
## 87                 WILDFIRE 2.716407e-02 3.299529e-01 1.832882e+06
## 88                     WIND 6.628242e-02 2.478386e-01 2.592075e+04
## 89             WINTER STORM 1.801959e-02 1.155528e-01 5.874249e+05
## 90           WINTER WEATHER 4.684173e-03 5.649397e-02 5.090987e+03
## 91       WINTER WEATHER/MIX 2.536232e-02 6.521739e-02 5.771739e+03
## 92               WINTRY MIX 1.063830e-02 8.191489e-01 1.329787e+02
##    EVTYPE.fix.num
## 1               1
## 2               2
## 3               3
## 4               4
## 5               5
## 6               6
## 7               7
## 8               8
## 9               9
## 10             10
## 11             11
## 12             12
## 13             13
## 14             14
## 15             15
## 16             16
## 17             17
## 18             18
## 19             19
## 20             20
## 21             21
## 22             22
## 23             23
## 24             24
## 25             25
## 26             26
## 27             27
## 28             28
## 29             29
## 30             30
## 31             31
## 32             32
## 33             33
## 34             34
## 35             35
## 36             36
## 37             37
## 38             38
## 39             39
## 40             40
## 41             41
## 42             42
## 43             43
## 44             44
## 45             45
## 46             46
## 47             47
## 48             48
## 49             49
## 50             50
## 51             51
## 52             52
## 53             53
## 54             54
## 55             55
## 56             56
## 57             57
## 58             58
## 59             59
## 60             60
## 61             61
## 62             62
## 63             63
## 64             64
## 65             65
## 66             66
## 67             67
## 68             68
## 69             69
## 70             70
## 71             71
## 72             72
## 73             73
## 74             74
## 75             75
## 76             76
## 77             77
## 78             78
## 79             79
## 80             80
## 81             81
## 82             82
## 83             83
## 84             84
## 85             85
## 86             86
## 87             87
## 88             88
## 89             89
## 90             90
## 91             91
## 92             92
# perform a hierarchical cluster on the mean of the fatalities.
plot(hclust(dist(EVTYPE.means$FATALITIES)))

From the Cluster Dendrogram for the mean fatalities we can see that we are able to see three distinct clusters with a height of 1.0. The first cluster simply contains the EVTYPE.fix.num of “35” which happens to be the storm event “Heat Wave” with a mean number of fatalities of nearly three people. Additionally on the far right of the dendrogram we see the set of “14”, “34”, “44”, “61”, “3”, and “62” which corresponds to “Excessive Heat”, “Heat”, “Hurricane/Typhoon”, “Rip Current”, “Avalanche”, and “Rip Currents” respectively. Each of these has a mean number of FATALITIES that is greater than 0.5 people and not surprisingly some of them are related to “Heat” and “rip currents” appears twice (one plural and one not.) The other cluster contains the other EVTYPE with many of these very close to zero and some of them statistically zero. Therefor we can see that storm events related to “heat waves”, “hurricane/typhoons”, “avalanches” and “rip currents” are the most deadly storm events.


Event Type and Injuries

We now direct our attention to injuries, and luckily we’ve already done all of the leg work since we already have our table with the mean number of INJURIES that we calculated in our re-casted EVTYPE.means table. We only need to look at a cluster dendrogram for the INJURIES to see if there are any EVTYPE that stand out from the rest.

# perform a hierarchical cluster on the mean of the injuries.
plot(hclust(dist(EVTYPE.means$INJURIES)))

This time “44” stands our from the rest, and we already know that this one corresponds to “Hurricane/Typhoon.” Checking the table we see that this corresponds to approximately 1.5 injuries on average. At a height around 2 we also can break off values of “14”, “35”, “34”, and “45” again away from all the rest, and not surprisingly these same values appeared when we looked at fatalities. These are “Excessive Heat”, “Heat Wave”, “Heat”, and a new one “Ice”. Each of these has at least 1 injury on average.

Combining the analysis from FATALITIES and INJURIES we see that in terms of population health the most harmful are related to extreme temperatures such as “heat” and “Ice”, and extreme storms such as a “Hurricane/Typhoon”. “Rip Currents” might not cause many INJURIES but they are likely to cause deaths.


Event Type and Economic Impact

We’ll now look at the Economic Impact of the event storms in much the same way using a cluster dendrogram and our EVTYPE.means table.

# perform a hierarchical cluster on the mean of the econonmic impact.
plot(hclust(dist(EVTYPE.means$econ.impact)))

From this cluster dendrogram it looks like we can split off “44” as the first cluster which is “Hurricane/Typhoon” with an average total economic impact of over $800,000,000. “67” appears next as it’s own cluster, and this is “Storm Surge” with an average economic impact of over $160,000,000. Two more clusters appear that look like they may be close together and these include “43”, “63”, “68”, and “77” which correspond to “Hurricane”, “River Flood”, “Storm Surge/Tide”, and “Tropical Storm”. Each of these has a average economic impact of over $10,000,000. Not surprisingly all of these are related to excessive water damage.