Synopsis:

The National Climactic Data Center (NCDC) regularly receives storm data from the National Weather service. It should be possible, using this data, to determine which types of events have the greatest effect on the physical health of the human population and the greatest damage potential with respect to property and crops. Because there are many different values for “Event Type” and many of them are similar (misspellings, capitalizaion differences, similar words/terms, etc) it is first necessary to group the event types into larger categories. As an example, “WINTRY MIX” is the same as “wintry/mix”. The economic effects are measured at different scales depending on the row of data, so the second step in preprocessing is to convert these numbers to the same scale so that they may be compared. To get usable numbers for comparison we then group the data by event type and calculate the total, median, and mean (average) effects of fatalities, injuries, property damage, and crop damage for each event type. Separate code shows that the length of time events have been recorded varies greatly depending on the event, so the mean (average) effect is preferred to total (absolute) effect in order to determine the events that are the most physically harmful or have the greatest economic consequences.

Data Processing

First we need to download the data and load it into R.

Unfortunately I had trouble with making the download/unzip code work, so the actual download of the file had to be done manually. We can run a simple command to check that the correct data is present and loaded.

stormDataFileName <- "repdata_data_StormData.csv.bz2"
file.info(stormDataFileName)$size
## [1] 49177144

That’s what we expected.

a <- Sys.time()
stormData <- read.csv(file = stormDataFileName, as.is = TRUE)
b <- Sys.time()
print(paste("Reading the file from", a, "to", b, "."))
## [1] "Reading the file from 2019-03-17 16:42:34 to 2019-03-17 16:43:15 ."

The following libraries are necessary for this particular analysis.

        library(forcats)
        library(dplyr)
        library(lubridate)
        library(knitr)

Check some basic information about the data file.

dim(stormData)
## [1] 902297     37

90229x37 - That’s correct

head(stormData)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

That looks good.

length(stormData$EVTYPE)
## [1] 902297
length(toupper(stormData$EVTYPE))
## [1] 902297

902297 and 902297. That’s what we wanted to see.

As noted in the synopsis it is the case that there are a lot of similar names in EVTYPE that need to be combined. This has to be somewhat arbitrary, so my goal was at least to group all of the event types used only once into larger groups with other similar event types. Note that hurricanes, tropical Storms, tropical cyclones, and typhoons are all roughly the same thing.

stormData$EVTYPE2 <- toupper(stormData$EVTYPE)
table(unique(stormData$EVTYPE2))

Using the defined weather types in the NATIONAL WEATHER SERVICE INSTRUCTION 10-1605 document of 17 August 2017 I coalesce the event type cateogries into a more manageable set.

EVTYPE3 <-
        fct_collapse(
                stormData$EVTYPE2,
                ASTRONOMICAL_LOW_TIDE = grep("AST.*LO.*TI.*", stormData$EVTYPE2, value = TRUE),
                AVALANCHE = grep("AVAL", stormData$EVTYPE2, value = TRUE),
                BLIZZARD = grep("BLIZ", stormData$EVTYPE2, value = TRUE),
                COASTAL_FLOOD = grep("COA.*FLO.*", stormData$EVTYPE2, value = TRUE),
                COLD_WIND_CHILL = grep("COL.*WIN.*CHI", stormData$EVTYPE2, value = TRUE),
                DEBRIS_FLOW = grep("DEB.*FL.*", stormData$EVTYPE2, value = TRUE),
                DENSE_FOG = grep("DEN.*FOG", stormData$EVTYPE2, value = TRUE),
                DENSE_SMOKE = grep("DEN.*SMO", stormData$EVTYPE2, value = TRUE),
                DROUGHT = grep("DROU.*|DRY", stormData$EVTYPE2, value = TRUE),
                DUST_DEVIL = grep("DUS.*DEV", stormData$EVTYPE2, value = TRUE),
                DUST_STORM = grep("DUS.*STO", stormData$EVTYPE2, value = TRUE),
                EXCESSIVE_HEAT = grep("EXC.*HEA|[UN.*|REC.*]WARM", stormData$EVTYPE2, value = TRUE),
                EXTREME_COLD_WIND_CHILL = grep("EXTR.*COL|WIN.*CHI|UN.*COL", stormData$EVTYPE2, value = TRUE),
                FLASH_FLOOD = grep("FLA.*FLO", stormData$EVTYPE2, value = TRUE),
                FLOOD = grep("[^FLOOD$]|FLDG|FLOODS|FLOOD$", stormData$EVTYPE2, value = TRUE),
                FROST_FREEZE = grep("FROST|FREEZE", stormData$EVTYPE2, value = TRUE),
                FUNNEL_CLOUD = grep("FUN.*CLO", stormData$EVTYPE2, value = TRUE),
                FREEZING_FOG = grep("FREEZ.*FOG|GLAZ", stormData$EVTYPE2, value = TRUE),
                HAIL = grep("HAIL", stormData$EVTYPE2, value = TRUE),
                HEAVY_RAIN = grep("HEAV.*RAI", stormData$EVTYPE2, value = TRUE),
                HEAVY_SNOW = grep("HEAV.*SNO", stormData$EVTYPE2, value = TRUE),
                HIGH_SURF = grep("HIG.*SURF", stormData$EVTYPE2, value = TRUE),
                HIGH_WIND = grep("HIG.*WIN", stormData$EVTYPE2, value = TRUE),
                HURRICANE = grep("HURR|TYPH", stormData$EVTYPE2, value = TRUE),
                ICE_STORM = grep("ICE.*STORM", stormData$EVTYPE2, value = TRUE),
                LAKE_EFFECT_SNOW = grep("LAKE.*EFF.*SNO", stormData$EVTYPE2, value = TRUE),
                LAKESHORE_FLOOD = grep("LAKESH.*FLO", stormData$EVTYPE2, value = TRUE),
                LIGHTNING = grep("LIGHT[N|I]", stormData$EVTYPE2, value = TRUE),
                MARINE_HAIL = grep("MAR.*HAIL", stormData$EVTYPE2, value = TRUE),
                MARINE_HIGH_WIND = grep("MAR.*HIGH.*WIND", stormData$EVTYPE2, value = TRUE),
                MARINE_STRONG_WIND = grep("MAR.*STR.*WIND", stormData$EVTYPE2, value = TRUE),
                MARINE_HIGH_WIND = grep("MAR.*[THUND|TSTM].*WIND", stormData$EVTYPE2, value = TRUE),
                RIP_CURRENT = grep("^RIP$", stormData$EVTYPE2, value = TRUE),
                SEICHE = grep("SEIC", stormData$EVTYPE2, value = TRUE),
                SLEET = grep("SLEET", stormData$EVTYPE2, value = TRUE),
                STORM_SURGE_TIDE = grep("STORM.*SURGE.*?[TIDE]", stormData$EVTYPE2, value = TRUE),
                STRONG_WIND = grep("STR.*WIND", stormData$EVTYPE2, value = TRUE),
                THUNDERSTORM_WIND = grep("[THUND|TSTM].*WIND", stormData$EVTYPE2, value = TRUE),
                TORNADO = grep("TORN", stormData$EVTYPE2, value = TRUE),
                TROPICAL_DEPRESSION = grep("TROP.*DEPR", stormData$EVTYPE2, value = TRUE),
                TROPICAL_STORM = grep("TROP.*STOR", stormData$EVTYPE2, value = TRUE),
                TSUNAMI = grep("TSUN|WAVE", stormData$EVTYPE2, value = TRUE),
                VOLCANIC_ASH = grep("VOLCAN.*?[ASH]", stormData$EVTYPE2, value = TRUE),
                WATERSPOUT = grep("WAT.*OUT", stormData$EVTYPE2, value = TRUE),
                WILDFIRE = grep("?[WILD]FIRE", stormData$EVTYPE2, value = TRUE),
                WINTER_STORM = grep("WINT.*STOR", stormData$EVTYPE2, value = TRUE),
                WINTER_WEATHER = grep("WINT.*MIX|WET.*SNO", stormData$EVTYPE2, value = TRUE)
        )
table(EVTYPE3)
## EVTYPE3
##           HIGH_SURF               FLOOD           LIGHTNING 
##                 969              111530               15767 
##   THUNDERSTORM_WIND          WATERSPOUT        FROST_FREEZE 
##              364444                3862                1512 
##          HEAVY_SNOW        WINTER_STORM             TORNADO 
##               15810               11441               60687 
##             TSUNAMI                HAIL          HEAVY_RAIN 
##                 107              288836               11799 
##        FREEZING_FOG               SLEET        FUNNEL_CLOUD 
##                  91                 122                6933 
##           ICE_STORM      WINTER_WEATHER           HURRICANE 
##                2032                1211                 298 
##    LAKE_EFFECT_SNOW     LAKESHORE_FLOOD         MARINE_HAIL 
##                 659                  23                 442 
##              SEICHE    STORM_SURGE_TIDE TROPICAL_DEPRESSION 
##                  21                 148                  60 
##      TROPICAL_STORM        VOLCANIC_ASH            WILDFIRE 
##                 697                  27                2769

Note that even though 48 categories are permitted it is the case that only 27 end up being used.

We check the loaded types and compare to the given knowledge of the dataset.

length(EVTYPE3)
## [1] 902297
length(unique(EVTYPE3))
## [1] 27

902297, and 27. So EVTYPE3 has event type values for every event, and we have reduced the number of unique event types to the 27 shown below.

The property damage and crop damage amounts are accompanied by (exponent) values indicating the scale (as a power of 10) of each value. We examine those to see what we are working with.

table(stormData$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330
table(stormData$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994

Here’s the summary

Note that Both PROPDMGEXP and CROPDMGEXP are of class “factor”. We have to change it to be a character for the sake of future code.

sapply(stormData$PROPDMGEXP, as.character)
sapply(stormData$CROPDMGEXP, as.character)

The following command converts the numeric damage values to the same scale, groups everything by event type, and then creates summaries for total, median, and mean effect of each event type on each of the four effect types (fatality, injury, property damage, and crop damage). Finally it combines fatalities and injuries into one number and combines property damage and crop damage into another number.

a <- Sys.time()
stormDataByType <- stormData %>%
        mutate(PropDmg = if_else(
                PROPDMGEXP %in% c("h", "H"),
                PROPDMG * 10 ^ 2
                ,
                ifelse(
                        PROPDMGEXP %in% c("k", "K"),
                        PROPDMG * 10 ^ 3
                        ,
                        ifelse(
                                PROPDMGEXP %in% c("m", "M"),
                                PROPDMG * 10 ^ 6
                                ,
                                ifelse(
                                        PROPDMGEXP %in% c("b", "B"),
                                        PROPDMG * 10 ^ 9
                                        ,
                                        ifelse(
                                                PROPDMGEXP %in% c("-", "?", ""),
                                                0
                                                ,
                                                ifelse(PROPDMGEXP == "+",
                                                       PROPDMG
                                                       ,
                                                       PROPDMG * 10 ^ as.numeric(PROPDMGEXP))
                                        )
                                )
                        )
                )
        )) %>%
        mutate(CropDmg = if_else(
                CROPDMGEXP %in% c("h", "H"),
                CROPDMG * 10 ^ 2
                ,
                ifelse(
                        CROPDMGEXP %in% c("k", "K"),
                        CROPDMG * 10 ^ 3
                        ,
                        ifelse(
                                CROPDMGEXP %in% c("m", "M"),
                                CROPDMG * 10 ^ 6
                                ,
                                ifelse(
                                        CROPDMGEXP %in% c("b", "B"),
                                        CROPDMG * 10 ^ 9
                                        ,
                                        ifelse(
                                                CROPDMGEXP %in% c("-", "?", ""),
                                                0
                                                ,
                                                ifelse(CROPDMGEXP == "+",
                                                       CROPDMG
                                                       ,
                                                       CROPDMG * 10 ^ as.numeric(CROPDMGEXP))
                                        )
                                )
                        )
                )
        )) %>%
        group_by(EVTYPE3) %>%
        summarise(
                TtlFat =  sum(FATALITIES),
                MedFat = median(FATALITIES),
                MeanFat =  mean(FATALITIES),

                TtlInj = sum(INJURIES),
                MedInj = median(INJURIES),
                MeanInj = mean(INJURIES),

                TtlPopEff = TtlFat + TtlInj,
                MedPopEff = MedFat + MedInj,
                MeanPopEff = MeanFat + MeanInj,

                TtlPropDmg = sum(PropDmg),
                MedPropDmg = median(PropDmg),
                MeanPropDmg = mean(PropDmg),

                TtlCropDmg = sum(CropDmg),
                MedCropDmg = median(CropDmg),
                MeanCropDmg = mean(CropDmg),

                TtlEconEff = TtlPropDmg + TtlCropDmg,
                MedEconEff = MedPropDmg + MedCropDmg,
                MeanEconEff = MeanPropDmg + MeanCropDmg
        )
b <- Sys.time()
b-a
## Time difference of 1.418236 secs

Results

Which events are most harmful with respect to public health?

Note that the total number of effects on the population could depend heavily on the length of time that the event has been recorded.

typeDuration <- matrix(nrow = 0, ncol = 2)
for (col in unique(EVTYPE3)) {
        data <- stormData %>% filter(EVTYPE3 == col)
        parsed <- mdy_hms(data$BGN_DATE)
        maxDate <- max(parsed, na.rm = TRUE)
        minDate <- min(parsed, na.rm = TRUE)
        diff <- as.numeric(maxDate - minDate)
        typeDuration <- rbind(typeDuration, list(as.character(col), diff))
}
typeDuration <- data.frame(typeDuration)
names(typeDuration) <- c("Type", 'Duration')
typeDuration$Duration <- as.numeric(typeDuration$Duration)
arrange(typeDuration, desc(Duration))
##                   Type Duration
## 1              TORNADO    22602
## 2                 HAIL    20763
## 3    THUNDERSTORM_WIND    20756
## 4                FLOOD     6907
## 5           HEAVY_SNOW     6907
## 6         WINTER_STORM     6902
## 7           HEAVY_RAIN     6902
## 8            LIGHTNING     6896
## 9            ICE_STORM     6894
## 10          WATERSPOUT     6891
## 11        FUNNEL_CLOUD     6890
## 12           HIGH_SURF     6871
## 13    LAKE_EFFECT_SNOW     6847
## 14        FROST_FREEZE     6833
## 15      TROPICAL_STORM     6817
## 16           HURRICANE     6783
## 17               SLEET     6532
## 18            WILDFIRE     6414
## 19        FREEZING_FOG     6242
## 20             TSUNAMI     6115
## 21              SEICHE     4747
## 22      WINTER_WEATHER     4744
## 23        VOLCANIC_ASH     4370
## 24 TROPICAL_DEPRESSION     3977
## 25         MARINE_HAIL     3455
## 26    STORM_SURGE_TIDE     2223
## 27     LAKESHORE_FLOOD     1677

Note that TORNADO, HAIL, and THUNDERSTORM_WIND have been recorded for 56+ years. Everything else has been recorded for less than 19 years. That extra 37 years gives those three a big head start on the total number of incidents.

Thinking about that, let’s look at the results for the total absolute effect on the physical health of the population (as measured by fatalities and injuries). We will look at the top 5 (in order to get past the outliers we’ve identified).

n <- length(stormDataByType$TtlPopEff)
for(i in 0:4) {
x <- which(stormDataByType$TtlPopEff == sort(stormDataByType$TtlPopEff,partial=n-i)[n-i])
        print(paste(as.character(stormDataByType$EVTYPE3[x]), as.character(stormDataByType$TtlPopEff[x])))
}
## [1] "TORNADO 97022"
## [1] "FLOOD 28687"
## [1] "THUNDERSTORM_WIND 12796"
## [1] "LIGHTNING 6049"
## [1] "ICE_STORM 2081"

The top values come from TORNADO, FLOOD, THUNDERSTORM_WIND, LIGHTNING, and ICE_STORM (in that descending order). if we ignore the outliers then FLOOD has the most absolute effect.

We can try looking at the median. Maybe that will be more useful.

with(stormDataByType, table(MedPopEff))
## MedPopEff
##  0 
## 27
n <- length(stormDataByType$MedPopEff)
for(i in 0:1) {
        x <- which(stormDataByType$MedPopEff == sort(stormDataByType$MedPopEff,partial=n-i)[n-i])
        print(paste(as.character(stormDataByType$EVTYPE3[x]), as.character(stormDataByType$MedPopEff[x])))
}
##  [1] "HIGH_SURF 0"           "FLOOD 0"              
##  [3] "LIGHTNING 0"           "THUNDERSTORM_WIND 0"  
##  [5] "WATERSPOUT 0"          "FROST_FREEZE 0"       
##  [7] "HEAVY_SNOW 0"          "WINTER_STORM 0"       
##  [9] "TORNADO 0"             "TSUNAMI 0"            
## [11] "HAIL 0"                "HEAVY_RAIN 0"         
## [13] "FREEZING_FOG 0"        "SLEET 0"              
## [15] "FUNNEL_CLOUD 0"        "ICE_STORM 0"          
## [17] "WINTER_WEATHER 0"      "HURRICANE 0"          
## [19] "LAKE_EFFECT_SNOW 0"    "LAKESHORE_FLOOD 0"    
## [21] "MARINE_HAIL 0"         "SEICHE 0"             
## [23] "STORM_SURGE_TIDE 0"    "TROPICAL_DEPRESSION 0"
## [25] "TROPICAL_STORM 0"      "VOLCANIC_ASH 0"       
## [27] "WILDFIRE 0"           
##  [1] "HIGH_SURF 0"           "FLOOD 0"              
##  [3] "LIGHTNING 0"           "THUNDERSTORM_WIND 0"  
##  [5] "WATERSPOUT 0"          "FROST_FREEZE 0"       
##  [7] "HEAVY_SNOW 0"          "WINTER_STORM 0"       
##  [9] "TORNADO 0"             "TSUNAMI 0"            
## [11] "HAIL 0"                "HEAVY_RAIN 0"         
## [13] "FREEZING_FOG 0"        "SLEET 0"              
## [15] "FUNNEL_CLOUD 0"        "ICE_STORM 0"          
## [17] "WINTER_WEATHER 0"      "HURRICANE 0"          
## [19] "LAKE_EFFECT_SNOW 0"    "LAKESHORE_FLOOD 0"    
## [21] "MARINE_HAIL 0"         "SEICHE 0"             
## [23] "STORM_SURGE_TIDE 0"    "TROPICAL_DEPRESSION 0"
## [25] "TROPICAL_STORM 0"      "VOLCANIC_ASH 0"       
## [27] "WILDFIRE 0"

Nothing has any value. So much for that idea.

Mean (average) is probably the most useful measure because it indicates how many fatalities + injuries might be expected for each event type.

with(stormDataByType, table(MeanPopEff))
## MeanPopEff
##                   0 0.00043271311120727 0.00330687830687831 
##                   6                   1                   1 
## 0.00479857081527234  0.0163934426229508  0.0201967892283791 
##                   1                   1                   1 
##  0.0300025425883549  0.0351110184280712  0.0734345351043643 
##                   1                   1                   1 
##   0.108108108108108   0.137225766978411   0.203137902559868 
##                   1                   1                   1 
##   0.257213305836995   0.356085229324666   0.361197110423117 
##                   1                   1                   1 
##    0.38364939430456   0.644189383070301    1.02411417322835 
##                   1                   1                   1 
##    1.59872789889103    2.45054945054945    4.91946308724832 
##                   1                   1                   1 
##    6.94392523364486 
##                   1
n <- length(stormDataByType$MeanPopEff)
for(i in 0:4) {
        x <- which(stormDataByType$MeanPopEff == sort(stormDataByType$MeanPopEff,partial=n-i)[n-i])
        print(paste(as.character(stormDataByType$EVTYPE3[x]), as.character(stormDataByType$MeanPopEff[x])))
}
## [1] "TSUNAMI 6.94392523364486"
## [1] "HURRICANE 4.91946308724832"
## [1] "FREEZING_FOG 2.45054945054945"
## [1] "TORNADO 1.59872789889103"
## [1] "ICE_STORM 1.02411417322835"

Here the highest values are TSUNAMI, HURRICANE, FREEZING_FOG, TORNADO, AND ICE_STORM. TSUNAMI wins by a hefty margin over HURRICANE which is itself significantly above FREEZING_FOG.

Let’s visualize those mean values.

par(mar = c(12,5,4,2), mgp = c(2,1,0))
with(
        stormDataByType,
        barplot(
                height = MeanPopEff,
                yaxt = 'n',
                main = 'Mean Population Effect (Fatalities + Injuries)',
                ylab = 'US Dollars',
                ylim = c(0, 7),
                names.arg = EVTYPE3, 
                las = 2,
                cex.lab = 1
        )
)
grid(nx = NA, ny = NULL)
labelsy <- c(seq(0,7,1))
axis(side = 2, at = labelsy, format(labelsy, scientific = FALSE), las = 1)

Clearly TSUNAMI and HURRICAN show significantly more fatalities and injuries per event on average than anything else.

Across the United States, which types of events have the greatest economic consequences?

We can do the same sort of analysis but use the CropDmg and PropDmg measures instead of fatalities and injuries.

with(stormDataByType, table(TtlEconEff))
## TtlEconEff
##           4000         194600          5e+05         980000        1737000 
##              1              1              1              1              1 
##          2e+06        3487300        6444500        7540000       40182000 
##              1              1              1              1              1 
##       60730700      115025000      161342050    952539516.5     1085600240 
##              1              1              1              1              1 
##     2016261000     4016262942     4642038000     5161586800     6782441251 
##              1              1              1              1              1 
##     8409286550     8967641360  18431430558.1  19023932425.7  58969613943.5 
##              1              1              1              1              1 
##    90762527810 247707730848.5 
##              1              1

We will want to look at a lot of the values here because several are near the top.

n <- length(stormDataByType$TtlEconEff)
for(i in 0:4) {
        x <- which(stormDataByType$TtlEconEff == sort(stormDataByType$TtlEconEff,partial=n-i)[n-i])
        print(paste(as.character(stormDataByType$EVTYPE3[x]), as.character(stormDataByType$TtlEconEff[x])))
}
## [1] "FLOOD 247707730848.5"
## [1] "HURRICANE 90762527810"
## [1] "TORNADO 58969613943.5"
## [1] "HAIL 19023932425.7"
## [1] "THUNDERSTORM_WIND 18431430558.1"

FLOOD, HURRICANE, and TORNADO are the top three here. FLOOD has 12 digits and is therefore nearly 2.5 times larger than HURRICANE.

We should check the median again, although after the results last time I’m not optimistic that we’ll get anything useful here either.

with(stormDataByType, table(MedEconEff))
## MedEconEff
##     0  2500  5000  6500 1e+06 
##    22     1     2     1     1
n <- length(stormDataByType$MedEconEff)
for(i in 0:4) {
        x <- which(stormDataByType$MedEconEff == sort(stormDataByType$MedEconEff,partial=n-i)[n-i])
        print(paste(as.character(stormDataByType$EVTYPE3[x]), as.character(stormDataByType$MedEconEff[x])))
}
## [1] "HURRICANE 1e+06"
## [1] "TROPICAL_DEPRESSION 6500"
## [1] "LIGHTNING 5000"      "TROPICAL_STORM 5000"
## [1] "LIGHTNING 5000"      "TROPICAL_STORM 5000"
## [1] "TORNADO 2500"

Here HURRICANE has the greatest median effect followed by TROPICAL_DEPRESSION and LIGHTNING/TROPICAL_STORM.

The final check is the mean. Again this could make sense to be the estimator of the most damaging weather events.

with(stormDataByType, table(MeanEconEff))
## MeanEconEff
## 9.04977375565611 28.0686571469782 5321.63501238646 15725.1941998964 
##                1                1                1                1 
## 16393.4426229508 18518.5185185185            28950  38321.978021978 
##                1                1                1                1 
## 46666.6666666667 50574.1089388219 60413.4912475423 60974.2033383915 
##                1                1                1                1 
## 65864.1319838939 68665.4168247944 118704.850361197 327826.086956522 
##                1                1                1                1 
## 340390.112890923 592818.918888209 971700.923484436 1333505.95238095 
##                1                1                1                1 
## 1507869.62616822 1864061.68291802 2220996.42112884 4413209.33070866 
##                1                1                1                1 
## 12064973.5294118 31365121.6216216 304572240.973154 
##                1                1                1
n <- length(stormDataByType$MeanEconEff)
for(i in 0:4) {
        x <- which(stormDataByType$MeanEconEff == sort(stormDataByType$MeanEconEff,partial=n-i)[n-i])
        print(paste(as.character(stormDataByType$EVTYPE3[x]), as.character(stormDataByType$MeanEconEff[x])))
}
## [1] "HURRICANE 304572240.973154"
## [1] "STORM_SURGE_TIDE 31365121.6216216"
## [1] "TROPICAL_STORM 12064973.5294118"
## [1] "ICE_STORM 4413209.33070866"
## [1] "FLOOD 2220996.42112884"

HURRICANE is still the highest on the list. This is followed by STORM_SURGE (which could be a consequence of a hurricane) and TROPICAL_STORM (which is just a weakened hurricane).

Here’s the first plot for the mean economic effect.

par(mar = c(12,7,4,2), mgp = c(4,1,0))
with(
        stormDataByType,
        barplot(
                height = MeanEconEff,
                yaxt = 'n',
                main = 'Mean Economic Effect (USD)',
                ylab = 'Dollars',
                ylim = c(0, 3.1*10^8),
                names.arg = EVTYPE3, 
                las = 2,
                cex.lab = 1
        )
)
grid(nx = NA, ny = NULL)
labelsy <- c(seq(from = 0,to = 3*10^8, by = 5*10^7))
axis(side = 2, at = labelsy, format(labelsy, scientific = TRUE), las = 1)

And then here’s the same plot where we allow the value for hurricane to go off the top of the figure so that we can actually see the variation in the smaller values.

par(mar = c(12,7,4,2), mgp = c(4,1,0))
with(
        stormDataByType,
        barplot(
                height = MeanEconEff,
                yaxt = 'n',
                main = 'Mean Economic Effect (USD)',
                ylab = 'Dollars',
                ylim = c(0, 3.1*10^7),
                names.arg = EVTYPE3, 
                las = 2,
                cex.lab = 1
        )
)
grid(nx = NA, ny = NULL)
labelsy <- c(seq(from = 0,to = 3*10^7, by = 5*10^6))
axis(side = 2, at = labelsy, format(labelsy, scientific = TRUE), las = 1)

##Conclusions:

Because some weather events have been measured for longer times than others it is the case that mean (average) appears to be the most effective way to compare the effects of each of the main weather event types. Also, because many events are the same or similar to others but are written down differently it is the case that of the 48 options for weather event types we only use 27. All that being said, the graphs of mean population effect (fatalities and injuries) and mean economic effect (monetary loss in terms of property damage and crop damage) show that for the former TSUNAMI, HURRICANE, AND FREEZING_FOG have the greatest effect on people and HURRICANEs, STORM_SURGEs, and TROPICAL_STORMs weather have the greatest economic effect.