The National Climactic Data Center (NCDC) regularly receives storm data from the National Weather service. It should be possible, using this data, to determine which types of events have the greatest effect on the physical health of the human population and the greatest damage potential with respect to property and crops. Because there are many different values for “Event Type” and many of them are similar (misspellings, capitalizaion differences, similar words/terms, etc) it is first necessary to group the event types into larger categories. As an example, “WINTRY MIX” is the same as “wintry/mix”. The economic effects are measured at different scales depending on the row of data, so the second step in preprocessing is to convert these numbers to the same scale so that they may be compared. To get usable numbers for comparison we then group the data by event type and calculate the total, median, and mean (average) effects of fatalities, injuries, property damage, and crop damage for each event type. Separate code shows that the length of time events have been recorded varies greatly depending on the event, so the mean (average) effect is preferred to total (absolute) effect in order to determine the events that are the most physically harmful or have the greatest economic consequences.
First we need to download the data and load it into R.
Unfortunately I had trouble with making the download/unzip code work, so the actual download of the file had to be done manually. We can run a simple command to check that the correct data is present and loaded.
stormDataFileName <- "repdata_data_StormData.csv.bz2"
file.info(stormDataFileName)$size
## [1] 49177144
That’s what we expected.
a <- Sys.time()
stormData <- read.csv(file = stormDataFileName, as.is = TRUE)
b <- Sys.time()
print(paste("Reading the file from", a, "to", b, "."))
## [1] "Reading the file from 2019-03-17 16:42:34 to 2019-03-17 16:43:15 ."
The following libraries are necessary for this particular analysis.
library(forcats)
library(dplyr)
library(lubridate)
library(knitr)
Check some basic information about the data file.
dim(stormData)
## [1] 902297 37
90229x37 - That’s correct
head(stormData)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
That looks good.
length(stormData$EVTYPE)
## [1] 902297
length(toupper(stormData$EVTYPE))
## [1] 902297
902297 and 902297. That’s what we wanted to see.
As noted in the synopsis it is the case that there are a lot of similar names in EVTYPE that need to be combined. This has to be somewhat arbitrary, so my goal was at least to group all of the event types used only once into larger groups with other similar event types. Note that hurricanes, tropical Storms, tropical cyclones, and typhoons are all roughly the same thing.
stormData$EVTYPE2 <- toupper(stormData$EVTYPE)
table(unique(stormData$EVTYPE2))
Using the defined weather types in the NATIONAL WEATHER SERVICE INSTRUCTION 10-1605 document of 17 August 2017 I coalesce the event type cateogries into a more manageable set.
EVTYPE3 <-
fct_collapse(
stormData$EVTYPE2,
ASTRONOMICAL_LOW_TIDE = grep("AST.*LO.*TI.*", stormData$EVTYPE2, value = TRUE),
AVALANCHE = grep("AVAL", stormData$EVTYPE2, value = TRUE),
BLIZZARD = grep("BLIZ", stormData$EVTYPE2, value = TRUE),
COASTAL_FLOOD = grep("COA.*FLO.*", stormData$EVTYPE2, value = TRUE),
COLD_WIND_CHILL = grep("COL.*WIN.*CHI", stormData$EVTYPE2, value = TRUE),
DEBRIS_FLOW = grep("DEB.*FL.*", stormData$EVTYPE2, value = TRUE),
DENSE_FOG = grep("DEN.*FOG", stormData$EVTYPE2, value = TRUE),
DENSE_SMOKE = grep("DEN.*SMO", stormData$EVTYPE2, value = TRUE),
DROUGHT = grep("DROU.*|DRY", stormData$EVTYPE2, value = TRUE),
DUST_DEVIL = grep("DUS.*DEV", stormData$EVTYPE2, value = TRUE),
DUST_STORM = grep("DUS.*STO", stormData$EVTYPE2, value = TRUE),
EXCESSIVE_HEAT = grep("EXC.*HEA|[UN.*|REC.*]WARM", stormData$EVTYPE2, value = TRUE),
EXTREME_COLD_WIND_CHILL = grep("EXTR.*COL|WIN.*CHI|UN.*COL", stormData$EVTYPE2, value = TRUE),
FLASH_FLOOD = grep("FLA.*FLO", stormData$EVTYPE2, value = TRUE),
FLOOD = grep("[^FLOOD$]|FLDG|FLOODS|FLOOD$", stormData$EVTYPE2, value = TRUE),
FROST_FREEZE = grep("FROST|FREEZE", stormData$EVTYPE2, value = TRUE),
FUNNEL_CLOUD = grep("FUN.*CLO", stormData$EVTYPE2, value = TRUE),
FREEZING_FOG = grep("FREEZ.*FOG|GLAZ", stormData$EVTYPE2, value = TRUE),
HAIL = grep("HAIL", stormData$EVTYPE2, value = TRUE),
HEAVY_RAIN = grep("HEAV.*RAI", stormData$EVTYPE2, value = TRUE),
HEAVY_SNOW = grep("HEAV.*SNO", stormData$EVTYPE2, value = TRUE),
HIGH_SURF = grep("HIG.*SURF", stormData$EVTYPE2, value = TRUE),
HIGH_WIND = grep("HIG.*WIN", stormData$EVTYPE2, value = TRUE),
HURRICANE = grep("HURR|TYPH", stormData$EVTYPE2, value = TRUE),
ICE_STORM = grep("ICE.*STORM", stormData$EVTYPE2, value = TRUE),
LAKE_EFFECT_SNOW = grep("LAKE.*EFF.*SNO", stormData$EVTYPE2, value = TRUE),
LAKESHORE_FLOOD = grep("LAKESH.*FLO", stormData$EVTYPE2, value = TRUE),
LIGHTNING = grep("LIGHT[N|I]", stormData$EVTYPE2, value = TRUE),
MARINE_HAIL = grep("MAR.*HAIL", stormData$EVTYPE2, value = TRUE),
MARINE_HIGH_WIND = grep("MAR.*HIGH.*WIND", stormData$EVTYPE2, value = TRUE),
MARINE_STRONG_WIND = grep("MAR.*STR.*WIND", stormData$EVTYPE2, value = TRUE),
MARINE_HIGH_WIND = grep("MAR.*[THUND|TSTM].*WIND", stormData$EVTYPE2, value = TRUE),
RIP_CURRENT = grep("^RIP$", stormData$EVTYPE2, value = TRUE),
SEICHE = grep("SEIC", stormData$EVTYPE2, value = TRUE),
SLEET = grep("SLEET", stormData$EVTYPE2, value = TRUE),
STORM_SURGE_TIDE = grep("STORM.*SURGE.*?[TIDE]", stormData$EVTYPE2, value = TRUE),
STRONG_WIND = grep("STR.*WIND", stormData$EVTYPE2, value = TRUE),
THUNDERSTORM_WIND = grep("[THUND|TSTM].*WIND", stormData$EVTYPE2, value = TRUE),
TORNADO = grep("TORN", stormData$EVTYPE2, value = TRUE),
TROPICAL_DEPRESSION = grep("TROP.*DEPR", stormData$EVTYPE2, value = TRUE),
TROPICAL_STORM = grep("TROP.*STOR", stormData$EVTYPE2, value = TRUE),
TSUNAMI = grep("TSUN|WAVE", stormData$EVTYPE2, value = TRUE),
VOLCANIC_ASH = grep("VOLCAN.*?[ASH]", stormData$EVTYPE2, value = TRUE),
WATERSPOUT = grep("WAT.*OUT", stormData$EVTYPE2, value = TRUE),
WILDFIRE = grep("?[WILD]FIRE", stormData$EVTYPE2, value = TRUE),
WINTER_STORM = grep("WINT.*STOR", stormData$EVTYPE2, value = TRUE),
WINTER_WEATHER = grep("WINT.*MIX|WET.*SNO", stormData$EVTYPE2, value = TRUE)
)
table(EVTYPE3)
## EVTYPE3
## HIGH_SURF FLOOD LIGHTNING
## 969 111530 15767
## THUNDERSTORM_WIND WATERSPOUT FROST_FREEZE
## 364444 3862 1512
## HEAVY_SNOW WINTER_STORM TORNADO
## 15810 11441 60687
## TSUNAMI HAIL HEAVY_RAIN
## 107 288836 11799
## FREEZING_FOG SLEET FUNNEL_CLOUD
## 91 122 6933
## ICE_STORM WINTER_WEATHER HURRICANE
## 2032 1211 298
## LAKE_EFFECT_SNOW LAKESHORE_FLOOD MARINE_HAIL
## 659 23 442
## SEICHE STORM_SURGE_TIDE TROPICAL_DEPRESSION
## 21 148 60
## TROPICAL_STORM VOLCANIC_ASH WILDFIRE
## 697 27 2769
Note that even though 48 categories are permitted it is the case that only 27 end up being used.
We check the loaded types and compare to the given knowledge of the dataset.
length(EVTYPE3)
## [1] 902297
length(unique(EVTYPE3))
## [1] 27
902297, and 27. So EVTYPE3 has event type values for every event, and we have reduced the number of unique event types to the 27 shown below.
The property damage and crop damage amounts are accompanied by (exponent) values indicating the scale (as a power of 10) of each value. We examine those to see what we are working with.
table(stormData$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 424665 7 11330
table(stormData$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
Here’s the summary
Note that Both PROPDMGEXP and CROPDMGEXP are of class “factor”. We have to change it to be a character for the sake of future code.
sapply(stormData$PROPDMGEXP, as.character)
sapply(stormData$CROPDMGEXP, as.character)
The following command converts the numeric damage values to the same scale, groups everything by event type, and then creates summaries for total, median, and mean effect of each event type on each of the four effect types (fatality, injury, property damage, and crop damage). Finally it combines fatalities and injuries into one number and combines property damage and crop damage into another number.
a <- Sys.time()
stormDataByType <- stormData %>%
mutate(PropDmg = if_else(
PROPDMGEXP %in% c("h", "H"),
PROPDMG * 10 ^ 2
,
ifelse(
PROPDMGEXP %in% c("k", "K"),
PROPDMG * 10 ^ 3
,
ifelse(
PROPDMGEXP %in% c("m", "M"),
PROPDMG * 10 ^ 6
,
ifelse(
PROPDMGEXP %in% c("b", "B"),
PROPDMG * 10 ^ 9
,
ifelse(
PROPDMGEXP %in% c("-", "?", ""),
0
,
ifelse(PROPDMGEXP == "+",
PROPDMG
,
PROPDMG * 10 ^ as.numeric(PROPDMGEXP))
)
)
)
)
)) %>%
mutate(CropDmg = if_else(
CROPDMGEXP %in% c("h", "H"),
CROPDMG * 10 ^ 2
,
ifelse(
CROPDMGEXP %in% c("k", "K"),
CROPDMG * 10 ^ 3
,
ifelse(
CROPDMGEXP %in% c("m", "M"),
CROPDMG * 10 ^ 6
,
ifelse(
CROPDMGEXP %in% c("b", "B"),
CROPDMG * 10 ^ 9
,
ifelse(
CROPDMGEXP %in% c("-", "?", ""),
0
,
ifelse(CROPDMGEXP == "+",
CROPDMG
,
CROPDMG * 10 ^ as.numeric(CROPDMGEXP))
)
)
)
)
)) %>%
group_by(EVTYPE3) %>%
summarise(
TtlFat = sum(FATALITIES),
MedFat = median(FATALITIES),
MeanFat = mean(FATALITIES),
TtlInj = sum(INJURIES),
MedInj = median(INJURIES),
MeanInj = mean(INJURIES),
TtlPopEff = TtlFat + TtlInj,
MedPopEff = MedFat + MedInj,
MeanPopEff = MeanFat + MeanInj,
TtlPropDmg = sum(PropDmg),
MedPropDmg = median(PropDmg),
MeanPropDmg = mean(PropDmg),
TtlCropDmg = sum(CropDmg),
MedCropDmg = median(CropDmg),
MeanCropDmg = mean(CropDmg),
TtlEconEff = TtlPropDmg + TtlCropDmg,
MedEconEff = MedPropDmg + MedCropDmg,
MeanEconEff = MeanPropDmg + MeanCropDmg
)
b <- Sys.time()
b-a
## Time difference of 1.418236 secs
Note that the total number of effects on the population could depend heavily on the length of time that the event has been recorded.
typeDuration <- matrix(nrow = 0, ncol = 2)
for (col in unique(EVTYPE3)) {
data <- stormData %>% filter(EVTYPE3 == col)
parsed <- mdy_hms(data$BGN_DATE)
maxDate <- max(parsed, na.rm = TRUE)
minDate <- min(parsed, na.rm = TRUE)
diff <- as.numeric(maxDate - minDate)
typeDuration <- rbind(typeDuration, list(as.character(col), diff))
}
typeDuration <- data.frame(typeDuration)
names(typeDuration) <- c("Type", 'Duration')
typeDuration$Duration <- as.numeric(typeDuration$Duration)
arrange(typeDuration, desc(Duration))
## Type Duration
## 1 TORNADO 22602
## 2 HAIL 20763
## 3 THUNDERSTORM_WIND 20756
## 4 FLOOD 6907
## 5 HEAVY_SNOW 6907
## 6 WINTER_STORM 6902
## 7 HEAVY_RAIN 6902
## 8 LIGHTNING 6896
## 9 ICE_STORM 6894
## 10 WATERSPOUT 6891
## 11 FUNNEL_CLOUD 6890
## 12 HIGH_SURF 6871
## 13 LAKE_EFFECT_SNOW 6847
## 14 FROST_FREEZE 6833
## 15 TROPICAL_STORM 6817
## 16 HURRICANE 6783
## 17 SLEET 6532
## 18 WILDFIRE 6414
## 19 FREEZING_FOG 6242
## 20 TSUNAMI 6115
## 21 SEICHE 4747
## 22 WINTER_WEATHER 4744
## 23 VOLCANIC_ASH 4370
## 24 TROPICAL_DEPRESSION 3977
## 25 MARINE_HAIL 3455
## 26 STORM_SURGE_TIDE 2223
## 27 LAKESHORE_FLOOD 1677
Note that TORNADO, HAIL, and THUNDERSTORM_WIND have been recorded for 56+ years. Everything else has been recorded for less than 19 years. That extra 37 years gives those three a big head start on the total number of incidents.
Thinking about that, let’s look at the results for the total absolute effect on the physical health of the population (as measured by fatalities and injuries). We will look at the top 5 (in order to get past the outliers we’ve identified).
n <- length(stormDataByType$TtlPopEff)
for(i in 0:4) {
x <- which(stormDataByType$TtlPopEff == sort(stormDataByType$TtlPopEff,partial=n-i)[n-i])
print(paste(as.character(stormDataByType$EVTYPE3[x]), as.character(stormDataByType$TtlPopEff[x])))
}
## [1] "TORNADO 97022"
## [1] "FLOOD 28687"
## [1] "THUNDERSTORM_WIND 12796"
## [1] "LIGHTNING 6049"
## [1] "ICE_STORM 2081"
The top values come from TORNADO, FLOOD, THUNDERSTORM_WIND, LIGHTNING, and ICE_STORM (in that descending order). if we ignore the outliers then FLOOD has the most absolute effect.
We can try looking at the median. Maybe that will be more useful.
with(stormDataByType, table(MedPopEff))
## MedPopEff
## 0
## 27
n <- length(stormDataByType$MedPopEff)
for(i in 0:1) {
x <- which(stormDataByType$MedPopEff == sort(stormDataByType$MedPopEff,partial=n-i)[n-i])
print(paste(as.character(stormDataByType$EVTYPE3[x]), as.character(stormDataByType$MedPopEff[x])))
}
## [1] "HIGH_SURF 0" "FLOOD 0"
## [3] "LIGHTNING 0" "THUNDERSTORM_WIND 0"
## [5] "WATERSPOUT 0" "FROST_FREEZE 0"
## [7] "HEAVY_SNOW 0" "WINTER_STORM 0"
## [9] "TORNADO 0" "TSUNAMI 0"
## [11] "HAIL 0" "HEAVY_RAIN 0"
## [13] "FREEZING_FOG 0" "SLEET 0"
## [15] "FUNNEL_CLOUD 0" "ICE_STORM 0"
## [17] "WINTER_WEATHER 0" "HURRICANE 0"
## [19] "LAKE_EFFECT_SNOW 0" "LAKESHORE_FLOOD 0"
## [21] "MARINE_HAIL 0" "SEICHE 0"
## [23] "STORM_SURGE_TIDE 0" "TROPICAL_DEPRESSION 0"
## [25] "TROPICAL_STORM 0" "VOLCANIC_ASH 0"
## [27] "WILDFIRE 0"
## [1] "HIGH_SURF 0" "FLOOD 0"
## [3] "LIGHTNING 0" "THUNDERSTORM_WIND 0"
## [5] "WATERSPOUT 0" "FROST_FREEZE 0"
## [7] "HEAVY_SNOW 0" "WINTER_STORM 0"
## [9] "TORNADO 0" "TSUNAMI 0"
## [11] "HAIL 0" "HEAVY_RAIN 0"
## [13] "FREEZING_FOG 0" "SLEET 0"
## [15] "FUNNEL_CLOUD 0" "ICE_STORM 0"
## [17] "WINTER_WEATHER 0" "HURRICANE 0"
## [19] "LAKE_EFFECT_SNOW 0" "LAKESHORE_FLOOD 0"
## [21] "MARINE_HAIL 0" "SEICHE 0"
## [23] "STORM_SURGE_TIDE 0" "TROPICAL_DEPRESSION 0"
## [25] "TROPICAL_STORM 0" "VOLCANIC_ASH 0"
## [27] "WILDFIRE 0"
Nothing has any value. So much for that idea.
Mean (average) is probably the most useful measure because it indicates how many fatalities + injuries might be expected for each event type.
with(stormDataByType, table(MeanPopEff))
## MeanPopEff
## 0 0.00043271311120727 0.00330687830687831
## 6 1 1
## 0.00479857081527234 0.0163934426229508 0.0201967892283791
## 1 1 1
## 0.0300025425883549 0.0351110184280712 0.0734345351043643
## 1 1 1
## 0.108108108108108 0.137225766978411 0.203137902559868
## 1 1 1
## 0.257213305836995 0.356085229324666 0.361197110423117
## 1 1 1
## 0.38364939430456 0.644189383070301 1.02411417322835
## 1 1 1
## 1.59872789889103 2.45054945054945 4.91946308724832
## 1 1 1
## 6.94392523364486
## 1
n <- length(stormDataByType$MeanPopEff)
for(i in 0:4) {
x <- which(stormDataByType$MeanPopEff == sort(stormDataByType$MeanPopEff,partial=n-i)[n-i])
print(paste(as.character(stormDataByType$EVTYPE3[x]), as.character(stormDataByType$MeanPopEff[x])))
}
## [1] "TSUNAMI 6.94392523364486"
## [1] "HURRICANE 4.91946308724832"
## [1] "FREEZING_FOG 2.45054945054945"
## [1] "TORNADO 1.59872789889103"
## [1] "ICE_STORM 1.02411417322835"
Here the highest values are TSUNAMI, HURRICANE, FREEZING_FOG, TORNADO, AND ICE_STORM. TSUNAMI wins by a hefty margin over HURRICANE which is itself significantly above FREEZING_FOG.
Let’s visualize those mean values.
par(mar = c(12,5,4,2), mgp = c(2,1,0))
with(
stormDataByType,
barplot(
height = MeanPopEff,
yaxt = 'n',
main = 'Mean Population Effect (Fatalities + Injuries)',
ylab = 'US Dollars',
ylim = c(0, 7),
names.arg = EVTYPE3,
las = 2,
cex.lab = 1
)
)
grid(nx = NA, ny = NULL)
labelsy <- c(seq(0,7,1))
axis(side = 2, at = labelsy, format(labelsy, scientific = FALSE), las = 1)
Clearly TSUNAMI and HURRICAN show significantly more fatalities and injuries per event on average than anything else.
We can do the same sort of analysis but use the CropDmg and PropDmg measures instead of fatalities and injuries.
with(stormDataByType, table(TtlEconEff))
## TtlEconEff
## 4000 194600 5e+05 980000 1737000
## 1 1 1 1 1
## 2e+06 3487300 6444500 7540000 40182000
## 1 1 1 1 1
## 60730700 115025000 161342050 952539516.5 1085600240
## 1 1 1 1 1
## 2016261000 4016262942 4642038000 5161586800 6782441251
## 1 1 1 1 1
## 8409286550 8967641360 18431430558.1 19023932425.7 58969613943.5
## 1 1 1 1 1
## 90762527810 247707730848.5
## 1 1
We will want to look at a lot of the values here because several are near the top.
n <- length(stormDataByType$TtlEconEff)
for(i in 0:4) {
x <- which(stormDataByType$TtlEconEff == sort(stormDataByType$TtlEconEff,partial=n-i)[n-i])
print(paste(as.character(stormDataByType$EVTYPE3[x]), as.character(stormDataByType$TtlEconEff[x])))
}
## [1] "FLOOD 247707730848.5"
## [1] "HURRICANE 90762527810"
## [1] "TORNADO 58969613943.5"
## [1] "HAIL 19023932425.7"
## [1] "THUNDERSTORM_WIND 18431430558.1"
FLOOD, HURRICANE, and TORNADO are the top three here. FLOOD has 12 digits and is therefore nearly 2.5 times larger than HURRICANE.
We should check the median again, although after the results last time I’m not optimistic that we’ll get anything useful here either.
with(stormDataByType, table(MedEconEff))
## MedEconEff
## 0 2500 5000 6500 1e+06
## 22 1 2 1 1
n <- length(stormDataByType$MedEconEff)
for(i in 0:4) {
x <- which(stormDataByType$MedEconEff == sort(stormDataByType$MedEconEff,partial=n-i)[n-i])
print(paste(as.character(stormDataByType$EVTYPE3[x]), as.character(stormDataByType$MedEconEff[x])))
}
## [1] "HURRICANE 1e+06"
## [1] "TROPICAL_DEPRESSION 6500"
## [1] "LIGHTNING 5000" "TROPICAL_STORM 5000"
## [1] "LIGHTNING 5000" "TROPICAL_STORM 5000"
## [1] "TORNADO 2500"
Here HURRICANE has the greatest median effect followed by TROPICAL_DEPRESSION and LIGHTNING/TROPICAL_STORM.
The final check is the mean. Again this could make sense to be the estimator of the most damaging weather events.
with(stormDataByType, table(MeanEconEff))
## MeanEconEff
## 9.04977375565611 28.0686571469782 5321.63501238646 15725.1941998964
## 1 1 1 1
## 16393.4426229508 18518.5185185185 28950 38321.978021978
## 1 1 1 1
## 46666.6666666667 50574.1089388219 60413.4912475423 60974.2033383915
## 1 1 1 1
## 65864.1319838939 68665.4168247944 118704.850361197 327826.086956522
## 1 1 1 1
## 340390.112890923 592818.918888209 971700.923484436 1333505.95238095
## 1 1 1 1
## 1507869.62616822 1864061.68291802 2220996.42112884 4413209.33070866
## 1 1 1 1
## 12064973.5294118 31365121.6216216 304572240.973154
## 1 1 1
n <- length(stormDataByType$MeanEconEff)
for(i in 0:4) {
x <- which(stormDataByType$MeanEconEff == sort(stormDataByType$MeanEconEff,partial=n-i)[n-i])
print(paste(as.character(stormDataByType$EVTYPE3[x]), as.character(stormDataByType$MeanEconEff[x])))
}
## [1] "HURRICANE 304572240.973154"
## [1] "STORM_SURGE_TIDE 31365121.6216216"
## [1] "TROPICAL_STORM 12064973.5294118"
## [1] "ICE_STORM 4413209.33070866"
## [1] "FLOOD 2220996.42112884"
HURRICANE is still the highest on the list. This is followed by STORM_SURGE (which could be a consequence of a hurricane) and TROPICAL_STORM (which is just a weakened hurricane).
Here’s the first plot for the mean economic effect.
par(mar = c(12,7,4,2), mgp = c(4,1,0))
with(
stormDataByType,
barplot(
height = MeanEconEff,
yaxt = 'n',
main = 'Mean Economic Effect (USD)',
ylab = 'Dollars',
ylim = c(0, 3.1*10^8),
names.arg = EVTYPE3,
las = 2,
cex.lab = 1
)
)
grid(nx = NA, ny = NULL)
labelsy <- c(seq(from = 0,to = 3*10^8, by = 5*10^7))
axis(side = 2, at = labelsy, format(labelsy, scientific = TRUE), las = 1)
And then here’s the same plot where we allow the value for hurricane to go off the top of the figure so that we can actually see the variation in the smaller values.
par(mar = c(12,7,4,2), mgp = c(4,1,0))
with(
stormDataByType,
barplot(
height = MeanEconEff,
yaxt = 'n',
main = 'Mean Economic Effect (USD)',
ylab = 'Dollars',
ylim = c(0, 3.1*10^7),
names.arg = EVTYPE3,
las = 2,
cex.lab = 1
)
)
grid(nx = NA, ny = NULL)
labelsy <- c(seq(from = 0,to = 3*10^7, by = 5*10^6))
axis(side = 2, at = labelsy, format(labelsy, scientific = TRUE), las = 1)
##Conclusions:
Because some weather events have been measured for longer times than others it is the case that mean (average) appears to be the most effective way to compare the effects of each of the main weather event types. Also, because many events are the same or similar to others but are written down differently it is the case that of the 48 options for weather event types we only use 27. All that being said, the graphs of mean population effect (fatalities and injuries) and mean economic effect (monetary loss in terms of property damage and crop damage) show that for the former TSUNAMI, HURRICANE, AND FREEZING_FOG have the greatest effect on people and HURRICANEs, STORM_SURGEs, and TROPICAL_STORMs weather have the greatest economic effect.