The following analysis breaks down the harm caused by severe weather events since 1990. We examine population health effects (injuries/fatalities) to find the most consistently harmful weather types as well as the most harmful overall. We also examine economic damages to find the most consistently damaging weather types as well as the most harmful overall.
Before starting any analysis, we load all external libraries required for the code below. Note that this analysis requires the following libraries: ggplot2, Hmisc, xtable
require(ggplot2)
require(Hmisc)
require(xtable)
Data were made available via the Coursera site as a bzipped csv file, and are originally provided by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. Original documentation can be found here.
The first step is to read in the data from the bz2 file provided.
setwd("Z:/Lotus/Dropbox/Coursera/5_Reproducible_Research/Projects/RR_Proj2_HarmfulWeatherEvents")
# read.csv should handle bzip compressed files automatically, but we use bzfile just in case
weather <- read.csv(bzfile('repdata_data_StormData.csv.bz2'), header=T, na.strings = "")
Before working with the dataset, let’s make sure it loaded in properly by checking the first few columns:
summary(weather[,1:10])
## STATE__ BGN_DATE BGN_TIME
## Min. : 1.0 5/25/2011 0:00:00: 1202 12:00:00 AM: 10163
## 1st Qu.:19.0 4/27/2011 0:00:00: 1193 06:00:00 PM: 7350
## Median :30.0 6/9/2011 0:00:00 : 1030 04:00:00 PM: 7261
## Mean :31.2 5/30/2004 0:00:00: 1016 05:00:00 PM: 6891
## 3rd Qu.:45.0 4/4/2011 0:00:00 : 1009 12:00:00 PM: 6703
## Max. :95.0 4/2/2006 0:00:00 : 981 03:00:00 PM: 6700
## (Other) :895866 (Other) :857229
## TIME_ZONE COUNTY COUNTYNAME STATE
## CST :547493 Min. : 0.0 JEFFERSON : 7840 TX : 83728
## EST :245558 1st Qu.: 31.0 WASHINGTON: 7603 KS : 53440
## MST : 68390 Median : 75.0 JACKSON : 6660 OK : 46802
## PST : 28302 Mean :100.6 FRANKLIN : 6256 MO : 35648
## AST : 6360 3rd Qu.:131.0 LINCOLN : 5937 IA : 31069
## HST : 2563 Max. :873.0 (Other) :866412 NE : 30271
## (Other): 3631 NA's : 1589 (Other):621339
## EVTYPE BGN_RANGE BGN_AZI
## HAIL :288661 Min. : 0.000 N : 86752
## TSTM WIND :219940 1st Qu.: 0.000 W : 38446
## THUNDERSTORM WIND: 82563 Median : 0.000 S : 37558
## TORNADO : 60652 Mean : 1.484 E : 33178
## FLASH FLOOD : 54277 3rd Qu.: 1.000 NW : 24041
## FLOOD : 25326 Max. :3749.000 (Other):134990
## (Other) :170878 NA's :547332
Since the weather events are organized by when they occurred, we’ll check the distribution over the years to get an idea of the density over time.
# Properly format dates as posix dates and plot begin dates on a histogram
weather$BGN_DATE <- as.Date(weather$BGN_DATE, format = "%m/%d/%Y %H:%M:%S")
hist(weather$BGN_DATE,
col="steelblue",
xlab="Year", breaks=40,
main="Distribution of severe weather observations over time")
weather$END_DATE <- format(as.Date(weather$END_DATE, format = "%m/%d/%Y %H:%M:%S"), "%m/%d/%Y")
By eye, it seems like the observations start increasing around 1990. Since there’s likely to be missing and/or lower quality data before this point, let’s just check that we won’t be eliminating too much of the dataset by subsetting to dates after 1990. We do this by looking at the decile cutoffs for our date set.
as.Date(quantile(as.numeric(weather$BGN_DATE), seq(0.1, 1, by=0.1)), origin="1970-01-01")
## 10% 20% 30% 40% 50%
## "1982-06-21" "1992-06-24" "1996-07-21" "1999-05-23" "2002-03-18"
## 60% 70% 80% 90% 100%
## "2004-06-02" "2006-07-13" "2008-06-17" "2010-06-11" "2011-11-30"
It looks like we’ll be eliminating less than 20% of the observations in order to ensure higher quality/frequency of observations. This will also make the dataset easier to work with, so let’s do it!
cutoff1990 <- as.numeric(as.Date("01/01/1990", format = "%m/%d/%Y"))
weather <- weather[as.numeric(weather$BGN_DATE) >= cutoff1990,]
# I would plot another histogram to see that the distribution is more balanced, but we are limited to 3 plots in our output...........
#hist(weather$BGN_DATE,
# col="steelblue",
# xlab="Year", breaks=20,
# main="Distribution of severe weather observations over time, after date filtering")
This looks like a more balanced distribution.
Now we can also subset the table to just the columns that we will be using for the weather consequences analysis. This will shrink the table and make the data easier to work with. Preserved columns are the following:
| Variable Name | Description |
|---|---|
| BGN_DATE | The date (mm-dd-yyyy) of the severe weather event |
| EVTYPE | The type of weather event e.g. thunderstorm |
| FATALITIES | Number of fatalities resulting from the weather event |
| INJURIES | Number of injuries resulting from the weather event |
| PROPDMG | Property damage, in USD |
| PROPDMGEXP | Multiplier for property damage, according to NOAA code |
| CROPDMG | Agricultural crop damage, in USD |
| CROPDMGEXP | Multiplier for crop damage, according to NOAA code |
weather <- weather[, c("EVTYPE","FATALITIES","INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
Next, we have to process a few columns which use specific encoding of exponents in order to recover the original numeric values. Looking at the original codebook, there are two columns, PROPDMGEXP and CROPDMGEXP, that represent the size units of the PROPDMG and CROPDMG columns respectively.
levels(weather$PROPDMGEXP)
## [1] "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K" "m"
## [18] "M"
levels(weather$CROPDMGEXP)
## [1] "?" "0" "2" "B" "k" "K" "m" "M"
#So what are all possible exponent representations that we will have to deal with?
exps <- c(NA,union(levels(weather$PROPDMGEXP), levels(weather$CROPDMGEXP)))
exps
## [1] NA "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M" "k"
Though I could not find information regarding some codes (+,-,?) in the NOAA database handbook, the others are easily interpretable and must be converted before they can be applied to the DMG columns. There’s no good way to do this automatically, so I will make a translation table and use it on both crop and prop damage. I will assume that damage codes of +, -, and ? should be translated to 1e0, meaning they will not exponentiate the damage value.
translateExp <- data.frame("exp"=exps,
"num"=c(1e0, 1e0, 1e0, 1e0,
1e0, 1e1, 1e2, 1e3, 1e4, 1e5, 1e6, 1e7, 1e8,
1e9, 1e2, 1e2, 1e3, 1e6, 1e6, 1e3))
weather$CROPDMGabs <- weather$CROPDMG * translateExp$num[match(weather$CROPDMGEXP,translateExp$exp)]
weather$PROPDMGabs <- weather$PROPDMG * translateExp$num[match(weather$PROPDMGEXP,translateExp$exp)]
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
First, let’s aggregate injuries by weather type and take a look at the total/average injuries for each type of weather event.
# Aggregate by type and sort descending by # of injuries
injByType_total <- aggregate(INJURIES ~ EVTYPE, data=weather, sum, na.rm=T)
injByType_total <- injByType_total[order(-injByType_total$INJURIES),]
injByType_mean <- aggregate(INJURIES ~ EVTYPE, data=weather, mean, na.rm=T)
injByType_mean <- injByType_mean[order(-injByType_mean$INJURIES),]
tail(injByType_total)
## EVTYPE INJURIES
## 975 WINTER STORM/HIGH WINDS 0
## 977 Winter Weather 0
## 981 WINTERY MIX 0
## 982 Wintry mix 0
## 983 Wintry Mix 0
## 985 WND 0
Note here that we see 3 different entries for the same weather type (WINTERY MIX, Wintry Mix, Wintry mix), indicating some inconsistencies in coding the data collection. In a real-world scenario, these should probably be merged, but for this assignment they will be kept separate.
We also check how many of them result in any injuries at all - cursory glances at the table show a lot of zeroes.
# Check how many weather event types result in any recorded injuries
injuriousTypes <- sum(injByType_total$INJURIES>0)/nrow(injByType_total)*100
Only 16.0406091% of weather types result in any injuries.
Next, let’s actually take a look at the most injuring weather types. We do this with a table to enumerate the top 10 most damaging weather types, in terms of total and average injuries reported/recorded.
Total Injuries
# Print the top 10 most injuring weather types, by total injuries
print(xtable(injByType_total[1:10,1:2]), type='html')
| EVTYPE | INJURIES | |
|---|---|---|
| 834 | TORNADO | 26674.00 |
| 170 | FLOOD | 6789.00 |
| 130 | EXCESSIVE HEAT | 6525.00 |
| 464 | LIGHTNING | 5230.00 |
| 856 | TSTM WIND | 5022.00 |
| 275 | HEAT | 2100.00 |
| 427 | ICE STORM | 1975.00 |
| 153 | FLASH FLOOD | 1777.00 |
| 760 | THUNDERSTORM WIND | 1488.00 |
| 972 | WINTER STORM | 1321.00 |
Average Injuries
# Print the top 10 most injuring weather types, by mean injuries
print(xtable(injByType_mean[1:10,1:2]), type='html')
| EVTYPE | INJURIES | |
|---|---|---|
| 277 | Heat Wave | 70.00 |
| 851 | TROPICAL STORM GORDON | 43.00 |
| 954 | WILD FIRES | 37.50 |
| 821 | THUNDERSTORMW | 27.00 |
| 366 | HIGH WIND AND SEAS | 20.00 |
| 656 | SNOW/HIGH WINDS | 18.00 |
| 224 | GLAZE/ICE STORM | 15.00 |
| 279 | HEAT WAVE DROUGHT | 15.00 |
| 973 | WINTER STORM HIGH WINDS | 15.00 |
| 411 | HURRICANE/TYPHOON | 14.49 |
As you can see, the top ten events in terms of total injuries differ from the top event types in terms of average injuries. Tornadoes and floods have caused the most injuries overall since 1990, but heat waves, wild fires, and thunderstorms are the most consistently injurious.
First, let’s aggregate fatalities by weather type and take a look at the total/average deaths for each type of weather event.
# Aggregate by type and sort descending by # of fatalities
fatByType_total <- aggregate(FATALITIES ~ EVTYPE, data=weather, sum, na.rm=T)
fatByType_total <- fatByType_total[order(-fatByType_total$FATALITIES),]
fatByType_mean <- aggregate(FATALITIES ~ EVTYPE, data=weather, mean, na.rm=T)
fatByType_mean <- fatByType_mean[order(-fatByType_mean$FATALITIES),]
tail(fatByType_total)
## EVTYPE FATALITIES
## 977 Winter Weather 0
## 979 WINTER WEATHER MIX 0
## 981 WINTERY MIX 0
## 982 Wintry mix 0
## 983 Wintry Mix 0
## 985 WND 0
We also check how many of them result in any fatalities at all, as again, cursory glances at the table show a lot of zeroes.
# Check how many weather event types result in any recorded injuries
fatalTypes <- sum(fatByType_total$FATALITIES>0)/nrow(fatByType_total)*100
Only 17.0558376% of weather types result in any deaths.
Next, let’s actually take a look at the most fatal weather types. We do this with a table to enumerate the top 10 most fatal weather types, in terms of total and average deaths reported/recorded.
Total Fatalities
# Print the top 10 most injuring weather types, by total injuries
print(xtable(fatByType_total[1:10,1:2]), type='html')
| EVTYPE | FATALITIES | |
|---|---|---|
| 130 | EXCESSIVE HEAT | 1903.00 |
| 834 | TORNADO | 1752.00 |
| 153 | FLASH FLOOD | 978.00 |
| 275 | HEAT | 937.00 |
| 464 | LIGHTNING | 816.00 |
| 170 | FLOOD | 470.00 |
| 585 | RIP CURRENT | 368.00 |
| 856 | TSTM WIND | 327.00 |
| 359 | HIGH WIND | 248.00 |
| 19 | AVALANCHE | 224.00 |
Average Fatalities
# Print the top 10 most injuring weather types, by mean injuries
print(xtable(fatByType_mean[1:10,1:2]), type='html')
| EVTYPE | FATALITIES | |
|---|---|---|
| 842 | TORNADOES, TSTM WIND, HAIL | 25.00 |
| 72 | COLD AND SNOW | 14.00 |
| 851 | TROPICAL STORM GORDON | 8.00 |
| 580 | RECORD/EXCESSIVE HEAT | 5.67 |
| 142 | EXTREME HEAT | 4.36 |
| 279 | HEAT WAVE DROUGHT | 4.00 |
| 373 | HIGH WIND/SEAS | 4.00 |
| 487 | MARINE MISHAP | 3.50 |
| 976 | WINTER STORMS | 3.33 |
| 340 | Heavy surf and wind | 3.00 |
As you can see, the top ten events in terms of total deaths are pretty similar to the top event types in terms of average deaths. Excessive heat, tornadoes and floods have caused the most deaths overall since 1990, with tornadoes, cold/snow, and heat waves being the most consistently fatal. It is worth noting that there is significant redundancy in the reported weather types (e.g. excessive heat, extreme heat, heat waves) and proper treatment of these event types would increase the accuracy of this analysis.
Now let’s combine the injury and fatality data and look at the aggregate as a reflection of population health effects.
allByType_total <- merge(injByType_total, fatByType_total, by="EVTYPE")
allByType_total$TOTAL <- allByType_total$INJURIES + allByType_total$FATALITIES
allByType_total <- allByType_total[order(-allByType_total$TOTAL),]
allByType_mean <- merge(injByType_mean, fatByType_mean, by="EVTYPE")
allByType_mean$TOTAL <- allByType_mean$INJURIES + allByType_mean$FATALITIES
allByType_mean <- allByType_mean[order(-allByType_mean$TOTAL),]
We then print the top 10 by mean and total, sorted by the sum of injuries and fatalities caused aka the total health events
Total Health Events
print(xtable(allByType_total[1:10,1:4]), type='html')
| EVTYPE | INJURIES | FATALITIES | TOTAL | |
|---|---|---|---|---|
| 834 | TORNADO | 26674.00 | 1752.00 | 28426.00 |
| 130 | EXCESSIVE HEAT | 6525.00 | 1903.00 | 8428.00 |
| 170 | FLOOD | 6789.00 | 470.00 | 7259.00 |
| 464 | LIGHTNING | 5230.00 | 816.00 | 6046.00 |
| 856 | TSTM WIND | 5022.00 | 327.00 | 5349.00 |
| 275 | HEAT | 2100.00 | 937.00 | 3037.00 |
| 153 | FLASH FLOOD | 1777.00 | 978.00 | 2755.00 |
| 427 | ICE STORM | 1975.00 | 89.00 | 2064.00 |
| 760 | THUNDERSTORM WIND | 1488.00 | 133.00 | 1621.00 |
| 972 | WINTER STORM | 1321.00 | 206.00 | 1527.00 |
Mean Total Health Events
print(xtable(allByType_mean[1:10,1:4]), type='html')
| EVTYPE | INJURIES | FATALITIES | TOTAL | |
|---|---|---|---|---|
| 277 | Heat Wave | 70.00 | 0.00 | 70.00 |
| 851 | TROPICAL STORM GORDON | 43.00 | 8.00 | 51.00 |
| 954 | WILD FIRES | 37.50 | 0.75 | 38.25 |
| 821 | THUNDERSTORMW | 27.00 | 0.00 | 27.00 |
| 842 | TORNADOES, TSTM WIND, HAIL | 0.00 | 25.00 | 25.00 |
| 366 | HIGH WIND AND SEAS | 20.00 | 3.00 | 23.00 |
| 279 | HEAT WAVE DROUGHT | 15.00 | 4.00 | 19.00 |
| 656 | SNOW/HIGH WINDS | 18.00 | 0.00 | 18.00 |
| 973 | WINTER STORM HIGH WINDS | 15.00 | 1.00 | 16.00 |
| 411 | HURRICANE/TYPHOON | 14.49 | 0.73 | 15.22 |
We can visualize this in a two-panel horizontal barplot of the most harmful weather types (in descending order, sorted by total harm and total mean harm).
# Prepare the data to be plotted
par(mar=c(5,7,3,3), mfrow=c(1,2))
colorsUsed <- c("slateblue","red4")
# Plot total effects
barplot(t(as.matrix(allByType_total[20:1,2:3])),
horiz=T,
names.arg = allByType_total[20:1,1],
las=1,
main="Total population effects (fatalities/injuries)\n due to weather events since 1990",
xlab="Total injuries/fatalities",
col=colorsUsed,
cex.main=0.8,
cex.names = 0.7
)
legend("right", col=colorsUsed, legend = c("Injuries","Fatalities"), pch=19)
# Plot mean effects
barplot(t(as.matrix(allByType_mean[20:1,2:3])),
horiz=T,
names.arg = allByType_mean[20:1,1],
las=1,
main="Average population effects (fatalities/injuries)\n due to weather events since 1990",
xlab="Mean injuries/fatalities per event",
col=colorsUsed,
cex.main=0.8,
cex.names=0.7
)
legend("right", col=colorsUsed, legend = c("Injuries","Fatalities"), pch=19)
Excessive heat, tornadoes and floods have caused the most population effects overall since 1990, with heat waves, tropical storms, and wildfires being the most consistently damaging (injuries/fatalities.
Across the United States, which types of events have the greatest economic consequences?
First, let’s aggregate property damage by weather type and take a look at the total/average damage for each type of weather event.
# Aggregate by type and sort descending by # of injuries
propByType_total <- aggregate(PROPDMGabs ~ EVTYPE, data=weather, sum, na.rm=T)
propByType_total <- propByType_total[order(-propByType_total$PROPDMGabs),]
propByType_mean <- aggregate(PROPDMGabs ~ EVTYPE, data=weather, mean, na.rm=T)
propByType_mean <- propByType_mean[order(-propByType_mean$PROPDMGabs),]
tail(propByType_total)
## EVTYPE PROPDMGabs
## 974 WINTER STORM/HIGH WIND 0
## 975 WINTER STORM/HIGH WINDS 0
## 977 Winter Weather 0
## 981 WINTERY MIX 0
## 982 Wintry mix 0
## 985 WND 0
We also check how many of them result in any property damage at all - cursory glances at the table show a lot of zeroes.
# Check how many weather event types result in any recorded injuries
propTypes <- sum(propByType_total$PROPDMGabs>0)/nrow(propByType_total)*100
41.2182741% of weather types results in recorded property damage.
Next, let’s actually take a look at the most damaging weather types. We do this with a table to enumerate the top 10 most damaging weather types, in terms of total and average property dagmage reported/recorded.
Total Property Damage
# Print the top 10 most damaging weather types, by total damage
print(xtable(propByType_total[1:10,1:2]), type='html')
| EVTYPE | PROPDMGabs | |
|---|---|---|
| 170 | FLOOD | 144657709807.00 |
| 411 | HURRICANE/TYPHOON | 69305840000.00 |
| 670 | STORM SURGE | 43323536000.00 |
| 834 | TORNADO | 30468735506.50 |
| 153 | FLASH FLOOD | 16822673978.50 |
| 244 | HAIL | 15735267512.70 |
| 402 | HURRICANE | 11868319010.00 |
| 848 | TROPICAL STORM | 7703890550.00 |
| 972 | WINTER STORM | 6688497251.00 |
| 359 | HIGH WIND | 5270046295.00 |
Average Property Damage
# Print the top 10 most damaging weather types, by mean damage
print(xtable(propByType_mean[1:10,1:2]), type='html')
| EVTYPE | PROPDMGabs | |
|---|---|---|
| 842 | TORNADOES, TSTM WIND, HAIL | 1600000000.00 |
| 298 | HEAVY RAIN/SEVERE WEATHER | 1250000000.00 |
| 411 | HURRICANE/TYPHOON | 787566363.64 |
| 409 | HURRICANE OPAL | 352538444.44 |
| 670 | STORM SURGE | 165990559.39 |
| 954 | WILD FIRES | 156025000.00 |
| 410 | HURRICANE OPAL/HIGH WINDS | 100000000.00 |
| 604 | SEVERE THUNDERSTORM | 92720000.00 |
| 271 | HAILSTORM | 80333333.33 |
| 402 | HURRICANE | 68208729.94 |
As you can see, the top ten events in terms of total property damage differ slightly from the top event types in terms of average property damage. Floods, hurricanes, and storm surges have caused the most damage overall since 1990, but tornadoes, heavy rain, and hurricanes are the most consistently damaging to property.
First, let’s aggregate crop damage by weather type and take a look at the total/average recorded damage for each type of weather event.
# Aggregate by type and sort descending by # of fatalities
cropByType_total <- aggregate(CROPDMGabs ~ EVTYPE, data=weather, sum, na.rm=T)
cropByType_total <- cropByType_total[order(-cropByType_total$CROPDMGabs),]
cropByType_mean <- aggregate(CROPDMGabs ~ EVTYPE, data=weather, mean, na.rm=T)
cropByType_mean <- cropByType_mean[order(-cropByType_mean$CROPDMGabs),]
tail(cropByType_total)
## EVTYPE CROPDMGabs
## 980 WINTER WEATHER/MIX 0
## 981 WINTERY MIX 0
## 982 Wintry mix 0
## 983 Wintry Mix 0
## 984 WINTRY MIX 0
## 985 WND 0
We also check how many of them result in any crop damage at all, as again, cursory glances at the table show a lot of zeroes.
# Check how many weather event types result in any recorded injuries
cropTypes <- sum(cropByType_total$CROPDMGabs>0)/nrow(cropByType_total)*100
Compared to property damage, less crop damage is reported, with only 13.8071066% of weather types resulting in crop damage.
Next, let’s actually take a look at the most damaging weather types. We do this with a table to enumerate the top 10 most damaging weather types, in terms of total and average crop damage reported/recorded.
Total Crop Damage
# Print the top 10 most damaging weather types, by total crop damage
print(xtable(cropByType_total[1:10,1:2]), type='html')
| EVTYPE | CROPDMGabs | |
|---|---|---|
| 95 | DROUGHT | 13972566000.00 |
| 170 | FLOOD | 5661968450.00 |
| 590 | RIVER FLOOD | 5029459000.00 |
| 427 | ICE STORM | 5022113500.00 |
| 244 | HAIL | 3025954473.00 |
| 402 | HURRICANE | 2741910000.00 |
| 411 | HURRICANE/TYPHOON | 2607872800.00 |
| 153 | FLASH FLOOD | 1421317100.00 |
| 140 | EXTREME COLD | 1292973000.00 |
| 212 | FROST/FREEZE | 1094086000.00 |
Average Crop Damage
# Print the top 10 most damaging weather types, by mean crop damage
print(xtable(cropByType_mean[1:10,1:2]), type='html')
| EVTYPE | CROPDMGabs | |
|---|---|---|
| 136 | EXCESSIVE WETNESS | 142000000.00 |
| 73 | COLD AND WET CONDITIONS | 66000000.00 |
| 87 | DAMAGING FREEZE | 43683333.33 |
| 121 | Early Frost | 42000000.00 |
| 411 | HURRICANE/TYPHOON | 29634918.18 |
| 590 | RIVER FLOOD | 29072017.34 |
| 406 | HURRICANE ERIN | 19430000.00 |
| 182 | FLOOD/RAIN/WINDS | 18800000.00 |
| 86 | Damaging Freeze | 17065000.00 |
| 402 | HURRICANE | 15758103.45 |
Drought, floods and ice storms have caused the most damage overall since 1990, with excessive wetness, damaging frost, and hurricanes being the most consistently damaging. It is again worth noting that there is significant redundancy in the reported weather types and proper treatment of these event types would increase the accuracy of this analysis.
Now let’s combine the crop and property damage data and look at the aggregate as a reflection of population health effects.
dmgByType_total <- merge(cropByType_total, propByType_total, by="EVTYPE")
dmgByType_total$TOTAL <- dmgByType_total$CROPDMGabs + dmgByType_total$PROPDMGabs
dmgByType_total <- dmgByType_total[order(-dmgByType_total$TOTAL),]
dmgByType_mean <- merge(cropByType_mean, propByType_mean, by="EVTYPE")
dmgByType_mean$TOTAL <- dmgByType_mean$CROPDMGabs + dmgByType_mean$PROPDMGabs
dmgByType_mean <- dmgByType_mean[order(-dmgByType_mean$TOTAL),]
We then print the top 10 by mean and total, sorted by the sum of damages caused.
Total damage caused since 1990
print(xtable(dmgByType_total[1:10,1:4]), type='html')
| EVTYPE | CROPDMGabs | PROPDMGabs | TOTAL | |
|---|---|---|---|---|
| 170 | FLOOD | 5661968450.00 | 144657709807.00 | 150319678257.00 |
| 411 | HURRICANE/TYPHOON | 2607872800.00 | 69305840000.00 | 71913712800.00 |
| 670 | STORM SURGE | 5000.00 | 43323536000.00 | 43323541000.00 |
| 834 | TORNADO | 414953270.00 | 30468735506.50 | 30883688776.50 |
| 244 | HAIL | 3025954473.00 | 15735267512.70 | 18761221985.70 |
| 153 | FLASH FLOOD | 1421317100.00 | 16822673978.50 | 18243991078.50 |
| 95 | DROUGHT | 13972566000.00 | 1046106000.00 | 15018672000.00 |
| 402 | HURRICANE | 2741910000.00 | 11868319010.00 | 14610229010.00 |
| 590 | RIVER FLOOD | 5029459000.00 | 5118945500.00 | 10148404500.00 |
| 427 | ICE STORM | 5022113500.00 | 3944927860.00 | 8967041360.00 |
Mean damage caused per weather event
print(xtable(dmgByType_mean[1:10,1:4]), type='html')
| EVTYPE | CROPDMGabs | PROPDMGabs | TOTAL | |
|---|---|---|---|---|
| 842 | TORNADOES, TSTM WIND, HAIL | 2500000.00 | 1600000000.00 | 1602500000.00 |
| 298 | HEAVY RAIN/SEVERE WEATHER | 0.00 | 1250000000.00 | 1250000000.00 |
| 411 | HURRICANE/TYPHOON | 29634918.18 | 787566363.64 | 817201281.82 |
| 409 | HURRICANE OPAL | 2111111.11 | 352538444.44 | 354649555.56 |
| 670 | STORM SURGE | 19.16 | 165990559.39 | 165990578.54 |
| 954 | WILD FIRES | 0.00 | 156025000.00 | 156025000.00 |
| 136 | EXCESSIVE WETNESS | 142000000.00 | 0.00 | 142000000.00 |
| 410 | HURRICANE OPAL/HIGH WINDS | 10000000.00 | 100000000.00 | 110000000.00 |
| 604 | SEVERE THUNDERSTORM | 15384.62 | 92720000.00 | 92735384.62 |
| 402 | HURRICANE | 15758103.45 | 68208729.94 | 83966833.39 |
We can visualize this in a two-panel horizontal barplot of the most harmful weather types (in descending order, sorted by total harm and total mean harm).
# Prepare the data to be plotted
par(mar=c(5,7,3,3), mfrow=c(1,2))
colorsUsed <- c("darkgreen","navajowhite4")
# Plot total effects
barplot(t(as.matrix(dmgByType_total[20:1,2:3])),
horiz=T,
names.arg = dmgByType_total[20:1,1],
las=1,
main="Total economic damage (crop/property)\n due to weather events since 1990",
xlab="Mean damage per event ($ USD)",
col=colorsUsed,
cex.main=0.8,
cex.names=0.6
)
legend("right", col=colorsUsed, legend = c("Crop Dmg","Property Dmg"), pch=19)
# Plot mean effects
barplot(t(as.matrix(dmgByType_mean[20:1,2:3])),
horiz=T,
names.arg = dmgByType_mean[20:1,1],
las=1,
main="Average economic damage (crop/property)\n due to weather events since 1990",
xlab="Mean damage per event ($ USD)",
col=colorsUsed,
cex.main=0.8,
cex.names=0.6
)
legend("right", col=colorsUsed, legend = c("Crop Dmg","Property Dmg"), pch=19)
Floods, hurricanes, and storm surges have caused the most damage overall since 1990, with tornadoes, heavy rain, and hurricanes being the most consistently damaging.
In terms of population health, excessive heat, tornadoes and floods have caused the most population effects overall since 1990, with heat waves, tropical storms, and wildfires being the most consistently damaging (injuries/fatalities).
Economically, floods, hurricanes, and storm surges have caused the most damage overall since 1990, with tornadoes, heavy rain, and hurricanes being the most consistently damaging. This is reflected in the property damage data, however if we look just at agricultural damage, drought, floods and ice storms have caused the most damage overall since 1990, with excessive wetness, damaging frost, and hurricanes being the most consistently damaging.
It is worth noting that there is significant redundancy in the reported weather types (e.g. excessive heat, extreme heat, heat waves) as well as inconsistent coding of event types over time (e.g. Wintry mix, Wintery mix, WINTRY MIX) and proper pre-treatment of these event types would increase the accuracy of this analysis.