The most destructive weather events for population health are tornadoes (by a significant margin), followed by heat. A larger proportion of people are killed by heat than by tornadoes.
The worst weather events for property and crop damage are flooding followed by hurricanes. The majority of all economic consequences are property damage.
The storm data is compressed as a bz2 file, which can be read directly into R.
storms <- read.csv("repdata_data_StormData.csv.bz2")
head(storms)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
The variables we are interested in to determine the events that pose the biggest threat to population health are EVTYPE (event type - class character), FATALITIES (class numeric), and INJURIES (class numeric). These variables are taken out of the main data table and stored separately for ease of analysis.
health <- subset(storms, select = c(EVTYPE, FATALITIES, INJURIES))
head(health)
## EVTYPE FATALITIES INJURIES
## 1 TORNADO 0 15
## 2 TORNADO 0 0
## 3 TORNADO 0 2
## 4 TORNADO 0 2
## 5 TORNADO 0 2
## 6 TORNADO 0 6
The fatalities and injuries are summed up over each event type and the outputs combined in one data table called healthsummary.
healthsummary <- aggregate(cbind(health$FATALITIES, health$INJURIES), list(health$EVTYPE), sum)
colnames(healthsummary) <- c("event", "fatalities", "injuries")
head(healthsummary)
## event fatalities injuries
## 1 HIGH SURF ADVISORY 0 0
## 2 COASTAL FLOOD 0 0
## 3 FLASH FLOOD 0 0
## 4 LIGHTNING 0 0
## 5 TSTM WIND 0 0
## 6 TSTM WIND (G45) 0 0
Rows that do not contain any fatalities and injuries are removed. Then fatalities and injuries are summed up and the total is placed in a new column.
smallhealthsummary <- subset(healthsummary, fatalities !=0 | injuries != 0)
smallhealthsummary$total <- smallhealthsummary$fatalities + smallhealthsummary$injuries
head(smallhealthsummary)
## event fatalities injuries total
## 18 AVALANCE 1 0 1
## 19 AVALANCHE 224 170 394
## 29 BLACK ICE 1 24 25
## 30 BLIZZARD 101 805 906
## 42 blowing snow 1 1 2
## 44 BLOWING SNOW 1 13 14
There are many rows in “smallhealthsummary” that could be combined. To determine how precisely the data need to be processed, the row with the maximum total fatalities and injuries is pulled.
smallhealthsummary[which.max(smallhealthsummary$total), ]
## event fatalities injuries total
## 834 TORNADO 5633 91346 96979
Since the maximum total fatalities and injuries total to nearly 100,000, any row that has less than 1,000 total fatalities and injuries, or less than 1% of the maximum, will be culled from the data set. This estimation should not skew the data analysis too heavily, while simplifying the data table significantly.
finalhealthsummary <- subset(smallhealthsummary, total > 1000)
print(finalhealthsummary)
## event fatalities injuries total
## 130 EXCESSIVE HEAT 1903 6525 8428
## 153 FLASH FLOOD 978 1777 2755
## 170 FLOOD 470 6789 7259
## 244 HAIL 15 1361 1376
## 275 HEAT 937 2100 3037
## 310 HEAVY SNOW 127 1021 1148
## 359 HIGH WIND 248 1137 1385
## 411 HURRICANE/TYPHOON 64 1275 1339
## 427 ICE STORM 89 1975 2064
## 464 LIGHTNING 816 5230 6046
## 760 THUNDERSTORM WIND 133 1488 1621
## 834 TORNADO 5633 91346 96979
## 856 TSTM WIND 504 6957 7461
## 972 WINTER STORM 206 1321 1527
This data table still contains a few rows with data that need to be combined. For instance, “Flood” and “Flash Flood” are combined as “Flood”.
finalhealthsummary$event <- replace(finalhealthsummary$event, 1, "HEAT")
finalhealthsummary$event <- replace(finalhealthsummary$event, 2, "FLOOD")
finalhealthsummary$event <- replace(finalhealthsummary$event, 6, "WINTER STORM")
finalhealthsummary$event <- replace(finalhealthsummary$event, 7, "WIND")
finalhealthsummary$event <- replace(finalhealthsummary$event, 8, "HURRICANE")
finalhealthsummary$event <- replace(finalhealthsummary$event, 9, "WINTER STORM")
finalhealthsummary$event <- replace(finalhealthsummary$event, 11, "WIND")
finalhealthsummary$event <- replace(finalhealthsummary$event, 13, "WIND")
print(finalhealthsummary)
## event fatalities injuries total
## 130 HEAT 1903 6525 8428
## 153 FLOOD 978 1777 2755
## 170 FLOOD 470 6789 7259
## 244 HAIL 15 1361 1376
## 275 HEAT 937 2100 3037
## 310 WINTER STORM 127 1021 1148
## 359 WIND 248 1137 1385
## 411 HURRICANE 64 1275 1339
## 427 WINTER STORM 89 1975 2064
## 464 LIGHTNING 816 5230 6046
## 760 WIND 133 1488 1621
## 834 TORNADO 5633 91346 96979
## 856 WIND 504 6957 7461
## 972 WINTER STORM 206 1321 1527
These values are then summed up to make a table with unique rows.
finalhealthsummary <- aggregate(cbind(finalhealthsummary$fatalities, finalhealthsummary$injuries, finalhealthsummary$total), list(finalhealthsummary$event), sum)
colnames(finalhealthsummary) <- c("event", "fatalities", "injuries", "total")
print(finalhealthsummary)
## event fatalities injuries total
## 1 FLOOD 1448 8566 10014
## 2 HAIL 15 1361 1376
## 3 HEAT 2840 8625 11465
## 4 HURRICANE 64 1275 1339
## 5 LIGHTNING 816 5230 6046
## 6 TORNADO 5633 91346 96979
## 7 WIND 885 9582 10467
## 8 WINTER STORM 422 4317 4739
The variables we are interested in to determine the events that pose the biggest threat to population health are EVTYPE (event type - class character), PROPDMG (class numeric), PROPDMGEXP (class character), CROPDMG (class numeric), and CROPDMGEXP (class character). These variables are taken out of the main data table and stored separately for ease of analysis.
damage <- subset(storms, select = c(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP))
nrow(damage)
## [1] 902297
head(damage)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 25.0 K 0
## 2 TORNADO 2.5 K 0
## 3 TORNADO 25.0 K 0
## 4 TORNADO 2.5 K 0
## 5 TORNADO 2.5 K 0
## 6 TORNADO 2.5 K 0
Because the damage is listed with various exponents, the values of PROPDMG and CROPDMG cannot simply be summed to get the total amount of damage for each event. First, PROPDMG and CROPDMG will have to be converted to their actual amounts, taking into account the exponent. To work with as little data as possible, first any rows that contain no value for PROPDMG and CROPDMG are removed.
smalldamage <- subset(damage, PROPDMG !=0 | CROPDMG != 0)
nrow(smalldamage)
## [1] 245031
head(smalldamage)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 25.0 K 0
## 2 TORNADO 2.5 K 0
## 3 TORNADO 25.0 K 0
## 4 TORNADO 2.5 K 0
## 5 TORNADO 2.5 K 0
## 6 TORNADO 2.5 K 0
Next, it would be nice to have a sense of what exponents are listed and how many of each there are. To do this, the dplyr package is used.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked _by_ '.GlobalEnv':
##
## storms
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
The exponent options and counts for property damage and crop damage are shown below:
smalldamage %>% count(PROPDMGEXP)
## PROPDMGEXP n
## 1 4357
## 2 - 1
## 3 + 5
## 4 0 209
## 5 2 1
## 6 3 1
## 7 4 4
## 8 5 18
## 9 6 3
## 10 7 2
## 11 B 40
## 12 h 1
## 13 H 6
## 14 K 229057
## 15 m 7
## 16 M 11319
smalldamage %>% count(CROPDMGEXP)
## CROPDMGEXP n
## 1 145037
## 2 ? 6
## 3 0 17
## 4 B 7
## 5 k 21
## 6 K 97960
## 7 m 1
## 8 M 1982
The storm data documentation states on page 12 the allowable entries for an exponent are “K” for thousands, “M” for millions, and “B” for billions. The rest of the various exponents entered will be ignored and replaced as missing data. The column will then be converted to a numerical class. The code and data for the PROPDMGEXP column is as follows:
# Replacing the valid exponents with numbers
smalldamage$PROPDMGEXP <- replace(smalldamage$PROPDMGEXP, smalldamage$PROPDMGEXP == "K", 1000)
smalldamage$PROPDMGEXP <- replace(smalldamage$PROPDMGEXP, smalldamage$PROPDMGEXP == "M", 1000000)
smalldamage$PROPDMGEXP <- replace(smalldamage$PROPDMGEXP, smalldamage$PROPDMGEXP == "m", 1000000)
smalldamage$PROPDMGEXP <- replace(smalldamage$PROPDMGEXP, smalldamage$PROPDMGEXP == "B", 1000000000)
# Replacing the invalid exponent entries with NA
smalldamage$PROPDMGEXP <- replace(smalldamage$PROPDMGEXP, smalldamage$PROPDMGEXP == "", "NA")
smalldamage$PROPDMGEXP <- replace(smalldamage$PROPDMGEXP, smalldamage$PROPDMGEXP == "-", "NA")
smalldamage$PROPDMGEXP <- replace(smalldamage$PROPDMGEXP, smalldamage$PROPDMGEXP == "+", "NA")
smalldamage$PROPDMGEXP <- replace(smalldamage$PROPDMGEXP, smalldamage$PROPDMGEXP == "0", "NA")
smalldamage$PROPDMGEXP <- replace(smalldamage$PROPDMGEXP, smalldamage$PROPDMGEXP == "2", "NA")
smalldamage$PROPDMGEXP <- replace(smalldamage$PROPDMGEXP, smalldamage$PROPDMGEXP == "3", "NA")
smalldamage$PROPDMGEXP <- replace(smalldamage$PROPDMGEXP, smalldamage$PROPDMGEXP == "4", "NA")
smalldamage$PROPDMGEXP <- replace(smalldamage$PROPDMGEXP, smalldamage$PROPDMGEXP == "5", "NA")
smalldamage$PROPDMGEXP <- replace(smalldamage$PROPDMGEXP, smalldamage$PROPDMGEXP == "6", "NA")
smalldamage$PROPDMGEXP <- replace(smalldamage$PROPDMGEXP, smalldamage$PROPDMGEXP == "7", "NA")
smalldamage$PROPDMGEXP <- replace(smalldamage$PROPDMGEXP, smalldamage$PROPDMGEXP == "h", "NA")
smalldamage$PROPDMGEXP <- replace(smalldamage$PROPDMGEXP, smalldamage$PROPDMGEXP == "H", "NA")
smalldamage$PROPDMGEXP <- as.numeric(smalldamage$PROPDMGEXP)
## Warning: NAs introduced by coercion
smalldamage %>% count(PROPDMGEXP)
## PROPDMGEXP n
## 1 1e+03 229057
## 2 1e+06 11326
## 3 1e+09 40
## 4 NA 4608
class(smalldamage$PROPDMGEXP)
## [1] "numeric"
The same is done for the CROPDMGEXP column.
# Replacing the valid exponents with numbers
smalldamage$CROPDMGEXP <- replace(smalldamage$CROPDMGEXP, smalldamage$CROPDMGEXP == "K", 1000)
smalldamage$CROPDMGEXP <- replace(smalldamage$CROPDMGEXP, smalldamage$CROPDMGEXP == "k", 1000)
smalldamage$CROPDMGEXP <- replace(smalldamage$CROPDMGEXP, smalldamage$CROPDMGEXP == "M", 1000000)
smalldamage$CROPDMGEXP <- replace(smalldamage$CROPDMGEXP, smalldamage$CROPDMGEXP == "m", 1000000)
smalldamage$CROPDMGEXP <- replace(smalldamage$CROPDMGEXP, smalldamage$CROPDMGEXP == "B", 1000000000)
# Replacing the invalid exponent entries with NA
smalldamage$CROPDMGEXP <- replace(smalldamage$CROPDMGEXP, smalldamage$CROPDMGEXP == "", "NA")
smalldamage$CROPDMGEXP <- replace(smalldamage$CROPDMGEXP, smalldamage$CROPDMGEXP == "?", "NA")
smalldamage$CROPDMGEXP <- replace(smalldamage$CROPDMGEXP, smalldamage$CROPDMGEXP == "0", "NA")
smalldamage$CROPDMGEXP <- as.numeric(smalldamage$CROPDMGEXP)
## Warning: NAs introduced by coercion
smalldamage %>% count(CROPDMGEXP)
## CROPDMGEXP n
## 1 1e+03 97981
## 2 1e+06 1983
## 3 1e+09 7
## 4 NA 145060
class(smalldamage$CROPDMGEXP)
## [1] "numeric"
From here, the actual damage value is calculated for both, with each being assigned to a new column.
smalldamage$PROPDMGVALUE <- smalldamage$PROPDMG * smalldamage$PROPDMGEXP
smalldamage$CROPDMGVALUE <- smalldamage$CROPDMG * smalldamage$CROPDMGEXP
head(smalldamage)
## EVTYPE PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP PROPDMGVALUE CROPDMGVALUE
## 1 TORNADO 25.0 1000 0 NA 25000 NA
## 2 TORNADO 2.5 1000 0 NA 2500 NA
## 3 TORNADO 25.0 1000 0 NA 25000 NA
## 4 TORNADO 2.5 1000 0 NA 2500 NA
## 5 TORNADO 2.5 1000 0 NA 2500 NA
## 6 TORNADO 2.5 1000 0 NA 2500 NA
The property damage and crop damage values are summed up over each event type and the outputs combined in one data table called damagesummary. NA values are ignored for these calculations.
damagesummary <- aggregate(cbind(smalldamage$PROPDMGVALUE, smalldamage$CROPDMGVALUE), list(smalldamage$EVTYPE), sum, na.rm=TRUE)
colnames(damagesummary) <- c("event", "propdmg", "cropdmg")
head(damagesummary)
## event propdmg cropdmg
## 1 HIGH SURF ADVISORY 200000 0
## 2 FLASH FLOOD 50000 0
## 3 TSTM WIND 8100000 0
## 4 TSTM WIND (G45) 8000 0
## 5 ? 5000 0
## 6 AGRICULTURAL FREEZE 0 28820000
The property damage and crop damage values are added up and the total is placed in a new column.
damagesummary$total <- damagesummary$propdmg + damagesummary$cropdmg
head(damagesummary)
## event propdmg cropdmg total
## 1 HIGH SURF ADVISORY 200000 0 200000
## 2 FLASH FLOOD 50000 0 50000
## 3 TSTM WIND 8100000 0 8100000
## 4 TSTM WIND (G45) 8000 0 8000
## 5 ? 5000 0 5000
## 6 AGRICULTURAL FREEZE 0 28820000 28820000
As with the population health data, there are many rows in damagesummary that could be combined. To determine how precisely the data need to be processed, the row with the maximum total damage value is pulled.
damagesummary[which.max(damagesummary$total), ]
## event propdmg cropdmg total
## 72 FLOOD 144657709800 5661968450 150319678250
Since the maximum property damages and crop damages total to nearly $150 trillion, any row that has less than $1,500,000,000 total damages, or less than 1% of the maximum, will be culled from the data set. This estimation should not skew the data analysis too heavily, while simplifying the data table significantly.
smalldamagesummary <- subset(damagesummary, total > 1500000000)
print(smalldamagesummary)
## event propdmg cropdmg total
## 39 DROUGHT 1046106000 13972566000 15018672000
## 59 FLASH FLOOD 16140811510 1421317100 17562128610
## 72 FLOOD 144657709800 5661968450 150319678250
## 116 HAIL 15732266720 3025954450 18758221170
## 142 HEAVY RAIN/SEVERE WEATHER 2500000000 0 2500000000
## 174 HIGH WIND 5270046260 638571300 5908617560
## 189 HURRICANE 11868319010 2741910000 14610229010
## 195 HURRICANE OPAL 3172846000 19000000 3191846000
## 197 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 206 ICE STORM 3944927810 5022113500 8967041310
## 262 RIVER FLOOD 5118945500 5029459000 10148404500
## 299 STORM SURGE 43323536000 5000 43323541000
## 300 STORM SURGE/TIDE 4641188000 850000 4642038000
## 313 THUNDERSTORM WIND 3483121140 414843050 3897964190
## 328 THUNDERSTORM WINDS 1735952850 190654700 1926607550
## 354 TORNADO 56937160480 414953110 57352113590
## 360 TORNADOES, TSTM WIND, HAIL 1600000000 2500000 1602500000
## 363 TROPICAL STORM 7703890550 678346000 8382236550
## 369 TSTM WIND 4484928440 554007350 5038935790
## 412 WILD/FOREST FIRE 3001829500 106796830 3108626330
## 414 WILDFIRE 4765114000 295472800 5060586800
## 424 WINTER STORM 6688497250 26944000 6715441250
This data table still contains a few rows with data that need to be combined. For instance, “Flood” and “Flash Flood” are combined as “Flood”.
smalldamagesummary$event <- replace(smalldamagesummary$event, 2, "FLOOD")
smalldamagesummary$event <- replace(smalldamagesummary$event, 5, "WIND")
smalldamagesummary$event <- replace(smalldamagesummary$event, 6, "WIND")
smalldamagesummary$event <- replace(smalldamagesummary$event, 8, "HURRICANE")
smalldamagesummary$event <- replace(smalldamagesummary$event, 9, "HURRICANE")
smalldamagesummary$event <- replace(smalldamagesummary$event, 10, "WINTER STORM")
smalldamagesummary$event <- replace(smalldamagesummary$event, 11, "FLOOD")
smalldamagesummary$event <- replace(smalldamagesummary$event, 13, "STORM SURGE")
smalldamagesummary$event <- replace(smalldamagesummary$event, 14, "WIND")
smalldamagesummary$event <- replace(smalldamagesummary$event, 15, "WIND")
smalldamagesummary$event <- replace(smalldamagesummary$event, 17, "TORNADO")
smalldamagesummary$event <- replace(smalldamagesummary$event, 18, "HURRICANE")
smalldamagesummary$event <- replace(smalldamagesummary$event, 19, "WIND")
smalldamagesummary$event <- replace(smalldamagesummary$event, 20, "WILDFIRE")
print(smalldamagesummary)
## event propdmg cropdmg total
## 39 DROUGHT 1046106000 13972566000 15018672000
## 59 FLOOD 16140811510 1421317100 17562128610
## 72 FLOOD 144657709800 5661968450 150319678250
## 116 HAIL 15732266720 3025954450 18758221170
## 142 WIND 2500000000 0 2500000000
## 174 WIND 5270046260 638571300 5908617560
## 189 HURRICANE 11868319010 2741910000 14610229010
## 195 HURRICANE 3172846000 19000000 3191846000
## 197 HURRICANE 69305840000 2607872800 71913712800
## 206 WINTER STORM 3944927810 5022113500 8967041310
## 262 FLOOD 5118945500 5029459000 10148404500
## 299 STORM SURGE 43323536000 5000 43323541000
## 300 STORM SURGE 4641188000 850000 4642038000
## 313 WIND 3483121140 414843050 3897964190
## 328 WIND 1735952850 190654700 1926607550
## 354 TORNADO 56937160480 414953110 57352113590
## 360 TORNADO 1600000000 2500000 1602500000
## 363 HURRICANE 7703890550 678346000 8382236550
## 369 WIND 4484928440 554007350 5038935790
## 412 WILDFIRE 3001829500 106796830 3108626330
## 414 WILDFIRE 4765114000 295472800 5060586800
## 424 WINTER STORM 6688497250 26944000 6715441250
These values are then summed up to make a table with unique rows.
finaldamagesummary <- aggregate(cbind(smalldamagesummary$propdmg, smalldamagesummary$cropdmg, smalldamagesummary$total), list(smalldamagesummary$event), sum)
colnames(finaldamagesummary) <- c("event", "propdmg", "cropdmg", "total")
print(finaldamagesummary)
## event propdmg cropdmg total
## 1 DROUGHT 1046106000 13972566000 15018672000
## 2 FLOOD 165917466810 12112744550 178030211360
## 3 HAIL 15732266720 3025954450 18758221170
## 4 HURRICANE 92050895560 6047128800 98098024360
## 5 STORM SURGE 47964724000 855000 47965579000
## 6 TORNADO 58537160480 417453110 58954613590
## 7 WILDFIRE 7766943500 402269630 8169213130
## 8 WIND 17474048690 1798076400 19272125090
## 9 WINTER STORM 10633425060 5049057500 15682482560
The data for fatalities and injuries is formatted to make it easy to plot a bargraph. First, the log10 of the fatalities and injuries columns are taken to condense the range of values that must be plotted. The two log values are added together and sorted so that when the values are plotted, the events are ordered from least hazardous to most hazardous.
# Take log10 of both fatalities and injuries columns
finalhealthsummary$fatalities <- log10(finalhealthsummary$fatalities)
finalhealthsummary$injuries <- log10(finalhealthsummary$injuries)
# Add values and order data
finalhealthsummary$total <- finalhealthsummary$fatalities + finalhealthsummary$injuries
finalhealthsummary <- finalhealthsummary[order(finalhealthsummary$total, decreasing=TRUE), ]
# Format table so barplot can graph the data
healthplot <- rbind(finalhealthsummary$fatalities, finalhealthsummary$injuries)
colnames(healthplot) <- finalhealthsummary$event
print(healthplot)
## TORNADO HEAT FLOOD WIND LIGHTNING WINTER STORM HURRICANE
## [1,] 3.75074 3.453318 3.160769 2.946943 2.911690 2.625312 1.80618
## [2,] 4.96069 3.935759 3.932778 3.981456 3.718502 3.635182 3.10551
## HAIL
## [1,] 1.176091
## [2,] 3.133858
The data can be plotted as a stacked bar graph to reveal that tornadoes are the most hazardous weather event for population health, in both the number of fatalities and number of injuries. Heat events, which come in second, have a larger percentage of fatalities.
par(mar = c(4,11,4,2))
barplot(healthplot, main = "Most Hazardous Weather Events to Population Health",
horiz=TRUE,
xlab = "log10(Health Event)",
las=1,
legend.text=c("Fatalities", "Injuries"),
col = rainbow(2),
axes = TRUE)
title(ylab="Weather Event", line=9)
The data for property and crop damage values are also formatted to make it easy to plot a bargraph. First, the property damage value and crop damage value columns are normalized by dividing by 10,000,000,000 to put the values at smaller numbers that are easier to compare. The property and crop damage values are added together and sorted so that when the values are plotted, the events are ordered from the smallest damage to the most damage.
# Take log10 of both property damage value and crop damage value
finaldamagesummary$propdmg <- finaldamagesummary$propdmg/10000000000
finaldamagesummary$cropdmg <- finaldamagesummary$cropdmg/10000000000
# Add values and order data
finaldamagesummary$total <- finaldamagesummary$propdmg + finaldamagesummary$cropdmg
finaldamagesummary <- finaldamagesummary[order(finaldamagesummary$total, decreasing=TRUE), ]
# Format table so barplot can graph the data
damageplot <- rbind(finaldamagesummary$propdmg, finaldamagesummary$cropdmg)
colnames(damageplot) <- finaldamagesummary$event
print(damageplot)
## FLOOD HURRICANE TORNADO STORM SURGE WIND HAIL
## [1,] 16.591747 9.2050896 5.85371605 4.7964724 1.7474049 1.5732267
## [2,] 1.211274 0.6047129 0.04174531 0.0000855 0.1798076 0.3025954
## WINTER STORM DROUGHT WILDFIRE
## [1,] 1.0633425 0.1046106 0.77669435
## [2,] 0.5049058 1.3972566 0.04022696
The data can be plotted as a stacked bar graph to reveal that floods are the most destructive weather event as far as economic consequences, with the vast majority of the damage occuring to properties. Hurricane events, which come in second, also have the mast majority of damage that occurs as property damage. The only weather event whose damages largely occur to crops is drought.
par(mar = c(4,11,4,2))
barplot(damageplot, main = "Most Costly Weather Events to Property and Crops",
horiz=TRUE,
xlab = "Damage Value (Tens of Billions)",
las=1,
legend.text=c("Property Damage", "Crop Damage"),
col = rainbow(2),
axes = TRUE)
title(ylab="Weather Event", line=9)