In this report we aim to find, which weather types cause most health and finance damages. To investigate this question, we use U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database data for weather events, occurred in U.S. between 1950 and 2011. After obtaining data, we group health and finance losses by weather event types. Since there are too many duplications in weather event types, we do secondary grouping by parsing, stopwords removing, tokenizing, string distance and supervising. Then we explore grouped data, looking into top weather event types, ordered by different loss types. The analysis shows that most damages to health are done by tornados, heat, storms, floods, lightning and different types of precipitations. Generally, the same weather events are making most financial losses, with an exception that crops damages are made mostly by different types of precipitations, floods, heat, and drought.
library(ggplot2)
library(scales)
library(gridExtra)
library(stringi)
library(stringdist)
library(pander)
We download NOAA data for U.S. weather events between 1950 and 2011 in the form of a comma-separated-value file compressed via the bzip2 algorithm, by link given in assignment:
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile = "repdata_data_StormData.csv.bz2", method="auto")
data <- read.csv("repdata_data_StormData.csv.bz2")
pander(summary(data))
| STATE__ | BGN_DATE | BGN_TIME | TIME_ZONE | COUNTY | COUNTYNAME | STATE | EVTYPE | BGN_RANGE | BGN_AZI | BGN_LOCATI | END_DATE | END_TIME | COUNTY_END |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Min. : 1.0 | 5/25/2011 0:00:00: 1202 | 12:00:00 AM: 10163 | CST :547493 | Min. : 0.0 | JEFFERSON : 7840 | TX : 83728 | HAIL :288661 | Min. : 0.000 | :547332 | :287743 | :243411 | :238978 | Min. :0 |
| 1st Qu.:19.0 | 4/27/2011 0:00:00: 1193 | 06:00:00 PM: 7350 | EST :245558 | 1st Qu.: 31.0 | WASHINGTON: 7603 | KS : 53440 | TSTM WIND :219940 | 1st Qu.: 0.000 | N : 86752 | COUNTYWIDE : 19680 | 4/27/2011 0:00:00: 1214 | 06:00:00 PM: 9802 | 1st Qu.:0 |
| Median :30.0 | 6/9/2011 0:00:00 : 1030 | 04:00:00 PM: 7261 | MST : 68390 | Median : 75.0 | JACKSON : 6660 | OK : 46802 | THUNDERSTORM WIND: 82563 | Median : 0.000 | W : 38446 | Countywide : 993 | 5/25/2011 0:00:00: 1196 | 05:00:00 PM: 8314 | Median :0 |
| Mean :31.2 | 5/30/2004 0:00:00: 1016 | 05:00:00 PM: 6891 | PST : 28302 | Mean :100.6 | FRANKLIN : 6256 | MO : 35648 | TORNADO : 60652 | Mean : 1.484 | S : 37558 | SPRINGFIELD : 843 | 6/9/2011 0:00:00 : 1021 | 04:00:00 PM: 8104 | Mean :0 |
| 3rd Qu.:45.0 | 4/4/2011 0:00:00 : 1009 | 12:00:00 PM: 6703 | AST : 6360 | 3rd Qu.:131.0 | LINCOLN : 5937 | IA : 31069 | FLASH FLOOD : 54277 | 3rd Qu.: 1.000 | E : 33178 | SOUTH PORTION: 810 | 4/4/2011 0:00:00 : 1007 | 12:00:00 PM: 7483 | 3rd Qu.:0 |
| Max. :95.0 | 4/2/2006 0:00:00 : 981 | 03:00:00 PM: 6700 | HST : 2563 | Max. :873.0 | MADISON : 5632 | NE : 30271 | FLOOD : 25326 | Max. :3749.000 | NW : 24041 | NORTH PORTION: 784 | 5/30/2004 0:00:00: 998 | 11:59:00 PM: 7184 | Max. :0 |
| NA | (Other) :895866 | (Other) :857229 | (Other): 3631 | NA | (Other) :862369 | (Other):621339 | (Other) :170878 | NA | (Other):134990 | (Other) :591444 | (Other) :653450 | (Other) :622432 | NA |
| COUNTYENDN | END_RANGE | END_AZI | END_LOCATI | LENGTH | WIDTH | F | MAG | FATALITIES | INJURIES | PROPDMG | PROPDMGEXP | CROPDMG | CROPDMGEXP | WFO |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mode:logical | Min. : 0.0000 | :724837 | :499225 | Min. : 0.0000 | Min. : 0.000 | Min. :0.0 | Min. : 0.0 | Min. : 0.0000 | Min. : 0.0000 | Min. : 0.00 | :465934 | Min. : 0.000 | :618413 | :142069 |
| NA’s:902297 | 1st Qu.: 0.0000 | N : 28082 | COUNTYWIDE : 19731 | 1st Qu.: 0.0000 | 1st Qu.: 0.000 | 1st Qu.:0.0 | 1st Qu.: 0.0 | 1st Qu.: 0.0000 | 1st Qu.: 0.0000 | 1st Qu.: 0.00 | K :424665 | 1st Qu.: 0.000 | K :281832 | OUN : 17393 |
| NA | Median : 0.0000 | S : 22510 | SOUTH PORTION : 833 | Median : 0.0000 | Median : 0.000 | Median :1.0 | Median : 50.0 | Median : 0.0000 | Median : 0.0000 | Median : 0.00 | M : 11330 | Median : 0.000 | M : 1994 | JAN : 13889 |
| NA | Mean : 0.9862 | W : 20119 | NORTH PORTION : 780 | Mean : 0.2301 | Mean : 7.503 | Mean :0.9 | Mean : 46.9 | Mean : 0.0168 | Mean : 0.1557 | Mean : 12.06 | 0 : 216 | Mean : 1.527 | k : 21 | LWX : 13174 |
| NA | 3rd Qu.: 0.0000 | E : 20047 | CENTRAL PORTION: 617 | 3rd Qu.: 0.0000 | 3rd Qu.: 0.000 | 3rd Qu.:1.0 | 3rd Qu.: 75.0 | 3rd Qu.: 0.0000 | 3rd Qu.: 0.0000 | 3rd Qu.: 0.50 | B : 40 | 3rd Qu.: 0.000 | 0 : 19 | PHI : 12551 |
| NA | Max. :925.0000 | NE : 14606 | SPRINGFIELD : 575 | Max. :2315.0000 | Max. :4400.000 | Max. :5.0 | Max. :22000.0 | Max. :583.0000 | Max. :1700.0000 | Max. :5000.00 | 5 : 28 | Max. :990.000 | B : 9 | TSA : 12483 |
| NA | NA | (Other): 72096 | (Other) :380536 | NA | NA | NA’s :843563 | NA | NA | NA | NA | (Other): 84 | NA | (Other): 9 | (Other):690738 |
| STATEOFFIC | ZONENAMES | LATITUDE | LONGITUDE | LATITUDE_E | LONGITUDE_ | REMARKS | REFNUM |
|---|---|---|---|---|---|---|---|
| :248769 | :594029 | Min. : 0 | Min. :-14451 | Min. : 0 | Min. :-14455 | :287433 | Min. : 1 |
| TEXAS, North : 12193 | :205988 | 1st Qu.:2802 | 1st Qu.: 7247 | 1st Qu.: 0 | 1st Qu.: 0 | : 24013 | 1st Qu.:225575 |
| ARKANSAS, Central and North Central: 11738 | GREATER RENO / CARSON CITY / M - GREATER RENO / CARSON CITY / M : 639 | Median :3540 | Median : 8707 | Median : 0 | Median : 0 | Trees down. : 1110 | Median :451149 |
| IOWA, Central : 11345 | GREATER LAKE TAHOE AREA - GREATER LAKE TAHOE AREA : 592 | Mean :2875 | Mean : 6940 | Mean :1452 | Mean : 3509 | Several trees were blown down. : 568 | Mean :451149 |
| KANSAS, Southwest : 11212 | JEFFERSON - JEFFERSON : 303 | 3rd Qu.:4019 | 3rd Qu.: 9605 | 3rd Qu.:3549 | 3rd Qu.: 8735 | Trees were downed. : 446 | 3rd Qu.:676723 |
| GEORGIA, North and Central : 11120 | MADISON - MADISON : 302 | Max. :9706 | Max. : 17124 | Max. :9706 | Max. :106220 | Large trees and power lines were blown down. : 432 | Max. :902297 |
| (Other) :595920 | (Other) :100444 | NA’s :47 | NA | NA’s :40 | NA | (Other) :588295 | NA |
We process data first by selecting columns we interested in:
et_sub <- data.frame(data$EVTYPE, data$FATALITIES, data$INJURIES, data$PROPDMG, data$CROPDMG)
pander(summary(et_sub))
| data.EVTYPE | data.FATALITIES | data.INJURIES | data.PROPDMG | data.CROPDMG |
|---|---|---|---|---|
| HAIL :288661 | Min. : 0.0000 | Min. : 0.0000 | Min. : 0.00 | Min. : 0.000 |
| TSTM WIND :219940 | 1st Qu.: 0.0000 | 1st Qu.: 0.0000 | 1st Qu.: 0.00 | 1st Qu.: 0.000 |
| THUNDERSTORM WIND: 82563 | Median : 0.0000 | Median : 0.0000 | Median : 0.00 | Median : 0.000 |
| TORNADO : 60652 | Mean : 0.0168 | Mean : 0.1557 | Mean : 12.06 | Mean : 1.527 |
| FLASH FLOOD : 54277 | 3rd Qu.: 0.0000 | 3rd Qu.: 0.0000 | 3rd Qu.: 0.50 | 3rd Qu.: 0.000 |
| FLOOD : 25326 | Max. :583.0000 | Max. :1700.0000 | Max. :5000.00 | Max. :990.000 |
| (Other) :170878 | NA | NA | NA | NA |
As we see, there are 902297 observations of 985 different weather events, let’s explore how many occurrences of each event there are:
et_cnt <- aggregate(et_sub$data.EVTYPE, by = list(et_sub$data.EVTYPE), FUN = length)
colnames(et_cnt) <- c("Event Type Orig", "Event Count")
pander(head(et_cnt[order(et_cnt$`Event Count`, decreasing = T),], 20))
| Â | Event Type Orig | Event Count |
|---|---|---|
| 244 | HAIL | 288661 |
| 856 | TSTM WIND | 219940 |
| 760 | THUNDERSTORM WIND | 82563 |
| 834 | TORNADO | 60652 |
| 153 | FLASH FLOOD | 54277 |
| 170 | FLOOD | 25326 |
| 786 | THUNDERSTORM WINDS | 20843 |
| 359 | HIGH WIND | 20212 |
| 464 | LIGHTNING | 15754 |
| 310 | HEAVY SNOW | 15708 |
| 290 | HEAVY RAIN | 11723 |
| 972 | WINTER STORM | 11433 |
| 978 | WINTER WEATHER | 7026 |
| 216 | FUNNEL CLOUD | 6839 |
| 490 | MARINE TSTM WIND | 6175 |
| 489 | MARINE THUNDERSTORM WIND | 5812 |
| 936 | WATERSPOUT | 3796 |
| 676 | STRONG WIND | 3566 |
| 919 | URBAN/SML STREAM FLD | 3392 |
| 957 | WILDFIRE | 2761 |
Let’s explore first 100 event type names:
pander(matrix(head(et_cnt$`Event Type Orig`, 120), ncol = 6))
| HIGH SURF ADVISORY | Beach Erosion | BLOWING DUST | COASTAL SURGE | COOL AND WET | DRY MICROBURST |
| COASTAL FLOOD | BEACH EROSION | blowing snow | COASTAL/TIDAL FLOOD | COOL SPELL | DRY MICROBURST 50 |
| FLASH FLOOD | BEACH EROSION/COASTAL FLOOD | Blowing Snow | COASTALFLOOD | CSTL FLOODING/EROSION | DRY MICROBURST 53 |
| LIGHTNING | BEACH FLOOD | BLOWING SNOW | COASTALSTORM | DAM BREAK | DRY MICROBURST 58 |
| TSTM WIND | BELOW NORMAL PRECIPITATION | BLOWING SNOW- EXTREME WIND CHI | Cold | DAM FAILURE | DRY MICROBURST 61 |
| TSTM WIND (G45) | BITTER WIND CHILL | BLOWING SNOW & EXTREME WIND CH | COLD | Damaging Freeze | DRY MICROBURST 84 |
| WATERSPOUT | BITTER WIND CHILL TEMPERATURES | BLOWING SNOW/EXTREME WIND CHIL | COLD AIR FUNNEL | DAMAGING FREEZE | DRY MICROBURST WINDS |
| WIND | Black Ice | BREAKUP FLOODING | COLD AIR FUNNELS | DEEP HAIL | DRY MIRCOBURST WINDS |
| ? | BLACK ICE | BRUSH FIRE | COLD AIR TORNADO | DENSE FOG | DRY PATTERN |
| ABNORMAL WARMTH | BLIZZARD | BRUSH FIRES | Cold and Frost | DENSE SMOKE | DRY SPELL |
| ABNORMALLY DRY | BLIZZARD AND EXTREME WIND CHIL | COASTAL FLOODING/EROSION | COLD AND FROST | DOWNBURST | DRY WEATHER |
| ABNORMALLY WET | BLIZZARD AND HEAVY SNOW | COASTAL EROSION | COLD AND SNOW | DOWNBURST WINDS | DRYNESS |
| ACCUMULATED SNOWFALL | Blizzard Summary | Coastal Flood | COLD AND WET CONDITIONS | DRIEST MONTH | DUST DEVEL |
| AGRICULTURAL FREEZE | BLIZZARD WEATHER | COASTAL FLOOD | Cold Temperature | Drifting Snow | Dust Devil |
| APACHE COUNTY | BLIZZARD/FREEZING RAIN | coastal flooding | COLD TEMPERATURES | DROUGHT | DUST DEVIL |
| ASTRONOMICAL HIGH TIDE | BLIZZARD/HEAVY SNOW | Coastal Flooding | COLD WAVE | DROUGHT/EXCESSIVE HEAT | DUST DEVIL WATERSPOUT |
| ASTRONOMICAL LOW TIDE | BLIZZARD/HIGH WIND | COASTAL FLOODING | COLD WEATHER | DROWNING | DUST STORM |
| AVALANCE | BLIZZARD/WINTER STORM | COASTAL FLOODING/EROSION | COLD WIND CHILL TEMPERATURES | DRY | DUST STORM/HIGH WINDS |
| AVALANCHE | BLOW-OUT TIDE | Coastal Storm | COLD/WIND CHILL | DRY CONDITIONS | DUSTSTORM |
| BEACH EROSIN | BLOW-OUT TIDES | COASTAL STORM | COLD/WINDS | DRY HOT WEATHER | EARLY FREEZE |
We see that there are many duplications, which may result in incorrect calculation results. To group duplicated events we use the following approach:
This approach is implemented in the following function:
eventTypeCleanup <- function(eventTypes) {
result <- c()
# Describe a complete set of distinct weather events
evtype_codes <- c("tornado", "erosion", "tsunami", "hail", "wind", "flood", "avalanche", "rain",
"snow", "heat", "lightning", "cyclone", "winter", "blizzard", "fire", "ice",
"cold", "frost", "freeze", "fog", "drought", "tropical", "slide", "surf",
"dust", "hurricane", "current", "surge", "tide", "thunder", "slump", "spout",
"typhoon", "storm", "glaze", "volcano", "sleet", "precipitation")
# Clean up original event types
eventTypes <- tolower(eventTypes)
eventTypes <- gsub("[^[:alnum:]]", " ", eventTypes)
eventTypes <- gsub("([a-z]+)([s]{1,1})([[:blank:]]|$+)", "\\1\\3", eventTypes)
eventTypes <- gsub("tstmw?", "thunderstorm", eventTypes)
eventTypes <- gsub("([a-z]+)storm", "\\1", eventTypes)
eventTypes <- gsub("[a-z]+snow", "snow", eventTypes)
eventTypes <- gsub("([a-z]+)slides?", "slide", eventTypes)
eventTypes <- gsub("([a-z]+)fall", "\\1", eventTypes)
eventTypes <- gsub("avalan([a-z]+)", "avalanche", eventTypes)
eventTypes <- gsub("([a-z]+)burst([a-z]+)", "wind", eventTypes)
eventTypes <- gsub("((lightning)|([a-z]?))(ing)*", "\\1", eventTypes)
eventTypes <- gsub("precip([a-z])*", "precipitation", eventTypes)
eventTypes <- gsub("([a-z]+)ness", "\\1", eventTypes)
# Replace subgroups of minor event types by more general
eventTypes <- gsub("warm(th)?", "heat", eventTypes)
eventTypes <- gsub("((seiche)|(wave))", "tide", eventTypes)
eventTypes <- gsub("[a-z]+therm[a-z]+", "heat", eventTypes)
eventTypes <- gsub("(high water)|(fld)", "flood", eventTypes)
eventTypes <- gsub("gustnado", "tornado", eventTypes)
eventTypes <- gsub("funnel", "tornado", eventTypes)
eventTypes <- gsub("volcanic", "volcano", eventTypes)
# Remove stopwords
eventTypes <- gsub("land|wild|water|whirl|chill", "", eventTypes)
# Tokenize by splitting into words
eventTypesTokenized <- lapply(eventTypes, stri_extract_all_words, simplify = T)
# Select most appropriate event type from complete set (using string distance algorithm)
for (i in 1:length(eventTypesTokenized)) {
evtype_cand_dist <- c()
evtype_cand_value <- c()
for (j in 1:length(eventTypesTokenized[[i]])) {
word <- eventTypesTokenized[[i]][j]
dist <- unlist(lapply(evtype_codes, stringdist, word))
dist_min <- min(dist)
ndx <- which(dist %in% dist_min)[1]
evtype_cand_dist <- c(evtype_cand_dist, dist_min)
evtype_cand_value <-
c(evtype_cand_value, evtype_codes[ndx])
}
evtype_dist <- min(evtype_cand_dist)
evtype_value <- "other"
if (!is.na(evtype_dist) && evtype_dist < 2)
{
evtype_value <-
evtype_cand_value[which(evtype_cand_dist %in% evtype_dist)][1]
}
result <- c(result, evtype_value)
}
# Generalize even more
result <- gsub("winter|frost|freeze", "cold", result)
result <- gsub("surge|tsunami|current|surf", "tide", result)
result <- gsub("blizzard", "snow", result)
result <- gsub("avalanche", "slide", result)
result <- gsub("sleet|glaze", "ice", result)
result <- gsub("hurricane|typhoon|cyclone|thunder|tropical", "storm", result)
result <- gsub("rain|snow|hail", "precipitation", result)
return(result)
}
Let’s compare original and grouped event types:
et_cnt$`Event Type Group` <- eventTypeCleanup(et_cnt$`Event Type Orig`)
egp1 <- ggplot(et_cnt, aes(x = `Event Type Orig`, y = `Event Count`)) +
geom_bar(stat = "identity") +
theme_minimal()+
theme(
plot.title = element_text(size=8, face="bold"),
axis.text.x= element_text(size=4, angle = 90, hjust = 1)
)
egp2 <- ggplot(et_cnt, aes(x = `Event Type Group`, y = `Event Count`)) +
geom_bar(stat = "identity") +
theme_minimal()+
theme(
plot.title = element_text(size=8, face="bold"),
axis.text.x= element_text(size=10, angle = 90, hjust = 1)
)
grid.arrange(egp1, egp2, nrow=2)
Fig. 1
On Fig. 1 on top we see a count of original 985 weather event types, and on the bottom we see a count of grouped 20 event types.
Now we can aggregate health and financial losses by original weather event types:
et_sum_o <- aggregate(et_sub[, 2:ncol(et_sub)], by = list(et_sub$data.EVTYPE), FUN = sum)
colnames(et_sum_o) <- c("Event Type Orig", "Fatalities Total", "Injuries Total", "Property Damages Total", "Crops Damages Total")
et_sum_o$`Event Type Group` <- eventTypeCleanup(et_sum_o$`Event Type Orig`)
et_sum_g <- aggregate(et_sum_o[, 2:5], by = list(et_sum_o$`Event Type Group`), FUN = sum)
colnames(et_sum_g)[1] <- "Event Type"
et_sum_g$`Event Type` <- factor(et_sum_g$`Event Type`)
pander(head(et_sum_g, 20))
| Event Type | Fatalities Total | Injuries Total | Property Damages Total | Crops Damages Total |
|---|---|---|---|---|
| cold | 732 | 2224 | 165543 | 19069 |
| drought | 2 | 4 | 4099 | 33904 |
| dust | 24 | 483 | 5839 | 2102 |
| erosion | 0 | 0 | 882.2 | 0 |
| fire | 90 | 1608 | 125218 | 9566 |
| flood | 1555 | 8681 | 2460600 | 366893 |
| fog | 81 | 1077 | 17259 | 0 |
| heat | 3185 | 9243 | 3233 | 1477 |
| ice | 111 | 2401 | 75555 | 1695 |
| lightning | 817 | 5232 | 603430 | 3581 |
| other | 40 | 130 | 2903 | 1196 |
| precipitation | 378 | 3664 | 927743 | 596637 |
| slide | 269 | 225 | 21822 | 37 |
| slump | 0 | 0 | 570 | 0 |
| spout | 6 | 72 | 10787 | 0 |
| storm | 982 | 11304 | 2779514 | 218351 |
| tide | 772 | 906 | 10452 | 20 |
| tornado | 5633 | 91367 | 3214829 | 1e+05 |
| volcano | 0 | 0 | 500 | 0 |
| wind | 468 | 1907 | 453723 | 23272 |
Then we aggregate health losses by grouped weather event types:
et_sum_g_h <- data.frame(et_sum_g$`Event Type`, et_sum_g$`Fatalities Total`, et_sum_g$`Injuries Total`,
et_sum_g$`Fatalities Total` + et_sum_g$`Injuries Total`)
colnames(et_sum_g_h) <- c("Event Type", "Fatalities Total", "Injuries Total", "Casualties Total")
et_sum_g_h$`Event Type` <- factor(et_sum_g_h$`Event Type`, levels = et_sum_g_h$`Event Type`, ordered = T)
pander(head(et_sum_g_h[order(et_sum_g_h$`Casualties Total`, decreasing = T), ], 20))
| Â | Event Type | Fatalities Total | Injuries Total | Casualties Total |
|---|---|---|---|---|
| 18 | tornado | 5633 | 91367 | 97000 |
| 8 | heat | 3185 | 9243 | 12428 |
| 16 | storm | 982 | 11304 | 12286 |
| 6 | flood | 1555 | 8681 | 10236 |
| 10 | lightning | 817 | 5232 | 6049 |
| 12 | precipitation | 378 | 3664 | 4042 |
| 1 | cold | 732 | 2224 | 2956 |
| 9 | ice | 111 | 2401 | 2512 |
| 20 | wind | 468 | 1907 | 2375 |
| 5 | fire | 90 | 1608 | 1698 |
| 17 | tide | 772 | 906 | 1678 |
| 7 | fog | 81 | 1077 | 1158 |
| 3 | dust | 24 | 483 | 507 |
| 13 | slide | 269 | 225 | 494 |
| 11 | other | 40 | 130 | 170 |
| 15 | spout | 6 | 72 | 78 |
| 2 | drought | 2 | 4 | 6 |
| 4 | erosion | 0 | 0 | 0 |
| 14 | slump | 0 | 0 | 0 |
| 19 | volcano | 0 | 0 | 0 |
Then we aggregate financial losses by grouped weather event types:
et_sum_g_f <- data.frame(et_sum_g$`Event Type`, et_sum_g$`Property Damages Total`, et_sum_g$`Crops Damages Total`,
et_sum_g$`Property Damages Total` + et_sum_g$`Crops Damages Total`)
colnames(et_sum_g_f) <- c("Event Type", "Property Damages Total", "Crops Damages Total", "All Damages Total")
et_sum_g_f$`Event Type` <- factor(et_sum_g_f$`Event Type`, levels = et_sum_g_f$`Event Type`, ordered = T)
pander(head(et_sum_g_f[order(et_sum_g_f$`All Damages Total`, decreasing = T), ], 20))
| Â | Event Type | Property Damages Total | Crops Damages Total | All Damages Total |
|---|---|---|---|---|
| 18 | tornado | 3214829 | 1e+05 | 3314857 |
| 16 | storm | 2779514 | 218351 | 2997865 |
| 6 | flood | 2460600 | 366893 | 2827492 |
| 12 | precipitation | 927743 | 596637 | 1524379 |
| 10 | lightning | 603430 | 3581 | 607010 |
| 20 | wind | 453723 | 23272 | 476995 |
| 1 | cold | 165543 | 19069 | 184612 |
| 5 | fire | 125218 | 9566 | 134784 |
| 9 | ice | 75555 | 1695 | 77250 |
| 2 | drought | 4099 | 33904 | 38003 |
| 13 | slide | 21822 | 37 | 21859 |
| 7 | fog | 17259 | 0 | 17259 |
| 15 | spout | 10787 | 0 | 10787 |
| 17 | tide | 10452 | 20 | 10472 |
| 3 | dust | 5839 | 2102 | 7940 |
| 8 | heat | 3233 | 1477 | 4710 |
| 11 | other | 2903 | 1196 | 4099 |
| 4 | erosion | 882.2 | 0 | 882.2 |
| 14 | slump | 570 | 0 | 570 |
| 19 | volcano | 500 | 0 | 500 |
To save coding, we will use several custom functions to gather data for presentation and create pie charts:
prepTopByCol <- function(table, colname, topcount){
ordered <- table[order(table[colname], decreasing = T),]
top <- head(ordered, topcount)
result <- data.frame(top$`Event Type`, top[colname])
colnames(result) <- c("Event Type", colname)
offset <- topcount+1
tmp <- ordered[offset:nrow(ordered),]
result[topcount+1, "Event Type"] <- "other"
result[topcount+1, colname] <- unlist(lapply(tmp[colname], sum))
result$`Event Type` <- factor(result$`Event Type`, levels = result$`Event Type`, ordered = T)
result["Percent"] <- result[colname]/sum(result[colname])
result["Lbl"] <- paste(percent(result$`Percent`), " - ", result$`Event Type`)
return(result)
}
blank_theme <- theme_minimal()+
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.border = element_blank(),
panel.grid = element_blank(),
axis.ticks = element_blank(),
plot.title = element_text(size = 9, face = "bold"),
axis.text.x = element_blank(),
legend.title = element_text(size = 9),
legend.text = element_text(size = 9),
legend.position = "bottom",
legend.direction = "vertical",
plot.margin = unit(c(0,0,0,0), "mm")
)
prepPieChart <- function(table, title){
result <- ggplot(table, aes(x = "", y = table$`Percent`, fill = table$`Event Type`)) +
geom_bar(width = 1, stat = "identity", color = "black") +
coord_polar("y", start = 0) +
blank_theme +
scale_fill_discrete(name = "Event type", labels = table$`Lbl`) +
ggtitle(title)
return(result)
}
Now we will present top weather event types, that made biggest losses in human health area:
et_sum_g_h_ft <- prepTopByCol(table = et_sum_g_h, colname = "Fatalities Total", topcount = 10)
et_sum_g_h_it <- prepTopByCol(table = et_sum_g_h, colname = "Injuries Total", topcount = 10)
et_sum_g_h_ct <- prepTopByCol(table = et_sum_g_h, colname = "Casualties Total", topcount = 10)
hftp <- prepPieChart(et_sum_g_h_ft, "By fatalities")
hitp <- prepPieChart(et_sum_g_h_it, "By injuries")
hctp <- prepPieChart(et_sum_g_h_ct, "By all casualties")
grid.arrange(hftp, hitp, hctp, ncol=3, top = "Top 10 weather event types")
Fig. 2
On Fig. 2 we see, that most damages to health are done by tornados, heat, storms, floods, lightning and different types of precipitations.
Now we will present top weather event types, that made biggest losses in financial area:
et_sum_g_f_pt <- prepTopByCol(table = et_sum_g_f, colname = "Property Damages Total", topcount = 10)
et_sum_g_f_ct <- prepTopByCol(table = et_sum_g_f, colname = "Crops Damages Total", topcount = 10)
et_sum_g_f_dt <- prepTopByCol(table = et_sum_g_f, colname = "All Damages Total", topcount = 10)
fptp <- prepPieChart(et_sum_g_f_pt, "By property damages")
fctp <- prepPieChart(et_sum_g_f_ct, "By crops damages")
fdtp <- prepPieChart(et_sum_g_f_dt, "By all damages")
grid.arrange(fptp, fctp, fdtp, ncol = 3, top = "Top 10 weather event types")
Fig. 3
On Fig. 3 we see that generally, the same weather events are making most financial losses, with an exception that crops damages are made mostly by different types of precipitations, floods, heat, and drought.