Volcanic Eruption in Indonesia

Indonesia an archipelago country with more than 13,000 islands is located in the “Pasific ring of fire” a series of seismic fault lines around the ocean of Pacific. There are 127 active volcano spread all over the country from north Sumatra in the west to Banda island in the east. With population over 270 million people, this natural event is one of high risks for the people living in the country.

This notebook is to explore eruption data from the year 1300 until today for deadly eruption, and visualize it to share the story and observe to get better understanding of this natural event.

Install and load packages that we will use:

Create a function for the chart canvas.

fig <- function(width, heigth){
     options(repr.plot.width = width, repr.plot.height = heigth)
}

Load libraries we will use.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(lubridate)
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(skimr)
library(cowplot)
## 
## Attaching package: 'cowplot'
## 
## The following object is masked from 'package:lubridate':
## 
##     stamp
options(warn=-1, message=FALSE)

“── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

✔ ggplot2 3.3.5 ✔ purrr 0.3.4 ✔ tibble 3.1.4 ✔ dplyr 1.0.7 ✔ tidyr 1.1.3 ✔ stringr 1.4.0 ✔ readr 2.0.1 ✔ forcats 0.5.1

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag()

Attaching package: ‘lubridate’

The following objects are masked from ‘package:base’:

date, intersect, setdiff, union

Attaching package: ‘janitor’

The following objects are masked from ‘package:stats’: …

stamp"

Load the data.

Let’s load all data, check what we need for our analysis.

volcano_events <- read_csv("volcano-events-1300-2021.csv", show_col_type=FALSE)

We need to standardize column names.

v_events <- volcano_events %>% clean_names() 
# glimpse(v_events)
# skim_without_charts(v_events)

One important missing data is VEI, according to Newhal and Self, 1982, historic eruptions that were definitely explosive, but carry no other descriptive information are assigned a default VEI of 2.

v_events <- v_events %>% mutate(vei = if_else(is.na(vei), 2, vei),
                               total_deaths = if_else(is.na(total_deaths), 0, total_deaths))
# glimpse(v_events)

Data Observation

Now let’s take a look data from different angles. What is the insight on each angle.

Mount Location

There are 114 eruptions in Java island alone, followed by Nusatenggara, 24 islands. More than 150 people live in Java island.

fig(12, 7)
n_eruptbyisland <- v_events %>% 
  group_by(island) %>% 
  summarize(cnt_mount=n())

ggplot(data=n_eruptbyisland, aes(x=reorder(island, -cnt_mount), y=cnt_mount, fill=island)) +
  geom_bar(stat="identity", width=0.9, color="white") +
  geom_text(aes(label=cnt_mount), position=position_stack(vjust=0.5), size=5) +
  labs(title="Number of Eruptions by Island") +
  xlab("Island") +
  ylab("Number of Mountains") +
  theme(legend.position = "none", text = element_text(size=15))

Top Ten Mountains Eruptions

Show the top ten mountain eruptions.

fig(12, 7)
n_eruptbymountain <- v_events %>% 
  group_by(name) %>% 
  summarise(cnt_bymount=n()) %>% 
  arrange(desc(cnt_bymount))

top10_eruptbymountain <- n_eruptbymountain %>% top_n(10) %>% arrange(desc(cnt_bymount))
## Selecting by cnt_bymount
ggplot(data=top10_eruptbymountain, aes(x=reorder(name, -cnt_bymount), y=cnt_bymount, fill=name)) +
  geom_bar(stat="identity", width=0.9, color="white") +
  scale_x_discrete(guide = guide_axis(angle=45)) +
  geom_text(aes(label=cnt_bymount), position=position_stack(vjust=0.5), size=5) +
  labs(title="Number of Eruption by Mountain") +
  xlab("Mountains Name") +
  ylab("Eruptions") +
  theme(legend.position = "none", text = element_text(size=18))

Selecting by cnt_bymount

Volcanic Explosivity Index (VEI)

A widely used classification scheme to describe the size of explosive eruptions. It is based principally on the erupted mass or volume of a deposit Newhall and Self, 1982. Historic eruptions that were definitely explosive, but carry no other descriptive information are assigned a default VEI of 2. VEI score is from 0 to 8:

VEI General Description Cloud Column Height (km) Volume (m3) Qualitative Description Classification How often Example
0 non-explosive <0.1 1x104 Gentle Hawaiian daily Kilauea
1 Small 0.1-1 1x106 Effusive Haw/Strombolian daily Stromboli
2 Moderate 1-5 1x107 Explosive Strom/Vulcanian weekly Galeras, 1992
3 Moderate-Large 3-15 1x108 Explosive Vulcanian yearly Ruiz, 1985
4 Large 10-25 1x109 Explosive Vulc/Plinian 10’s of years Galunggung, 1982
5 Very Large >25 1x1010 Cataclysmic Plinian 100’s of years St. Helens, 1981
6 >25 km 1x1011 paroxysmal Plin/Ultra-Plinian 100’s of years Krakatau, 1883
7 >25 km 1x1012 colossal Ultra-Plinian 1000’s of years Tambora, 1815
8 >25 km >1x1012 colossal Ultra-Plinian 10,000’s of years Yellowstone, 2 Ma
vei_eruptions <- v_events %>% group_by(vei) %>% summarise(cnt_vei=n())
ggplot(data=vei_eruptions, aes(x=reorder(vei, -cnt_vei), y=cnt_vei, fill=vei)) +
  geom_bar(stat="identity", width=0.8, color="white", position = position_dodge(width = 0.1)) +
  geom_text(aes(label=cnt_vei), position=position_stack(vjust=0.5), size=5) +
  labs(title="VEI Score") +
  xlab("VEI Score") +
  ylab("Eruptions") +
  theme(legend.position = "none", text = element_text(size=18))

Eruptions by Mountain Type

See number of eruptions by mountain type.

n_eruptbytype <- v_events %>% 
  group_by(type) %>% 
  summarise(cnt_bytype=n())
ggplot(data=n_eruptbytype, aes(x=reorder(type, -cnt_bytype), y=cnt_bytype, fill=type)) +
      geom_bar(stat="identity", width=0.8, color="white", position = position_dodge(width = 0.1)) +
      geom_text(aes(label=cnt_bytype), position=position_stack(vjust=0.5), size=5) +
      scale_x_discrete(guide = guide_axis(angle=45)) +
      labs(title="Mountain Type") +
      xlab("Mountains Type") +
      ylab("Number of mountains") +
      theme(legend.position = "none", text = element_text(size=18))

By Mountain Elevation

fig(12,7)
e_mounts <- v_events %>% select(name, elevation_m) %>% distinct()
ggplot(data=e_mounts, aes(x=reorder(name, -elevation_m), y=elevation_m, fill=name)) +
  geom_bar(stat="identity", width=0.8, color="white", position = position_dodge(width = 0.1)) +
  scale_x_discrete(guide = guide_axis(angle=45)) +
  # geom_text(aes(label=elevation_m), position=position_stack(vjust=0.5), size=5) +
  labs(title="Mountain Elevation") +
  xlab("Mountains Name") +
  ylab("Elevation") +
  theme(legend.position = "none", text = element_text(size=15))

Deadly Eruptions

Although extreme eruption with high explosion and causing deaths are rare, we need observe the events.

n_deathspereruptions <- v_events %>% select(name, erupt_date, total_deaths) %>% arrange(desc(total_deaths)) %>% top_n(10)
## Selecting by total_deaths
ggplot(data=n_deathspereruptions, aes(x=reorder(name, -total_deaths), y=total_deaths, fill=name)) +
  geom_bar(stat="identity", width=0.9, color="white") +
  scale_x_discrete(guide = guide_axis(angle=45)) +
  geom_text(aes(label=total_deaths), position=position_stack(vjust=0.5), size=5) +
  labs(title="Total Deaths by Mountain") +
  xlab("Mountains Name") +
  ylab("Deaths") +
  theme(legend.position = "none", text = element_text(size=15))

Deathliest eruption was Mount Tambora on Oct, 8 1822 where killed 60K people, followed by Mount Krakatau on Aug, 27, 1883, and then Kelut erupted twice on New Year 1586 and Mei, 19, 1919, killed 10K and 5K respectively.

Now let us take a look deadly eruptions that happened in the past 30 years.

Normal Distribution Analysis

The histogram chart shows visualization of eruptions based on yearly eruptions.

n_mounts <- v_events %>% select(year) %>% filter(year>1900) 

ggplot(n_mounts, aes(year)) +
  geom_histogram(aes(x=year, y=..density..), bins = 50, fill="#d3d3d3", color="black") +
  stat_function(fun=dnorm, args = list(mean=mean(n_mounts$year), sd=sd(n_mounts$year)), color="red") +
  geom_density(color="blue") +
  ggtitle("Histogram of Eruptions") +
  xlab("Year") +
  ylab("Density") +
  theme_bw() 

Normal distribution chart based on mountains elevation.

e_mounts <- v_events %>% select(elevation_m)
ggplot(e_mounts, aes(elevation_m)) +
  geom_histogram(aes(x=elevation_m, y=..density..), bins = 50, fill="#d3d3d3", color="black") +
  stat_function(fun=dnorm, args = list(mean=mean(e_mounts$elevation_m), sd=sd(e_mounts$elevation_m)), color="red") +
  geom_density(color="blue") +
  ggtitle("Histogram of Eruptions") +
  xlab("Elevation (m)") +
  ylab("Density") +
  theme_bw() 

This chart shows mountain with elevation between 2,000 to 3,000 (m) are most active volcanos.

QQ Analysis

A QQ plot visualizes data based on the quantities of the provided variable against the quantiles that would exist if the data were normally distributed. Data that follows the normal distribution should be in a line with a set slope. Including stat_qq() generates a QQ plot. The following example generates a QQ plot of the eruptions (n_erupts) variable.

y     <- quantile(n_mounts$year, c(0.25, 0.75)) # Find the 1st and 3rd quantiles
x     <- qnorm( c(0.25, 0.75))           # Find the matching normal values on the x-axis
slope <- diff(y) / diff(x)               # Compute the line slope
int   <- y[1] - slope * x[1]             # Compute the line intercept
ggplot(n_mounts, aes(sample = year)) +
  stat_qq() + 
  geom_abline(intercept = int, slope = slope, color = "blue", size = 2, alpha = 0.5 )

y     <- quantile(e_mounts$elevation_m, c(0.25, 0.75)) # Find the 1st and 3rd quantiles
x     <- qnorm( c(0.25, 0.75))           # Find the matching normal values on the x-axis
slope <- diff(y) / diff(x)               # Compute the line slope
int   <- y[1] - slope * x[1]             # Compute the line intercept

ggplot(e_mounts, aes(sample = elevation_m)) +
  stat_qq() + 
  geom_abline(intercept = int, slope = slope, color = "blue", size = 2, alpha = 0.5 )

Box Plot

Now let’s see in box plot by mountain type vs VEI.

p <- ggplot(v_events, aes(island, y=elevation_m))
p + geom_boxplot(fill = "white", colour = "#3366FF") 

Elevation Corelation

This scatter plot chart tells trend of eruptions. X axis is year, the circles are eruptions record, the size tells number of injuries and circle size is death toll. Although it is still debatable argument of how the data is recorded and documented especially for very old records more than 100 years ago. The chart shows eruptions is getting higher in number and intensity.

# Load libraries to make the chart look nicer
library(colorspace) # for darken()
install.packages("devtools")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
devtools::install_github("slowkow/ggrepel")
## Skipping install of 'ggrepel' from a github remote, the SHA1 (e23ff7a3) has not changed since last install.
##   Use `force = TRUE` to force installation
fig(12, 7)
island_cols <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#999999", "#889EAF", "#D4B499")
e_events <- v_events 
ggplot(data=e_events, aes(x=year, y=elevation_m)) +
  geom_point(aes(color = island, fill = island, size = total_deaths), alpha = 0.5, shape = 21) +
  geom_smooth(
        aes(color = "y ~ log(x)", fill = "y ~ log(x)"),
        method = 'loess', formula = y~log(x), se = FALSE, fullrange = TRUE) +
  scale_size_continuous(range = c(3, 10)) +
  scale_color_manual(values = darken(island_cols, 0.5)) +
  scale_fill_manual(values = island_cols) +
  labs(title="Total Eruptions in the Past 30 years")

Eruptions After 1970, zoom the data to show only from 1970 records. Mountains become more active, since the year 1800, however the impact is getting lower, both in the number of deaths as well as injuries.

fig(18, 12)
e_events_1970 <- v_events %>% filter(year>1970 & elevation_m>0)
# e_events_top10 <- e_events_1970 %>% filter(vei>2)
ggplot(data=e_events_1970, aes(x=year, y=elevation_m)) +
  geom_point(aes(color = island, fill = island, size = vei), alpha = 0.5, shape = 21) +
  geom_smooth(aes(color = "y ~ log(x)", fill = "y ~ log(x)", alpha=.1), size=3,
    method = 'loess', formula = y~log(x), se = FALSE, fullrange = TRUE) +
  geom_text(aes(label = name),
    color = "black",
    size = 11/.pt, # font size 9 pt
    point.padding = 0.1, 
    box.padding = .6,
    min.segment.length = 0,
    max.overlaps = 1000,
    seed = 7654
  ) +
  scale_size_continuous(range = c(3, 15)) +
  scale_color_manual(values = darken(island_cols, 0.5)) +
  scale_fill_manual(values = island_cols) +
  scale_x_continuous(
    name = "Eruption Events Year 1970-2021",
  ) +
  scale_y_continuous(
    name = "Elevation (m)",
    limits = c(10, 4100),
    breaks = c(1, 900, 1800, 2700, 3600, 3900),
    expand = c(0, 0)
  ) +
  labs(title="Total Eruptions in the Past 30 years") +
  theme_minimal_hgrid(12, rel_small = 1) +
  theme(
    legend.position = "bottom",
    legend.justification = "right",
    legend.text = element_text(size = 9),
    legend.box.spacing = unit(0, "pt")
  )

fig(12, 7)
island_cols <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#999999", "#889EAF", "#D4B499")
ve_data <- v_events %>% filter(elevation_m>0) %>% select(island, type, name, vei, elevation_m, total_deaths)
ggplot(ve_data, aes(vei, elevation_m)) +
  geom_smooth(aes(color = "y ~ log(x)", fill = "y ~ log(x)"),
        method = 'lm', formula = y~log(x), se = FALSE, fullrange = TRUE) +
  geom_point(aes(color = island, fill = island, size = total_deaths), alpha = 0.5, shape = 21) +
  labs(title="Elevation (m) vs VEI") +
  xlab("VEI") +
  ylab("Elevation (m)") +
  scale_color_manual(values = darken(island_cols, 0.3)) +
  scale_fill_manual(values = island_cols) +
  scale_size_continuous(range = c(3, 21)) +
  guides(size=guide_legend("Total Deaths")) +
  theme_minimal_hgrid(12, rel_small = 1) + 
  theme(
    legend.position = "right",
    legend.justification = "right",
    legend.text = element_text(size = 9),
    legend.box.spacing = unit(3, "pt")
  )

fig(12, 7)
island_cols <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#999999", "#889EAF", "#D4B499")
ve_data <- v_events %>% filter(elevation_m>0 & total_deaths<10000) %>% select(island, type, name, vei, elevation_m, total_deaths)
ggplot(ve_data, aes(total_deaths, elevation_m)) +
  geom_smooth(aes(color = "y ~ log(x)", fill = "y ~ log(x)"),
        method = 'lm', formula = y~log(x), se = FALSE, fullrange = TRUE) +
  geom_point(aes(color = island, fill = island, size = total_deaths), alpha = 0.5, shape = 21) +
  labs(title="Elevation (m) vs Total Deaths") +
  xlab("Total Deaths") +
  ylab("Elevation (m)") +
  scale_color_manual(values = darken(island_cols, 0.3)) +
  scale_fill_manual(values = island_cols) +
  scale_size_continuous(range = c(3, 21)) +
  guides(size=guide_legend("Total Deaths")) +
  guides(colour=guide_legend("Island")) +
  theme_minimal_hgrid(12, rel_small = 1) 

#install.packages("sf")
#install.packages("rworldmap")
#install.packages("rgeos")

library(tidyverse)        # for manipulation of simple features objects
# for getMap()

Loading required package: sp

rgeos version: 0.5-5, (SVN revision 640) GEOS runtime version: 3.8.0-CAPI-1.13.1 Linking to sp version: 1.4-5 Polygon checking: TRUE

Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1

Welcome to rworldmap

For a short introduction type : vignette(‘rworldmap’)

Volcano Map

Data visualization on a map is very important to show where are the objects that are discussed and at the same time visualize more data that fit the map.

require("maps")
## Loading required package: maps
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
fig(20, 9)
island_cols <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#999999", "#889EAF", "#D4B499")
type_cols <- c("#6F69AC", "#95DAC1", "#FFEBA1", "#FD6F96")
mydata <- e_events %>% 
    dplyr::group_by(island, type, name, longitude, latitude) %>%
    dplyr::summarise(sum_erpt = n(), m_vei = mean(vei), m_elev = max(elevation_m), m_deaths = max(total_deaths), .groups="drop") %>%
    select(island, type, name, m_vei, m_elev, longitude, latitude, m_deaths)
global <- map_data("world") # World longitude and latitude data 
ggplot() + 
    geom_polygon(data = global, aes(x=long, y = lat, group = group), 
        fill = "gray85", color = "gray80") + 
    coord_fixed(1.3) +
    xlim(94,142) + ylim(-11,7.5) +
    geom_point(data = mydata, aes(x = longitude, y = latitude, color = type, fill = type, size = m_deaths), 
        alpha = 0.8, show.legend = F, shape=24) +
    geom_text(data = mydata,
        aes(x = longitude, y = latitude, label = name),
        color = "grey30",
        size = 11/.pt, # font size 9 pt
        point.padding = 0.1, 
        box.padding = .6,
        min.segment.length = 0,
        max.overlaps = 1000,
        seed = 7654
      ) +
    ggtitle ("Volcano Eruption") + 
    scale_color_manual(values = darken(type_cols, 0.7)) +
    scale_fill_manual(values = type_cols) +
    scale_size_continuous(range = c(3, 9)) +
    guides(size=guide_legend("Total Deaths")) +
    guides(color=guide_legend("Type")) +
    theme_minimal_hgrid(12, rel_small = 1) + 
    theme(
      legend.position = "bottom",
      legend.justification = "right",
      legend.text = element_text(size = 9),
      legend.box.spacing = unit(3, "pt"))

Loading required package: maps

Attaching package: ‘maps’

The following object is masked from ‘package:purrr’:

map


# Conclusion

From data visualization and statistic method we can conclude from the data we have as follows: * Java has the most volcanic mountians in the country, may be in the world, make this island the most risky island. More than 60% of the country population, or more than 150 million people live in Java. * The more higher elevaltion of a mountain, the more probability of eruption event. * Most of eruptions are low to moderate explosion with no death and minor damage. * Mountain with elevation between 2,000 to 3,000 (m) are most active volcanos. * Mountains become more active, since the year 1800, however the impact is getting lower, both in the number of deaths as well as injuries.

There is a volcanics eruptions competition at Kaggle to predict the next eruption which I plan to develop, hopefully soon. I am still learning machine learning ;-).

5320.jpg
Mount Merapi volcano erupted earlier this year (2021).

Source: Courtesy of theguardian.com

References:
* NCEI/WDS Global Significant Volcanic Eruptions Database, 4360 BC to Present * Personal research for the year 2021 volcanic eruptions, from local online news media. * Visualizing Data, Probability, the Normal Distribution, and Z Scores
* Brief History of Volcano Eruptions in Indonesia, Viz of the day, Tableau Public Gallery.

Notes: Thanks for visiting this notebook. Please let me know if you found any error, incorrect finding, or better method and approach in this notebook on the comment section below.