Indonesia an archipelago country with more than 13,000 islands is located in the “Pasific ring of fire” a series of seismic fault lines around the ocean of Pacific. There are 127 active volcano spread all over the country from north Sumatra in the west to Banda island in the east. With population over 270 million people, this natural event is one of high risks for the people living in the country.
This notebook is to explore eruption data from the year 1300 until today for deadly eruption, and visualize it to share the story and observe to get better understanding of this natural event.
Install and load packages that we will use:
Create a function for the chart canvas.
fig <- function(width, heigth){
options(repr.plot.width = width, repr.plot.height = heigth)
}
Load libraries we will use.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(lubridate)
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(skimr)
library(cowplot)
##
## Attaching package: 'cowplot'
##
## The following object is masked from 'package:lubridate':
##
## stamp
options(warn=-1, message=FALSE)
“── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
✔ ggplot2 3.3.5 ✔ purrr 0.3.4 ✔ tibble 3.1.4 ✔ dplyr 1.0.7 ✔ tidyr 1.1.3 ✔ stringr 1.4.0 ✔ readr 2.0.1 ✔ forcats 0.5.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag()
Attaching package: ‘lubridate’
The following objects are masked from ‘package:base’:
date, intersect, setdiff, union
Attaching package: ‘janitor’
The following objects are masked from ‘package:stats’: …
stamp"
Let’s load all data, check what we need for our analysis.
volcano_events <- read_csv("volcano-events-1300-2021.csv", show_col_type=FALSE)
We need to standardize column names.
v_events <- volcano_events %>% clean_names()
# glimpse(v_events)
# skim_without_charts(v_events)
One important missing data is VEI, according to Newhal and Self, 1982, historic eruptions that were definitely explosive, but carry no other descriptive information are assigned a default VEI of 2.
v_events <- v_events %>% mutate(vei = if_else(is.na(vei), 2, vei),
total_deaths = if_else(is.na(total_deaths), 0, total_deaths))
# glimpse(v_events)
Now let’s take a look data from different angles. What is the insight on each angle.
There are 114 eruptions in Java island alone, followed by Nusatenggara, 24 islands. More than 150 people live in Java island.
fig(12, 7)
n_eruptbyisland <- v_events %>%
group_by(island) %>%
summarize(cnt_mount=n())
ggplot(data=n_eruptbyisland, aes(x=reorder(island, -cnt_mount), y=cnt_mount, fill=island)) +
geom_bar(stat="identity", width=0.9, color="white") +
geom_text(aes(label=cnt_mount), position=position_stack(vjust=0.5), size=5) +
labs(title="Number of Eruptions by Island") +
xlab("Island") +
ylab("Number of Mountains") +
theme(legend.position = "none", text = element_text(size=15))
Show the top ten mountain eruptions.
fig(12, 7)
n_eruptbymountain <- v_events %>%
group_by(name) %>%
summarise(cnt_bymount=n()) %>%
arrange(desc(cnt_bymount))
top10_eruptbymountain <- n_eruptbymountain %>% top_n(10) %>% arrange(desc(cnt_bymount))
## Selecting by cnt_bymount
ggplot(data=top10_eruptbymountain, aes(x=reorder(name, -cnt_bymount), y=cnt_bymount, fill=name)) +
geom_bar(stat="identity", width=0.9, color="white") +
scale_x_discrete(guide = guide_axis(angle=45)) +
geom_text(aes(label=cnt_bymount), position=position_stack(vjust=0.5), size=5) +
labs(title="Number of Eruption by Mountain") +
xlab("Mountains Name") +
ylab("Eruptions") +
theme(legend.position = "none", text = element_text(size=18))
Selecting by cnt_bymount
A widely used classification scheme to describe the size of explosive eruptions. It is based principally on the erupted mass or volume of a deposit Newhall and Self, 1982. Historic eruptions that were definitely explosive, but carry no other descriptive information are assigned a default VEI of 2. VEI score is from 0 to 8:
| VEI | General Description | Cloud Column Height (km) | Volume (m3) | Qualitative Description | Classification | How often | Example |
|---|---|---|---|---|---|---|---|
| 0 | non-explosive | <0.1 | 1x104 | Gentle | Hawaiian | daily | Kilauea |
| 1 | Small | 0.1-1 | 1x106 | Effusive | Haw/Strombolian | daily | Stromboli |
| 2 | Moderate | 1-5 | 1x107 | Explosive | Strom/Vulcanian | weekly | Galeras, 1992 |
| 3 | Moderate-Large | 3-15 | 1x108 | Explosive | Vulcanian | yearly | Ruiz, 1985 |
| 4 | Large | 10-25 | 1x109 | Explosive | Vulc/Plinian | 10’s of years | Galunggung, 1982 |
| 5 | Very Large | >25 | 1x1010 | Cataclysmic | Plinian | 100’s of years | St. Helens, 1981 |
| 6 | >25 km | 1x1011 | paroxysmal | Plin/Ultra-Plinian | 100’s of years | Krakatau, 1883 | |
| 7 | >25 km | 1x1012 | colossal | Ultra-Plinian | 1000’s of years | Tambora, 1815 | |
| 8 | >25 km | >1x1012 | colossal | Ultra-Plinian | 10,000’s of years | Yellowstone, 2 Ma |
vei_eruptions <- v_events %>% group_by(vei) %>% summarise(cnt_vei=n())
ggplot(data=vei_eruptions, aes(x=reorder(vei, -cnt_vei), y=cnt_vei, fill=vei)) +
geom_bar(stat="identity", width=0.8, color="white", position = position_dodge(width = 0.1)) +
geom_text(aes(label=cnt_vei), position=position_stack(vjust=0.5), size=5) +
labs(title="VEI Score") +
xlab("VEI Score") +
ylab("Eruptions") +
theme(legend.position = "none", text = element_text(size=18))
See number of eruptions by mountain type.
n_eruptbytype <- v_events %>%
group_by(type) %>%
summarise(cnt_bytype=n())
ggplot(data=n_eruptbytype, aes(x=reorder(type, -cnt_bytype), y=cnt_bytype, fill=type)) +
geom_bar(stat="identity", width=0.8, color="white", position = position_dodge(width = 0.1)) +
geom_text(aes(label=cnt_bytype), position=position_stack(vjust=0.5), size=5) +
scale_x_discrete(guide = guide_axis(angle=45)) +
labs(title="Mountain Type") +
xlab("Mountains Type") +
ylab("Number of mountains") +
theme(legend.position = "none", text = element_text(size=18))
fig(12,7)
e_mounts <- v_events %>% select(name, elevation_m) %>% distinct()
ggplot(data=e_mounts, aes(x=reorder(name, -elevation_m), y=elevation_m, fill=name)) +
geom_bar(stat="identity", width=0.8, color="white", position = position_dodge(width = 0.1)) +
scale_x_discrete(guide = guide_axis(angle=45)) +
# geom_text(aes(label=elevation_m), position=position_stack(vjust=0.5), size=5) +
labs(title="Mountain Elevation") +
xlab("Mountains Name") +
ylab("Elevation") +
theme(legend.position = "none", text = element_text(size=15))
Although extreme eruption with high explosion and causing deaths are rare, we need observe the events.
n_deathspereruptions <- v_events %>% select(name, erupt_date, total_deaths) %>% arrange(desc(total_deaths)) %>% top_n(10)
## Selecting by total_deaths
ggplot(data=n_deathspereruptions, aes(x=reorder(name, -total_deaths), y=total_deaths, fill=name)) +
geom_bar(stat="identity", width=0.9, color="white") +
scale_x_discrete(guide = guide_axis(angle=45)) +
geom_text(aes(label=total_deaths), position=position_stack(vjust=0.5), size=5) +
labs(title="Total Deaths by Mountain") +
xlab("Mountains Name") +
ylab("Deaths") +
theme(legend.position = "none", text = element_text(size=15))
Deathliest eruption was Mount Tambora on Oct, 8 1822 where killed 60K
people, followed by Mount Krakatau on Aug, 27, 1883, and then Kelut
erupted twice on New Year 1586 and Mei, 19, 1919, killed 10K and 5K
respectively.
Now let us take a look deadly eruptions that happened in the past 30 years.
The histogram chart shows visualization of eruptions based on yearly eruptions.
n_mounts <- v_events %>% select(year) %>% filter(year>1900)
ggplot(n_mounts, aes(year)) +
geom_histogram(aes(x=year, y=..density..), bins = 50, fill="#d3d3d3", color="black") +
stat_function(fun=dnorm, args = list(mean=mean(n_mounts$year), sd=sd(n_mounts$year)), color="red") +
geom_density(color="blue") +
ggtitle("Histogram of Eruptions") +
xlab("Year") +
ylab("Density") +
theme_bw()
Normal distribution chart based on mountains elevation.
e_mounts <- v_events %>% select(elevation_m)
ggplot(e_mounts, aes(elevation_m)) +
geom_histogram(aes(x=elevation_m, y=..density..), bins = 50, fill="#d3d3d3", color="black") +
stat_function(fun=dnorm, args = list(mean=mean(e_mounts$elevation_m), sd=sd(e_mounts$elevation_m)), color="red") +
geom_density(color="blue") +
ggtitle("Histogram of Eruptions") +
xlab("Elevation (m)") +
ylab("Density") +
theme_bw()
This chart shows mountain with elevation between 2,000 to 3,000 (m) are most active volcanos.
A QQ plot visualizes data based on the quantities of the provided variable against the quantiles that would exist if the data were normally distributed. Data that follows the normal distribution should be in a line with a set slope. Including stat_qq() generates a QQ plot. The following example generates a QQ plot of the eruptions (n_erupts) variable.
y <- quantile(n_mounts$year, c(0.25, 0.75)) # Find the 1st and 3rd quantiles
x <- qnorm( c(0.25, 0.75)) # Find the matching normal values on the x-axis
slope <- diff(y) / diff(x) # Compute the line slope
int <- y[1] - slope * x[1] # Compute the line intercept
ggplot(n_mounts, aes(sample = year)) +
stat_qq() +
geom_abline(intercept = int, slope = slope, color = "blue", size = 2, alpha = 0.5 )
y <- quantile(e_mounts$elevation_m, c(0.25, 0.75)) # Find the 1st and 3rd quantiles
x <- qnorm( c(0.25, 0.75)) # Find the matching normal values on the x-axis
slope <- diff(y) / diff(x) # Compute the line slope
int <- y[1] - slope * x[1] # Compute the line intercept
ggplot(e_mounts, aes(sample = elevation_m)) +
stat_qq() +
geom_abline(intercept = int, slope = slope, color = "blue", size = 2, alpha = 0.5 )
Now let’s see in box plot by mountain type vs VEI.
p <- ggplot(v_events, aes(island, y=elevation_m))
p + geom_boxplot(fill = "white", colour = "#3366FF")
This scatter plot chart tells trend of eruptions. X axis is year, the circles are eruptions record, the size tells number of injuries and circle size is death toll. Although it is still debatable argument of how the data is recorded and documented especially for very old records more than 100 years ago. The chart shows eruptions is getting higher in number and intensity.
# Load libraries to make the chart look nicer
library(colorspace) # for darken()
install.packages("devtools")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
devtools::install_github("slowkow/ggrepel")
## Skipping install of 'ggrepel' from a github remote, the SHA1 (e23ff7a3) has not changed since last install.
## Use `force = TRUE` to force installation
fig(12, 7)
island_cols <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#999999", "#889EAF", "#D4B499")
e_events <- v_events
ggplot(data=e_events, aes(x=year, y=elevation_m)) +
geom_point(aes(color = island, fill = island, size = total_deaths), alpha = 0.5, shape = 21) +
geom_smooth(
aes(color = "y ~ log(x)", fill = "y ~ log(x)"),
method = 'loess', formula = y~log(x), se = FALSE, fullrange = TRUE) +
scale_size_continuous(range = c(3, 10)) +
scale_color_manual(values = darken(island_cols, 0.5)) +
scale_fill_manual(values = island_cols) +
labs(title="Total Eruptions in the Past 30 years")
Eruptions After 1970, zoom the data to show only from 1970 records. Mountains become more active, since the year 1800, however the impact is getting lower, both in the number of deaths as well as injuries.
fig(18, 12)
e_events_1970 <- v_events %>% filter(year>1970 & elevation_m>0)
# e_events_top10 <- e_events_1970 %>% filter(vei>2)
ggplot(data=e_events_1970, aes(x=year, y=elevation_m)) +
geom_point(aes(color = island, fill = island, size = vei), alpha = 0.5, shape = 21) +
geom_smooth(aes(color = "y ~ log(x)", fill = "y ~ log(x)", alpha=.1), size=3,
method = 'loess', formula = y~log(x), se = FALSE, fullrange = TRUE) +
geom_text(aes(label = name),
color = "black",
size = 11/.pt, # font size 9 pt
point.padding = 0.1,
box.padding = .6,
min.segment.length = 0,
max.overlaps = 1000,
seed = 7654
) +
scale_size_continuous(range = c(3, 15)) +
scale_color_manual(values = darken(island_cols, 0.5)) +
scale_fill_manual(values = island_cols) +
scale_x_continuous(
name = "Eruption Events Year 1970-2021",
) +
scale_y_continuous(
name = "Elevation (m)",
limits = c(10, 4100),
breaks = c(1, 900, 1800, 2700, 3600, 3900),
expand = c(0, 0)
) +
labs(title="Total Eruptions in the Past 30 years") +
theme_minimal_hgrid(12, rel_small = 1) +
theme(
legend.position = "bottom",
legend.justification = "right",
legend.text = element_text(size = 9),
legend.box.spacing = unit(0, "pt")
)
fig(12, 7)
island_cols <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#999999", "#889EAF", "#D4B499")
ve_data <- v_events %>% filter(elevation_m>0) %>% select(island, type, name, vei, elevation_m, total_deaths)
ggplot(ve_data, aes(vei, elevation_m)) +
geom_smooth(aes(color = "y ~ log(x)", fill = "y ~ log(x)"),
method = 'lm', formula = y~log(x), se = FALSE, fullrange = TRUE) +
geom_point(aes(color = island, fill = island, size = total_deaths), alpha = 0.5, shape = 21) +
labs(title="Elevation (m) vs VEI") +
xlab("VEI") +
ylab("Elevation (m)") +
scale_color_manual(values = darken(island_cols, 0.3)) +
scale_fill_manual(values = island_cols) +
scale_size_continuous(range = c(3, 21)) +
guides(size=guide_legend("Total Deaths")) +
theme_minimal_hgrid(12, rel_small = 1) +
theme(
legend.position = "right",
legend.justification = "right",
legend.text = element_text(size = 9),
legend.box.spacing = unit(3, "pt")
)
fig(12, 7)
island_cols <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#999999", "#889EAF", "#D4B499")
ve_data <- v_events %>% filter(elevation_m>0 & total_deaths<10000) %>% select(island, type, name, vei, elevation_m, total_deaths)
ggplot(ve_data, aes(total_deaths, elevation_m)) +
geom_smooth(aes(color = "y ~ log(x)", fill = "y ~ log(x)"),
method = 'lm', formula = y~log(x), se = FALSE, fullrange = TRUE) +
geom_point(aes(color = island, fill = island, size = total_deaths), alpha = 0.5, shape = 21) +
labs(title="Elevation (m) vs Total Deaths") +
xlab("Total Deaths") +
ylab("Elevation (m)") +
scale_color_manual(values = darken(island_cols, 0.3)) +
scale_fill_manual(values = island_cols) +
scale_size_continuous(range = c(3, 21)) +
guides(size=guide_legend("Total Deaths")) +
guides(colour=guide_legend("Island")) +
theme_minimal_hgrid(12, rel_small = 1)
#install.packages("sf")
#install.packages("rworldmap")
#install.packages("rgeos")
library(tidyverse) # for manipulation of simple features objects
# for getMap()
Loading required package: sp
rgeos version: 0.5-5, (SVN revision 640) GEOS runtime version: 3.8.0-CAPI-1.13.1 Linking to sp version: 1.4-5 Polygon checking: TRUE
Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
For a short introduction type : vignette(‘rworldmap’)
Data visualization on a map is very important to show where are the objects that are discussed and at the same time visualize more data that fit the map.
require("maps")
## Loading required package: maps
##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
fig(20, 9)
island_cols <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#999999", "#889EAF", "#D4B499")
type_cols <- c("#6F69AC", "#95DAC1", "#FFEBA1", "#FD6F96")
mydata <- e_events %>%
dplyr::group_by(island, type, name, longitude, latitude) %>%
dplyr::summarise(sum_erpt = n(), m_vei = mean(vei), m_elev = max(elevation_m), m_deaths = max(total_deaths), .groups="drop") %>%
select(island, type, name, m_vei, m_elev, longitude, latitude, m_deaths)
global <- map_data("world") # World longitude and latitude data
ggplot() +
geom_polygon(data = global, aes(x=long, y = lat, group = group),
fill = "gray85", color = "gray80") +
coord_fixed(1.3) +
xlim(94,142) + ylim(-11,7.5) +
geom_point(data = mydata, aes(x = longitude, y = latitude, color = type, fill = type, size = m_deaths),
alpha = 0.8, show.legend = F, shape=24) +
geom_text(data = mydata,
aes(x = longitude, y = latitude, label = name),
color = "grey30",
size = 11/.pt, # font size 9 pt
point.padding = 0.1,
box.padding = .6,
min.segment.length = 0,
max.overlaps = 1000,
seed = 7654
) +
ggtitle ("Volcano Eruption") +
scale_color_manual(values = darken(type_cols, 0.7)) +
scale_fill_manual(values = type_cols) +
scale_size_continuous(range = c(3, 9)) +
guides(size=guide_legend("Total Deaths")) +
guides(color=guide_legend("Type")) +
theme_minimal_hgrid(12, rel_small = 1) +
theme(
legend.position = "bottom",
legend.justification = "right",
legend.text = element_text(size = 9),
legend.box.spacing = unit(3, "pt"))
Loading required package: maps
Attaching package: ‘maps’
The following object is masked from ‘package:purrr’:
map
# Conclusion
From data visualization and statistic method we can conclude from the data we have as follows: * Java has the most volcanic mountians in the country, may be in the world, make this island the most risky island. More than 60% of the country population, or more than 150 million people live in Java. * The more higher elevaltion of a mountain, the more probability of eruption event. * Most of eruptions are low to moderate explosion with no death and minor damage. * Mountain with elevation between 2,000 to 3,000 (m) are most active volcanos. * Mountains become more active, since the year 1800, however the impact is getting lower, both in the number of deaths as well as injuries.
There is a volcanics eruptions competition at Kaggle to predict the next eruption which I plan to develop, hopefully soon. I am still learning machine learning ;-).

Mount Merapi volcano erupted earlier this year
(2021).
Source: Courtesy of theguardian.com
References:
* NCEI/WDS
Global Significant Volcanic Eruptions Database, 4360 BC to Present *
Personal research for the year 2021 volcanic eruptions, from local
online news media. * Visualizing
Data, Probability, the Normal Distribution, and Z Scores
* Brief
History of Volcano Eruptions in Indonesia, Viz of the day, Tableau
Public Gallery.
Notes: Thanks for visiting this notebook. Please let me know if you found any error, incorrect finding, or better method and approach in this notebook on the comment section below.