About Dataset


The data this week comes from The Smithsonian Institution.

Axios put together a lovely plot of volcano eruptions since Krakatoa (after 1883) by elevation and type.

For more information about volcanoes check out the below Wikipedia article or specifically about VEI (Volcano Explosivity Index) see the Wikipedia article here. Lastly, Google Earth has an interactive site on “10,000 Years of Volcanoes”!

More Information About Dataset.

Loading Library and Data


library(tidyverse)
library(maps)
library(knitr)
library(skimr)

eruptions_raw <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-12/eruptions.csv')
volcano_raw <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-12/volcano.csv')

I used several packages to help me do this task. That is Tidyverse, maps, janitor, skimr and knitr.

Data Preparation


eruptions <- eruptions_raw %>%
 mutate(
   vei_status =  case_when(vei == 0 ~ 'non explosive',
                           vei == 1 ~ 'small',
                           vei >= 2 & vei <= 3 ~ 'moderate',
                           vei >= 4 & vei <= 5 ~ 'large',
                           vei >= 6 & vei <= 7 ~ 'very large'),
   start_date = paste(start_year, start_month, start_day, sep = "/") %>% as.Date(),
   end_date = paste(end_year, end_month, end_day, sep = "/") %>% as.Date(),
   durations_days = as.numeric(end_date - start_date)
     ) %>% 
 select(-eruption_number, -start_year, -start_month, -start_day,
     -end_year, -end_month, -end_day)

 
volcano_eruptions <- inner_join(eruptions, volcano_raw, by = c("volcano_number", "volcano_name")) %>%
 select(-c(area_of_activity, latitude.y, longitude.y)) %>%
 mutate(last_eruption_year = parse_number(last_eruption_year)) %>%
 na.omit()

volcano_eruptions %>%
 select(1:5, country) %>%
 head() %>%
 kable()
volcano_number volcano_name eruption_category vei evidence_method_dating country
241040 Whakaari/White Island Confirmed Eruption 2 Historical Observations New Zealand
290240 Sarychev Peak Confirmed Eruption 2 Historical Observations Russia
263310 Tengger Caldera Confirmed Eruption 2 Historical Observations Indonesia
256010 Tinakula Confirmed Eruption 2 Historical Observations Solomon Islands
357040 Planchon-Peteroa Confirmed Eruption 1 Historical Observations Chile
282050 Kuchinoerabujima Confirmed Eruption 3 Historical Observations Japan

Summary


Numeric Variable

volcano_eruptions %>% 
 skim() %>%
 yank('numeric') %>% 
 select(-p0, -p25, -p75, -p100) %>%
 kable() 
skim_variable n_missing complete_rate mean sd p50 hist
volcano_number 0 1 287532.9 43915.96 282110 ▃▇▆▃▂
vei 0 1 1.8 0.94 2 ▆▇▂▁▁
latitude.x 0 1 12.4 28.09 12 ▁▂▇▇▃
longitude.x 0 1 53.1 106.78 109 ▁▃▂▂▇
durations_days 0 1 373.9 2408.67 40 ▇▁▁▁▁
last_eruption_year 0 1 2003.0 51.31 2018 ▁▁▁▁▇
elevation 0 1 2211.6 1235.32 2084 ▁▂▇▆▁
population_within_5_km 0 1 34585.4 182255.84 583 ▇▁▁▁▁
population_within_10_km 0 1 58640.9 195896.50 5682 ▇▁▁▁▁
population_within_30_km 0 1 515226.4 888761.04 199361 ▇▁▁▁▁
population_within_100_km 0 1 4160678.3 6966662.46 1002905 ▇▁▁▁▁

In here, the volcano, number, longitude and latitude attributes can be ignored, because they don’t mean anything.

Character Variable

volcano_eruptions %>% 
 skim() %>%
 yank('character') %>% 
 select(-whitespace, -empty) %>%
 kable() 
skim_variable n_missing complete_rate min max n_unique
volcano_name 0 1 3 32 316
eruption_category 0 1 18 18 2
evidence_method_dating 0 1 10 23 3
vei_status 0 1 5 13 5
primary_volcano_type 0 1 6 19 20
country 0 1 4 32 48
region 0 1 6 30 19
subregion 0 1 4 37 70
tectonic_settings 0 1 35 47 10
evidence_category 0 1 14 17 2
major_rock_1 0 1 6 40 10
major_rock_2 0 1 1 40 11
major_rock_3 0 1 1 40 10
major_rock_4 0 1 1 40 8
major_rock_5 0 1 1 40 7
minor_rock_1 0 1 1 40 9
minor_rock_2 0 1 1 40 10
minor_rock_3 0 1 1 40 5
minor_rock_4 0 1 1 21 2
minor_rock_5 0 1 1 1 1

In this dataset contain 316 volcano/mountain name spread across 48 countries with 5 level volcano explosion index (vei).

Data Exploration


How about volcano explosive index spreding?

world <- map_data("world")

ggplot() +
 geom_map(
  data = world, map = world,
  aes(long, lat, map_id = region),
  color = "white", fill = "gray20", size = 0.01, alpha = 0.1
  ) +
 geom_point(
  data = volcano_eruptions,
  aes(longitude.x, latitude.x, size = vei^5, color = vei_status),
  alpha = 0.4
 ) + 
 guides(size = FALSE) +
 theme_void() + 
 #facet_wrap(~vei, ncol = 2) +
 theme(legend.position = 'bottom')

The spread of vei is relatively spread randomly across the world. But active mountains swarm around Southeast Asia (Indonesia and Philippines), East Asia (Japan, East of Russia), the east coast of the Americas and the Mediterranean continent.

What the name mountain which eruption mostly?

volcano_eruptions %>% 
 group_by(country) %>%
 count(volcano_name) %>%
 arrange(-n) %>%
 head(25) %>%
 ggplot(aes(x = reorder(paste(volcano_name, country, sep = " - "), n), 
       y = n)) +
 geom_bar(stat = 'identity', fill = "steelblue") +
 coord_flip() +
 theme(legend.position = 'none') +
 labs(x = "Mountain Name",
      y = "Number of eruptions")

Mount Fournaise, Piton de la has eruptions 111 times, following by mount Etna in Italy has 104 times and Asosan in Japan 66 times.

How about comparison between volcano eruption evidence categories?

volcano_eruptions %>%
 filter(last_eruption_year > 1800) %>%
 ggplot(aes(last_eruption_year, fill = evidence_category, alpha = 0.5)) +
 geom_density() +
 #facet_wrap(~vei_status, scales = "free") +
 scale_x_continuous(breaks = seq(1800, 2020, 20)) +
 theme(legend.position = "bottom")

Eruption observed mostly after 2000 year increased high compared to eruption dated.

How about distribution of explode duration by vei status?

volcano_eruptions %>%
 select(volcano_name, durations_days, vei_status) %>%
 arrange(-durations_days) %>%
 ggplot()+
 geom_density(aes(durations_days, fill = vei_status, alpha = 0.5)) +
 facet_wrap(~vei_status) +
 scale_x_log10() +
 theme(legend.position = "none") +
 labs(x = "Duration (Days)",
      y = element_blank())

Very large vei status has more duration than other.

Thank You

Amri Rohman.
Sidoarjo, East Java, ID