Eruptions Project 2

Eruptions Around The World

Volcano Erupting

Source: Scientific American

Introduction to dataset

For my dataset I chose a very interesting one. The one I chose is a historic volcano eruption dataset. This dataset goes all the way back to 4360 BC to present and shows a multitude of data for each eruption. There is so many variables so I will explain them by grouping them together. The first set of variables is the time variables which is both year and month, these show the time the earthquake happened. Next variable group is tsu and erq these show the data on the related tsunami or earthquake if one exists. Next variable group is the name variable which shows the names of each volcano. Next group is the location variables which shows the location the volcano these variables are location and country and longitude and latitude. Next you have the elevation of the volcano variable which shows the elevation of the volcano. Then you have the type variable which shows the type of volcano it is. Then you have the VEI variable which shows the VEI rating of a eruption or the size of the volcanic eruption. Then you have the agent variable which shows how the people died from the volcano (exp. is A = Avalanche). Then you have the final variable group which in total shows the destruction of the volcano from deaths to property damage.

Background Research On Data Set And Personal Interest

I have always been really interested in volcano’s so I decided I will use this data set which shows volcano eruptions over the years. What really interests me about this data set is the sheer range of eruptions it provides, from BCE to present.

Now the background research I did was mostly about how VEI or Volcanic Explosivity Index is measured. Which according to the National Park Services is a measurement on how much volcanic magma and debris is released from a volcano during eruption. It can range from a score of 1-8 and its very similar to how other disasters are rated that being the higher the number the more intense it is.

#install.packages("readr")
#install.packages("DataExplorer")
library(viridis)
Loading required package: viridisLite
library(tidyverse)
Warning: package 'readr' was built under R version 4.3.3
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
library(readr)
library(highcharter)
Warning: package 'highcharter' was built under R version 4.3.3
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
Highcharts (www.highcharts.com) is a Highsoft software product which is
not free for commercial and Governmental use
library(RColorBrewer)
volcano <- read_tsv("volcano-events-2024-04-11_11-41-30_-0400.tsv")
Rows: 878 Columns: 35
── Column specification ────────────────────────────────────────────────────────
Delimiter: "\t"
chr  (6): Search Parameters, Name, Location, Country, Type, Agent
dbl (29): Year, Mo, Dy, Tsu, Eq, Latitude, Longitude, Elevation (m), VEI, De...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data Filtering

For My data there is a lot Id like to filter out. First I want to select only the columns im using so I remove out the non total damages and their descriptions and I also remove the search variable and the related earthquake and tsunami variables. I then remove all the total damage descriptions and finally I want to remove all the data that has Na’s for type and VEI and Total deaths as those are the three values I’ll be using the most.

Also I want to make the dates be month then year in BC or AD

Eruptions <- volcano [-1] |>
  select(-4,-5,-(15:24))
  
EruptionsNoNA <- Eruptions|>
  filter(!is.na(Type) & !is.na(VEI) & !is.na(`Total Death Description`))
#Here I filter out all unneeded data for linear regression and remove the Na's from the values im using for the linear regression

EruptionsNoNA2 <- Eruptions|>
  filter(!is.na(Type) & !is.na(VEI))

#and here I filter out the NA's in Type and VEI

Linear Regression For VEI vs Total Deaths

Below is my linear Reg Graph for how VEI effects total Deaths. I do want to remove every point that has deaths under 100.

ggplot(EruptionsNoNA, aes(x = VEI, y= `Total Death Description`)) + labs(title = "Volcanic Explosivity Index Rating vs Total Death \nDescription", 
caption = "Source: NOAA") + 
xlab("Volcanic Explosivity Rating") +
ylab("Total Damage Description (1-4)") +
theme_minimal(base_size = 12) + 
geom_point() + 

geom_smooth(method='lm',formula=y~x) 

Now the total death description is the number of deaths but more simplified its from 1 to 4 and describes how many deaths there was 1 being very few and 4 being a lot I chose this over total deaths as total deaths didn’t include old earthquakes while this gives a wider range of data to work with. This does hurt the plot visually but I will make it up with explaining the relationship.

cor(EruptionsNoNA$VEI, EruptionsNoNA$`Total Death Description`)
[1] 0.3535339

There is a very small correlation here

fit1 <- lm(`Total Death Description` ~ VEI,data=EruptionsNoNA)
summary(fit1)

Call:
lm(formula = `Total Death Description` ~ VEI, data = EruptionsNoNA)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.6741 -0.7286 -0.4134  0.5866  3.2169 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.78307    0.11471   6.827 2.64e-11 ***
VEI          0.31517    0.03814   8.263 1.41e-15 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9643 on 478 degrees of freedom
Multiple R-squared:  0.125, Adjusted R-squared:  0.1232 
F-statistic: 68.28 on 1 and 478 DF,  p-value: 1.409e-15

My corelation is 0.3535 which is a very small corelation and my equation for this linear reg would be.

Total Death Description = 0.31517(VEI) + 0.78307

This equation means that for every increase in total death description the VEI will increase by 0.31517 and the intercept would be 0.78307.

The adjusted R square value shows that about 12% of the variation in observation is explained by the model this means that 88% of the variation is likely not explained.

Plotting on Highcharter

For High charter I want to plot each volcano eruption by year and total Deaths so I will filter out those bellow:

Now I will make my high charter graph.

HighcharterData <- Eruptions |>
  filter(!is.na(Year) & !is.na(`Total Deaths`) & !is.na(VEI))
highchart() |>
  hc_add_series(data= HighcharterData,
                type = "scatter",
                hcaes(x = Year , y = `Total Deaths`, group = VEI)) |>
  hc_colors(brewer.pal(8,"Reds")) |>
  hc_chart(backgroundColor = "#444545") |>
  hc_title(text = "Total Number Of Deaths From Volcanos Throught The Years") |>
  hc_caption(text = "Source: NOAA") |>
  hc_xAxis(title = list(text= "Year")) |>
  hc_yAxis(title = list(text= "Number Of Deaths")) |>
  hc_plotOptions(series = list(marker = list(symbol = "circle"))) |>
  hc_tooltip(borderColor = "white", pointFormat = "Volcano Name: {point.Name} <br>
             Country: {point.Country} <br>
             Year: {point.Year} <br
             Volcano Type: {point.Type} <br>
             VEI (Volcanic Explosivity Index) Score: {point.VEI} <br>
             Total Deaths: {point.Total Deaths}")

Closing Thoughts

My graph above shows how the number of deaths from volcanos have changed over the years. I made each VEI score its own color and a pattern I see a lot on this graph is that a lot more volcano explode now then earlier but that may just be due to lack of proper documentation in the past.

I really wish that I could of done a leaflet graph instead of a highcharter and I did try several times to make it happen but it always failed which really sucks as I feel this data will benefit greatly from being visualized on a map. My main issue with using leaflet is that I couldn’t figure out how to color code based on my variables.

Bibliography: https://www.nps.gov/subjects/volcanoes/volcanic-explosivity-index.htm#:~:text=The%20Volcanic%20Explosivity%20Index%20(VEI,based%20on%20magnitude%20and%20intensity.