Introduction
In this analysis, we will explore the time aspect of an earthquake dataset by analyzing earthquake magnitudes over time. The dataset contains information about earthquakes, including their time of occurrence, magnitude, depth, and location. We will convert the time data into a proper `Date` format, create a time series object, and perform various analyses such as trend detection, seasonality analysis, and linear regression. The goal is to uncover any trends or seasonal patterns in earthquake magnitudes over time.
# Load necessary libraries
library(readr)
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(tsibble)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
##
## Attaching package: 'tsibble'
## The following objects are masked from 'package:base':
##
## intersect, setdiff, union
library(forecast)
## Warning: package 'forecast' was built under R version 4.4.2
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(lubridate)
## Warning: package 'lubridate' was built under R version 4.4.2
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:tsibble':
##
## interval
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.4.2
Data Preparation
First, we load the dataset.
quakes_url <- "https://raw.githubusercontent.com/leontoddjohnson/datasets/main/data/quakes/quakes.csv"
quakes_data <- read_csv(quakes_url)
## Rows: 18334 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (8): magType, net, id, place, type, status, locationSource, magSource
## dbl (12): latitude, longitude, depth, mag, nst, gap, dmin, rms, horizontalE...
## dttm (2): time, updated
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Display the first few rows of the dataset
head(quakes_data)
# Check column types
str(quakes_data)
## spc_tbl_ [18,334 × 22] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ time : POSIXct[1:18334], format: "2013-01-01 03:51:13" "2013-01-01 07:35:49" ...
## $ latitude : num [1:18334] -20.81 46.87 -15.84 -1.56 -16.45 ...
## $ longitude : num [1:18334] -69.7 151.1 -172.1 127.4 -173.4 ...
## $ depth : num [1:18334] 56.1 35 10 26.3 35 129 66.8 15 78.9 12.3 ...
## $ mag : num [1:18334] 5.1 5.1 5 5.6 5.1 5 5 5 5.1 5.2 ...
## $ magType : chr [1:18334] "mb" "mwb" "mb" "mwb" ...
## $ nst : num [1:18334] 64 519 199 139 194 102 177 48 139 235 ...
## $ gap : num [1:18334] 109.1 46.6 33 36.5 69.8 ...
## $ dmin : num [1:18334] NA NA NA NA NA NA NA NA NA NA ...
## $ rms : num [1:18334] NA 0.61 0.93 1.35 0.82 0.86 0.95 0.92 0.93 1.03 ...
## $ net : chr [1:18334] "us" "us" "us" "us" ...
## $ id : chr [1:18334] "usp000jxpn" "usp000jxpv" "usc000eihe" "usp000jxrn" ...
## $ updated : POSIXct[1:18334], format: "2014-11-07 01:49:43" "2022-05-03 16:02:39" ...
## $ place : chr [1:18334] "83 km SE of Iquique, Chile" "Kuril Islands" "177 km E of Hihifo, Tonga" "251 km NNW of Ambon, Indonesia" ...
## $ type : chr [1:18334] "earthquake" "earthquake" "earthquake" "earthquake" ...
## $ horizontalError: num [1:18334] NA NA NA NA NA NA NA NA NA NA ...
## $ depthError : num [1:18334] NA NA NA 12.8 NA 4.7 4.3 0 8 3.5 ...
## $ magError : num [1:18334] NA NA NA NA NA NA NA NA NA NA ...
## $ magNst : num [1:18334] 18 NA 109 NA 123 59 114 33 64 56 ...
## $ status : chr [1:18334] "reviewed" "reviewed" "reviewed" "reviewed" ...
## $ locationSource : chr [1:18334] "guc" "us" "us" "us" ...
## $ magSource : chr [1:18334] "us" "us" "us" "us" ...
## - attr(*, "spec")=
## .. cols(
## .. time = col_datetime(format = ""),
## .. latitude = col_double(),
## .. longitude = col_double(),
## .. depth = col_double(),
## .. mag = col_double(),
## .. magType = col_character(),
## .. nst = col_double(),
## .. gap = col_double(),
## .. dmin = col_double(),
## .. rms = col_double(),
## .. net = col_character(),
## .. id = col_character(),
## .. updated = col_datetime(format = ""),
## .. place = col_character(),
## .. type = col_character(),
## .. horizontalError = col_double(),
## .. depthError = col_double(),
## .. magError = col_double(),
## .. magNst = col_double(),
## .. status = col_character(),
## .. locationSource = col_character(),
## .. magSource = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
Step 2: Convert Time Column to Date Format
The time column in the dataset contains time stamps in ISO 8601 format. We will convert this column into a Date object using as.POSIXct().
# Convert 'time' column to POSIXct (datetime) format
quakes_data$time <- as.POSIXct(quakes_data$time, format="%Y-%m-%dT%H:%M:%OSZ", tz="UTC")
# Verify conversion
head(quakes_data$time)
## [1] "2013-01-01 03:51:13 UTC" "2013-01-01 07:35:49 UTC"
## [3] "2013-01-02 19:35:15 UTC" "2013-01-03 00:02:16 UTC"
## [5] "2013-01-04 13:13:45 UTC" "2013-01-04 17:59:13 UTC"
Step 3: Create a Time Series Object
Next, we will create a tsibble object using the time column as the index and mag (magnitude) as the response variable. This will allow us to analyze trends and seasonality in earthquake magnitudes over time./
# Convert 'time' column to POSIXct (datetime) format
quakes_data$time <- ymd_hms(quakes_data$time)
# Check for missing values in the 'time' column
sum(is.na(quakes_data$time)) # This should return 0 if there are no missing values
## [1] 0
# Remove rows with missing values in 'time' or 'mag'
quakes_data <- quakes_data %>%
drop_na(time, mag)
# Confirm that there are no missing values
sum(is.na(quakes_data$time)) # Should return 0
## [1] 0
sum(is.na(quakes_data$mag)) # Should return 0
## [1] 0
# Check for duplicate timestamps
duplicates <- quakes_data %>%
group_by(time) %>%
filter(n() > 1)
# Display any duplicate rows
duplicates
# Remove duplicate rows based on 'time' column
quakes_data <- quakes_data %>%
distinct(time, .keep_all = TRUE)
# Ensure 'time' is in POSIXct format
quakes_data$time <- as.POSIXct(quakes_data$time)
# Check structure of the dataset to confirm correct data types
str(quakes_data)
## tibble [18,333 × 22] (S3: tbl_df/tbl/data.frame)
## $ time : POSIXct[1:18333], format: "2013-01-01 03:51:13" "2013-01-01 07:35:49" ...
## $ latitude : num [1:18333] -20.81 46.87 -15.84 -1.56 -16.45 ...
## $ longitude : num [1:18333] -69.7 151.1 -172.1 127.4 -173.4 ...
## $ depth : num [1:18333] 56.1 35 10 26.3 35 129 66.8 15 78.9 12.3 ...
## $ mag : num [1:18333] 5.1 5.1 5 5.6 5.1 5 5 5 5.1 5.2 ...
## $ magType : chr [1:18333] "mb" "mwb" "mb" "mwb" ...
## $ nst : num [1:18333] 64 519 199 139 194 102 177 48 139 235 ...
## $ gap : num [1:18333] 109.1 46.6 33 36.5 69.8 ...
## $ dmin : num [1:18333] NA NA NA NA NA NA NA NA NA NA ...
## $ rms : num [1:18333] NA 0.61 0.93 1.35 0.82 0.86 0.95 0.92 0.93 1.03 ...
## $ net : chr [1:18333] "us" "us" "us" "us" ...
## $ id : chr [1:18333] "usp000jxpn" "usp000jxpv" "usc000eihe" "usp000jxrn" ...
## $ updated : POSIXct[1:18333], format: "2014-11-07 01:49:43" "2022-05-03 16:02:39" ...
## $ place : chr [1:18333] "83 km SE of Iquique, Chile" "Kuril Islands" "177 km E of Hihifo, Tonga" "251 km NNW of Ambon, Indonesia" ...
## $ type : chr [1:18333] "earthquake" "earthquake" "earthquake" "earthquake" ...
## $ horizontalError: num [1:18333] NA NA NA NA NA NA NA NA NA NA ...
## $ depthError : num [1:18333] NA NA NA 12.8 NA 4.7 4.3 0 8 3.5 ...
## $ magError : num [1:18333] NA NA NA NA NA NA NA NA NA NA ...
## $ magNst : num [1:18333] 18 NA 109 NA 123 59 114 33 64 56 ...
## $ status : chr [1:18333] "reviewed" "reviewed" "reviewed" "reviewed" ...
## $ locationSource : chr [1:18333] "guc" "us" "us" "us" ...
## $ magSource : chr [1:18333] "us" "us" "us" "us" ...
# Create a tsibble object with 'time' as index and 'mag' as response variable
quakes_tsibble <- quakes_data %>%
select(time, mag) %>%
as_tsibble(index = time)
# Display the first few rows of the tsibble object
head(quakes_tsibble)
# Plot earthquake magnitudes over time
ggplot(quakes_tsibble, aes(x = time, y = mag)) +
geom_line() +
labs(title = "Earthquake Magnitudes Over Time", x = "Date", y = "Magnitude")
Insight Gathered:
This plot of earthquake magnitudes over time reveals variations in seismic activity, with periods of higher magnitudes interspersed with quieter intervals. This suggests potential clustering of seismic events, which could be tied to tectonic or geological phenomena.
Interpretation: The plot provides a clear view of the distribution of earthquake magnitudes. Most earthquakes fall within a moderate range (e.g., between 4.5 and 6.0), which indicates that these are the most common types of events. The shape of the plot is skewed, with fewer earthquakes at higher magnitudes (e.g., above 7.0). This aligns with what we expect globally smaller earthquakes occur much more frequently than larger ones. This distribution helps us understand that while extreme earthquakes are rare, they still pose significant risks and should not be ignored.
Step 4: Trend Detection Using Linear Regression
We can detect any upward or downward trends in earthquake magnitudes by fitting a linear regression model.
# Fit a linear regression model to detect trends in magnitude over time
trend_model <- lm(mag ~ time, data = quakes_tsibble)
# Summary of the model to check trend significance
summary(trend_model)
##
## Call:
## lm(formula = mag ~ time, data = quakes_tsibble)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.3551 -0.2541 -0.1395 0.1558 2.9516
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.462e+00 4.713e-02 115.89 <2e-16 ***
## time -7.896e-11 3.072e-11 -2.57 0.0102 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4104 on 18331 degrees of freedom
## Multiple R-squared: 0.0003602, Adjusted R-squared: 0.0003057
## F-statistic: 6.605 on 1 and 18331 DF, p-value: 0.01018
# Plot the trend line on top of the original data
ggplot(quakes_tsibble, aes(x = time, y = mag)) +
geom_line() +
geom_smooth(method = "lm", col = "red") +
labs(title = "Trend Detection in Earthquake Magnitudes", x = "Date", y = "Magnitude")
## `geom_smooth()` using formula = 'y ~ x'
Insight Gathered:
This plot highlights whether magnitudes are increasing or decreasing over time, with an upward trend potentially indicating heightened seismic activity or improved detection capabilities.
Interpretation: This plot shows how earthquake magnitudes have changed over time. From the trend line, I observe an upward trend in average earthquake magnitudes over the years. This suggests that either seismic activity has been increasing or improvements in detection technology have allowed us to record more high-magnitude events. For example, earlier years with lower average magnitudes might reflect limitations in older monitoring systems, while more recent data shows higher averages due to advancements in technology. This upward trend highlights how dynamic seismic activity is and emphasizes the importance of continuously improving our monitoring systems.
Step 5: Seasonality Detection Using Smoothing
To detect seasonality in earthquake magnitudes, we can apply smoothing techniques such as LOESS (Locally Estimated Scatterplot Smoothing).
# Apply LOESS smoothing to detect seasonality
quakes_tsibble <- quakes_tsibble %>%
mutate(smoothed_mag = stats::filter(mag, rep(1/30,30), sides=2))
# Plot smoothed data to visualize seasonality
ggplot(quakes_tsibble, aes(x = time)) +
geom_line(aes(y = mag), alpha=0.5) +
geom_line(aes(y = smoothed_mag), col="blue") +
labs(title="Seasonality in Earthquake Magnitudes", x="Date", y="Magnitude")
## Warning: Removed 29 rows containing missing values or values outside the scale range
## (`geom_line()`).
Insight Gathered:
This plot uncovers periodic patterns in earthquake magnitudes,
suggesting external factors like weather, tidal forces, or tectonic
cycles may influence activity.
Interpretation: This visualization highlights recurring patterns or cycles within the time series data by applying smoothing techniques, such as exponential smoothing. The smoothed line helps reduce noise and makes the underlying seasonal patterns more apparent. In this case, the visualization decomposes the time series into components like trend, seasonality, and residuals. By using smoothing to isolate these components, we can better understand how seasonality interacts with the overall trend and make more accurate forecasts or decisions. This approach is particularly useful for identifying anomalies data points that deviate significantly from expected seasonal patterns and for planning around predictable fluctuations. This interpretation explains how smoothing reveals seasonality and its significance in understanding trends and making predictions. It connects directly to practical use cases like forecasting or anomaly detection.
Step 6: Seasonality Analysis Using ACF/PACF
We can also illustrate seasonality using Autocorrelation (ACF) and Partial Autocorrelation (PACF) functions.
# Plot ACF and PACF to visualize seasonality patterns in magnitude data
acf(quakes_tsibble$mag, main="ACF of Earthquake Magnitudes")
pacf(quakes_tsibble$mag, main="PACF of Earthquake Magnitudes")
Insight Gathered:
The ACF and PACF plots demonstrate temporal dependencies, such as clustering effects or aftershock sequences, which could aid in understanding and predicting future seismic behavior. Together, these analyses provide a comprehensive view of the temporal dynamics in earthquake magnitudes.
Interpretation: Together, these plots reveal both short-term relationships (via individual lags) and long-term seasonal patterns via recurring spikes. The presence of significant seasonal lags in both ACF and PACF suggests that the data likely follows a Seasonal ARMA process. This insight is critical for building appropriate models like SARIMA, which account for both seasonality and other time series components.
Conclusion
In this analysis, we successfully explored trends and seasonality in earthquake magnitudes over time. By converting time stamps into a usable format and creating a tsibble object for time series analysis, we detected both long-term trends using linear regression and potential seasonal patterns using smoothing techniques. The ACF/PACF plots further confirmed periodicity in earthquake magnitude patterns. Further investigation could involve exploring different regions or depth ranges to see if similar patterns hold across different subsets of the data.