On April 3, 2024, an earthquake with a magnitude of 7.2 on the Richter scale struck eastern Taiwan, caused by fault movement. The shaking lasted for about 98 seconds. Aftershock activity continued over the next few days, leading to the collapse of major roads in the eastern region and varying degrees of damage to bridges and elevated rail lines. During this period, Taiwan issued two national-level alerts, resulting in 18 fatalities and thousands of injuries. To further analyze the locations and seismic depths of the earthquakes from April 2 to April 7, I plan to use clustering analysis methods.
# Load require library
library(ggplot2)
library(tidyverse)
library(factoextra)
library(cluster)
library(wesanderson)
library(dbscan)
library(plotly)
library(gridExtra)
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)
library(tmap)
The data was provided by USGS earthquake data, i selected the earthquake data occured in Taiwan from April 2nd to April 7th.
query<-read.csv("/Users/ninalin/Desktop/query.csv")
view(query)
str(query)
## 'data.frame': 76 obs. of 5 variables:
## $ time : chr "2024-04-07T22:49:30.435Z" "2024-04-07T10:24:38.426Z" "2024-04-07T02:06:00.792Z" "2024-04-06T21:15:08.953Z" ...
## $ latitude : num 24.2 24.1 24.2 24 24.2 ...
## $ longitude: num 122 122 122 122 122 ...
## $ depth : num 37.5 15.6 16 18.9 33.8 ...
## $ mag : num 4.6 4.5 4.5 4.6 4.8 5.2 4.7 4.9 4.5 4.9 ...
summary(query)
## time latitude longitude depth
## Length:76 Min. :23.73 Min. :121.4 Min. : 6.723
## Class :character 1st Qu.:23.88 1st Qu.:121.7 1st Qu.:16.843
## Mode :character Median :24.04 Median :121.7 Median :26.235
## Mean :24.02 Mean :121.7 Mean :25.274
## 3rd Qu.:24.17 3rd Qu.:121.8 3rd Qu.:35.000
## Max. :24.29 Max. :121.9 Max. :45.326
## mag
## Min. :4.500
## 1st Qu.:4.500
## Median :4.750
## Mean :4.847
## 3rd Qu.:5.000
## Max. :7.400
set.seed(123)
loc_data <- scale(query[, c("latitude", "longitude")])
loc_elbow<-fviz_nbclust(loc_data, kmeans, method = "wss") +ggtitle("Elbow Method")
loc_silhouette<- fviz_nbclust(loc_data, kmeans, method = "silhouette") +ggtitle("Silhouette Method")
grid.arrange(loc_elbow,loc_silhouette, ncol=2, top="Optimal Number of Clusters")
Used both elbow method and silhouette to compare the optimal number of cluster. The graph indicates that the optimal number of cluster is 4.
set.seed(123)
location_data <- scale(query[, c("latitude", "longitude")])
kmeans_result2<-kmeans(location_data,centers = 4)
query$cluster2<-kmeans_result2$cluster
km_location<-eclust(query[, c("longitude","latitude")], k=4, FUNcluster = "kmeans", hc_metric = "euclidean", graph = F)
fviz_cluster(km_location, data = query[, c("longitude","latitude")],geom = "point", ellipse.type = "convex")+labs(title="Earthquake Location with Kmeans",x = "Latitude", y = "Longitude")+theme_minimal()
world_data <- ne_countries(scale = "medium", returnclass = "sf")
taiwan <- world_data %>%
filter(name == "Taiwan")
ggplot(data = taiwan)+ geom_sf(fill = "gray", color= "black")+geom_point(data= query , aes(x = longitude, y = latitude, color = cluster2), size = 2)+ggtitle("Earthquake Clustering on Taiwan Map")+labs(x = "longitude", y="latitude", color="cluster")
taiwan <- ne_countries(scale = "medium", returnclass = "sf") %>%
dplyr::filter(name == "Taiwan")
tmap_mode("view")
## tmap mode set to interactive viewing
query_sf <- st_as_sf(query, coords = c("longitude", "latitude"), crs = 4326)
tm_shape(taiwan) + tm_polygons(col = "lightblue", border.col = "black") + tm_shape(query_sf) + tm_dots(col = "cluster2", palette = wes_palette("GrandBudapest2", 4), size = 0.1) + tm_layout(title = "K-means clustering on Taiwan Map")
Perform the result on the real world map
The clusters indicates the majority of the earthquakes location, which are concentrated along Taiwan’s eastern coastline and offshore area. By using K-means clustering, we can know the distinct region that are high-risk seismic activity zone. This can further provide an insights into seismic distribution patterns, to prevent further damage.
scaled_magnitude<- scale(query$mag)
fviz_nbclust(as.data.frame(scaled_magnitude), kmeans, method =
"wss")+labs(title= "Elbow Method For Optimal K for Magnitude")
Number set for optimal K
set.seed(123)
k<-3
kmeans_result<-kmeans(scaled_magnitude, centers = k, nstart =25)
query$magnitude_cluster<-as.factor(kmeans_result$cluster)
ggplot(query, aes(x=mag, y=0, color = magnitude_cluster))+geom_jitter(width = 0.1, height = 0.1, size=2, alpha=0.8)+scale_color_manual(values = palette(wes_palette("GrandBudapest2", 4)))
table(kmeans_result$cluster)#to check the exact data
##
## 1 2 3
## 45 29 2
A total of 45 earthquakes below magnitude 4.9 form the pink cluster. Another 29 earthquakes, with magnitudes ranging from 4.9 to 5.7, make up the purple cluster. Finally, two major earthquakes with magnitudes of 6.4 and 7.4 respectively, represent the brown cluster.
silhouette_magnitude<- silhouette(kmeans_result$cluster, dist(scaled_magnitude))
fviz_silhouette(silhouette_magnitude)+labs(title="Silhouette Plot of Magnitude Performance", x="Cluster", y="Silhouette Width")
## cluster size ave.sil.width
## 1 1 45 0.72
## 2 2 29 0.52
## 3 3 2 0.40
Visualize shadow statistics to evaluate the performance.
Cluster 1 reach 0.72 average silhouette width, indicates a well performance of the cluster, the cluster are tightly grouped Cluster 2 with overlap points and boundary points so only reach 0.52 average silhouette width. Cluster 3 was influence by the outlier(major earthquake), so average silhouette width is 0.4.
scaled_depth<- scale(query$depth)
fviz_nbclust(as.data.frame(scaled_depth), kmeans, method="wss")+labs(title= "Elbow Method For Depth")
set.seed(123)
k<-2
kmeans_result_depth<-kmeans(scaled_depth, centers = k, nstart = 25)
query$depth_cluster<-as.factor(kmeans_result_depth$cluster)
ggplot(query, aes(x= depth, y = 0, color= depth_cluster))+geom_jitter(width= 0.1, height=0.1, size= 2, alpha= 0.8)+scale_color_manual(values= palette(wes_palette("GrandBudapest2", 2)))+labs(title="K-means Clustering of Earthquake Depth", x= "Depth" , y="Clusters", color="Cluster")
There are two clusters identified in the depth data.
Cluster 1 (pink Cluster) comprises 33 data points, representing shallow to intermediate-depth earthquakes with focal depths ranging from 6.7 to 24.9 km. On the other hand, Cluster 2 (purple Cluster) includes 43 data points, corresponding to intermediate-depth earthquakes with focal depths between 25.0 and 45.3 km.
silhouette_depth<-silhouette(kmeans_result_depth$cluster, dist(scaled_depth))
fviz_silhouette(silhouette_depth)+labs(title="Silhouette Plot of Depth Performance", x="Cluster", y="Silhouette Width")
## cluster size ave.sil.width
## 1 1 33 0.65
## 2 2 43 0.65
Visualize shadow statistics to evaluate the performance.
Both of the clusters reach 0.65 average width, indicate a moderate performance. The data points are correctly grouped and clear within each other.
Based on the earlier outcomes, the data is primarily concentrated in the lower magnitude range. Therefore, I chose DBSCAN for the combined analysis of earthquake magnitude and depth. Since the distribution of earthquakes is highly uneven and the dataset includes extreme magnitudes, I believe DBSCAN is better suited to capture the significant density clustering characteristics of the earthquakes.
depth_magnitude_data<-query[, c("depth", "mag")]
scaled_data<-scale(depth_magnitude_data)
kNNdistplot(scaled_data, k=4)
abline(h= 0.5, col="red", lty=2)
dbscan_result<-dbscan(scaled_data, eps=0.43, minPts = 5)
query$dbscan_cluster<-as.factor(dbscan_result$cluster)
ggplot(query, aes(x= mag, y= depth, color=dbscan_cluster))+geom_point(size=2, alpha=0.8)+scale_color_manual(values= c("0"="darkgray", "1"="pink", "2"="purple", "3"= "lightblue"))
print(dbscan_result)
## DBSCAN clustering for 76 objects.
## Parameters: eps = 0.43, minPts = 5
## Using euclidean distances and borderpoints = TRUE
## The clustering contains 3 cluster(s) and 15 noise points.
##
## 0 1 2 3
## 15 44 12 5
##
## Available fields: cluster, eps, minPts, metric, borderPoints
###Check whether the noise ratio is accurate and below 20%.
noise_ratio<-summary(query$dbscan_cluster)/nrow(query)
noise_ratio
## 0 1 2 3
## 0.19736842 0.57894737 0.15789474 0.06578947
##Ratio Distribution
##Cluster 0: 19.7%-noise
##Cluster 1: 57.9%
##Cluster 2: 15.8%
##Cluster 3: 6.6%
The clustering distribution appears reasonable, with the majority of the data concentrated in the primary cluster (57.9%). The noise ratio is moderate (19.7%), which may effectively capture sparse regions or extreme earthquake events.
###Check the values of each cluster.
aggregate(query[, c("depth", "mag")], by = list(Cluster =
query$dbscan_cluster), FUN = function(x) c(mean = mean(x), sd = sd(x)))
## Cluster depth.mean depth.sd mag.mean mag.sd
##1 0 21.200333 12.385837 5.34666667 0.74341554
##2 1 30.580455 5.848346 4.73181818 0.20089651
##3 2 17.212667 3.393173 4.51666667 0.03892495
##4 3 10.141200 2.316420 5.16000000 0.08944272
By comparing the data and the graph, we can infer that the noise cluster originates from the distribution of the main shock and other shallow earthquakes with low depth and magnitude. Cluster 1 represents deep earthquakes with magnitudes concentrated around 4.7, Cluster 2 corresponds to intermediate-depth earthquakes, while Cluster 3 consists of shallow earthquakes with relatively higher magnitudes.
Create a 3D plot to visualize the relationship between Magnitude, Depth, and Cluster assignments, providing a clear representation of how the data points are distributed across the three dimensions.
data<-tibble(magnitude= query$mag, depth = query$depth, cluster = query$dbscan_cluster)
plot_ly(data,
x = ~magnitude,
y = ~depth,
z = ~cluster,
color = ~as.factor(cluster),
palette = wes_palette("GrandBudapest2", 4)) %>%
add_markers(size = 4) %>%
layout(
title = "3D Scatter Plot of Depth and Magnitude",
scene = list(
xaxis = list(title = "Magnitude"),
yaxis = list(title = "Depth"),
zaxis = list(title = "Cluster")
)
)
Taiwan located at the collision zone between the Eurasian Plate and the Philippine Sea Plate, which result in frequent earthquake. These earthquakes vary in magnitude and depth, but they often cause serious and unpredictable damage to infrastructure and lives.
For this analysis,I used both K-means and DBSCAN clustering to investigate the seismic activity over a seven-day period, which provided insights into general regional clustering and effective in capturing denser seismic clusters and detecting outliers.
This research contributes valuable information of how these earthquakes are distributed in Taiwan and offers insights that could be useful for future research. It also provides ways to improve disaster preparedness and strategies for earthquake preparedness and mitigation in high-risk areas.
Sources:
https://medium.com/saralkarki/earthquake-cluster-analysis-k-means-approach-cdb2bf6cb21b https://www.nature.com/articles/s41467-020-17841-x