library(sp) # Spatial Data Handling
## Warning: package 'sp' was built under R version 4.3.3
library(gstat) # Geostatistics
## Warning: package 'gstat' was built under R version 4.3.3
library(dbscan) # Density-Based Clustering
## Warning: package 'dbscan' was built under R version 4.3.3
##
## Attaching package: 'dbscan'
## The following object is masked from 'package:stats':
##
## as.dendrogram
library(factoextra)# Cluster Validation and Visualization
## Warning: package 'factoextra' was built under R version 4.3.3
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.3.3
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(cluster) # Clustering Algorithms
## Warning: package 'cluster' was built under R version 4.3.3
library(fpc) # Cluster Validation
## Warning: package 'fpc' was built under R version 4.3.3
##
## Attaching package: 'fpc'
## The following object is masked from 'package:dbscan':
##
## dbscan
DBSCAN is a density-based clustering algorithm that groups points based on their density.
DBSCAN: Ideal for finding clusters of varying shapes and densities, commonly used to identify mineral deposit clusters or geological formations.
# Load spatial data (e.g., mineral deposit locations)
data(meuse)
coordinates(meuse) <- ~x+y # Convert to SpatialPointsDataFrame
# Choose DBSCAN parameters (eps: neighborhood distance, minPts: minimum points)
# DBSCAN using the correct function from the 'dbscan' package
db <- dbscan::dbscan(meuse@coords, eps = 200, minPts = 5)
# Visualize results (adjusting to use the new cluster assignments)
plot(meuse, col = db$cluster + 1, pch = 16)
GMM: Models data as a mixture of Gaussian distributions, useful for identifying geochemical anomalies or clusters based on multiple attributes.
library(mclust)
## Warning: package 'mclust' was built under R version 4.3.3
## Package 'mclust' version 6.1.1
## Type 'citation("mclust")' for citing this R package in publications.
# Perform GMM clustering
gmm_model <- Mclust(meuse@data[, c("cadmium", "lead")])
# Visualize results
plot(gmm_model, what = "classification")
Code snippet
# Perform hierarchical clustering
dist_matrix <- dist(meuse@data[, c("cadmium", "lead")])
hc <- hclust(dist_matrix, method = "ward.D2")
# Plot dendrogram
plot(hc, main = "Dendrogram")
Creates a hierarchical tree (dendrogram) representing the relationships between data points. You can cut the tree at a specific height to obtain clusters. Commonly used in stratigraphic correlation and mineral resource classification.
library(kohonen)
## Warning: package 'kohonen' was built under R version 4.3.3
##
## Attaching package: 'kohonen'
## The following object is masked from 'package:mclust':
##
## map
# Train SOM
som_grid <- somgrid(xdim = 5, ydim = 5, topo = "hexagonal")
som_model <- som(scale(meuse@data[, c("cadmium", "lead")]), grid = som_grid)
# Visualize results
plot(som_model, type = "codes")
SOM: Unsupervised neural network that maps high-dimensional data onto a lower-dimensional grid, preserving topological relationships. Helpful for visualizing complex multivariate geological data and identifying patterns.
library(EMCluster)
## Warning: package 'EMCluster' was built under R version 4.3.3
## Loading required package: MASS
## Loading required package: Matrix
The emcluster function doesn’t handle missing data well. Here’s how to address it:
Check for Missing Values:
# Check for missing values in the relevant columns
any(is.na(meuse@data[, c("cadmium", "lead")]))
## [1] FALSE
If you find missing values, you have a few options:
Impute Missing Values:
Mean/Median Imputation:
Replace missing values with the mean or median of the respective column.
KNN Imputation:
Impute missing values using the k-nearest neighbors algorithm.
More Advanced Methods:
Consider multiple imputation or other sophisticated techniques depending on the nature of your data and the missingness pattern.
Here’s an example of mean imputation:
meuse_imputed <- meuse
meuse_imputed@data$cadmium[is.na(meuse_imputed@data$cadmium)] <- mean(meuse_imputed@data$cadmium, na.rm = TRUE)
meuse_imputed@data$lead[is.na(meuse_imputed@data$lead)] <- mean(meuse_imputed@data$lead, na.rm = TRUE)
You May Remove Rows with Missing Values:
meuse_complete <- meuse[!is.na(meuse@data$cadmium) & !is.na(meuse@data$lead), ]
Run emcluster on Complete Data:
# If you removed rows with missing values
scaled_data <- scale(meuse@data[, c("cadmium", "lead")])
em_model <- init.EM(scaled_data, nclass = 3)
# Visualize
plotcluster(meuse@data[, c("cadmium", "lead")], em_model$class)
Assumes data points are generated from a mixture of underlying probability distributions, providing a probabilistic framework for cluster assignment. Often used for classifying rock types or identifying distinct groups in geochemical data.
library(EMCluster)
library(EMCluster, quietly = TRUE)
set.seed(1234)
x1 <- da1$da
emobj <- simple.init(x1, nclass = 10)
emobj <- shortemcluster(x1, emobj)
summary(emobj)
## Method:
## n = 500, p = 2, nclass = 10, flag = , total parameters = 59,
## conv.iter = 12, conv.eps = 0.009409358,
## logL = -5827.1582, AIC = 11772.3164, BIC = 12020.9783.
## nc:
## [1] 10
## pi:
## [1] 0.07731 0.05203 0.01943 0.02477 0.19800 0.02424 0.29374 0.12548 0.02601
## [10] 0.15897
ret <- emcluster(x1, emobj, assign.class = TRUE)
summary(ret)
## Method:
## n = 500, p = 2, nclass = 10, flag = , total parameters = 59,
## conv.iter = 56, conv.eps = 8.541177e-07,
## logL = -5775.3087, AIC = 11668.6174, BIC = 11917.2793.
## nc:
## [1] 40 16 38 14 99 28 132 45 13 75
## pi:
## [1] 0.07765 0.03079 0.07566 0.02125 0.19800 0.05803 0.27991 0.08170 0.02491
## [10] 0.15211
Standardize or normalize your data if necessary, especially for distance-based methods.
Carefully choose the appropriate parameters for each algorithm (e.g., eps and minPts for DBSCAN).
Use techniques like the Silhouette analysis or the Gap statistic to assess the quality of your clustering results.