Hierarchical Clustering Analysis of Simulated Carbon Sequestration Data

My Analysis of Carbon Sequestration Patterns Using Hierarchical Clustering

In my recent study on carbon sequestration, I was driven by the need to understand how different regions contribute to carbon storage. My main goal was to map out the effectiveness of various ecosystems or forest types in sequestering carbon, which is pivotal for crafting informed environmental policies and enhancing conservation efforts

After applying hierarchical clustering to the carbon sequestration data and visualizing the results through a cluster plot, I have managed to discern some clear patterns and relationships among the 200 regions based on their carbon sequestration characteristics. Here’s how I interpret these findings:

Cluster 1 (Red Region): This cluster, primarily in the upper right of the plot, includes regions like 163, 113, 129, and 96. The regions in this cluster tend to cluster tightly together, indicating similar characteristics regarding soil carbon levels, vegetation density, and annual carbon intake. Given its position along the higher ends of both dimensions, this cluster might represent regions with high carbon sequestration potential.

Cluster 2 (Blue Region): Regions such as 164, 139, and 149 are in this cluster, located towards the bottom left of the plot. These regions show a distinct separation from others, likely indicating lower scores in the variables considered. The spread and positioning suggest variability in carbon sequestration performance, possibly due to differing soil carbon levels or vegetation densities.

Cluster 3 (Green Region): This cluster covers the middle portion of the plot and includes a diverse mix of regions like 102, 33, 76, and 174. The spread is moderate, suggesting a moderate level of similarity among the regions in terms of the carbon sequestration parameters. This might be indicative of average to good carbon sequestration capabilities.

Cluster 4 (Purple Region): Located on the far right, this cluster includes regions such as 200, 135, and 194. These regions are characterized by their position on the higher end of Dim1, possibly suggesting they have higher annual carbon intake rates or greater vegetation density, factors that are critical for higher carbon sequestration.

Dimension Contributions: Dim1 (36.5%) and Dim2 (33.9%) together explain a substantial 70.4% of the variability in the dataset, highlighting the importance of these dimensions in understanding the regional differences in carbon sequestration capabilities.

The clear spatial separation between the clusters, particularly between clusters 1 and 2, and clusters 3 and 4, underscores significant differences in carbon sequestration characteristics. These differences are statistically significant, suggesting distinct ecological zones or management practices that could be investigated further.

The tight grouping in Cluster 1 and the more spread out nature of Clusters 2 and 3 indicate varying degrees of homogeneity within each cluster. Cluster 1’s tight grouping suggests very similar carbon sequestration characteristics among its regions, which could be due to similar environmental conditions or parallel conservation practices.

# I'm loading the necessary libraries for data manipulation and plotting.
library(tidyverse)  # I use this for data manipulation and ggplot for plotting.

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(cluster)    # I employ this for the clustering algorithms.
library(factoextra) # I utilize this for enhanced cluster visualization.

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

# I'm simulating a larger dataset for carbon sequestration across 200 regions.
set.seed(123)  # I set a seed for reproducibility.
carbon_data <- tibble(
  Region = 1:200,  # I increased the number of regions to enhance the dataset.
  SoilCarbon = rnorm(200, mean = 50, sd = 10),  # Simulating soil carbon data.
  VegetationDensity = runif(200, min = 20, max = 80),  # Uniform distribution for vegetation density.
  AnnualCarbonIntake = rnorm(200, mean = 200, sd = 50)  # Normal distribution for annual carbon intake.
)

# I'm scaling the data to normalize the influence of each variable on the clustering process.
scaled_carbon_data <- scale(carbon_data[-1])  # Excluding the 'Region' column for scaling.

# I'm performing hierarchical clustering using the Euclidean distance and complete linkage method.
hc <- hclust(dist(scaled_carbon_data), method = "complete")  # 'Complete' linkage for the most dissimilar merges.

# I'm visualizing the dendrogram to scrutinize the clustering results.
plot(hc, hang = -1, labels = carbon_data$Region, main = "Dendrogram of Carbon Sequestration Data", xlab = "Regions", ylab = "Distance")

# I'm also interested in visualizing specific clusters, so I cut the tree at four clusters.
clusters <- cutree(hc, k = 4)  # I decide to examine the structure within four clusters.
fviz_cluster(list(data = scaled_carbon_data, cluster = clusters)) +
  ggtitle("Cluster Plot of Carbon Sequestration Data") +
  theme_minimal()  # I apply a minimal theme for a clean visual presentation.

Hierarchical Clustering Analysis of Simulated Carbon Sequestration Data

Avery Holloman

2025-01-03