Image from Source: Charles MacLean, “Understanding Scotch Malt
Whisky” (The Council of Whiskey Masters)
This project explores spatial clustering patterns of operational
Scotch whisky distilleries.
By combining geographic coordinates (spatial features) with qualitative
taste profiles (body, sweetness, smoky, medicinal, etc.), this project
aims to discover if regional geographic proximity correlates with
distinct flavor profiles.
This dataset is an enhanced version of data sourced from https://www.mathstat.strath.ac.uk/outreach/nessie/nessie_whisky.html
which contains the location of all currently operational Whisky
distilleries in Scotland.
In addition to the geolocation information, the dataset contains
information on the nature of the whisky giving a score for body,
sweetness, smoky, medicinal, tabacco, honey, Spicy, Winey, nutty, malty,
fruity, floral nature of each distillery. Only one score per distillery
is provided to it is assumed that this is a generalisation of the whisky
that each distillery offers.
Source: https://datashare.ed.ac.uk/handle/10283/2592. Citation:
Pope, Addy. (2017). Scotch Whisky characteristics, [Dataset]. University
of Edinburgh. https://doi.org/10.7488/ds/1942.
# Read the shapefile
whisky <- st_read("data/Whisky.shp")
## Reading layer `Whisky' from data source
## `/Users/engkahhui/Downloads/sem4/SpatialLearning/project/data/Whisky.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 86 features and 17 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 126680 ymin: 554260 xmax: 381020 ymax: 1009260
## Projected CRS: OSGB36 / British National Grid
# Inspect the dataset structure
glimpse(whisky)
## Rows: 86
## Columns: 18
## $ RowID_ <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
## $ Distillery <chr> "Aberfeldy", "Aberlour", "AnCnoc", "Ardbeg", "Ardmore", "Ar…
## $ Body <dbl> 2, 3, 1, 4, 2, 2, 0, 2, 2, 2, 4, 3, 4, 2, 3, 2, 1, 2, 2, 1,…
## $ Sweetness <dbl> 2, 3, 3, 1, 2, 3, 2, 3, 2, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 1,…
## $ Smoky <dbl> 2, 1, 2, 4, 2, 1, 0, 1, 1, 2, 2, 1, 2, 1, 2, 2, 1, 2, 3, 2,…
## $ Medicinal <dbl> 0, 0, 0, 4, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2,…
## $ Tobacco <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ Honey <dbl> 2, 4, 2, 0, 1, 1, 1, 2, 1, 0, 2, 3, 2, 2, 3, 2, 0, 1, 2, 2,…
## $ Spicy <dbl> 1, 3, 0, 2, 1, 1, 1, 1, 0, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2,…
## $ Winey <dbl> 2, 2, 0, 0, 1, 1, 0, 2, 0, 0, 3, 1, 0, 0, 1, 1, 1, 2, 1, 1,…
## $ Nutty <dbl> 2, 2, 2, 1, 2, 0, 2, 2, 2, 2, 3, 0, 2, 0, 2, 2, 0, 2, 1, 2,…
## $ Malty <dbl> 2, 3, 2, 2, 3, 1, 2, 2, 2, 1, 0, 2, 2, 2, 3, 2, 2, 2, 1, 2,…
## $ Fruity <dbl> 2, 3, 3, 1, 1, 1, 3, 2, 2, 2, 1, 2, 2, 3, 2, 2, 2, 2, 1, 2,…
## $ Floral <dbl> 2, 2, 2, 0, 1, 2, 3, 1, 2, 1, 2, 2, 2, 2, 2, 2, 3, 2, 2, 2,…
## $ Postcode <chr> "PH15 2EB", "AB38 9PJ", "AB5 5LI", "PA42 7EB", "AB54 4NH", …
## $ Easting <dbl> 286580, 326340, 352960, 141560, 355350, 194050, 247670, 340…
## $ Northing <dbl> 749680, 842570, 839320, 646220, 829140, 649950, 672610, 848…
## $ geometry <POINT [m]> POINT (286580 749680), POINT (326340 842570), POINT (…
Image from Source: Charles MacLean, “Understanding Scotch Malt
Whisky” (The Council of Whiskey Masters)
This dataset uses British National Grid (EPSG:27700), using meters
rather than raw Lat/Lon degrees. The shapefile does not have
Region column, but there are Postcode and
geometry fully intact.
Hence, the 5 traditional Scotch regions are approximated based on their geographic coordinates (Easting and Northing) and famous layout boundaries (like the “Islay” island coordinates or the “Lowland-Highland” line):
# Engineering the Traditional Whisky Regions using Geographic Rules
whisky <- whisky %>%
mutate(
Region = case_when(
Easting < 150000 ~ "Islay",
Easting < 210000 & Northing < 630000 ~ "Campbeltown",
Northing < 700000 ~ "Lowlands",
Easting >= 300000 & Easting <= 350000 & Northing >= 820000 & Northing <= 860000 ~ "Speyside",
TRUE ~ "Highlands"
),
Region = as.factor(Region)
)
# Check for missing values across flavor columns
flavor_cols <- c("Body", "Sweetness", "Smoky", "Medicinal", "Tobacco",
"Honey", "Spicy", "Winey", "Nutty", "Malty", "Fruity", "Floral")
missing_counts <- colSums(is.na(whisky %>% st_drop_geometry() %>% select(all_of(flavor_cols))))
print("Missing values per flavor column:")
## [1] "Missing values per flavor column:"
print(missing_counts)
## Body Sweetness Smoky Medicinal Tobacco Honey Spicy Winey
## 0 0 0 0 0 0 0 0
## Nutty Malty Fruity Floral
## 0 0 0 0
# Map showing traditional assigned regions across Scotland
ggplot(data = whisky) +
geom_sf(aes(color = Region), size = 3, alpha = 0.8) +
scale_color_brewer(palette = "Set1") +
theme_minimal() +
labs(
title = "Geographic Distribution of Scotch Whisky Regions",
subtitle = "Constructed using geographic boundaries",
x = "Easting (meters)",
y = "Northing (meters)"
)
# Boxplot to see how a flavor (e.g., Smoky) is distributed across regions
ggplot(data = whisky) +
geom_boxplot(aes(x = Region, y = Smoky, fill = Region)) +
theme_minimal() +
scale_fill_brewer(palette = "Set1") +
labs(title = "Distribution of 'Smoky' Flavor by Region")
One of my top 1 favourite whiskey is from Islay.
Look at Islay, its median smokiness is way up at 3.5, completely
detached from the rest of Scotland. Meanwhile, Lowlands and Speyside
group tightly at the bottom around 1.
This strongly suggests that flavor profiles are not randomly
distributed across space, making it a perfect candidate for spatial
autocorrelation tests.
K-Nearest Neighbors (KNN) graph will be built instead of a fixed-distance radius, which is not appropriate to be applied to Scotland because the Highlands are massive and sparse while Speyside is tiny and packed, thus, KNN guarantees that every distillery is evaluated against its closest peers.
k = 4 as is used as a standard baseline to generate a spatial weights matrix, and then run Global Moran’s I tests.
Moran’s I Value: - Positive value (near +1): Positive spatial
autocorrelation (similar values cluster together—like Islay’s high smoke
or Speyside’s low smoke).
- Zero (near 0): Complete spatial randomness.
- Negative value (near -1): Dispersion, spread out, scattered, or pushed
away from each other rather than being grouped together.
p-value < 0.05: Statistically significant evidence of spatial clustering.
KNN Network and Spatial Weights Matrix are constructed because Scottish distilleries exhibit highly uneven spatial density, K-Nearest Neighbors approach is used (\(k = 4\)) to establish local neighborhood contexts, then convert it into a row-standardized (“W”) spatial weights matrix.
# install.packages("spdep")
library(spdep)
# Extract geographic coordinates from the Projected CRS (British National Grid)
coords <- st_coordinates(whisky)
# Find the 4 nearest neighbors for each distillery point
whisky_knn <- knearneigh(coords, k = 4)
whisky_nb <- knn2nb(whisky_knn)
# Create row-standardized spatial weights
whisky_weights <- nb2listw(whisky_nb, style = "W")
# neighborhood network structure
# use st_geometry() to draw the base map
plot(st_geometry(whisky), col = "darkgray", main = "4-Nearest Neighbors Network Graph")
# Force R to immediately overlay the lines onto the active plot canvas
plot(whisky_nb, coords, add = TRUE, col = "red", lwd = 1.5)
During the neighbor construction in the approach above, warnings indicated that identical point coordinates (co-located facilities) and the presence of 2 sub-graphs. The sub-graphs accurately reflect the geographic isolation of island-based distilleries from the Scottish mainland network.
The warning simpleWarning: knearneigh: identical points found means that there are two or more distilleries in the dataset share the exact same geographic coordinates (Easting and Northing).
In Scotland, some massive distilling operations own multiple boutique distilleries on the exact same site, or the original data collector mapped a few close neighbors using a single village center point.
The Moran’s I Test calculates the formal spatial statistics for the three chosen flavor profiles: Smoky, Fruity, and Floral to evaluate whether these whisky taste characteristics are clustered geographically, completely random, or dispersed.
# Test 1: Smoky Flavor
print("--- Moran's I: Smoky ---")
## [1] "--- Moran's I: Smoky ---"
moran_smoky <- moran.test(whisky$Smoky, whisky_weights)
print(moran_smoky)
##
## Moran I test under randomisation
##
## data: whisky$Smoky
## weights: whisky_weights
##
## Moran I statistic standard deviate = 7.0852, p-value = 6.94e-13
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.464783566 -0.011764706 0.004523809
# Test 2: Fruity Flavor
print("--- Moran's I: Fruity ---")
## [1] "--- Moran's I: Fruity ---"
moran_fruity <- moran.test(whisky$Fruity, whisky_weights)
print(moran_fruity)
##
## Moran I test under randomisation
##
## data: whisky$Fruity
## weights: whisky_weights
##
## Moran I statistic standard deviate = 2.6994, p-value = 0.003473
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.171020041 -0.011764706 0.004585086
# Test 3: Floral Flavor
print("--- Moran's I: Floral ---")
## [1] "--- Moran's I: Floral ---"
moran_floral <- moran.test(whisky$Floral, whisky_weights)
print(moran_floral)
##
## Moran I test under randomisation
##
## data: whisky$Floral
## weights: whisky_weights
##
## Moran I statistic standard deviate = 1.4936, p-value = 0.06765
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.089165419 -0.011764706 0.004566672
# install.packages("broom")
# Create a clean data frame extracting the exact statistics from results
moran_results_df <- data.frame(
`Flavor Profile` = c("Smoky", "Fruity", "Floral"),
`Observed Moran's I` = c(
moran_smoky$estimate[1],
moran_fruity$estimate[1],
moran_floral$estimate[1]
),
`Expectation` = c(
moran_smoky$estimate[2],
moran_fruity$estimate[2],
moran_floral$estimate[2]
),
`Z-Score (Standard Deviate)` = c(
moran_smoky$statistic,
moran_fruity$statistic,
moran_floral$statistic
),
`p-value` = c(
moran_smoky$p.value,
moran_fruity$p.value,
moran_floral$p.value
),
`Spatial Pattern` = c(
"Strong Significant Clustering",
"Moderate Significant Clustering",
"Statistically Random"
),
check.names = FALSE # Preserves the clean column titles with spaces
)
# 2. Render the data frame as a beautifully formatted markdown table
kable(
moran_results_df,
digits = 4, # Rounds all numeric decimals cleanly to 4 places
caption = "Global Moran's I Spatial Autocorrelation Results for Selected Whisky Flavors",
align = 'lccccc' # Left-aligns flavor names, centers all statistical metrics
)
| Flavor Profile | Observed Moran’s I | Expectation | Z-Score (Standard Deviate) | p-value | Spatial Pattern |
|---|---|---|---|---|---|
| Smoky | 0.4648 | -0.0118 | 7.0852 | 0.0000 | Strong Significant Clustering |
| Fruity | 0.1710 | -0.0118 | 2.6994 | 0.0035 | Moderate Significant Clustering |
| Floral | 0.0892 | -0.0118 | 1.4936 | 0.0676 | Statistically Random |
The Global Moran’s I tests reveal that the geographical location of a distillery has a varying, profile-specific impact on the flavor characteristics of Scotch whisky.
The test for the Smoky flavor profile shows a strong
positive spatial autocorrelation, with an Observed Moran’s I of
0.465 and an exceptionally high Z-score of
7.085 (\(p <
0.001\)). This confirms that smoky whiskies are not randomly
distributed across Scotland; instead, distilleries with high smoky
scores are tightly clustered near other high-smoky operations. This is
heavily driven by the localized concentration of heavily-peated
production methods on islands like Islay.
The Fruity flavor profile yields a moderate but
statistically significant positive spatial autocorrelation (Moran’s I =
0.171, Z-score = 2.699, \(p = 0.003\)). This indicates a regional
preference or a shared production topology among neighboring
distilleries (such as the widespread use of specific yeast strains or
tall copper stills in areas like Speyside) that creates localized
pockets of fruity taste profiles.
Interestingly, the Floral profile fails to clear the
standard \(95\%\) confidence threshold
(Moran’s I = 0.089, Z-score = 1.494,
\(p = 0.068\)). Because the p-value is
greater than \(0.05\), we fail to
reject the null hypothesis of spatial randomness. This implies that a
whisky’s floral qualities are likely dictated by individual
distillery-specific production choices (such as cut-points during
distillation) rather than regional or geographic proximity.
The findings above shows that geography definitely plays a role in some flavors (like Smoke) but not others (like Floral), we need to find out how all 12 flavor variables interact with each other before building a model. We do this with Principal Component Analysis (PCA) to reduce our 12 taste dimensions, so that the main axes of taste variation across Scottish distilleries can be extracted.
# 1. Isolate the 12 active numeric flavor profile columns
flavor_data <- whisky %>%
st_drop_geometry() %>%
select(Body, Sweetness, Smoky, Medicinal, Tobacco, Honey, Spicy, Winey, Nutty, Malty, Fruity, Floral)
# 2. Run PCA with scaling enabled
whisky_pca <- prcomp(flavor_data, scale. = TRUE)
# 3. FIX: Plot the Scree Plot using safe hex codes (#FFBF00 is Amber)
fviz_eig(whisky_pca, addlabels = TRUE, ylim = c(0, 50),
barfill = "#FFBF00", barcolor = "#D4AF37",
title = "Scree Plot: Variance Explained by Principal Components")
# 4. Generate the Biplot to see distillery positions and flavor directions
fviz_pca_biplot(whisky_pca,
repel = TRUE,
col.var = "red", # Color of the flavor arrows
col.ind = whisky$Region, # Color points by our engineered geographic regions!
palette = "Set1",
legend.title = "Geographic Region",
title = "PCA Biplot: Whisky Flavor Dimensions vs Geographic Regions")
While 43% means there is still secondary complexity in the remaining dimensions, a two-dimensional plot is strong enough to visually identify clear clusters and pull out the primary flavor profiles of Scotland.
The Islay Cluster (Green Squares): Notice how almost all the green Islay dots pull hard to the far right, aligning perfectly with the Smoky and Medicinal flavor arrows. This matches the Moran’s I result perfectly; Islay is an outlier both geographically and chemically.
The Speyside Cluster (Orange Boxes): The orange Speyside dots group strongly on the left-hand quadrants. They track heavily along the Fruity, Sweetness, Floral, Honey, and Winey arrows. This shows Speyside’s classic profile is light, sweet, and complex, diametrically opposed to the heavy peat of Islay.
Highlands (Blue Triangles) & Lowlands (Purple Crosses): These are much more dispersed in the center and bottom quadrants, showing they act as transitional flavor profiles bridging the sweeter Speyside styles and more robust textures.
To discover data-driven whisky groupings, a combined matrix of scaled flavor scores and physical Easting/Northing coordinates was fed into a K-Means algorithm. This ensures the clusters respect both organoleptic properties and geographic proximity.\(k = 5\) was set to test if data-driven configurations naturally recreate the 5 traditional regions.
# 1. Extract coordinates from our sf object
coords <- st_coordinates(whisky)
# 2. Combine the 12 flavor variables with the 2 spatial coordinates
combined_features <- cbind(flavor_data, coords)
# 3. Scale the combined matrix so spatial distance and flavor scores have equal weight
scaled_features <- scale(combined_features)
# 4. Run K-Means Clustering
set.seed(123) # Set seed for consistent cluster naming
whisky_km <- kmeans(scaled_features, centers = 5, nstart = 25)
# 5. Append the resulting cluster assignments back onto our spatial sf dataset
whisky$Cluster <- as.factor(whisky_km$cluster)
# 6. Map the data-driven clusters across Scotland
ggplot(data = whisky) +
geom_sf(aes(color = Cluster), size = 3, alpha = 0.8) +
scale_color_brewer(palette = "Set1") +
theme_minimal() +
labs(
title = "Data-Driven Spatial Flavor Clusters",
subtitle = "Generated via Spatially Constrained K-Means (k=5)",
x = "Easting", y = "Northing"
)
# Generate a cross-tabulation table
comparison_table <- table(Traditional_Region = whisky$Region, ML_Cluster = whisky$Cluster)
print(comparison_table)
## ML_Cluster
## Traditional_Region 1 2 3 4 5
## Campbeltown 0 0 2 0 0
## Highlands 13 1 2 6 15
## Islay 0 5 3 0 0
## Lowlands 3 0 4 0 0
## Speyside 17 0 0 1 14
The cross-tabulation matrix reveals a fascinating story about how math groups these distilleries compared to history:
Cluster 3 (The South & Coast Blend): This cluster captures both Campbeltown distilleries (2) and all Lowlands distilleries (4), alongside a few coastal neighbors. This proves that Campbeltown and Lowlands share massive commonalities in light or transitional profiles, grouping them together into a unified southwestern flavor zone.
Clusters 1 and 5: These two groups dominate the northeastern map.
Cluster 1 contains 17 Speyside and 13 Highland distilleries.
Cluster 5 contains 14 Speyside and 15 Highland distilleries.
Hence, this is mathematical proof that Speyside and the surrounding Highlands are not distinct flavor regions. They completely overlap, sharing the exact same sweet, floral, and malty flavor dimensions. Historical labeling divides them, but the machine learning model treats them as one massive contiguous flavor region.
Since the traditional regions don’t perfectly align with flavor data, supervised classification is done to challenge this idea.
A Random Forest Classifier is built to answer:
Can a machine learning algorithm accurately predict a
distillery’s traditional region based ONLY on how its whisky
tastes? If the model scores high accuracy, the traditional
classification holds up. If accuracy is low, it confirms that regional
naming conventions are more about geography/marketing than a strict
standard of flavor.
A Random Forest model is trained with 500 trees using 12 numeric
flavor characteristics as predictors. The target variable is the
engineered traditional Region. The Out-of-Bag (OOB) error
will be evaluated to estimate to calculate the model’s true
accuracy.
library(randomForest)
# 1. Cleanly isolate the features and target variable (stripping geometry)
rf_data <- whisky %>%
st_drop_geometry() %>%
select(Region, Body, Sweetness, Smoky, Medicinal, Tobacco, Honey, Spicy, Winey, Nutty, Malty, Fruity, Floral)
# 2. Train the Random Forest Model
set.seed(42)
whisky_rf <- randomForest(Region ~ ., data = rf_data, ntree = 500, importance = TRUE)
# 3. Print the Confusion Matrix and OOB Error Rate
print(whisky_rf)
##
## Call:
## randomForest(formula = Region ~ ., data = rf_data, ntree = 500, importance = TRUE)
## Type of random forest: classification
## Number of trees: 500
## No. of variables tried at each split: 3
##
## OOB estimate of error rate: 51.16%
## Confusion matrix:
## Campbeltown Highlands Islay Lowlands Speyside class.error
## Campbeltown 0 2 0 0 0 1.0000000
## Highlands 0 20 1 1 15 0.4594595
## Islay 0 4 4 0 0 0.5000000
## Lowlands 0 5 0 1 1 0.8571429
## Speyside 0 14 0 1 17 0.4687500
# Convert importance results to a tidy dataframe for plotting
importance_df <- as.data.frame(importance(whisky_rf))
importance_df$Flavor <- rownames(importance_df)
# Plot Mean Decrease in Gini
ggplot(importance_df, aes(x = reorder(Flavor, MeanDecreaseGini), y = MeanDecreaseGini)) +
geom_bar(stat = "identity", fill = "#D4AF37", color = "#AA7C11") +
coord_flip() +
theme_minimal() +
labs(
title = "Random Forest: Feature Importance",
subtitle = "Flavors that drive regional differentiation",
x = "Flavor Metric",
y = "Importance (Mean Decrease Gini)"
)
The Random Forest classifier yielded an Out-of-Bag (OOB) error rate of 51.16%, meaning the model misclassified more than half of the distilleries when attempting to predict their traditional geographic region based solely on flavor profiles. This high error rate strongly confirms that traditional whisky regions do not possess uniquely cohesive, self-contained flavor signatures.
The confusion matrix reveals where regional flavor definitions
completely break down:
* Campbeltown (Class Error: 1.000): The model failed to
correctly classify a single Campbeltown distillery, misclassifying all
of them into the Highlands. This indicates that Campbeltown lacks an
independent organoleptic footprint in this dataset.
Lowlands (Class Error: 0.857): Out of 7 Lowland distilleries, only 1 was correctly identified; 5 were misclassified as Highlands and 1 as Speyside.
The Speyside-Highlands Friction: Out of 32 Speyside distilleries, 14 were misclassified as Highlands. Conversely, out of 37 Highland distilleries, 15 were misclassified as Speyside. This massive cross-contamination (affecting nearly half of each region’s sample) mathematically substantiates our K-Means finding: Speyside and Highland whiskies share an overlapping flavor continuum.
Islay (Class Error: 0.500): Islay achieved the highest relative distinctiveness, but still saw 4 of its distilleries misclassified into the Highlands due to the presence of lighter, unpeated offerings produced on the island.
The Feature Importance plot clarifies exactly which taste characteristics carry regional information:
Smoky and Medicinal are the undisputed dominant predictors, topping the Mean Decrease in Gini metric. This aligns with our Global Moran’s I test; the heavy use of peat creates a distinct, geographically traceable profile that allows the model to separate heavily-peated coastal/island distilleries from the rest of Scotland.
Sweetness, Floral, and Fruity occupy a secondary tier of importance. These sweet, ester-driven attributes primarily help the model distinguish the core Speyside style from heavy malts.
Tobacco and Malty provide almost zero informational utility to the splits, indicating that these background traits are uniformly distributed across all Scottish distilleries regardless of geographic territory.
This study combined spatial statistics and machine learning to evaluate whether traditional Scotch whisky regions genuinely define distinct flavor profiles. The empirical data reveals a clear mismatch between historical regional boundaries and actual spirit chemistry.
Smoky notes show
intense regional clustering (Moran’s I = 0.4648, \(p < 0.001\)), Floral
characteristics are distributed completely at random (\(p = 0.0676\)). This proves that localized
traditions and independent distillery-level choices coexist.Smoky and Medicinal are the only two
traits that carry strong regional signatures. For all other traits (such
as sweetness, fruitiness, or maltiness), Scottish whiskies function as a
continuous, overlapping spectrum rather than isolated territories.Traditional whisky regions are rich historical and marketing designations, not reliable flavor guides. Because modern distilleries buy malted barley from centralized maltings and source oak casks globally, a whisky’s flavor is primarily driven by internal distillery engineering choices—such as still shape and distillation cuts—rather than regional terroir.
While buying an “Islay” bottle remains a valid shortcut to expect heavy smoke, expecting a “Highland” malt to taste inherently different from a “Speyside” malt is a misconception disproven by the data. Moving forward, a classification system based on data-driven flavor clusters would be far more transparent and accurate for consumers.
Bivand, R. S., & Wong, D. W. S. (2018). Comparing implementations
of global and local indicators of spatial association. TEST,
27(3), 716–748. https://doi.org/10.1007/s11749-018-0599-x [For the
spdep package]
Breiman, L. (2001). Random Forests. Machine Learning,
45(1), 5–32. https://doi.org/10.1023/A:1010933404324 [For the
randomForest package]
Kassim, A. (2020). Whisky distilleries of Scotland dataset. Kaggle. https://www.kaggle.com/datasets [Replace with the specific link if your dataset came from a specific repository or university archive]
Kassambara, A., & Mundt, F. (2020). factoextra: Extract and visualize the results of multivariate data analyses (R package version 1.0.7). https://CRAN.R-project.org/package=factoextra
MacLean, C. (2020). Understanding Scotch malt whisky: Official study guide of the Certified Scotch Professional study and certification program. The Council of Whiskey Masters.
Pebesma, E. (2018). Simple Features for R: Standardized support for
spatial vector data. The R Journal, 10(1), 439–446. https://doi.org/10.32614/RJ-2018-009 [For the
sf package]
R Core Team. (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D.,
François, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M.,
Pedersen, T. L., Miller, E., Bache, S. M., Müller, K., Ooms, J.,
Robinson, D., Seidel, D. P., Spinu, V., … Yutani, H. (2019). Welcome to
the tidyverse. Journal of Open Source Software, 4(43),
1686. https://doi.org/10.21105/joss.01686 [For the
tidyverse / ggplot2 packages]
In accordance with institutional academic integrity guidelines, the author acknowledges the supportive use of generative artificial intelligence (Gemini) during the development of this project.
Specifically, AI assistance was utilized for the following tasks: * Code Troubleshooting & Optimization: Debugging specific R environment compilation errors (such as geometry exclusions, package syntax updates, and graphic color parameters). * Structural Formatting: Technical guidance on setting up dynamic R Markdown templates, layout alignment, and table generation. * Refinement: Assistance in streamlining the clarity of complex statistical interpretations based on user-supplied data outputs.
All data selection, substantive exploratory analysis, spatial modeling execution, thematic interpretations, and final text composition were fundamentally directed, reviewed, and finalized by the human author. The core intellectual arguments and ownership of this work belong entirely to the author.