Greenfield Analysis (GFA)is a facility location optimization technique used to identify the most suitable placement of new service centers, warehouses, or healthcare facilities when no prior infrastructure constraints exist. Unlike methods that rely on existing facility locations, Greenfield Analysis assumes a “clean slate,” allowing decision makers to determine the most efficient network design from the ground up.
In this analysis, we applied a center-of-gravity (CoG) approach to identify four optimal facility locations in Connecticut. The CoG method works by clustering demand points into groups and calculating the geographic center of each group. This provides a set of facility locations that minimize average travel distance for the population served.
To illustrate the method, we generated 1000 demand locations across Connecticut and grouped them into five service clusters using k-means clustering. Each cluster’s centroid was then calculated as the facility location that best serves the demand points assigned to it. Finally, we mapped the demand points, their assigned centroids, and the connecting paths using an interactive visualization.
This approach provides planners and policymakers with a simple yet powerful tool for exploring how new facilities could be positioned to improve accessibility and service efficiency. While the example here uses simulated demand points, the same framework can be applied to real-world data such as population centers, healthcare demand, or customer locations.
Here are the steps for creating the analysis in R, including data simulation, clustering, centroid calculation, and interactive mapping.
1. Load the packages
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# The {sf} package provides a set of tools for working with geospatial vectors, i.e. points, lines, polygons, etc.
library(sf)
## Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
# The {tmap} package allows you to create interactive map
library(tmap)
2. Simulate 1000 locations in Connecticut.
# Approximate bounding box for CT: lat 40.98–42.05, lon -73.73 to -71.78
set.seed(2016)
n_locations <- 1000
connecticut_locs <- data.frame(
id = 1:n_locations,
lon = runif(n_locations, -73.73, -71.78),
lat = runif(n_locations, 40.98, 42.05)
)
# Returns the first six rows
head(connecticut_locs)
## id lon lat
## 1 1 -73.37868 41.85321
## 2 2 -73.45126 41.02757
## 3 3 -72.08879 41.33352
## 4 4 -73.46953 41.18279
## 5 5 -72.79887 41.57457
## 6 6 -73.49355 41.67696
3. Perform k-means clustering to get 4 facility zones.
# 2. Perform k-means clustering
k <- 4
coords <- connecticut_locs %>% select(lon, lat)
kmeans_result <- kmeans(coords, centers = k)
connecticut_locs$cluster <- kmeans_result$cluster
# Returns the first six rows
head(connecticut_locs)
## id lon lat cluster
## 1 1 -73.37868 41.85321 4
## 2 2 -73.45126 41.02757 4
## 3 3 -72.08879 41.33352 1
## 4 4 -73.46953 41.18279 4
## 5 5 -72.79887 41.57457 2
## 6 6 -73.49355 41.67696 4
4. Compute CoG (centroid) for each cluster.
centroids <- connecticut_locs %>%
group_by(cluster) %>%
summarise(
cog_lon = mean(lon),
cog_lat = mean(lat),
.groups = "drop"
)
# Returns the first six rows
head(centroids)
## # A tibble: 4 × 3
## cluster cog_lon cog_lat
## <int> <dbl> <dbl>
## 1 1 -72.1 41.5
## 2 2 -72.7 41.8
## 3 3 -72.6 41.2
## 4 4 -73.4 41.5
5. Join centroids back to the original data by cluster.
connecticut_locs_with_centroids <- connecticut_locs %>%
left_join(centroids, by = "cluster")
6. Plot the result.
# Plot
ggplot() +
# Lines from each point to its centroid
geom_segment(data = connecticut_locs_with_centroids,
aes(x = lon, y = lat,
xend = cog_lon, yend = cog_lat,
color = factor(cluster)),
alpha = 0.6, size = 0.7) +
# Points (demand locations)
geom_point(data = connecticut_locs,
aes(x = lon, y = lat, color = factor(cluster)),
size = 2) +
# Centroids (facility locations)
geom_point(data = centroids,
aes(x = cog_lon, y = cog_lat),
shape = 17, color = "black", size = 4) +
# Labels for centroids
geom_text(data = centroids,
aes(x = cog_lon, y = cog_lat, label = cluster),
vjust = -1, size = 4) +
scale_color_brewer(palette = "Set1", name = "Cluster") +
labs(title = "Center-of-Gravity Greenfield Analysis (CT)",
subtitle = "Lines connect demand points to their cluster centroids",
x = "Longitude", y = "Latitude") +
theme_minimal() +
# Customize text elements: make title bigger and bold, subtitle slightly smaller,
# axis titles and labels larger for readability
theme(
plot.title = element_text(size = 27, face = "bold"),
plot.subtitle = element_text(size = 14),
axis.title = element_text(size = 14),
axis.text = element_text(size = 12)
)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
7. Create an interactive map using {tmap} package.
# Convert demand locations and centroids to sf objects
demand_sf <- st_as_sf(connecticut_locs, coords = c("lon", "lat"), crs = 4326)
centroids_sf <- st_as_sf(centroids, coords = c("cog_lon", "cog_lat"), crs = 4326)
# Create line connections (demand to centroid)
# Use lapply to loop through each row of the dataset "connecticut_locs_with_centroids"
# Each row represents a demand location and the centroid it is assigned to
line_list <- lapply(1:nrow(connecticut_locs_with_centroids), function(i) {
# st_linestring() then converts this into a LINESTRING geometry (a line between two points)
st_linestring(matrix(
c(connecticut_locs_with_centroids$lon[i], connecticut_locs_with_centroids$lat[i],
connecticut_locs_with_centroids$cog_lon[i], connecticut_locs_with_centroids$cog_lat[i]),
ncol = 2, byrow = TRUE))
})
lines_sf <- st_sf(
cluster = factor(connecticut_locs_with_centroids$cluster),
geometry = st_sfc(line_list, crs = 4326))
# Interactive map with tmap
tmap_mode("view")
## ℹ tmap mode set to "view".
tm_shape(lines_sf) +
tm_lines(col = "cluster", lwd = 2, alpha = 0.5) +
tm_shape(demand_sf) +
tm_symbols(col = "cluster", palette = "Set1", size = 0.05, border.col = "black") +
tm_shape(centroids_sf) +
tm_symbols(shape = 17, col = "black", size = 0.1) +
tm_text("cluster", size = 2, col = "black", ymod = 0.0) +
tm_layout(
title = "Center-of-Gravity Greenfield Analysis (CT)",
title.size = 1.5,
legend.outside = TRUE
)
##
## ── tmap v3 code detected ───────────────────────────────────────────────────────
## [v3->v4] `tm_lines()`: use `col_alpha` instead of `alpha`.[v3->v4] `tm_symbols()`: migrate the argument(s) related to the scale of the
## visual variable `fill` namely 'palette' (rename to 'values') to fill.scale =
## tm_scale(<HERE>).[v3->v4] `symbols()`: use 'fill' for the fill color of polygons/symbols
## (instead of 'col'), and 'col' for the outlines (instead of 'border.col').[v3->v4] `tm_layout()`: use `tm_title()` instead of `tm_layout(title = )`[cols4all] color palettes: use palettes from the R package cols4all. Run
## `cols4all::c4a_gui()` to explore them. The old palette name "Set1" is named
## "brewer.set1"Registered S3 method overwritten by 'jsonify':
## method from
## print.json jsonlite
Conclusion
The Greenfield Analysis using the center-of-gravity approach successfully identified four optimal facility locations to serve simulated demand points across Connecticut. By clustering 1000 demand locations and calculating the geographic centroid of each group, the analysis demonstrated how service areas can be organized to minimize travel distance and improve accessibility.
The interactive mapping of demand points, facility centroids, and their connections provided a clear visualization of service assignments, making the results easy to interpret for planning and decision-making. While this example used simulated data, the same framework can be applied to real-world datasets, incorporating factors such as population density, healthcare utilization, or transportation networks.
Overall, this approach illustrates the practicality of center-of-gravity modeling in strategic facility planning, offering a flexible and data-driven method for optimizing service coverage and resource allocation.
Disclaimer:The author of this tutorial, along with any associated organizations, assumes no responsibility for the use or misuse of the code and methods presented. This content is intended for educational purposes only and is not a substitute for professional advice.
A.M.D.G.