SATELLITE DATA FOR AGRICULTURAL ECONOMISTS: Theory and Practice

Lecture date: 08-01-2026

Author
Affiliation

David Wuepper, Hadi, and Wyclife Agumba Oluoch

Land Economics Group, University of Bonn, Bonn, Germany

Published

December 8, 2025

1 Introduction

In this session, we will learn how to use an already built spatial machine learning model from sdm package (Naimi and Araujo 2016) to predict tea fields on our data for the same roi but different time. The main task is to segment tea fields in the same region of interest but on data that the model was never trained on.

2 Google Earth Engine

You can run the code in the snippet below in the GEE code editor to obtain the Sentinel+2 + canopy height for use in this script


var s2 = ee.ImageCollection("COPERNICUS/S2_SR_HARMONIZED");
var cs_plus = ee.ImageCollection("GOOGLE/CLOUD_SCORE_PLUS/V1/S2_HARMONIZED");
var roi = projects/ee-wyclifeaoluoch/assets/roi_win_2025 // Change to your asset path
var canopy_ht = projects/sat-io/open-datasets/facebook/meta-canopy-height;

/*
SATELLITE DATA FOR AGRICULTURAL ECONOMISTS
THEORY AND PRACTICE

Mapping tea plantations in Kericho region of Kenya: Machine Learning Approach
David Wuepper, Hadi, Wyclife Agumba Oluoch
Affiliations:
  - Land Economics Group (https://www.ilr1.uni-bonn.de/en/research/research-groups/land-economics), University of Bonn, Bonn, Germany

This script, demonstrates how to create, visualize, and export to drive a cloud-free Sentinel-2 composite and canopy height data
for use in machine and deep learning models. The model trained in 2025 is used to predict in this image of 2020
*/

// --------------------------------------------
// 1. Band Selection for Sentinel-2
// --------------------------------------------
// Define relevant Sentinel-2 bands for vegetation/cropland analysis.
// These include visible, red-edge, near-infrared, and shortwave-infrared bands.

var selected_bands = ['B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B8A', 'B11', 'B12'];

s2 = s2.select(selected_bands);

print('selected_bands', selected_bands);

// --------------------------------------------
// 2. Define Time Range for Image Collection
// --------------------------------------------

var start_date = "2020-01-01";  // Start date of image collection
var end_date = "2021-01-01";    // End date of image collection

// --------------------------------------------
// 3. Mosaic the canopy height, clip, convert to float and rename
// --------------------------------------------

canopy_ht = canopy_ht.mosaic().rename('ch');

// --------------------------------------------
// 4. Create Cloud-Free Composite
// --------------------------------------------
// Filter by region of interest (roi), apply cloud score mask, and generate a median composite.

var composite = s2
  .filterBounds(roi)                          // Keep images overlapping the region of interest
  .filterDate(start_date, end_date)           // Filter only for the dates needed
  .linkCollection(cs_plus, 'cs')              // Attach the cloud score band 
  .map(function(img){
    return img.updateMask(img.select('cs').gte(0.6));  // Mask out cloudy pixels (cloud score < 0.6)
  })
  .median()                                   // Reduce image collection to a single median image
  .multiply(0.0001)                           // Scale reflectance values
  .select(selected_bands)                     // Re-select bands post-scaling, 'cs' was linked
  .addBands(canopy_ht)
  .toFloat();                                 // Clip the final composite to the region of interest

// Print image properties to console to verify band structure and completeness

print('Composite with canopy height', composite);  // Expecting a single image with 10 bands

// How many images contributed to the median

print('Images used', 
  s2
  .filterBounds(roi)
  .filterDate(start_date, end_date)
  .linkCollection(cs_plus, 'cs')
    .map(function(img){
      return img.updateMask(img.select('cs').gte(0.6));
      }));
  
//  Compare with cloudy pixel percentage (Old approach...only 21 images or so)
print('Images used no cs_plus', 
  s2
  .filterBounds(roi)
  .filterDate(start_date, end_date)
  .filter(ee.Filter.lt("CLOUDY_PIXEL_PERCENTAGE", 10)));

// --------------------------------------------
// 5. Visualization of the Composite
// --------------------------------------------
// Add a true-color layer (RGB: B4, B3, B2) to the map for visual inspection
Map.centerObject({object: roi, zoom: 13});

Map.addLayer(
  composite.clip(roi), 
  {
    min: 0.019347,              // Obtained from the Layers panel, min value (sometimes based on 98% of values)
    max: 0.124803,              // Obtained from the Layers panel, these are to enhance visualization
    bands: ['B4', 'B3', 'B2']   // Bands to use for visualization, RGB (true color composite)
  }, 
  'composite',                  // Name of the layers in Layers panel
  false                         // Do not display layer by default (can be turned on in Layers panel), makes computations faster
);

// --------------------------------------------
// 6. Display ROI Outline
// --------------------------------------------
// Paint the ROI geometry outline in red

var roi_outline = ee.Image().paint({ // Paints the edges of roi as new raster layer
  featureCollection: roi,            // The region of interest to be visualized
  width: 4                           // Outline thickness in pixels
});

// Center the map view on the ROI and add the red border outline

Map.addLayer(roi_outline, {palette: 'red'}, 'roi');

// --------------------------------------------
// 7. Export Image Composite to Google Drive
// --------------------------------------------
// Save the processed image composite for offline use (e.g., in Python or GIS software)

Export.image.toDrive({                // Function to export an image to Google drive
  image: composite,                   // The image to be exported
  description: 'temporal_image',      // Task name to appear in the Tasks tab while exporting
  folder: 'winter2025',               // Folder in your Google Drive to hold the image
  fileNamePrefix: 'temporal2025',     // Prefix for output file name
  region: roi,                        // Export region bounds
  scale: 10,                          // Spatial resolution (Sentinel-2 native resolution)
  maxPixels: 1e13,                    // Allow large exports
  crs: "EPSG:3857"                    // Set coordinate reference system (Web Mercator)
});

// --------------------------------------------
// 10. Script Summary
// --------------------------------------------
/*
This script demonstrates the workflow for generating a small cloud-free Sentinel-2 image composite and canopy height data
for a small administrative region in Kenya, specifically tailored for agricultural analysis.
It includes:

- Selecting and preprocessing relevant spectral bands
- Applying a cloud mask using a cloud score (`cs_plus`)
- Generating a 'clean' median composite
- Visualizing the composite and roi
- Exporting the composite image for further analysis in GIS or machine learning tools

Adapt and extend this script to fit your specific region, time period, or classification objectives.
*/

/////////////////E N D //////////////////

3 Data sets

We are going to make use of raster data we downloaded from Google Earth Engine. You need to have GEE account to be able to see the script in GEE and rerun the extraction task. The raster data is cloud-free median Sentinel-2 image for the year 2020 (filtered for the dates 2020-01-01 to 2021-01-01). The last date ensures that the images for 2020-12-31 are also obtained otherwise they will be ignored.

After exporting the image to drive, we download it to our predefined data folder inside our R project.

4 Loading libraries

We will make use of the following packages to achieve the classification task:

library(sdm)            # version 1.2.67
library(ggplot2)        # version 4.0.1
library(dplyr)          # version 1.1.4

5 Loading the data

Here, our only data is the dataset which we will transfer the model to, that is, the Sentinel-2 image for the year 2020:

  • temporal_stack: Raster dataset of the Sentinel-2 images and canopy height to predict tea plantations
temporal_stack <- rast("data/temporal2025.tif")

We need to apply our model on this new data.

6 Visualize the data

plot(temporal_stack$B7)

We did not clip the image to roi when exporting from GEE. Let us do that locally.

roi <- vect("data/roi.shp")
roi <- project(roi, crs(temporal_stack))
temporal_stack <- crop(temporal_stack, roi, mask = TRUE)
plot(temporal_stack$B7)

7 Load the model

The next step after bringing in the new data is to load the model which we will use to make prediction on it. This is the same model we trained earlier on the 2024 images. We will use a function from sdm package to read the model in.

temporal_model <- read.sdm("models/model_ens.sdm")
temporal_model
class                                 : sdmModels 
======================================================== 
number of species                     :  1 
number of modelling methods           :  3 
names of modelling methods            :  rf, svm, brt 
replicate.methods (data partitioning) :  subsampling 
number of replicates (each method)    :  3 
total number of replicates per model  :  3 (per species) 
test percentage (in subsampling)      :  30 
------------------------------------------
model run success percentage (per species)  :
------------------------------------------
method          label            
---------------------- 
rf         :        100   %
svm        :        100   %
brt        :        100   %

###################################################################
model Mean performance (per species), using test dataset (generated using partitioning):
-------------------------------------------------------------------------------

 ## species   :  label 
=========================

methods    :     AUC     |     COR     |     TSS     |  Deviance 
-------------------------------------------------------------------------
rf         :     1       |     0.98    |     0.99    |     0.09     
svm        :     1       |     0.99    |     1       |     0.06     
brt        :     1       |     0.96    |     0.98    |     0.54     

8 AoA

Before we can use a model trained in one time (2024) to make prediction in a different time (2020), it is important to check how similar the two times are pixelwise. That is whether the pixel values in 2024 data where the model was trained are in the same range as the pixel values in the 2020 data for the same roi. To do this, we can use aoa function from sdm package. It stands for Area of Applicability.

temp_aoa <- aoa(x = temporal_stack, d = temporal_model)

Let us then visualize the area of applicability.

plot(temp_aoa)

The areas with values close to 1 are quite similar in terms of pixel values to those where the model was trained. Therefore, we are more confident in predictions made in such areas. On the other hand, areas that are closer to 0 have values that are different from those that the model was exposed to or trained on, hence our confidence is lower in predictions made in such areas. We expect most areas to be closer to 1 so that we are more confident in our transferred model.

With that confidence, we can then use the model to make prediction on the new data:

p_temporal <- predict(temporal_model, temporal_stack, mean = TRUE)

Visualize the mean prediction:

plot(p_temporal)

8.2 Binary output

We can use the pa function from sdm package (Naimi and Araujo 2016) to calculate the threshold to use that maximizes tss in order to derive a binary map from the probability output:

temporal_bin_ens <- pa(x = temporal_ens, y = temporal_model, id = "ensemble", opt = 2)

Now we can have the binary map based on a threshold that maximizes specificity and sensitivity.

plot(temporal_bin_ens, main = "Binary Presence/Absence Map 2020",
     col = c("grey90", "green4"))

writeRaster(temporal_bin_ens, "outputs/bin_ens_2020.tif", overwrite = TRUE)

9 Interactive inspection

mapview::mapview(temporal_bin_ens, col.regions = c("transparent", "green"), alpha = 0.4)
Number of pixels is above 5e+05.Only about 5e+05 pixels will be shown.
You can increase the value of `maxpixels` to 580920 to avoid this.

10 Area under tea

We can then determine how much land is under tea in the region.

temporal_tea_area <- terra::expanse(temporal_bin_ens, unit = "ha", byValue = TRUE)
temporal_tea_area
  layer value      area
1     1     0 2661.3597
2     1     1  825.6166
temporal_tea_area |> 
  mutate(Class = case_when(value == 1 ~ "Tea",
                           value == 0 ~ "Non-tea",
                           TRUE ~ as.character(value))) |> 
  ggplot(aes(Class, area, fill = Class)) +
  geom_col() +
  # Add text labels on top of the bars
  geom_text(aes(label = round(area)),  # Display area value, rounded to 2 decimals
            vjust = -0.5,                 # Adjust vertical position (move up)
            color = "black",               # Set text color
            size = 5) +                    # Set text size
  scale_fill_manual(values = c("Non-tea" = "gray70", "Tea" = "darkgreen")) +
  theme_classic() +
  # Optional: Adjust y-axis limits to make room for the text
  ylim(0, max(temporal_tea_area$area) * 1.1) # Extends the y-axis limit by 10% to prevent clipping

11 Create change map

For brevity, we will use t1 = 2020 and t2 for 2025. So we want to see areas where tea has increased, decreased or remain the same etc.

t1 <- rast("outputs/bin_ens_2020.tif")
t2 <- rast("outputs/bin_ens.tif")
par(mfrow = c(1, 2))
plot(t1, col = c("white", "darkgreen"), main = "2020")
plot(t2, col = c("white", "darkgreen"), main = "2025")

Let the change map be called ch_mp.

t2 <- crop(t2, roi, mask = TRUE)
ch_mp <- t1 

ch_mp <- ifel(t1 == 0 & t2 == 0, 0,
         ifel(t1 == 0 & t2 == 1, 1,
         ifel(t1 == 1 & t2 == 0, 2, 3)))
plot(ch_mp, col = c("gray", "blue", "red", "darkgreen"))

We can improve the plot

library(tidyterra)

Attaching package: 'tidyterra'
The following object is masked from 'package:raster':

    select
The following object is masked from 'package:stats':

    filter
# Make sure your categories are factors with labels
ch_mp <- as.factor(ch_mp)
levels(ch_mp) <- data.frame(
  value = 0:3,
  label = c("No tea", "Expansion", "Contraction", "Stable Tea")
)

# Plot with ggplot2
ggplot() +
  geom_spatraster(data = ch_mp) +
  scale_fill_manual(
    values = c("gray", "blue", "red", "darkgreen"),
    name   = "Tea Change Map",
    na.value = "transparent",
    na.translate = FALSE
  ) +
  theme_minimal() +
  theme(
    legend.position = "right",
    legend.title = element_text(size = 12, face = "bold"),
    legend.text  = element_text(size = 10)
  )
<SpatRaster> resampled to 501075 cells.

Finally, we can tell area under each class:

expanse(ch_mp, unit = "ha", byValue = TRUE)
  layer       value      area
1     1      No tea 2453.7165
2     1   Expansion  206.4116
3     1 Contraction  105.4059
4     1  Stable Tea  720.8662

12 Session summary

In this session, we have demonstrated how to create a binary tea map using spatial machine learning model trained elsewhere on your new area.

In this case, we did not have to train the model ourselves but used an existing model to make prediction elsewhere.

13 Assignments (Optional)

  1. Practical exercise ==> Apply a pre-trained spatial ML model to a new temporal dataset (e.g., 2017 Sentinel-2 image) to generate a binary tea map using your area of interest. Calculate area under tea.

  2. Practical exercise ==> Create a change map between two time points (e.g., 2017 vs 2025) showing expansion, contraction, stable tea, and no tea areas.

  3. Conceptual question ==> Why is checking Area of Applicability (AoA) important when transferring models across time?

  • Ensures input data at new time is within the range seen during model training.

  • Indicates confidence in predictions: areas close to 1 are reliable, near 0 are uncertain.

  • Helps avoid extrapolation errors and misclassification.

  1. Applied question ==> What factors can lead to inaccuracies when applying a model trained in one year to another year?
  • Differences in pixel values due to sensor, seasonality, or land cover changes.

  • Misalignment or resampling errors between datasets.

  • Model not generalizing well beyond training data.

  • Uncaptured environmental or management changes affecting crop appearance.

14 References

Naimi, Babak, and Miguel B. Araujo. 2016. “Sdm: A Reproducible and Extensible r Platform for Species Distribution Modelling” 39: 368–75. https://doi.org/10.1111/ecog.01881.