Generate predictor variables from the PlanetScope imagery

The PlanetScope imagery

Planet surface reflectance (SR) imagery was used to generate predictor variables. SR data is critical because it represents the proportion of light reflected from the Earth’s surface while correcting for atmospheric effects, which enables accurate comparisons across acquisition dates. Temporal variation strengthens the model’s ability to generalise by capturing seasonal differences in vegetation.

The Planet sensor provides imagery in eight spectral bands: Red, Near-Infrared (NIR), Blue, Green, Green I, Coastal Blue, Yellow, and Red Edge. Among these, the Red, NIR, and Blue bands were combined to generate RGB composites, enabling a detailed visual assessment of Aberfoyle Forest and enhancing the distinction of vegetation patterns.

Create vegetation indices (Summer)

The vegetation indices from the summer scene can improve the Random Forest model by providing features that capture the peak of vegetative growth, when foliage is densest and most uniform. This allows for a clearer distinction between species or vegetation types. Moreover, vegetation indices help to reduce noise from raw spectral data, resulting in more reliable and accurate predictions. These enhancements can significantly boost the model’s performance.

Since the assessment focuses on closed canopies, the effects of soil are minimal. For this reason, it makes sense to use indices that focus on small differences in plant health, chlorophyll levels, and canopy structure. MCARI, GNDVI, NDVI, and MSAVI2 each highlight these aspects, and together they provide a fuller picture that helps distinguish species better than indices that only correct for soil or water.

MCARI is highly sensitive to variations in chlorophyll and canopy structure, making it particularly useful for distinguishing species with subtle differences in greenness.
GNDVI, which uses the green band, is more responsive to chlorophyll levels than NDVI and can better differentiate species with similar biomass but differing health.
NDVI remains a reliable general-purpose index and serves as a solid baseline. It complements MCARI and GNDVI effectively.
MSAVI2 can be valuable when species vary in crown density or have incomplete canopy cover, as it adds structural contrast.

Combining Spectral Bands and Vegetation Indices into a Multi-Layered Image

The imagery dataset contains several raster images, with each one showing either a spectral band from the Planet sensor or a vegetation index. Since these images cover the same area and have the same resolution, they can be stacked to form a single multi-layered image. This final image brings together the original PlanetScope bands for each season along with the calculated vegetation indices.

The list below shows a total of 36 predictor variables.

Table 1. Predictor variables used in the Random Forest model
Covariates
coastal.blue_winter
blue_winter
green.I_winter
green_winter
yellow_winter
red_winter
rededge_winter
nir_winter
coastal.blue_spring
blue_spring
green.I_spring
green_spring
yellow_spring
red_spring
rededge_spring
nir_spring
coastal.blue_summer
blue_summer
green.I_summer
green_summer
yellow_summer
red_summer
rededge_summer
nir_summer
coastal.blue_autumn
blue_autumn
green.I_autumn
green_autumn
yellow_autumn
red_autumn
rededge_autumn
nir_autumn
NDVI
GNDVI
MCARI
MSAVI2

Field data (The response variable)

The response variable represents the various cover classes identified in the study area, based on data from the Forester Subcompartment database. To keep the spectral signatures clear, I am only using compartments that are completely covered by a single component.

Random Forest Classification

Forward Feature Selection (FFS)

Highly autocorrelated variables lead to overfitting, then removing these variables should solve the problem. CAST’s ffs (forward feature selection) function selects predictor variables with user-defined cross-validation.

As the number of variables in the model increases, performance improves in terms of Kappa, and the results become more consistent. Kappa stabilises over 0.7 with sixteen variables.

Tune Spatial cross-validation

I will evaluate the classification using spatial cross-validation with the blockCV R. Spatial cross-validation with the blockCV package is particularly beneficial for avoiding spatial autocorrelation, which can lead traditional cross-validation methods to overestimate model performance. The blockCV package enables you to divide spatial data into blocks, effectively minimising spatial dependency between the training and validation sets. By creating these spatial blocks, you can preserve spatial relationships and prevent autocorrelation in the validation data, resulting in a more accurate assessment of model performance.

The figure below shows the autocorrelation of the predictors. The plots display the extent of spatial autocorrelation for each input raster covariate and also show the spatial block created using the median of these extents. Based on this, the area should be divided into blocks that are 1200 meters wide. For each round of x-fold cross-validation, all data from one spatial block is left out.

Running Random Forest

I split the data into a training set containing 80% of the samples and a validation set containing the remaining 20%. Additionally, I established the 12 predictors and 10 response variables. The R code configures a 10-fold spatial cross-validation using trainControl, with method = “cv” specifying cross-validation. The random forest model (method = “rf”) is trained using the train function with cross-validation (trControl = ctrl_sp_spatial), a specified tuning grid (tuneGrid), and 300 trees (ntree = 300), optimizing for the Kappa metric. We see that the classes could be distinguished with a high Kappa value (>0.68). The optimal mtry value for the model is 2.

Random Forest

18198 samples 14 predictor 11 classes: ‘Beech’, ‘Birch’, ‘Corsican pine’, ‘Douglas fir’, ‘Larch’, ‘Norway spruce’, ‘Oak’, ‘Other Broadleaves’, ‘Other Conifers’, ‘Scots pine’, ‘Sweet chestnut’

No pre-processing Resampling: Cross-Validated (10 fold) Summary of sample sizes: 16463, 16603, 16695, 16318, 16341, 16708, … Resampling results across tuning parameters:

mtry Accuracy Kappa
2 0.8259865 0.7911304 8 0.8241937 0.7895839 14 0.8155421 0.7794587

Kappa was used to select the optimal model using the largest value. The final value used for the model was mtry = 2.

The image below provides insight into how variables help distinguish between different species in the forest.

For instance, in the Lodgepole pine panel, the nir_spring and red_winter variables show high importance for classifying this species. This suggests that reflectance in these bands during those seasons is especially useful for distinguishing Lodgepole pine from other classes

	Beech	Birch	Corsican pine	Douglas fir	Larch	Norway spruce	Oak	Other Broadleaves	Other Conifers	Scots pine	Sweet chestnut	Sum	UA
Beech	1133.0	3.0	0.0	0	7.0	0.0	71	3.0	0.0	1	7.0	1225	92.5
Birch	4.0	276.0	0.0	0	13.0	1.0	102	1.0	0.0	0	1.0	398	69.3
Corsican pine	0.0	0.0	451.0	6	0.0	0.0	0	0.0	0.0	35	0.0	492	91.7
Douglas fir	0.0	0.0	1.0	2313	0.0	3.0	0	0.0	25.0	2	0.0	2344	98.7
Larch	0.0	0.0	0.0	0	3133.0	1.0	13	2.0	0.0	0	1.0	3150	99.5
Norway spruce	0.0	0.0	1.0	9	0.0	1818.0	0	0.0	10.0	11	0.0	1849	98.3
Oak	13.0	1.0	0.0	0	19.0	0.0	4849	1.0	0.0	0	24.0	4907	98.8
Other Broadleaves	2.0	2.0	0.0	0	2.0	1.0	38	247.0	0.0	0	16.0	308	80.2
Other Conifers	0.0	0.0	13.0	107	2.0	31.0	0	1.0	989.0	13	0.0	1156	85.6
Scots pine	0.0	0.0	17.0	1	9.0	4.0	0	0.0	2.0	819	0.0	852	96.1
Sweet chestnut	39.0	1.0	0.0	0	4.0	0.0	32	2.0	0.0	0	1439.0	1517	94.9
Sum	1191.0	283.0	483.0	2436	3189.0	1859.0	5105	257.0	1026.0	881	1488.0	18198	NA
PA	95.1	97.5	93.4	95	98.2	97.8	95	96.1	96.4	93	96.7	NA	96.0

Supervised Classification. Forest of Dean

Document Author

2025-12-17