Abstract

Erratic weather patterns induced by climate change pose a significant challenge to rain-fed agricultural systems and drought is now a prevaling concern for food-security, particularly in arid and semi-arid areas. In Taita-Taveta county, global Principal Component Analysis shows meteorological factors to be the key drivers of Exposure to drought but at the local level Geographically Weighted Principal Component Analysis identifies altitude and soil sand content as key variables. Input data is validated by the literature but it lacks multivariate normality, possibly due to crude data extraction and processing techniques particularly time-series data. Application of convolution kernels would better capture trends in time-series data and would remove the multivariate normality criteria.

Introduction

Climate change increasingly manifests as erratic weather patterns with declining rainfall totals combined with increased frequency and intensity of droughts, floods and heatwaves. For smallholders within agricultural systems with a high reliance on natural systems these environmental changes can be extremely challenging as agro-ecological zones shift and traditional crops and livestock may no-longer do well under the prevailing conditions (Mpandeli et al, 2019). Implications for food security are already materialising with Porter et al (2014) estimating agricultural productivity declined between 1% and 5% globally in the 30 years prior to their study with the greatest declines accumulating in drought prone arid and semi-arid lands (ASALs) where agriculture is predominantly rain-fed (Mugabe et al, 2019).

The significant effect of drought on agricultural productivity in ASALs gave rise to the ‘Vulnerability Assessment Model’ (VAM), a conceptual tool for understanding a farmer or agricultural systems likely level of susceptibility to a drought event and its ramifications (Zarafshani et al, 2016). A VAM has three core components: Exposure, which has an environmental focus examining the nature, extent, duration and frequency of drought conditions for a farm or geographic area; Sensitivity, which explores the resilience and capacity of planted crops to cope and respond to drought conditions, this is related to the cropping pattern and crop condition; finally, Adaptive capacity assesses the resources available to a farm of agricultural system and the ability to deploy them to cope with a drought episode. The general acceptance of VAMs as a conceptual tool was reinforced by the IPCC ‘Fourth Assessment Report’ which promoted the principals (Alonso et al, 2019).

This paper focuses on developing the methodology for assessing the first component of VAM - Exposure - in-order to complement existing household survey tools that are be better able to capture measures of Sensitivity and Adaptive capacity. Droughts come in three forms, meteorological, hydrological and agricultural, each of which has its own classification criteria but cumulatively form a comprehensive assessment of Exposure (Hoque et al, 2021). Meteorological droughts are driven by dry weather and the inequity between low precipitation and high evapotranspiration. Areas prone to metereological drought are characterised by low rainfall and humidity but high temperatures. Hydrological droughts are distinguished by a scarcity of surface and subsurface water supply (i.e. in stream-flows, lakes, reservoirs, waterpans and groundwater), generally considered to be a consequence of topographical factors such as elevation, slope, surface waterbody/stream density, which all contribute to surface runoff rates and if unfavorable allow any precipitation to rapidly flow away. Agricultural droughts occur when water is insufficient for crop growth and production, vulnerability is therefore dependent on water demand, driven by vegetation cover, and water availability provided from soil moisture in rain-fed agricultural systems (Hoque et al, 2021).

Current VAM research has generated a large number of variables for measuring Exposure to drought but the literature has not yet coalesced at a single set of variables nor data collection method. Instead, data availability and the researchers own methodological bias appear to dominate variable choices. To minimise demands on resources and ensure consistency in variable availability across space, this paper utilises and builds on the wide range of remote sensed variables in the VAM literature on Exposure with particular inspiration from Hoque et al (2021). The study site at the focus of this paper is Taita-Taveta County, in Kenya, a predominantly rain-fed agricultural community that has in recent years experienced increased frequency and intensity of droughts (ICPAC, 2019). The study area is part of a wider research program (ESSA …..) utilising households surveys to collect data from 332 smallholder farmers. The coordinates of these households have been re-used in this paper for data collection purposes. Using Principal Component Analysis (PCA) and then Geographically Weighted Principal Component Analysis (GWPCA) the primary drivers of drought Exposure are revealed. Whilst PCA has been used before to identify Principal Components (PCs) of Exposure across a whole study site (e.g. Alonso et al, 2019) a literature search has not found examples of GWPCA being used to recover household-level PCs of Exposure and the contributing variables. It is hoped that through collecting and then ordering the principal drivers of exposure to drought at the household-level this paper can contribute to more focused and spatially tailored VAMs and subsequent policy interventions.

Materials and Methods

The Study Area

The study area covers roughly 876\(km^2\) of Taita-Taveta County in SE Kenya and is demarcated to the South, West and North by the Tsavo National Parks (see Figure 1). The lowest areas of the study area are at an average elevation of 700 meters above sea level (m.a.s.l) whilst the highest peaks, known as the Taita Hills, are 1600-2200m.a.s.l. The area has two rainy seasons annually, the long rains occur between March and May whilst the short rains are between November and December. Average rainfall in the highlands is 1100-1400mm, but mist and cloud precipitation occurs year-round, while in the lowlands it is 400-600mm (Pelikka et al, 2018). The orographic rainfall pattern means the southeastern slopes of the highland area receive more precipitation than the northwestern slopes. The hottest and driest months in Taita Taveta County are January and February whilst June to October are cooler. The Taita Hills are the northernmost part of the Precrambrian Eastern-Arc mountain range. The hilltops were typically covered with moist evergreen montane forest used for firewood collection and charcoal manufacturing, but this has reduced by about 50% since 1955 and has been largely replaced by plantation forests, mostly pines and eucalyptuses. Outside the forests the hills are extensively cultivated by smallholders growing maize, beans, tomatoes, peas, cassave, mango, cabbage, bananas and potatoes (Pelikka et al, 2018). By contrast, the foothills and lowlands are characterised by Acacia-Commiphora shrublands, dry croplands, cattle grazing, wildlife conservation and sisal farming.

Figure 1: Study Area and smallholder locations.

Data

Various spatial data have been used in this study to prepare a comprehensive map of drought exposure. The required meteorological drought, surface runoff, soil-moisture and soil sand component data were acquired from peer reviewed data sets constructed using remote sensing and extracted for each smallholder grid reference via Google Earth Engine. The remaining hydrological and agricultural drought data were extracted using geographical information system QGIS. These data sets, with their sources and other characteristics are detailed in Table 1.

Table 1: Exposure variables, unit of measure, data source and reference source justifying each variables inclusion.

Meteorological Drought Data

Dry weather patterns create favorable conditions for meteorological drought, which is primarily influenced by the combination of rainfall, temperature and humidity dictating the amount of evapotranspiration, a directly linked driver of meteorological drought (Hoque et al, 2021). All the meteorological raw-data in this report (see Tabel 1) are time-series monthly averages from March 2012 to March 2020. This equates to 96 observations for each variable at each smallholder location. To reduce the 96 observations to a single mean annual value for each smallholder a moving average constructed using a window (or frame) of 12 months has been used to provide 96 “annual” averages which have, in-tern, been averaged to provide a single observation for the whole eight year period. As can be seen in Plot 1 below, which examples using average maximum temperature data, this process has a smoothing effect and reveals more clearly the underlying trend in the data.

Figure 2: Monthly average temperatures (red), moving averages using a 12 month window (blue) and two trend lines, green is linear whilst orange is logarithmic.

## NULL

Hydrological Drought Data

Topography is a key determinate in a landscapes ability to retain water therefore a smallholders situation within that landscape is an important determinate of hydrological drought exposure (Hoque et al, 2021). Elevation and slope gradient at the smallholder location as well as surface runoff, distance to nearest river and distance to nearest waterpan have been used to classify a smallholders exposure to hydrological drought. In general, areas of high elevation and steep slope tend to have higher exposure whilst proximity to water bodies (i.e. rivers and waterpans - including reservoirs) pertains to a smallholders ability to access water in the absence of rain.

Agricultural Drought Data

For smallholder farmers the components of agricultural drought, soil moisture and sand content, are of particularly importance. Areas with low soil moisture content are highly vulnerable to drought and a key determinate of a soils capacity to hold moisture is its texture, which is governed in-large-part by the proportion that is made-up of sand (Hoque et al, 2021).

Models

Principal Component Analysis

PCA is a data dimensionality reduction technique that extracts the latent variables explaining a maximum of the information present in the original data. Given data matrix \(\textbf{X}\) with \(n\) rows representing the observations and \(m\) columns representing the variables, the variance-covariance matrix \(\boldsymbol{\Sigma}\) is \(m \times m\) with the variances in the leading diagonal and the covariances in the off-diagonal. The trace of \(\boldsymbol{\Sigma}\) is the total variance in the data. When \(\textbf{X}\) is standardised the trace of \(\boldsymbol{\Sigma}\) is equal to the number of columns. After and eigendecomposition a standard result in linear algebra states that:

\[\textbf{LVL}^T = \boldsymbol{\Sigma}\]

where \(\textbf{V}\) is a diagonal matrix of eigenvalues and \(\textbf{L}\) is a matrix of eigenvectors.

Component scores are found by post-multiplying the original data values \(\textbf{X}\) by \(\textbf{L}\); the correlation matrix for \(\textbf{XL}\) is an identity matrix. Component scores are thus a linear combination of the original data values (with the highest score the principal component) and given the values of the scores and the loadings, the original data values can be recovered by an inverse transformation.

Geographically Weighted PCA

As with PCA, an assumption of multivariate normality is required for GWPCA. For PCA this assumption must hold at the global level whilst for GWPCA it is local in geographic-space. Thus, for GWPCA a vector of observed variables \(\textbf{x}_i\) at spatial location \(i\) is assumed to have a multivariate normal distribution with mean vector \(\boldsymbol{\mu}\) and variance/covariance matrix \(\Sigma\), such that, \(\textbf{x}_i~N(\boldsymbol{\mu},\boldsymbol{\Sigma})\). As the data used in this report do not exhibit multivariate normality and neither are they of the same scale, the data has be centered (subtracting the mean) and scaled (dividing by the standard deviation). In an attempt to induce a more robust normal distribution on the data a log transformation was applied but to no useful effect.

If spatial location \(i\) has coordinates \((u,v)\), then PCA with local geographical effects involves regarding \(\textbf{x}_i\) as conditional on \(u\) and \(v\), which also means that \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\) become functions of \(u\) and \(v\). Consequently \(\textbf{x}_i|(u,v)~N(\boldsymbol{\mu}(u,v), \boldsymbol{\Sigma}(u,v))\). Therefore, the components \(\boldsymbol{\mu}(u,v)\) and \(\boldsymbol{\Sigma}(u,v)\) are geographically weighted (GW) mean vector and the GW variance/covariance matrix, respectively.

To obtain GW principal components, the decomposition of the GW variance/covariance matrix provides the GW eigenvalues and GW eigenvectors. The product of the \(i\)th row of the data matrix with the GW eigenvectors for the \(i\)th location provides the \(i\)th row of GW component scores. The GW variance/covariance matrix is:

\[\boldsymbol{\Sigma}(u,v) = \textbf{X}^T\textbf{W}(u,v)\textbf{X}\]

where \(\textbf{W}(u,v)\) is a diagonal matrix of geographic weights that has been generated using a bi-squared kernel function, such that:

\[w_{ij} = (1-(\frac{d_{ij}}{r})^2)^2\]

if \(d_{ij} \leq r\); otherwise, \(w_{ij} = 0\) where the bandwidth is the georaphical distance \(r\) and \(d_{ij}\) is the distance between spatial locations of the \(i\)th and \(j\)th rows in the data matrix \(\textbf{x}\).

The GW principal components for the location \((u_i, v_i)\) can be written as:

\[\textbf{LVL}^T|(u_i,v_i) = \boldsymbol{\Sigma}(u_i,v_i)\]

where \(\boldsymbol{\Sigma}(u_i,v_i)\) is the GW variance/covariance matrix for location \((u_i,v_i)\).

GWPCA: Kernel Bandwidth Selection

A major challenge in GWPCA is bandwidth selection. Suppose that \(q\) denotes the number of components retained such that \(\textbf{XL}_q\) is a matrix of component scores for the retained components only. It is possible to show that the best (least squares) rank \(q\) approximation to \(\textbf{X}\) is \(\textbf{XL}_q\textbf{L}_q^T\) and that the residual matrix from this \(\textbf{S}\), is given by \(\textbf{S} = \textbf{X}-\textbf{XL}_q\textbf{L}_q^T\). In effect, through the principal components, we find the minimum of the expression \(\Sigma_{ij}([\textbf{X}]_{ij}-[\textbf{S}]_{ij})^2\) with respect to \(\textbf{S}\) where \(\textbf{S}\) is a rank \(q\) matrix. The variance levels of the components of the matrix \(\textbf{S}\) therefore measure the ‘goodness of fit’ (GOF) of the projected subplanes and as such:

\[GOF_i = \underset{j=q+1}{\overset{j=m}{\Sigma}}s^2_{ij}\]

is the GOF for the \(i\)th observation and \(s_{ij}\) is the \(j\)th component score for observation \(i\). The total GOF for the entire data set is:

\[GOF_i = \underset{i=1}{\overset{1=n}{\Sigma}}GOF_{i}\]

For GWPCA, the local principal components for the ith location represent a similar projection, but with the corresponding loadings defined locally. That is, in this case we find \(S\) to minimise \(\underset{ij}\Sigma w_i([\textbf{X}]_{ij} - [\textbf{S}]_{ij})^2\) where \(w_i\) is a locally defined weight for location \(i\). The GOF statistic is defined in an analogous fashion as for global PCA; with the exception that in each locality, \(S\) is defined using local weights, as above. The GOF statistic provides the means of finding an optimal bandwidth for GWPCA by using either a leave-one-out method or a holdback sample when computing the terms of the statistic (Harris et al, 2011).

Results

Descriptive

Figure 3: Meteorological drought variable scores at the location of each smallholder farm

The plots in Figures 4, 6 and 8 show that the assumption of multivariate normality has not been upheld. This is not ideal but neither does it totally undermine the value of a principal component analysis. More on this is provided in the Discussion section.

Figure 4: Frequency distribution for each meteorological drought variable pre- and post-transformation

Figure 5: Hydrological drought variable scores at the location of each smallholder farm

Figure 6: Frequency distribution for each hydrological drought variable pre- and post-transformation

Figure 7:Agricultural drought variable scores at the location of each smallholder farm

Figure 8: Frequency distribution for each agricultural drought variable pre- and post-transformation

Principal Component Analysis

Decomposition of the data set as part of the global PCA reveals three eigenvalues that have a score greater than one and cumulatively they describe almost 80% of total variance. Three PCs have therefore been retained for further analysis and interpretation.

Figure 9 Global PCA eignevalues

Plotting the three PCs and colouring each point by the name of the district it is located in reveals definite, although overlapping, clustering patterns.

Figure 10 Principal Components 1 and 2 coloured by location

PC1 describes 52.4% of total variance in the data set. Primarily composed of the variance from water and temperature variables, PC1 is akin to a composite indicator of exposure to meteorological drought variables plus soil moisture and run-off.

PC2, on-the-other-hand, explains 15.6% of the total variance in the data set and is dominated by variables that describe a smallholders location within the landscape i.e. altitude, slope gradient, distance to nearest river and proportion of soil composed of sand.

PC3 only describes 8.4% of variance in the data set and is primarily the variance from variable ‘months with zero precipitation’.

Figure 11 Contributing Variables to PC1, PC2 and PC3

Geographically Weighted PCA

Adaptive Bandwidth

GWPCA is used to generate spatial insights about drought exposure, that is, identify smallholders that have an unusual multi-way combination of drought variables in relation to their immediate geographical neighbors. Applying the bandwidth selection criteria, discussed in the methodology, suggests an adaptive bandwidth of 123 data points with data influence (distance decaying weights) corresponding to a bisquared kernel and retention of three PC’s, as guided by the global PCA.

The initital two PCs at each smallholder location explains greater than 90% of the total variance in the data set localised to that geographic neigbourhood.

The primary contributing variable for PC1 and PC2 are displayed in Figure 12. There is almost no spatial variance in either primary variable. Instead, altitude and sand consentration explain the greatest amount of variance in PC1 and PC2 for each localised version of the data set. This seems reasonable given the course granularity of meteorological variables in particular, which will exhibit higher covariance the more local the geographic scale. The primary variable of

Figure 12 Proportion of variance explained by PC1 and the primary and secondary contributing variables

Discrepancy values in Figures 13 are localised GOF values, which are the measure of total discrepancy between true data values and those reconstruced using PCs. The discrepancy score is the sum of the individual discrepancies. A large individual discrepancy associated with an observation suggests it is very different to observations near to it.

Figure 13 PC Discrepancies at each Location Looking at Figure 14 it looks like observations with the largest discrepancies to their neighbors occur in the north and south of the study site.

Figure 14 Location and Size of Outlier Discrepancies

Fixed Bandwidth

Performing GWPCA again but this time with a fixed kernel bandwidth of 11.74km, returns similar results.

Again the intital two PCs at each smallholder location explain greater than 90% of the total variance in the data set once it has been localised to that geographic location.

Despite the change in kernel bandwidth there is still no spatial differentiation. Again, altitude and soil texture explain the greatest amount of variance in PC1 and PC2 respectively for the localised versions of the the data set.

Figure 15 Proportion of variance explained by PC1 and the primary and secondary contributing variables

Outlier discrepancy values in Figures 16 are much larger than Figure 13, demonstrating the power of an adaptive kernel but also that when looking for general patterns both kernels are effective.

Figure 16 PC Discrepancies at each Location Whilst the amount of discrepancy has increased the geogrpahic pattern of the observations with the greatest discrepancy to their neighbours remains similar i.e. highest in the north and south of the study site.

Figure 17 Location and Size of Outlier Discrepancies ### Fixed Kernel with Double Bi-squared Kernel

Altitude is not a particularly useful variable for policy makers. Whilst smallholders at significantly different altitudes will be significantly different there is not a great deal policy makers can do to address this. It seems logical, therefore, to remove altitude as a variable in the data set and re-run the GWPCA using a double bi-squared kernel with a fixed distance decay bandwidth of 24km and an altitude distance decay of 200meters.

Despite the radical change in kernel and bandwidth parameters there is still no spatial differentiation. Sand is still the primary contributing variable to PC1 in each localised version of the the data set. Soil moisture contributes the second highest amount of variance to PC1.

Figure 18 Primary and secondary contributing variables to PC1

Adaptive Kernel without Altitude

For completeness, GWPCA is perfomred again this time with a single kernel using an adaptive bandwidth of 330 data points and the altitude variable excluded from the data set.

## Adaptive bandwidth(number of nearest neighbours): 205 CV score: 61070162 
## Adaptive bandwidth(number of nearest neighbours): 128 CV score: 62294124 
## Adaptive bandwidth(number of nearest neighbours): 254 CV score: 59451341 
## Adaptive bandwidth(number of nearest neighbours): 283 CV score: 59139311 
## Adaptive bandwidth(number of nearest neighbours): 302 CV score: 59080816 
## Adaptive bandwidth(number of nearest neighbours): 313 CV score: 59064819 
## Adaptive bandwidth(number of nearest neighbours): 320 CV score: 59059102 
## Adaptive bandwidth(number of nearest neighbours): 324 CV score: 59058344 
## Adaptive bandwidth(number of nearest neighbours): 327 CV score: 59057109 
## Adaptive bandwidth(number of nearest neighbours): 328 CV score: 59056302 
## Adaptive bandwidth(number of nearest neighbours): 330 CV score: 59054321 
## Adaptive bandwidth(number of nearest neighbours): 330 CV score: 59054321

The intital PC at each smallholder location explains greater than 88% of the total variance in the data set localised to that geographic location.

The primary or winning variable for PC1 and PC2 are displayed in Figure 19. Again, there is no spatial variance. Instead, soil sand content and average precipitation explains the greatest amount of variance in the respective components in each localised version of the the data set.

Figure 19 Proportion of variance explained by PC1 and PC2 and the primary contributing variables

Figure 20 Discrepancy Outliers

Figure 21 Locations of Greatest Discrepancy

Discussion

The global PCA model defined three PCs that combined, capture over 80% of the variance previously shared by 13 variables of drought Exposure. Results suggest that at the landscape scale over 52.4% of exposure to drought is explained by variables measuring meteorological drought as well as soil moisture and run-off - PC1. Whilst PC2 is dominated by topography and soil variables that explain a further 15.6% of exposure to drought.

The GWPCA model zooms in to provide a more local interpretation and reveals altitude as the primary driver of variance in exposure to drought. For meteorological variables, localisation has the effect of limiting the amount of variance that those variables can exhibit due to the presence of spatial dependence, which should not be ignored when using PCA for geographically distributed data (Cartone & Postiglione, 2020). Hydrological and agricultural drought variables on-the-other-hand are naturally more fine grained and can exhibit much greater local variance. It seems logical, therefore, that non-meteorological factors can explain more of the local variances in drought exposure.

Hydrological and agricultural variables are also arguably more feasibly manipulated by policy makers and so it seems useful to identify which factors may be a local driver of drought exposure. When altitude is used as a weighting component within a GWPCA model, rather than a variable within the data set, local exposure to drought is primarily driven by soil texture - the amount of sand in the soil - followed by soil moisture levels, which is arguably a different side of the same coin. Based on these results we can say that, ceteris paribus, smallholders at low altitude with soils composed of a high sand content have the highest exposure to drought in Taita-Taveta. In this instance, smallholders in Werugha have the highest elevation and lowest soil sand content (so lowest exposure to drought) whilst smallholders in Nyolo have soils with the highest sand content and are located at medium to low elevations- suggesting relatively high exposure to drought.

Before drawing firm conclusions the significant limitations to the data used in this report must be recognised. Both PCA and GWPCA are excellent tools to identify uncorrelated components from a data set of correlated variables. However, components are only guaranteed to be independent as well as uncorrelated when the variables collectively exhibit multivariate normality. When this does not hold, components are still guaranteed to be uncorrelated but not necessarily independent, implying information included in one PC might also be included in another PC. The effect of non-independence is that component loadings will not carry a unique effect in a given dimension (Kim & Kim, 2012) and cause double counting of a variables variance.

When performing GWPCA the choice of an appropriate spatial scale for analysis (bandwidth) is incredibly important. While analysing different spatial levels, the effects of modifiable areal unit problem (MAUP) on statistical analysis can be severe, sometimes generating puzzling results that lead to interpretation difficulties (Cartone & Postiglione, 2020). In this study, manipulating the size of the kernel both in latitude-longitude and in altitude had limited effect on the proportion of variance explained by the principal components. Similarly, varying kernel attributes reinforced that altitude and sand are the primary drivers of local variance in exposure to drought.

Future work

Motivation for this study stemmed from a desire to better understand the landscape of Taita-Taveta in relation to drought exposure, use a data-led approach and generate ideas for further research development. Starting with improvements that could be made to this research future research options are then explored.

The area in clearest need of further development is the quality of data collection and variable composition. There are a huge number of high quality remotely sensed and publicly available data sets. Indeed, one such data set is the European Space Agency AgERA5, which provides daily indicators specifically useful to agriculture, that was not used in this report but should be utilised in future research. Using a wider range of data sources would facilitate a wider range of variables that can inform Exposure to drought in much more detail. For example, a number of literature sources incorporate Land Use Land Change (LULC); Plant Available Water Capacity (PAWC); Soil Depth; and Wind Speed among others.

Rather than using each variable in its raw form, or after a basic reductive technique, to form the input data set it would be much better if each variable included a measure of quality or seasonality. For example, the hydrological variable ‘distance to river’ could be vastly improved if the seasonality of the river flow was incorporated in some way. Similarly, rainfall and temperature patterns at crucial times in the cropping or phenology calendar may benefit from having greater weight than rainfall at a less significant time of year.

Further improvement to the data could be made in the method used to reduce raw data to a single average figure. Currently a moving average is used to reduce eight years of meteorological data to one number for each smallholder but this does not take full advantage of the entire time series. Application of bootstrapping is likely to provide a more accurate mean than the moving average can provide whilst also wasting less of the data i.e. observations that do not have 11 months of data ahead of them could still be used to create as many “annual” averages as observations that do not.

The lack of multivariate normality in the data set also needs to be addressed. Currently, the raw-data is extracted from the raster pixel within which the GPS coordinates, of the smallholder, are located. When smallholders are close together or raster pixels are large this results in the same value being attributed to multiple observations and less variation in the data set. This could be addressed using a pixel sized buffer to average the values around each observation, creating a unique value for that smallholder, in-turn creating more variance and a more realistic data set. Increasing the spread of observation values should improve the chance of a multivariate normal data set or at least make data transformations to multivariate normal, easier.

Everything proposed so far is geared towards reducing inputs and outputs to a static, averaged moment in time, which is a clear limitation to capturing real-world experiences that more techniques such as convolution kernels can overcome. Whilst some data used to assess drought exposure is static (e.g. slope gradient, elevation, etc) a great deal of data is time-series, and this applies to a range of other common variables used in agricultural research. Convolution Neural Networks (CNNs) and random convolution kernels are at the cutting edge of techniques simultaneously capturing shape, frequency and variance of time-series data with one single mechanism that can be used as inputs to other models (Dempster et al, 2020).

Ultimately, the goal is for analysis and results to be for the entire landscape rather than for each individual smallholder. Moving to a landscape sized/shaped raster (a spatially located matrix) approach to data capture and analysis, whereby raw data and results are stored and returned to each individual cell/pixel (appropriately sized at say, 30 x 30 meters) of the raster, would provide an appropriate structure. Indeed, a composite indicator of drought Exposure, constructed, for example, via a Gaussian process to provide probability distributions, could be stored using such a raster structure to providing landscape scale insight. Whilst a useful asset in itself it could also be used as an input for a wide range of other analysis including being one level of a multi-level analysis.

Conclusion

Increasingly erratic weather patterns induced by climate change pose a significant challenge to rain-fed agricultural systems, particularly those situated on arid and semi-arid lands. Consequently, the on-set of drought has become a significant concern for food-security in these areas motivating a great deal of research to identify variables measuring exposure to drought. PCA has shown meteorological factors to be the rather obvious global-drivers of Exposure to drought but at the local-level GWPCA identifies altitude and soil sand content as key variables.

There are, however, limitations to this study most important of which is that the input data lacks multivariate normality. Collecting a great deal more data for each observation coupled with more sophisticated data transformation techniques could improve the data but it is not guaranteed. Furthermore, the way the time-series data has been processed is very reductive and a lot of information has been lost. Similarly, seasonality of rivers and phenology of flora has not been incorporated into the appropriate variables. There is, therefore, a great deal of scope for model improvement through application of convolution kernels, which can better capture trends in time-series data.

Bibliography

Catarina Alonso, Celia M. Gouveia, Ana Russo, Patricia Pascoa (2019) Crops’ exposure, sensitivity and adaptive capacity to drought occurrence, Natural Hazard and Earth Systems Science, 19(2727-2743)

Alfredo Cartone, Paolo Postiglione (2020) Principal component analysis for geographical data: the role of spatial effects in the definition of composite indicators, Spatial Economic Analysis, 16(126-147)

Angus Dempster, Francois Petitjean. Geoffrey I. Webb (2020) ROCKET: exceptionally fast and accurate time series classification using random convolutional kernels, Data Mining and Knowledge Discovery 34(1454-1495)

Urska Demsar, Paul Harris, Chris Brunsdon, A. Stewart Fotheringham, Sean McLoone (2013) Principal Component Analysis on Spatial Data: An Overview, Annals of the Association of American Geographers, 103:1, 106-128

Paul Harris, Chris Brunsdon, Martin Charlton (2011) Geographically weighted principal components analysis, International Journal of Geographical Information Science, 25:10, 1717-1736

Muhammad Hoque, Biswajeet Pradhan, Naser Ahmed, Abdullah Alamri (2021) Drought Vulnerability Assessment Using Geospatial Techniques in Southern Queensland, Australia, Sensors, 21(6896)

Donghoh Kim, Se-Kang Kim (2012) Comparing patterns of component loadings: Principal Component Analysis (PCA) versus Independent Component Analysis (ICA) in analyzing multivariate non-normal data, Behaviour Research Methods, 44, 1239-1243

S. Mpandeli, L. Nhamo, M. Moeletsi, T. Masupha, J. Magidi, K. Tshikolomo, S. Liphadzi, D. Naidoo, T. Mabhaudhi (2019) Assessing climate change adaptive capacity at local scale using observed and remotely sensed data, Weather and Climate Extremes 26, 100240

Paschal Mugabe, Fiona Mwaniki, Kane Mamary, H.M. Ngibuini (2019) Chapter 14 - An assessment of drought monitoring and early warning systems in Tanzania, Kenya and Mali, Current Directions in Water Scarcity Research, 2(211-219)

Narumasa Tsutsumida, Paul Harris, Alexis Comber (2017) The Application of a Geographically Weighted Principal Component Analysis for Exploring Tenty-three Years of Goat Population Change across Mongolia, Annals of the American Association of Geographers, 107:5, 1060-1074

Kiumars Zarafshani, Lida Sharafi, Hossein Azadi, Steven van Passel (2016) Vulnerability Assessment Models to Drought: Toward a Conceptual Framework, Sustainability, 8(588)

IGAD Climate Prediction & Applications Centre (ICPAC) (2019) Policy Brief: Climate Trends over Taita-Taveta County