Date: April 11, 2017

A. SYNOPSIS

This report is the end result of a sequences of preprocessing and analysis of Banana Land tenure survey, an initiative of the AVANSE banana value chain departement. A tailor-made data entry interface that reflects the field form has beeen designed by the Data management team to enter data.. We have discussed with enumerators to raise discrepencies in data collection, find out missing data and correct messy data. We have stored those data in a database server in the cloud and backed up data localy on daily basis. We have identified that irrigation is the major constraint among banana farmers.

B. DATA ENTRY AND PROCESSING

The data for this report come from Amazon Web Service which is used to store a relational Database (MySQL Server). This datasbase is designed based on a data model conceived in coordination with banana value chain departeemnt. Data management team has been exclusively used to enter data through a user friendly data entry interface. Two validation processes have been set up:

  1. validation rule using regular expression at data entry level
  2. validation cross-checking by the Management Information System (MIS) Specialist
We have connected to the database to read datasets using parameters below: Show me parameters ▼


The database is divided into six main tables with one to many relationships between the first table and the others listed below:

  1. tbl_land: which deals with Single Response Categorical Variables (SRCV) e.g Land tenure, Sol type
  2. tbl_esp_domin: which deals only with Espece dominantes par ordre d’importance Multiple Response Categorical Variable (MRCV)
  3. tbl_contr_prio: which deals only with Contraintes prioritaires Multiple Response Categorical Variable (MRCV)
  4. tbl_avez_fait: which deals only with Qu’avez-vous fait Multiple Response Categorical Variable (MRCV)
  5. tbl_face_contr: which deals only with Comment avez-vous fait face a cette contrainte Multiple Response Categorical Variable (MRCV)
  6. tbl_cult_princ: which deals only with Deux principales cultures de la zone Multiple Response Categorical Variable (MRCV)

B. ANALYSIS

1. Single Response Categorical Variable(SRCV) analysis

1.1 Land tenure versus Priority constraint

In this exercise, we are going to test if two categorical variables Land tenure versus Priority constraint have a significant correlation between them. The relationship table is illustrated below. It shows the number of case an event (Priority constraint) is observed among farmers grouped by land tenure. Show me R code ▼


credit agricole eau d’irrigation main d’oeuvre semences de qualite somme1
fermier 0 10 0 0 10
gerant 0 8 0 0 8
heritier en indivision 0 9 0 0 9
metayer 0 16 0 0 16
proprietaire 2 193 4 1 200
somme2 2 236 4 1 243
The table shows Land tenure versus Priority constraint. It says that priority constraint is encountered among 157 tenant farmers. Though this number comes from an exploratory output, it shows a signifcant different between the importance of priority among tenant and others. We cannot ascertain without a statistic test there’s a signicant different between means of tenant and others, however we can affirm that it seems that there’s a huge difference betwen the frequency of tenants priority and other farmers. Show me R code ▼


credit agricole eau d’irrigation main d’oeuvre semences de qualite somme1
fermier 0.000 0.041 0.000 0.000 0.041
gerant 0.000 0.033 0.000 0.000 0.033
heritier en indivision 0.000 0.037 0.000 0.000 0.037
metayer 0.000 0.066 0.000 0.000 0.066
proprietaire 0.008 0.794 0.016 0.004 0.823
somme2 0.008 0.971 0.016 0.004 1.000

The Chi-square test resul t shows the p-valu e greater than 0.05 which indicates a poor correlation be tween Land tenure and Priority constraint. Group testing analysis will not be performed in this section.
<a id=“aTag2” href=“javas cript:toggleAndCha ngeText2();“>
Show me R code ▼
<div id=“divToToggle2” st yle=“display: none ;“>
    chisq.test(tbl_land$statut_fonc, tbl_land$contr_prio)
## 
##  Pearson's Chi-squared test
## 
## data:  tbl_land$statut_fonc and tbl_land$contr_prio
## X-squared = 1.5496, df = 12, p-value = 0.9998


We can reshape the percentage table by considering Land tenure as Id variable and credit agricole, eau d’irigation, main d’oeuvre, semences de qualite, as measure variables. It turns out to have a new table with value as outcome and variable, tenure, semences de qualite, as predictors Show me R code ▼ Show me table ▼


The exploratory analysis of the percent value shows that it seems that there’s a significant difference in mean betwen water access constraint and others. For this exercice group testing analysis will not be performed.

1.2 Land tenure versus System of culture.

In this exercise, we are going to test if two categorical variables Land tenure versus system of culture have a significant correlation between them. The relationship table is illustrated below. It shows the number of case an event (System of culture) is observed among farmers grouped by land tenure.

agroforesterie association monoculture somme1
fermier 1 1 8 0 10
gerant 0 0 5 3 8
heritier en indivision 0 0 9 0 9
metayer 1 0 15 0 16
proprietaire 58 9 111 21 199
somme2 60 10 148 24 242

The table shows Land tenure versus System of culture. It says that System of culture is encountered among 94 tenant farmers. Though this number comes from an exploratory output, it shows a signifcant different between the importance of priority among tenant and others.

agroforesterie association monoculture somme1
fermier 0.004 0.004 0.033 0.000 0.041
gerant 0.000 0.000 0.021 0.012 0.033
heritier en indivision 0.000 0.000 0.037 0.000 0.037
metayer 0.004 0.000 0.062 0.000 0.066
proprietaire 0.240 0.037 0.459 0.087 0.822
somme2 0.248 0.041 0.612 0.099 1.000

The Chi-square test result shows the p-value greater than 0.05 which denotes a poor correlation between Land tenure and System of culture. Group testing analysis will not be performed in this section.

## 
##  Pearson's Chi-squared test
## 
## data:  tbl_land$statut_fonc and tbl_land$syst_culture
## X-squared = 27.32, df = 12, p-value = 0.006947

We can reshape the percentage table by considering Land tenure as Id variable and agroforesterie, association, monoculture, as measure variables. It turns out to have a new table with value as outcome and variable, tenure, as predictors Show me table ▼


The exploratory analysis of the percent value shows that it seems that there’s a significant difference in mean betwen association system and others. For this exercice group testing analysis will not be performed.

1.3 Land tenure versus degree of involvement

In this exercise, we are going to test if two categorical variables Land tenure versus degree of involvement have a significant correlation between them. The relationship table is illustrated below. It shows the number of case an event (degree of involvement) is observed among farmers grouped by land tenure.

10% 100% 2% 25% 30% 50% 75% somme1
fermier 0 0 0 5 1 2 2 10
gerant 1 0 0 6 0 1 0 8
heritier en indivision 0 0 1 7 0 0 1 9
metayer 0 1 0 7 0 4 4 16
proprietaire 3 2 0 124 0 44 27 200
somme2 4 3 1 149 1 51 34 243

The table shows Land tenure versus Degree of involvement. It says that 25% level of engagment is encountered among 120 tenant farmers. Though this number comes from an exploratory output, it shows a signifcant different between the level of involvement between tenant and others.

10% 100% 2% 25% 30% 50% 75% somme1
fermier 0.000 0.000 0.000 0.021 0.004 0.008 0.008 0.041
gerant 0.004 0.000 0.000 0.025 0.000 0.004 0.000 0.033
heritier en indivision 0.000 0.000 0.004 0.029 0.000 0.000 0.004 0.037
metayer 0.000 0.004 0.000 0.029 0.000 0.016 0.016 0.066
proprietaire 0.012 0.008 0.000 0.510 0.000 0.181 0.111 0.823
somme2 0.016 0.012 0.004 0.613 0.004 0.210 0.140 1.000

The Chi-square test result shows the p-value greater than 0.05 which denotes a poor correlation between Land tenure and Degree of involvement. Group testing analysis will not be performed in this section.

## 
##  Pearson's Chi-squared test
## 
## data:  tbl_land$statut_fonc and tbl_land$niv_eng
## X-squared = 66.221, df = 24, p-value = 8.045e-06

We can reshape the the percentage table by considering Land tenure as Id variable and agroforesterie, association, monoculture, as measure variables. It turns out to have a new table with percent as outcome and Level_involvement, tenure, as predictors Show me R code ▼


The exploratory analysis of the percent value shows that it seems that there’s a significant difference in mean betwen association system and others. For this exercice group testing analysis will not be performed.

1.4 Land tenure versus work with farmer

In this exercise, we are going to test if two categorical variables Land tenure versus work with farmer have a significant correlation between them. The relationship table is illustrated below. It shows the number of case an event (work with farmer) is observed among farmers grouped by land tenure.

non oui somme1
fermier 5 5 10
gerant 7 1 8
heritier en indivision 5 4 9
metayer 4 12 16
proprietaire 134 66 200
somme2 155 88 243

The table shows Land tenure versus work with farmer. It says that 155 tenant farmers disagree to work with other farmer. 88 farmers agree to work with their buddies. This is an exploratory analysis that simply says that it seems that more farmers are not on agreement to support others. A statistic test will be performed to verify this hypothesis.

non oui somme1
fermier 0.021 0.021 0.041
gerant 0.029 0.004 0.033
heritier en indivision 0.021 0.016 0.037
metayer 0.016 0.049 0.066
proprietaire 0.551 0.272 0.823
somme2 0.638 0.362 1.000

The Chi-square test result shows the p-value greater than 0.05 which denotes a poor correlation between Land tenure and work with farme.

## 
##  Pearson's Chi-squared test
## 
## data:  tbl_land$statut_fonc and tbl_land$niv_eng
## X-squared = 66.221, df = 24, p-value = 8.045e-06

We can reshape the the frequency table by considering Land tenure as Id variable and oui, non, as measure variables. It turns out to have a new table with frequency as outcome and responses, tenure, as predictors Show me table ▼


1.4.1 T-test

As this step we are going to fit a simple linear regression model with frequency as outcome and responses as predictor

Model 1

fit1 <- lm(frequency ~ factor(responses), data=tbMelt)
summary(fit1)$coef
##                      Estimate Std. Error    t value  Pr(>|t|)
## (Intercept)              31.0   20.16135  1.5375955 0.1627044
## factor(responses)oui    -13.4   28.51245 -0.4699701 0.6509212

First of all we include only responses variable and include the intercept. In other word, we consider non as a linear combination of oui. Notice that The t-test for \(H_0: \beta_{response} = 0\) versus \(H_a: \beta_{response} \neq 0\) has a P-value equal to 0.65 > 0.05. The estimate is -13.4. The test statistic is interestingly significative. Notice that we have only factor(responses)oui, the positive response in the table. It is because R has elected to choose non: the negative response as the reference category. The number -13.4 is the estimated decrease in frequency comparing oui response to non response.

Confidence interval

alpha <- 0.05
n <- nrow(tbMelt)
pe <- coef(summary(fit1))["factor(responses)oui", "Estimate"]
se <- coef(summary(fit1))["factor(responses)oui", "Std. Error"]
tstat <- qt(1 - alpha/2, n - 2)  # n - 2 statistic test for model with intercept and slope
pe + c(-1, 1) * (se * tstat)
## [1] -79.14984  52.34984

If we were willing to choose the model 1 as our best model, then the confidence interval for the -13.4 frequency decrease difference would be -79.14984 and 52.34984

Model 2

fit2 <- lm(frequency ~ factor(responses) - 1, data=tbMelt)
summary(fit2)$coef
##                      Estimate Std. Error   t value  Pr(>|t|)
## factor(responses)non     31.0   20.16135 1.5375955 0.1627044
## factor(responses)oui     17.6   20.16135 0.8729575 0.4081234

Here we omit the intercept, then the model includes both oui and non responses. non is not a linear combination of oui, there’s 2 means in the dataset. The expected value of the outcome should be the mean for oui or non responses. As we can see in the table oui is about 17.6 and non is about 31.0 and it is clearly illustrated in the t-test below.

t.test(frequency ~ factor(responses), data=tbMelt)$estimate
## mean in group non mean in group oui 
##              31.0              17.6


2. Multiple Response Categorical Variable(MRCV) analysis.

Here we retrieve MRCV data from the realtional database based on the one to many relationship using a Structure Query Language(SQL). Show me queries ▼

2.1 Land tenure versus Farmers mutual aid

In this exercise, we are going to test if two categorical variables Land tenure versus Farmers mutual aid have a significant correlation between them. The relationship table is illustrated below. It shows the number of response obtained in a question among farmers grouped by land tenure. Show me R code ▼


achat et ou location de pompe d’irrigation association de producteur association de producteurs aucun drainage forage konbit lutte integree na participation/ affiliation au ffs somme1
fermier 2 0 1 1 0 1 2 0 1 1 9
gerant 1 0 0 0 0 0 0 0 0 0 1
heritier en indivision 1 0 1 0 0 1 1 0 0 0 4
metayer 3 0 3 2 0 0 2 0 1 0 11
proprietaire 14 1 15 1 1 2 21 3 0 4 62
somme2 21 1 20 4 1 4 26 3 2 5 87
The table shows Land tenure versus Farmers mutual aid. It says that konbit response has a frequency of 21 among tenant farmers. We can also see that the top responses: konbit, achat et ou location de pompe d’irrigation, association de producteurs over others Show me R code ▼


achat et ou location de pompe d’irrigation association de producteur association de producteurs aucun drainage forage konbit lutte integree na participation/ affiliation au ffs somme1
fermier 0.023 0.000 0.011 0.011 0.000 0.011 0.023 0.000 0.011 0.011 0.103
gerant 0.011 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.011
heritier en indivision 0.011 0.000 0.011 0.000 0.000 0.011 0.011 0.000 0.000 0.000 0.046
metayer 0.034 0.000 0.034 0.023 0.000 0.000 0.023 0.000 0.011 0.000 0.126
proprietaire 0.161 0.011 0.172 0.011 0.011 0.023 0.241 0.034 0.000 0.046 0.713
somme2 0.241 0.011 0.230 0.046 0.011 0.046 0.299 0.034 0.023 0.057 1.000
The Chi-square test result shows the p-value of 0.8632 greater than 0.05 which indicates a poor correlation between Land tenure and Farmers mutual aid. Group testing analysis will not be performed in this section. Show me R code ▼
## 
##  Pearson's Chi-squared test
## 
## data:  qry_avez_fait$statut_fonc and qry_avez_fait$avez_fait
## X-squared = 26.925, df = 36, p-value = 0.8632


We can reshape the percentage table by considering Land tenure as Id variable and achat et ou location de pompe d’irrigation, association de producteurs, drainage forage, konbit, lutte integree, participation/affiliation au ffs as measure variables. It turns out to have a new table with value as outcome and variable, tenure, as predictors Show me R code ▼ Show me table ▼


The exploratory analysis of the percent value shows that it seems that there’s a significant difference in mean betwen water access constraint and others. For this exercice group testing analysis will not be performed.

2.2 Land tenure versus Constraints of production

In this exercise, we are going to test if two categorical variables Land tenure versus Constraints of production have a significant correlation between them. The relationship table is illustrated below. It shows the number of response obtained in a question among farmers grouped by land tenure. Show me R code ▼


acces au marche breche de la riviere credit agricole drainage eau d’irrigation formation inondation main d’oeuvre maladies materiel de labourage na rongeurs et insectes semences de qualite vol de recoltes somme1
fermier 2 0 7 2 10 2 2 2 6 1 0 3 2 3 42
gerant 1 0 7 1 8 3 0 0 2 0 0 1 2 3 28
heritier en indivision 1 0 6 0 9 3 3 0 3 0 0 2 2 2 31
metayer 6 1 13 6 16 4 0 1 11 0 0 10 4 8 80
proprietaire 45 2 146 44 202 107 5 34 49 4 1 42 65 47 793
somme2 55 3 179 53 245 119 10 37 71 5 1 58 75 63 974
The table shows Land tenure versus Constraints of production. It says that eau d’irrigation response has a frequency of 202 among tenant farmers. We can also see that the top responses: eau d’irrigation, credit agricole, formation over others Show me R code ▼


acces au marche breche de la riviere credit agricole drainage eau d’irrigation formation inondation main d’oeuvre maladies materiel de labourage na rongeurs et insectes semences de qualite vol de recoltes somme1
fermier 0.002 0.000 0.007 0.002 0.010 0.002 0.002 0.002 0.006 0.001 0.000 0.003 0.002 0.003 0.043
gerant 0.001 0.000 0.007 0.001 0.008 0.003 0.000 0.000 0.002 0.000 0.000 0.001 0.002 0.003 0.029
heritier en indivision 0.001 0.000 0.006 0.000 0.009 0.003 0.003 0.000 0.003 0.000 0.000 0.002 0.002 0.002 0.032
metayer 0.006 0.001 0.013 0.006 0.016 0.004 0.000 0.001 0.011 0.000 0.000 0.010 0.004 0.008 0.082
proprietaire 0.046 0.002 0.150 0.045 0.207 0.110 0.005 0.035 0.050 0.004 0.001 0.043 0.067 0.048 0.814
somme2 0.056 0.003 0.184 0.054 0.252 0.122 0.010 0.038 0.073 0.005 0.001 0.060 0.077 0.065 1.000
The Chi-square test result shows the p-value of 0.02559 less than 0.05 which indicates a string correlation between Land tenure and Farmers mutual aid. Group testing analysis between eau d’irrigation and credit agricle will be performed. Show me R code ▼
## 
##  Pearson's Chi-squared test
## 
## data:  qry_contr_prod$statut_fonc and qry_contr_prod$contr_prod
## X-squared = 73.682, df = 52, p-value = 0.02559


We can reshape the frequency table by considering Land tenure as Id variable and eau d’irrigation, credit agricole, formation, etc., as measure variables. It turns out to have a new table with value as outcome and variable, tenure, as predictors Show me R code ▼ Show me table ▼


2.2.1 T-test

As this step we are going to fit a simple linear regression model with frequency as outcome and Constraints of production as predictor. We subset only eau d’irrigation, credit agricole responses to perfrom group-testing analysis.

Model

contpd <- subset(tbMelt, cons_prod == c("eau d'irrigation", "credit agricole"))
fit1 <- lm(frequency ~ factor(cons_prod) - 1, data=contpd)
summary(fit1)$coef
##                                   Estimate Std. Error   t value  Pr(>|t|)
## factor(cons_prod)credit agricole  10.00000   64.19069 0.1557858 0.8860942
## factor(cons_prod)eau d'irrigation 73.66667   52.41148 1.4055446 0.2545209

Here we omit the intercept, then the model includes both eau d’irrigation and credit agricole responses. eau d’irrigation is not a linear combination of credit agricole, there’s 2 means in the dataset. The expected value of the outcome should be the mean for eau d’irrigation or credit agricole responses. As we can see in the table eau d’irrigation is about 17.6 and credit agricole is about 31.0 and it is clearly illustrated in the t-test below.

t.test(frequency ~ factor(cons_prod) - 1, data=contpd)
## 
##  Welch Two Sample t-test
## 
## data:  frequency by factor(cons_prod)
## t = -0.99112, df = 2.0087, p-value = 0.4257
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -338.9095  211.5762
## sample estimates:
##  mean in group credit agricole mean in group eau d'irrigation 
##                       10.00000                       73.66667

2.3 Land tenure versus Most frequent species

In this exercise, we are going to test if two categorical variables Land tenure versus most_frequent_species have a significant correlation between them. The relationship table is illustrated below. It shows the number of response obtained in a question among farmers grouped by land tenure. Show me R code ▼


arachide arbre a pain arbre veritable avocat banane bois de chene cacaoyer canne a sucre canne sucre gombo haricot haricot noir igname mais malanga manguier manioc palmiste papayer patate douce piment pois pois congo pois inconnu pois negre taro somme1
fermier 0 0 1 1 8 0 0 0 0 0 4 0 1 3 0 1 3 0 1 0 0 1 0 0 0 2 26
gerant 0 0 0 0 6 0 0 0 0 0 3 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 13
heritier en indivision 1 1 0 0 6 0 1 0 0 0 7 0 0 3 0 0 2 0 0 0 0 1 1 0 0 0 23
metayer 0 0 0 0 15 0 0 0 0 0 10 0 1 3 0 1 2 0 1 0 0 1 0 0 0 0 34
proprietaire 1 4 2 1 101 1 1 7 4 1 54 3 7 39 3 0 46 1 1 11 1 8 6 1 7 3 314
somme2 2 5 3 2 136 1 2 7 4 1 78 3 10 49 3 2 54 1 4 11 1 11 7 1 7 5 410
The table shows Land tenure versus Most frequent species. It says that banana response has a frequency of 101 among tenant farmers. We can also see that the top responses: banana, haricot, manioc over others Show me R code ▼


arachide arbre a pain arbre veritable avocat banane bois de chene cacaoyer canne a sucre canne sucre gombo haricot haricot noir igname mais malanga manguier manioc palmiste papayer patate douce piment pois pois congo pois inconnu pois negre taro somme1
fermier 0.000 0.000 0.002 0.002 0.020 0.000 0.000 0.000 0.00 0.000 0.010 0.000 0.002 0.007 0.000 0.002 0.007 0.000 0.002 0.000 0.000 0.002 0.000 0.000 0.000 0.005 0.063
gerant 0.000 0.000 0.000 0.000 0.015 0.000 0.000 0.000 0.00 0.000 0.007 0.000 0.002 0.002 0.000 0.000 0.002 0.000 0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.032
heritier en indivision 0.002 0.002 0.000 0.000 0.015 0.000 0.002 0.000 0.00 0.000 0.017 0.000 0.000 0.007 0.000 0.000 0.005 0.000 0.000 0.000 0.000 0.002 0.002 0.000 0.000 0.000 0.056
metayer 0.000 0.000 0.000 0.000 0.037 0.000 0.000 0.000 0.00 0.000 0.024 0.000 0.002 0.007 0.000 0.002 0.005 0.000 0.002 0.000 0.000 0.002 0.000 0.000 0.000 0.000 0.083
proprietaire 0.002 0.010 0.005 0.002 0.246 0.002 0.002 0.017 0.01 0.002 0.132 0.007 0.017 0.095 0.007 0.000 0.112 0.002 0.002 0.027 0.002 0.020 0.015 0.002 0.017 0.007 0.766
somme2 0.005 0.012 0.007 0.005 0.332 0.002 0.005 0.017 0.01 0.002 0.190 0.007 0.024 0.120 0.007 0.005 0.132 0.002 0.010 0.027 0.002 0.027 0.017 0.002 0.017 0.012 1.000
The Chi-square test result shows the p-value of 2.2e-16 less than 0.05 which indicates a string correlation between Land tenure and Most frequent species. Group testing analysis between banana and haricot will be performed. Show me R code ▼
## 
##  Pearson's Chi-squared test
## 
## data:  qry_esp_domin$esp_domin and qry_esp_domin$esp_domin
## X-squared = 10250, df = 625, p-value < 2.2e-16


We can reshape the frequency table by considering Land tenure as Id variable and banana, haricot, manico, etc., as measure variables. It turns out to have a new table with value as outcome and variable, tenure, as predictors Show me R code ▼ Show me table ▼


C. GEOPROCESSING

C.1 Input data

Data are collected in the field using GPS tracklogs and stored in .gpx format for every parcel. Each parcel is saved with a unique code that refers to the first 4 letters of the agroforestry activity followed by incremented numbers. GPS receiver tracks are organized in the office by surveyor and device ID using a file storage system.

C.2 Features

We read GPS tracks per file with the original coordinate system if it exists. We transform set of points from each file into polyline, then into polygon. We add attribute data to spatial polygon by joining MySQL databaase to parcels using parcel_code variable. Show R code


In other to better visualize parcels in static maps, we read road and stream network spatial data and reproject them in the UTM zone 18 WGS 84 coordinate system. We also read the Mysql database survey tables that will be joined to spatial parcels for thematic maps purposes.

We calculate the centroid of each parcel within block and retrieve the coordinates within the spatial dataset. The basic idea is to perform a cluster analysis at parcel level. Show R code

C.3 More on clustering analysis

k-means clustering is a old technique that was developped quite a while ago, but it remains very useful for summarizing high dimensional data and have a sense of what pattern our hillside parcels show, what parcels is similar to each other.

The basic principle behind k-means clustering is we define what does that mean to things beeing similar to each other, what does that mean to things beeing different to each other. In some sense we define what does that mean to be close, how do we group things together and how we visualize this grouping, and once we visualize this grouping and how do you interpret what we see.

The most important thing is defining what we mean by close. we need a distance metric to define what does that mean to things beeing close to each other because depending on the context two things can seem close but not be very close and in a differnt context, you can have a total different meaning. We use a continuous distance which is like the Euclidean distance, this is like a straight line between two points.

We partition a group of parcels into fifteeen (15) blocks along each river banck and each block is divided into five(5) sub-blocks. Each of this block or group can have a centroid, like a center of gravity around each group. Once we have the centroid, we assign each parcel to each centroid. The basic idea of the algorithm running K-means clustering is that we pick a centroid, assign all the parcels to the centroid and maybe recalcultate the centroid and reassign the parcels. We reiterate back until we reach the solutions illustrated in the graph below. Show source table of parcels

Aggregation of top six(6) number of hectares per block

block area_ha
15 15 97.47553
10 10 91.45048
11 11 70.27694
8 8 68.93399
14 14 66.39683
2 2 47.19502

The table above illustrate the number of hectares per top six (6) blocks, It summarizes the number of hectares per block of irrigation.

C.4 Thematic map

At this step, we visualize characteristic of parcels by specific theme accross the landscape. Thematic maps are designed to convey information about a single topic or theme, such as land tenure, priority constraint, etc.. A thematic map communicates more information than graphs illustrated above. For example, individual parcel locations can be showed over the stream network . Show R code





Conclusion 1

We have visualized accross the landscape how scattered are parcels based on their characteristics(land tenure, priority constraint, etc..). we have aggregated this comparison at block level so as to have legible information. However the number of parcels geoprocessed 371 is greater than the number of parcels surveyed 243. Further data cleaning is a crucial step in order to have good thematic maps.