QUESTION

Is there a significative difference between computed area and area from GPS receiver reading? Can we use quantitive characteristic of blocks to classify this exercice?
This research is conducted to answer the following indicators:

  1. AVANSE Indicator 1.4 (FtF 4.5.2-2) Number of Hectares under Improved Technologies or Management Practices as a Result of USG Assistance (RiA) (WOG)
  2. AVANSE Indicator 1.4.1 (F, FTF 4.5.1-28): Hectares under New or Improved/Rehabilitated Irrigation and Drainage Services as a Result of USG Assistance (RiA) (WOG)
  3. AVANSE Indicator 1.5.1 (Custom) Number of Rural Hectares Mapped and Adjudicated (S)
  4. AVANSE Indicator 2.1 (F 4.8.1-26) Number of Hectares of Biological Significance and/or Natural Resources under Improved Natural Resource Management as a Result of USG Assistance
  5. AVANSE Indicator 2.3 (F 4.8.1-1) Number of Hectares of Biological Significance and/or Natural Resources Showing Improved Physical Conditions as a Result of USG Assistance

INPUT DATA

Data are collected in the field using GPS tracklogs and stored in .gpx format for every parcel. Each parcel is saved with a unique code that refers to the first 4 letters of the agroforestry activity followed by incremented numbers. GPS receiver tracks are organized in the office by surveyor and device ID using a file storage system.

FEATURES

We read GPS tracks per file with the original coordinate system if they have coordinates. otherwise, the parcel without features is skipped. We transform set of points from each file into polyline, then into polygon by linking the first and end point. We add attribute data to spatial polygon by retrieving from the GPS file storage system the name of parcel, the name of file, the device ID, the name of surveyor. We finally write each file into shapefile format.

Show R code

We compile separate shapefiles into one shapefile. We create a numeric unique ID based on the filename value. Spatial data processing is an important step to:

  1. create efficiencies by minimizing information loss or duplication
  2. verify consistency across data sources
  3. make data accessible to those who need it and can use it to make programmatic decisions.
  4. produce needed information in a timely manner.
  5. produce information that is relevant to value-chain needs
Show R code

In other to better visualize blocs per Grant in either static maps or web mapping application, we read blocks, road and stream network spatial data and reproject them in the UTM zone 18 WGS 84 coordinate system to calculate the area of blocks in hectares. We also read the block spreadsheet that will be joined to spatial blocks to compare computed area and area from GPS reading.
We calculate the centroid of each parcels within block and retrieve the coordinates in the block spatial dataset. The basic idea is to perform a cluster analysis at Grant and block level. It is a requirement to split out parcels per Grant and blocks because the data collection did not take in consideration this aggregation.

Show R code

More on clustering analysis

k-means clustering is a old technique that was developped quite a while ago, but it remains very useful for summarizing high dimensional data and have a sense of what pattern our hillside parcels show, what parcels is similar to each other.

The basic principle behing k-means clustering is we define what does that mean to things beeing similar to each other, what does that mean to things beeing different to each other. In some sense we define what does that mean to be close, how do we group things together and how we visualize this grouping, and once we visualize this grouping and how do you interpret what we see.

The most important thing is defining what we mean by close. we need a distance metric to define what does that mean to things beeing close to each other because depending on the context two things can seem close but not be very close and in a differnt context, you can have a total different meaning. We use a continuous distance which is like the Euclidean distance, this is like a straight line between two points.

We partition a group of parcels into five(5) grants and each grant is divided into four(4) blocks. We have in our spatial dataset 982 parcels for which we have retreive the points, so we have 982 points. we know that they can be divided into five(5) groups or grants. Each of this group can have a centroid, like a center of gravity around each group. Once we have the centroid, we assign each parcel to each centroid. The basic idea of the algorithm running K-means clustering is that we pick a centroid, assign all the parcels to the centroid and maybe recalcultate the centroid and reassign the parcels. We reiterate back until we reach the solutions illustrated in the five(5) graphs below.

We also calculate the size and the length of raw files and processed files. Knowing the size of the raw file is a good indicator to set how much memory we will allocate to our Integrated Development Environment(IDE), in our case Rstudio.

Summary of GPS tracks processing

Show R code
Size in MB Number of parcels Number of hectares
GPS files 5.96 984 302.01
Shapefiles 0.79 982 302.01

The number of files decrease after processing from 984 to 982. It is mainly due to missing features from GPS tracks. The number of size decrease from 5.96 to 0.79. It is one of the objective of spatial data processing. GPS files is converted in Shapefile because it is a handy format when dealing with spatial data.

The table below summarize stabilized hillside parcels data splited out per Grant and block.

Aggregation of parcels per block and grant

Show R code
Block Grant area_ha
1 1 15.041999
2 1 21.650664
3 1 11.287469
4 1 22.503947
1 2 14.158044
2 2 21.071819
3 2 7.860917
4 2 16.958768
1 3 8.507325
2 3 11.307393
3 3 12.213840
4 3 13.970525
1 4 26.967610
2 4 12.511791
3 4 10.773312
4 4 13.164418
1 5 16.290974
2 5 14.241276
3 5 13.785306
4 5 17.740384

We can summarize the number of hectares per Grant. It turns out to have:

  1. Grant 1 which location is in the upstream hillside of Perches has 70.48 ha;
  2. Grant 2 located in downstream hillside of Jean de Nantes covers 60.05 ha;
  3. Grant 3 with a 46 ha area located in upstream hillside of Grisongarde;
  4. Grant 4 location in the upstream hillside of Acul des pins with a total area of 63.42 ha;
  5. Grant 5 which location is in the upstream hillside of Acul Samedi covering a total area of 62.06 ha.

The table below illustrate the number of hectares per Grant

Grant area_ha Locality direction
1 70.48408 Perches upstream
2 60.04955 Jean de Nantes downstream
3 45.99908 Grisongarde upstream
4 63.41713 Acul des Pins upstream
5 62.05794 Acul Samedi upstream

Exploratory analysis

The goal of this task is to understand the basic relationships observed between surveyors in the dataset and prepare to build a model. We can organize surveyor by their frequency, which refers to the frequency of the parcels collected.

Show R code
##     Wesner  Grandelin     Robert    Fredlin    Franzdy      Romel 
##        162        161        160        115        105         98 
##    Anthony     Folero PierreMary      Marco     Emanel   Rodemond 
##         34         30         14         13         12         12 
##      Fritz     Samuel  Alexandre     Jocely      Enock      Evens 
##         11         11          8          8          7          7 
##      Lesly   Nicodeme     Guesly 
##          6          6          2

We can also aggregate computed area by surveyor. The basic idea is to explore in a bar plot the area covered by each surveyor.

Show R code

We can also make a representation on the cloud of the surveyors showing their relative size based on the number of parcels collected or the total computed parcels area. The two graphs shows respectively the frequency of parcels per susrveyor and the number of parcels area computed per surveyor. We can see that Ronald and Velia are the top surveyors.

Show R code


We can see that Ronald has collected more parcels than his buddies but his efficiency in area covered is less than Gonel and Velia

Can we compare the efficiency of surveyors.

First of all let’s explore the relationship between the two source of parcels area disregrading surveyors.

Show R code

There’s a positive relationships and it seems to be strong for all surveyors

Show R code


If we split this relationship by the top four(4) frequent surveyors to have a closer look at the relationship, we see it is positive and strong for all of them, but much more for Velia.

Regression models

Let’s build a model with computed area as outcome and only surveyor as predictor. The basic idea in this exercice is that we want to understand the difference in computed area between surveyors. The efficiency of a surveyor can be mostly explained by the total computed area, because number of hecatares are key terms in our five (5) indicators above mentioned.

Show R code
##                        Estimate Std. Error     t value    Pr(>|t|)
## (Intercept)         0.194549796 0.09241473  2.10518170 0.035534541
## surveyorAnthony    -0.069917122 0.10271323 -0.68070221 0.496223907
## surveyorEmanel      0.384520812 0.11930690  3.22295531 0.001311454
## surveyorEnock       0.292622032 0.13528131  2.16306325 0.030782301
## surveyorEvens       0.059929622 0.13528131  0.44300001 0.657865429
## surveyorFolero      0.070834273 0.10400935  0.68103758 0.496011776
## surveyorFranzdy     0.107973521 0.09587067  1.12624144 0.260344502
## surveyorFredlin     0.099910026 0.09557511  1.04535607 0.296121298
## surveyorFritz       0.123187393 0.12145675  1.01424906 0.310719208
## surveyorGrandelin   0.090059981 0.09468291  0.95117460 0.341754906
## surveyorGuesly      0.258355489 0.20664562  1.25023456 0.211518225
## surveyorJocely     -0.063141076 0.13069416 -0.48312086 0.629119969
## surveyorLesly      -0.045798547 0.14116583 -0.32443083 0.745682485
## surveyorMarco       0.050507845 0.11745707  0.43001111 0.667283964
## surveyorNicodeme    0.186636842 0.14116583  1.32211061 0.186445915
## surveyorPierreMary -0.040083685 0.11584799 -0.34600241 0.729416635
## surveyorRobert      0.109647929 0.09469692  1.15788277 0.247199564
## surveyorRodemond    0.007198136 0.11930690  0.06033294 0.951903011
## surveyorRomel       0.188038546 0.09611277  1.95643668 0.050703046
## surveyorSamuel      0.142011926 0.12145675  1.16923867 0.242597417
## surveyorWesner      0.165903306 0.09466908  1.75245508 0.080014481

The estimate of Fredlin is 0.26, but the P-value is 5.31e-02 > 0.05. The test statistic is significative. in other words, the t-test for \(H_0: \beta_{surveyor} = 0\) versus \(H_a: \beta_{surveyor} \neq 0\) with P-value equal to 0.0531 is significant. Notice that we have, Franzdy who is missing in the table. It is because he is the reference category. The number 0.26 is the estimated increase in computed area comparing Fredlin parcels and Franzdy parcels. We reject the null hypothesis and claim that there is a significant difference in the mean computed area between Franzdy parcels and that of Fredlin parcels.

Show R code
##                     Estimate Std. Error   t value     Pr(>|t|)
## surveyorAlexandre  0.1945498 0.09241473  2.105182 3.553454e-02
## surveyorAnthony    0.1246327 0.04482773  2.780259 5.537665e-03
## surveyorEmanel     0.5790706 0.07545631  7.674250 4.078538e-14
## surveyorEnock      0.4871718 0.09879550  4.931114 9.629168e-07
## surveyorEvens      0.2544794 0.09879550  2.575820 1.014867e-02
## surveyorFolero     0.2653841 0.04772276  5.560954 3.476926e-08
## surveyorFranzdy    0.3025233 0.02550889 11.859526 2.245552e-30
## surveyorFredlin    0.2944598 0.02437459 12.080608 2.209435e-31
## surveyorFritz      0.3177372 0.07881155  4.031607 5.978150e-05
## surveyorGrandelin  0.2846098 0.02060029 13.815817 9.807106e-40
## surveyorGuesly     0.4529053 0.18482946  2.450396 1.444731e-02
## surveyorJocely     0.1314087 0.09241473  1.421946 1.553665e-01
## surveyorLesly      0.1487512 0.10671134  1.393959 1.636522e-01
## surveyorMarco      0.2450576 0.07249608  3.380288 7.532417e-04
## surveyorNicodeme   0.3811866 0.10671134  3.572129 3.716597e-04
## surveyorPierreMary 0.1544661 0.06985897  2.211114 2.726276e-02
## surveyorRobert     0.3041977 0.02066456 14.720744 2.200123e-44
## surveyorRodemond   0.2017479 0.07545631  2.673705 7.629083e-03
## surveyorRomel      0.3825883 0.02640421 14.489673 3.527239e-43
## surveyorSamuel     0.3365617 0.07881155  4.270462 2.144359e-05
## surveyorWesner     0.3604531 0.02053661 17.551736 4.912875e-60

If we omit the intercept, then the model includes now all the surveyors. Now Fredlin is not a linear combination of Franzdy. there are 25 means in the dataset, the total number of surveyors. The expected value of the outcome should be the mean for one or another surveyor. As we can see in the table Fredlin is about 0.65 and Franzdy is about 0.38 and it is clearly illustrated in the boxplot below.

Show R code

If I were to substract 0.2944598 to 0.3025233, we would get exactly -0.008063495, the estimate of Fredlin. In other words, the model is perfectly consistent. If you are going to fix the mean of Fredlin and Franzdy, any way you do this, the model will alway be consistent.

Show R code
## [1] -0.0952804  0.2951005

If we were willing to choose the model 1 as our best model, then the confidence interval for the 0.263 computed area difference between Fredlin and Franzdy would be -0.01462709 and 0.54254004. The graph below shows residuals versus fitted computed area value.

Show R code

Conclusion 1

We have checked the difference between computed parcels area and written parcels area from surveyors. The difference is not signicant. We correctly accept the null hypothesis and claim that there is not a significant difference in computed parcels area versus written parcels area from surveyors.

STATIC MAPS

At this step, we visualize the difference in computed area and area from GPS reading in a spatial context. area from GPS reading are read from the GPS device by the surveyor and transcripted manually with a paper survey form. They are then entered in a spreadsheet table by data entry clerks. Given How parcels are scattered over the watersheds, they are subsetted at the level of Grant for each map to have a closer look when comparing the two sources of area.

Show R code
Show R code
Show R code
Show R code
Show R code

Conclusion 2

We have compared with a panel visualization blocks area from both computation and GPS reading method. we have aggregated this comparison at Grant level so as to have legible information. The difference between the two methods is not significant. We correctly accept the null hypothesis and claim that there is not a significant difference in computed area versus area from GPS reading.