Is there a significative difference between computed and field area of parcels? Can we use quantitive characteristic of parcels to classify this exercice?
This research is conducted to answer the following indicators:
Data are collected in the field using GPS tracklogs and stored in .gpx format for every parcel. Each parcel is saved with a unique code that refers to the name of the surveyor and the crop. GPS receiver tracks are organized in the office by surveyor and device ID using a file storage system.
We read GPS tracks per file with the original coordinate system if they have coordinates. otherwise, the parcel without features is skipped. We transform set of points from each file into polyline, then into polygon by linking the first and end point. We add attribute data to spatial polygon by retrieving the name of the parcel from the metadata file, the name of file, the device ID, the name of surveyor from the GPS file storage system. We finally write each file into shapefile.
Show R codeWe compile separate shapefiles into one shapefile. We create a numeric unique ID based on the name value from the metadata file.
Show R codeIn other to better visualize parcels in either static maps or web mapping application, we read parcels, sub-watersheds, road network spatial data and reproject them in the UTM zone 18 WGS 84 coordinate system to calculate the area of parcels in hectares. We also read the banana spreadsheet that will be joined to spatial parcels to compare computed and field parcels area.
Show R codeWe also calculate the size and the length of raw files and processed files. Knowing the size of the raw file is a good indicator to set how much memory we will allocate to our Integrated Development Environment(IDE), in our case Rstudio.
| Size in MB | Number of parcels | Number of hectares | |
|---|---|---|---|
| GPS files | 7.44 | 1645 | 727.62 |
| Shapefiles | 0.90 | 1446 | 646.94 |
The number of files decrease after processing from 1645 to 1446. It is mainly due to missing features from GPS tracks. The number of size decrease from 7.44 to 0.9. It is one of the objective of spatial data processing. GPS files is converted in Shapefile because it is a handy format when dealing with spatial data.
The goal of this task is to understand the basic relationships observed between surveyors in the dataset and prepare to build a model. We can organize surveyor by their frequency, which refers to the frequency of the parcels collected.
Show R code## Ronald Velia Rochelin PierreJoel Nixon Gonel
## 161 148 139 127 101 100
## Jacquy Bony Rony Suzette William Sonel
## 84 68 68 49 43 41
## Henriot Geffrard Tardier Anthony Sandro Dorceant
## 40 38 36 27 27 24
## Folero Otilien Semeran Nicodeme Billy Remy
## 24 24 23 20 18 12
## Lesly
## 4
We can also aggregate computed area by surveyor. The basic idea is to explore in a bar plot the area covered by each surveyor.
Show R codeWe can also make a representation on the cloud of the surveyors showing their relative size based on the number of parcels collected or the total computed parcels area. The two graphs shows respectively the frequency of parcels per susrveyor and the number of parcels area computed per surveyor. We can see that Ronald and Velia are the top surveyors.
Show R code
We can see that Ronald has collected more parcels than his buddies but his efficiency in area covered is less than Gonel and Velia
First of all let’s explore the relationship between the two source of parcels area disregrading surveyors.
Show R codeThere’s a positive relationships and it seems to be strong for all surveyors
Show R code
If we split this relationship by the top four(4) frequent surveyors to have a closer look at the relationship, we see it is positive and strong for all of them, but much more for Velia.
Let’s build a model with computed area as outcome and only surveyor as predictor. The basic idea in this exercice is that we want to understand the difference in computed area between surveyors. The efficiency of a surveyor can be mostly explained by the total computed area, because number of hecatares are key terms in our five (5) indicators above mentioned.
Show R code## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.38709625 0.08627239 4.4869079 7.809141e-06
## surveyorBilly 0.26395647 0.13640862 1.9350425 5.318331e-02
## surveyorBony 0.21699358 0.10197158 2.1279809 3.351038e-02
## surveyorDorceant -0.18115750 0.12576253 -1.4404728 1.499540e-01
## surveyorFolero -0.05595666 0.12576253 -0.4449390 6.564316e-01
## surveyorGeffrard 0.06762164 0.11283314 0.5993065 5.490640e-01
## surveyorGonel 0.32111415 0.09722404 3.3028266 9.809488e-04
## surveyorHenriot -0.02168810 0.11165527 -0.1942416 8.460145e-01
## surveyorJacquy 0.27378341 0.09917305 2.7606635 5.842578e-03
## surveyorLesly -0.11339899 0.24017216 -0.4721571 6.368872e-01
## surveyorNicodeme -0.02845405 0.13225306 -0.2151485 8.296824e-01
## surveyorNixon -0.05461095 0.09712166 -0.5622942 5.740043e-01
## surveyorOtilien 0.12663408 0.12576253 1.0069301 3.141398e-01
## surveyorPierreJoel -0.06339348 0.09500147 -0.6672895 5.046956e-01
## surveyorRemy -0.09680108 0.15552975 -0.6223959 5.337814e-01
## surveyorRochelin -0.01147140 0.09427976 -0.1216741 9.031744e-01
## surveyorRonald -0.02643322 0.09322616 -0.2835387 7.768052e-01
## surveyorRony 0.20450040 0.10197158 2.0054646 4.510271e-02
## surveyorSandro 0.31438707 0.12200758 2.5767831 1.007257e-02
## surveyorSemeran -0.20711291 0.12720169 -1.6282245 1.036990e-01
## surveyorSonel 0.07237510 0.11110519 0.6514106 5.148868e-01
## surveyorSuzette -0.03882504 0.10744360 -0.3613527 7.178895e-01
## surveyorTardier -0.06247662 0.11412764 -0.5474276 5.841709e-01
## surveyorVelia 0.06050020 0.09381234 0.6449067 5.190918e-01
## surveyorWilliam 0.22634095 0.11007447 2.0562530 3.994043e-02
The estimate of billy is 0.26, but the P-value is 5.31e-02 > 0.05. The test statistic is significative. in other words, the t-test for \(H_0: \beta_{surveyor} = 0\) versus \(H_a: \beta_{surveyor} \neq 0\) with P-value equal to 0.0531 is significant. Notice that we have, Anthony who is missing in the table. It is because he is the reference category. The number 0.26 is the estimated increase in computed area comparing billy parcels and Anthony parcels. We reject the null hypothesis and claim that there is a significant difference in the mean computed area between anthony parcels and that of billy parcels.
Show R code## Estimate Std. Error t value Pr(>|t|)
## surveyorAnthony 0.3870963 0.08627239 4.486908 7.809141e-06
## surveyorBilly 0.6510527 0.10566166 6.161674 9.355895e-10
## surveyorBony 0.6040898 0.05436248 11.112257 1.440576e-27
## surveyorDorceant 0.2059387 0.09150568 2.250557 2.456600e-02
## surveyorFolero 0.3311396 0.09150568 3.618787 3.063666e-04
## surveyorGeffrard 0.4547179 0.07272134 6.252881 5.318441e-10
## surveyorGonel 0.7082104 0.04482845 15.798236 6.408757e-52
## surveyorHenriot 0.3654082 0.07088000 5.155307 2.889094e-07
## surveyorJacquy 0.6608797 0.04891185 13.511649 3.156538e-39
## surveyorLesly 0.2736973 0.22414223 1.221087 2.222555e-01
## surveyorNicodeme 0.3586422 0.10023945 3.577855 3.580614e-04
## surveyorNixon 0.3324853 0.04460597 7.453829 1.568936e-13
## surveyorOtilien 0.5137303 0.09150568 5.614190 2.371586e-08
## surveyorPierreJoel 0.3237028 0.03977881 8.137567 8.718479e-16
## surveyorRemy 0.2902952 0.12940858 2.243245 2.503457e-02
## surveyorRochelin 0.3756248 0.03802299 9.878887 2.638428e-22
## surveyorRonald 0.3606630 0.03532977 10.208475 1.164087e-23
## surveyorRony 0.5915967 0.05436248 10.882445 1.507519e-26
## surveyorSandro 0.7014833 0.08627239 8.131030 9.179785e-16
## surveyorSemeran 0.1799833 0.09347377 1.925496 5.436674e-02
## surveyorSonel 0.4594714 0.07001027 6.562913 7.374373e-11
## surveyorSuzette 0.3482712 0.06404064 5.438285 6.325297e-08
## surveyorTardier 0.3246196 0.07471408 4.344825 1.492872e-05
## surveyorVelia 0.4475965 0.03684876 12.146854 2.255475e-32
## surveyorWilliam 0.6134372 0.06836274 8.973268 8.881315e-19
If we omit the intercept, then the model includes now all the surveyors. Now Billy is not a linear combination of Anthony. there are 25 means in the dataset, the total number of surveyors. The expected value of the outcome should be the mean for one or another surveyor. As we can see in the table billy is about 0.65 and anthony is about 0.38 and it is clearly illustrated in the boxplot below.
Show R codeIf I were to substract 0.38 to 0.65, we would get exactly 0.27. In other words, the model is perfectly consistent. If you are going to fix the mean of the billy and anthony, any way you do this, the model will alway be consistent.
Show R code## [1] -0.01462709 0.54254004
If we were willing to choose the model 1 as our best model, then the confidence interval for the 0.263 computed area difference between Billy and Anthony would be -0.01462709 and 0.54254004. The graph below shows residuals versus fitted computed area value.
Show R codeWe have checked the difference between computed parcels area and written parcels area from surveyors. The difference is not signicant. We correctly accept the null hypothesis and claim that there is not a significant difference in computed parcels area versus written parcels area from surveyors.
At this step, we visualize the difference in computed area and field area in a spatial context. Field area are read from the GPS device by the surveyor and transcripted manually in a paper survey form. They are then entered in a spreadsheet table by data entry clerks. Given How parcels are scattered over the watersheds, they are subsetted at the level of subwatershed for each map to have a closer look when comparing the two sources of area.
Show R codeWe have compared with a panel visualization parcels area from both computation and GPS lecture method. we have aggregated this comparison at the subwatershed level so as to have legible information. The difference between the two methods is not significant. We correctly accept the null hypothesis and claim that there is not a significant difference in computed parcels area versus GPS lecture parcels area.