Orlenko Irina
2022-09-19
Here is a sample of the dataset:
did | date | time | lat | lon | hAccuracy |
---|---|---|---|---|---|
YjU0c2M3YT | 2019-09-02 | 14:02:20 | 1.307193 | 103.789969 | 16 |
NTVqZnByMj | 2019-09-08 | 15:17:08 | 1.306691 | 103.788064 | 40 |
Yzg3aTI0Zm | 2019-09-07 | 00:04:46 | 1.296493 | 103.788552 | 16 |
NDlwajkxZ3 | 2020-09-29 | 16:23:06 | 1.302764 | 103.789300 | 16 |
Ymh1bzJudj | 2019-09-19 | 09:48:05 | 1.299170 | 103.788830 | 10 |
YXY2ZzUwcG | 2020-09-27 | 23:48:38 | 1.306168 | 103.790950 | 17 |
Nm5rODRyMm | 2020-09-12 | 20:47:53 | 1.291254 | 103.791026 | 43 |
N2d2dDBmY3 | 2021-09-19 | 08:34:20 | 1.297297 | 103.792369 | 9 |
M3A0ZGpyM2 | 2019-09-03 | 12:46:02 | 1.297263 | 103.792366 | 34 |
aGFsczRrb3 | 2019-09-18 | 14:22:41 | 1.297277 | 103.792415 | 11 |
N2d1MHFobT | 2020-09-16 | 22:32:47 | 1.306542 | 103.790730 | 5 |
M3NtcTVob3 | 2019-09-16 | 15:21:41 | 1.298444 | 103.793557 | 18 |
NWlzOHEyaj | 2020-09-17 | 15:37:02 | 1.296259 | 103.790090 | 11 |
MjlwbDBkYW | 2020-09-03 | 17:49:38 | 1.305604 | 103.790868 | 4 |
YXNxcTlqdG | 2020-09-10 | 12:26:56 | 1.299795 | 103.787668 | 22 |
M2liM3BocX | 2021-09-17 | 23:03:59 | 1.298468 | 103.790561 | 28 |
ZGE5M2kxOG | 2020-09-08 | 19:45:47 | 1.292339 | 103.791598 | 5 |
ZTEwZnZoOX | 2020-09-02 | 01:42:49 | 1.306552 | 103.787742 | 11 |
NHA5bGwxNm | 2019-09-27 | 20:18:12 | 1.297539 | 103.787827 | 7 |
ZjJmOHZic3 | 2020-09-24 | 14:43:20 | 1.305188 | 103.788919 | 18 |
ODI5OGphcT | 2020-09-06 | 17:07:39 | 1.297297 | 103.792370 | 5514 |
NjZ1cW9pNG | 2019-09-03 | 12:15:45 | 1.304208 | 103.791564 | 9 |
NGN0cmNjMm | 2020-09-27 | 12:52:14 | 1.295670 | 103.788300 | 8 |
MmpkMXN1NT | 2020-09-07 | 06:52:40 | 1.292591 | 103.793451 | 22 |
NGQwOWNhdH | 2020-09-23 | 11:57:23 | 1.306516 | 103.790526 | 7 |
YnVlOWdtMz | 2020-09-23 | 13:56:27 | 1.306118 | 103.789510 | 10 |
YmI4ajZsbH | 2020-09-24 | 12:01:03 | 1.300471 | 103.788184 | 10 |
NWZvbmRwZz | 2020-09-23 | 13:27:16 | 1.302981 | 103.792100 | 10 |
NTQ2bTJuN2 | 2019-09-13 | 03:06:06 | 1.296743 | 103.795281 | 17 |
YTU0cjU4b2 | 2020-09-14 | 18:10:48 | 1.305662 | 103.787796 | 5 |
MmVkdjhndH | 2020-09-04 | 09:22:45 | 1.307362 | 103.789176 | 7 |
ZGFjcm5rNW | 2019-09-19 | 13:39:09 | 1.305716 | 103.790522 | 6 |
MTNzYnJjZ2 | 2019-09-13 | 17:04:53 | 1.299018 | 103.787447 | 29 |
YW1kajNsYW | 2019-09-25 | 08:25:28 | 1.303971 | 103.787800 | 5 |
Nmd2YmR1bn | 2020-09-28 | 01:06:29 | 1.297297 | 103.792370 | 5514 |
YmlxOXRidj | 2019-09-16 | 08:14:37 | 1.298590 | 103.788280 | 18 |
YzY4bGxoZT | 2019-09-20 | 05:55:26 | 1.305514 | 103.790134 | 5 |
NW1qbW02Yj | 2020-09-01 | 18:43:44 | 1.301549 | 103.787105 | 26 |
OGprMzY1bm | 2020-09-02 | 17:37:23 | 1.302220 | 103.792820 | 6 |
NWlhcGtiZj | 2019-09-24 | 20:35:48 | 1.298815 | 103.787456 | 17 |
We have four data dimensions: location, time, pings, devices
decimal places | degrees | distance |
---|---|---|
0 | 1.0 | 111 km |
1 | 0.1 | 11.1 km |
2 | 0.01 | 1.11 km |
3 | 0.001 | 111 m |
4 | 0.0001 | 11.1 m |
5 | 0.00001 | 1.11 m |
6 | 0.000001 | 0.111 m |
Pings are mostly given with 6 decimal places
Some pings have fewer than 6 decimal places in their coordinates
did | lat | lon | lat dp | lon dp |
---|---|---|---|---|
Ymt0c25rNm | 1.289280 | 103.789420 | 5 | 5 |
ZWoybXFkNj | 1.292877 | 103.785080 | 6 | 5 |
Ymt0c25rNm | 1.289280 | 103.789420 | 5 | 5 |
NTBnZmc5MD | 1.291510 | 103.785356 | 5 | 6 |
Ymt0c25rNm | 1.289280 | 103.789420 | 5 | 5 |
NnRrdWJlcT | 1.291309 | 103.784288 | 6 | 6 |
M2trb2RvcG | 1.291588 | 103.784485 | 6 | 6 |
OW44MGdxNW | 1.289958 | 103.790013 | 6 | 6 |
OWpzdDViZW | 1.289099 | 103.790276 | 6 | 6 |
Ymt0c25rNm | 1.289280 | 103.789420 | 5 | 5 |
Ymt0c25rNm | 1.289280 | 103.789420 | 5 | 5 |
NjRnOG1pZj | 1.290559 | 103.785370 | 6 | 5 |
Ymt0c25rNm | 1.289280 | 103.789420 | 5 | 5 |
Ymt0c25rNm | 1.289300 | 103.789444 | 4 | 6 |
OWJqdGhxY2 | 1.290833 | 103.784705 | 6 | 6 |
OW9sNjk4ZW | 1.289461 | 103.791380 | 6 | 5 |
NDl0bTViN3 | 1.287974 | 103.791340 | 6 | 5 |
NzdhZWJibj | 1.289070 | 103.791577 | 5 | 6 |
Y21kNHZvaj | 1.291490 | 103.784394 | 5 | 6 |
NGY1cDlzdT | 1.289850 | 103.787447 | 5 | 6 |
Assuming that the number of decimal places can be between 2 and 6 (we will count 0 and 1 decimal place as 2), the fraction of points where the longitude has exactly 6 decimal places should be 90%. Indeed, the 6th decimal place can be 0, 1, 2, …, 9. In 9 out of 10 cases, the longitude has 6 decimal places and in one case, i.e., when the 6th decimal place is 0, the number of decimal places is 5 or less. By the same logic, the fraction of points whose longtitude has 5 decimal places should be 9% (90% of the remaining 10%), the fraction of points whose longitude has 4 decimal places should be 0.9% etc.
The same logic is applied to latitude
decimal places | fraction of points |
---|---|
0 / 1 / 2 | 0.01% |
3 | 0.09% |
4 | 0.9% |
5 | 9% |
6 | 90% |
However, we observe different frequencies. South Park I:
lon dp | ping count | % | expected count | expected % | observed / expected |
---|---|---|---|---|---|
<=2 | 856 | 1.1 | 8 | 0.01 | 106.61 |
3 | 163 | 0.2 | 72 | 0.09 | 2.26 |
4 | 3236 | 4.0 | 723 | 0.90 | 4.48 |
5 | 17082 | 21.3 | 7226 | 9.00 | 2.36 |
6 | 58954 | 73.4 | 72262 | 90.00 | 0.82 |
lat dp | ping count | % | expected count | expected % | observed / expected |
---|---|---|---|---|---|
<=2 | 532 | 0.7 | 8 | 0.01 | 66.26 |
3 | 38 | 0.0 | 72 | 0.09 | 0.53 |
4 | 1161 | 1.4 | 723 | 0.90 | 1.61 |
5 | 9026 | 11.2 | 7226 | 9.00 | 1.25 |
6 | 69534 | 86.6 | 72262 | 90.00 | 0.96 |
Science Park II:
lon dp | ping count | % | expected count | expected % | observed / expected |
---|---|---|---|---|---|
<=2 | 397 | 0.7 | 6 | 0.01 | 67.57 |
3 | 200 | 0.3 | 53 | 0.09 | 3.78 |
4 | 2519 | 4.3 | 529 | 0.90 | 4.76 |
5 | 17166 | 29.2 | 5288 | 9.00 | 3.25 |
6 | 38469 | 65.5 | 52876 | 90.00 | 0.73 |
lat dp | ping count | % | expected count | expected % | observed / expected |
---|---|---|---|---|---|
<=2 | 341 | 0.6 | 6 | 0.01 | 58.04 |
3 | 24 | 0.0 | 53 | 0.09 | 0.45 |
4 | 1362 | 2.3 | 529 | 0.90 | 2.58 |
5 | 4934 | 8.4 | 5288 | 9.00 | 0.93 |
6 | 52090 | 88.7 | 52876 | 90.00 | 0.99 |
One North:
lon dp | ping count | % | expected count | expected % | observed / expected |
---|---|---|---|---|---|
<=2 | 27720 | 3.2 | 86 | 0.01 | 324.06 |
3 | 18397 | 2.2 | 770 | 0.09 | 23.90 |
4 | 37764 | 4.4 | 7699 | 0.90 | 4.91 |
5 | 193082 | 22.6 | 76986 | 9.00 | 2.51 |
6 | 578437 | 67.6 | 769860 | 90.00 | 0.75 |
lat dp | ping count | % | expected count | expected % | observed / expected |
---|---|---|---|---|---|
<=2 | 955 | 0.1 | 86 | 0.01 | 11.16 |
3 | 845 | 0.1 | 770 | 0.09 | 1.10 |
4 | 9460 | 1.1 | 7699 | 0.90 | 1.23 |
5 | 77512 | 9.1 | 76986 | 9.00 | 1.01 |
6 | 766628 | 89.6 | 769860 | 90.00 | 1.00 |
There are too many pings with abnormally few decimal places. There presence can be explained by
Rounding errors due to glitches in GPS.
IP-based geolocation. There are pings whose location was collected via Wi-Fi rather than via a satellite. In that case, the coordinates we have are those of the physical LAN port of Wi-Fi router and they are imprecise.
Data doctoring.
To estimate the number of incorrect pings in our dataset, we notice that the expected fraction of pings where both longitude and latitude have 6 decimal places should be 81% (90% of 90%). The difference between the observed fraction and the expected fraction is due to the presence of incorrect data points.
Science Park I
lon dp | lat dp | ping count | % |
---|---|---|---|
2 | 2 | 527 | 0.7 |
2 | 4 | 325 | 0.4 |
2 | 6 | 4 | 0.0 |
3 | 4 | 5 | 0.0 |
3 | 5 | 10 | 0.0 |
3 | 6 | 148 | 0.2 |
4 | 3 | 3 | 0.0 |
4 | 4 | 33 | 0.0 |
4 | 5 | 269 | 0.3 |
4 | 6 | 2931 | 3.7 |
5 | 3 | 3 | 0.0 |
5 | 4 | 112 | 0.1 |
5 | 5 | 3511 | 4.4 |
5 | 6 | 13456 | 16.8 |
6 | 2 | 5 | 0.0 |
6 | 3 | 32 | 0.0 |
6 | 4 | 686 | 0.9 |
6 | 5 | 5236 | 6.5 |
6 | 6 | 52995 | 66.0 |
Estimated percentage of incorrect data points is 81% - 66% = 15%
Science Park II
lon dp | lat dp | ping count | % |
---|---|---|---|
2 | 2 | 340 | 0.6 |
2 | 4 | 5 | 0.0 |
2 | 5 | 3 | 0.0 |
2 | 6 | 49 | 0.1 |
3 | 4 | 1 | 0.0 |
3 | 5 | 13 | 0.0 |
3 | 6 | 186 | 0.3 |
4 | 4 | 27 | 0.0 |
4 | 5 | 239 | 0.4 |
4 | 6 | 2253 | 3.8 |
5 | 3 | 2 | 0.0 |
5 | 4 | 907 | 1.5 |
5 | 5 | 1158 | 2.0 |
5 | 6 | 15099 | 25.7 |
6 | 2 | 1 | 0.0 |
6 | 3 | 22 | 0.0 |
6 | 4 | 422 | 0.7 |
6 | 5 | 3521 | 6.0 |
6 | 6 | 34503 | 58.7 |
Estimated percentage of incorrect data points is 81% - 59% = 22%
One North
lon dp | lat dp | ping count | % |
---|---|---|---|
2 | 1 | 888 | 0.1 |
2 | 3 | 4 | 0.0 |
2 | 4 | 247 | 0.0 |
2 | 5 | 5395 | 0.6 |
2 | 6 | 21186 | 2.5 |
3 | 3 | 203 | 0.0 |
3 | 4 | 310 | 0.0 |
3 | 5 | 3147 | 0.4 |
3 | 6 | 14737 | 1.7 |
4 | 1 | 9 | 0.0 |
4 | 3 | 42 | 0.0 |
4 | 4 | 1408 | 0.2 |
4 | 5 | 3114 | 0.4 |
4 | 6 | 33191 | 3.9 |
5 | 1 | 11 | 0.0 |
5 | 3 | 83 | 0.0 |
5 | 4 | 1429 | 0.2 |
5 | 5 | 14822 | 1.7 |
5 | 6 | 176737 | 20.7 |
6 | 1 | 46 | 0.0 |
6 | 2 | 1 | 0.0 |
6 | 3 | 513 | 0.1 |
6 | 4 | 6066 | 0.7 |
6 | 5 | 51034 | 6.0 |
6 | 6 | 520777 | 60.9 |
Estimated percentage of incorrect data points is 81% - 61% = 20%
Now we will sort all pings into several bands:
band 2 - pings where latitude OR longitude has only 2 decimal places
band 3 - pings where latitude OR longitude has only 3 decimal places
band 4 - pings where latitude OR longitude has only 4 decimal places
band 5 - pings where latitude OR longitude has only 5 decimal places
band 6 - pings where both latitude AND longitude have 6 decimal places
If we believe that errors that we have are just rounding errors, then we can remove pings with at least one coordinate given with not enough decimal places.
Science Park I
dp | ping count | ping % | cumulative ping % | device count | device % |
---|---|---|---|---|---|
2 | 861 | 1.1 | 1.1 | 99 | 2.3 |
3 | 201 | 0.3 | 1.3 | 94 | 2.2 |
4 | 4031 | 5.0 | 6.3 | 771 | 17.9 |
5 | 22203 | 27.7 | 34.0 | 2112 | 49.0 |
6 | 52995 | 66.0 | 100.0 | 3822 | 88.8 |
Science Park II
dp | ping count | ping % | cumulative ping % | device count | device % |
---|---|---|---|---|---|
2 | 398 | 0.7 | 0.7 | 66 | 2.6 |
3 | 224 | 0.4 | 1.1 | 71 | 2.8 |
4 | 3848 | 6.5 | 7.6 | 565 | 22.5 |
5 | 19778 | 33.7 | 41.3 | 1373 | 54.7 |
6 | 34503 | 58.7 | 100.0 | 2241 | 89.3 |
One North
dp | ping count | ping % | cumulative ping % | device count | device % |
---|---|---|---|---|---|
1 | 954 | 0.1 | 0.1 | 117 | 0.3 |
2 | 26833 | 3.1 | 3.2 | 2990 | 6.6 |
3 | 19035 | 2.2 | 5.5 | 3255 | 7.1 |
4 | 45208 | 5.3 | 10.8 | 7900 | 17.3 |
5 | 242593 | 28.4 | 39.1 | 22322 | 49.0 |
6 | 520777 | 60.9 | 100.0 | 37697 | 82.7 |
Need a plan!
Count unique points where each device pinged. The definition of a unique point will be with different resolution, e.g., if as a unique point we count pairs (lon, lat), where both coordinates are rounded to the 4th decimal place, then we are essentially counting the number of different squares \(11\times 11\) m that this device has been to.
Stationary devices are devices that pinged from very few \(11\times 11\) m squares.
Devices that pinged always from one \(11\times 11\) cm square are probably geolocated based on IP rather than based on satellite.
## `summarise()` has grouped output by 'did', 'lon'. You can override using the
## `.groups` argument.
did | dp 6 |
---|---|
NDE1ZWx2cjhvMmRudDpiamQxZmJkYTE0ajI5 | 4 |
YWZqdWhhNzZnZDE3djo5MjE1ZmRjM29oY2Ew | 10 |
Mm5qdTVwNTU3cmwxcTo0OHVvMG82bWo0N2pz | 4 |
YXBsZWE5ajU2cWQ4bTo2bWRpNmE1NDB1aXF2 | 16 |
OTZxa2JobzU5aTI2cjo0ZG1kbWgzNGt0OWhh | 181 |
Y284a2dpMXRjNTcxMTpmazdhc2lyM21kcjNl | 2 |
NmE5cXE0c3FhZmNxbDo1NzUyZmlwc2pscnNv | 34 |
ZGw0dHRsb2NnNnAyajo0bmRhODhwZzkwanNs | 3 |
OHJyNTJuNmJydGNsOTo1cTVnM3E5bjdvYW5q | 3 |
MmEza3MzZXJwbmkzbzpjMW9vNGhmcTVobXRy | 6 |
YnBqYjJyc3ZudGgxNzpiZTRrYmE2ODM1Z2Js | 1 |
M25yMHRwdmRpYWwzcjo0dWYxMWhsbHU0MW03 | 38 |
ZjhibTZlZzRyNTJwaDo5YzhjOWp1cDRsYnJx | 2 |
NmFmNzMxZW9qbGNjaTo4cXJiaG02a2MwYWVy | 1 |
cG00aDcwYWJpdTYzOjNzanNxYTlxamo0Z2M= | 1 |
N3BtbzlvOGFzZnU0cTozN2dyZ2s4ODc0MHY5 | 1 |
ZnUzcWw2ZmJ1MTMzMTo0Mm02b3E2cW02c2Rj | 5 |
NjdiYmx0dWV0NzE4aTplbmxxaXQ5NW92dXFr | 2 |
Y242N2k0ZDBibHFnZjo1cmVqMjhkM21xdjJy | 1 |
NXRicHR1cnBwbWVsazpkcmFrZTNwcjFyNGZr | 1 |
## `summarise()` has grouped output by 'did', 'lon'. You can override using the
## `.groups` argument.
did | 1100 m |
---|---|
Zml2c2IzZHV0cnNwZToxbHE2amRmaW1yczNu | 1 |
MzZmYmo4dmZlMjhwYzozNmNqbHIybXBka2Jy | 1 |
N2gydDg2MjBiY2czaTpmNHU4cjZkMjQ4NHR2 | 1 |
MzByNGVuajhvbjk0OmM0cnFwc3QxamZsM2Y= | 1 |
N3Nqa2ptYW9qbW9najpkNTQzc2cwcWU1bmFz | 1 |
YmMybTZsN25ibHRscDo1MmtyZjg4ZDRvb2Zx | 2 |
Y3NmMGcyMnU4M2piOTo3dWl0Mjd2Nmh0YnRi | 1 |
NDBvMmc0M3NnbXFtYzozZHFhdnFjc3BxbG9l | 1 |
NnM1M2d1OGVlMmdxdTpldjJqOGo4bm5obWt2 | 1 |
OWlmcG1mZzJmbmRmMDpkZzRwbjNuOGJpMnA2 | 1 |
NHYyNHF0MzM5b2trMDpmOTYwazJqcDU1aGU2 | 1 |
ZmMwOThtbDZoZzN2dDozNHB0ZG5lOXJpcHB2 | 1 |
MmNsN3MyaGgzMzR0NjppZGI1OWc4dDloZjg= | 1 |
OTZxa2JobzU5aTI2cjo0ZG1kbWgzNGt0OWhh | 1 |
NGozdnEyY2Ixc3ZmaDo3YW84cTdncGEzbnBu | 1 |
MzRxdG9uMGRobGZtaDpjZXBycXNsN2RkaDho | 1 |
ZmJmMXVpNWNvcGpsODo1MWs4MWswZzEwbTdu | 1 |
YWMydDh2dWNhdmNpazozNmNndjZrNmxiM3Nw | 1 |
NWUzOW8xN2hyOXJobzozMzF1dXJodDRnOGM2 | 1 |
NWU3dnRyZDR2M2w5ZjppNmxqMDh1YWR0cnQ= | 1 |
Science Park
did | 1100 m | 110 m | 11 m | 1.1 m | 0.11 m | ping count |
---|---|---|---|---|---|---|
OHJ2NXNpY21lMzRwZzo0Ymd1OTZqMzhvYXJs | 2 | 6 | 27 | 200 | 299 | 532 |
YWtmbW9ucWdnZmczajphcHF1a2p0Y2FsYTYw | 1 | 5 | 34 | 332 | 453 | 636 |
N3ZmbmNidm1lOXQ3cjo2OG9lZHV2dnBkcXE1 | 1 | 7 | 67 | 324 | 499 | 501 |
MzUzc2FvdjFyOW5xazpldGJhZzJuNGx2dXNx | 2 | 6 | 35 | 301 | 501 | 599 |
Nzg2ZXN1bTdyMDF2aTo3Z2FmNG1wYms3OGhz | 2 | 7 | 68 | 433 | 502 | 506 |
ODQ1bGczdWhjcGRtMjpkc2wwMW9xZm9jdW81 | 2 | 6 | 87 | 477 | 522 | 522 |
YTJla2RlMTQ5Y29pMTo2MGdwN2ZhcTNmZms2 | 2 | 13 | 65 | 383 | 546 | 610 |
ZGttZGJ2dWowNGFxODo1OGQ5MjE5NnBnYTQ4 | 2 | 17 | 71 | 371 | 583 | 588 |
ZTlxdWRtMjhtMG1xYjo4Y2ozZ2xla2ZyNGNu | 1 | 4 | 27 | 256 | 613 | 624 |
YnA4dnZqdnAxY2prMTpkNTkwZWVvM3BmOHVj | 1 | 1 | 4 | 118 | 642 | 662 |
YnV0czdjdTBkZmxnbzozMnYwM2dqbjdpc3Fw | 2 | 10 | 34 | 333 | 658 | 665 |
ZmFscm5xcmk3bmRwaTpkam9lZ2F2aDExZ2Jt | 1 | 15 | 159 | 602 | 664 | 664 |
YWhnbWJmNWVwbmFsczphb281MHQwMWxtMXZz | 2 | 15 | 122 | 456 | 686 | 690 |
YXJsZ3ZrZXVnbzlucDphNHZzbTNzZXNscjlx | 1 | 1 | 2 | 121 | 736 | 762 |
OW9oMHMwbWFxOXNzbjoxaXMxOXFxODRtdm1y | 2 | 10 | 49 | 427 | 751 | 762 |
NjRnOG1pZjN2MW9qczpjY2VhM3JoZjhudTYx | 2 | 13 | 60 | 487 | 796 | 1013 |
ZGVlczRkcWUyNDRzMzo0dWZ0bWZlYnM1N3U4 | 2 | 4 | 38 | 528 | 818 | 820 |
Nm5xaXEzZzdnMHFzMzpiOTQ2NmszMGI0b2U5 | 1 | 13 | 182 | 688 | 966 | 969 |
NGxpNGllMmNmYjZhYzo2aWljMmg1NzloMjRp | 2 | 10 | 63 | 476 | 1011 | 1028 |
Ymt0c25rNmc2NDZwdjppcXJlZDVvMG92azk= | 1 | 1 | 4 | 121 | 1968 | 4434 |
Y2trMGk3amcwamNrcDo0NXMxMDlmanJhY211 | 2 | 5 | 42 | 617 | 2017 | 3727 |
Nm05cHFnOTNvaHA2YjozczlnZXI0NGpmdTBk | 1 | 13 | 130 | 758 | 2100 | 2168 |
Y21kNHZvajJzMDA2NjozN3U5YjRqbTY1Z29q | 2 | 4 | 18 | 359 | 2593 | 2742 |
One North