Birders as Agents of Re-Territorialization

Exploring the Phenomenon through eBird

Travis Bott

The purpose of this project is to get an initial spatial-quantitative handle on birders in Colombia using data from eBird, a popular app among avian enthusiasts. The app is used for both personal and scientific purposes; individuals use it to keep track of their personal lists—what they’ve seen and where they’ve seen it—and researchers treat it as citizen-science data. Whereas most researchers take advantage of this data to track species, I use it to focus on the people making the observations—where they’re going, and how many of them are going there, and how those places may overlap with the post-conflict process. Hopefully, if things go according to plan, we’ll see that there is at least some meat on the bone of my idea that birders are a significant component of this initital wave of ‘re-territorialization.’

There are of course limits to this data. If we see trends like increasing unique observers, that may have less to do with the sheer numnber of people venturing into post-conflict zones than it does with increased app popularity, cell phone usage, or a number of other things. However, I feel that combining eBird usership data with interview based evidence–conversations that I’ve had with birders and birding guides on the groun in Colombia–we can get a reasonable read on whether or not individuals are traveling to certain parts of Colombia in notable numbers.

Initial Packages

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0     ✔ purrr   0.2.5
## ✔ tibble  1.4.2     ✔ dplyr   0.7.7
## ✔ tidyr   0.8.1     ✔ stringr 1.3.1
## ✔ readr   1.1.1     ✔ forcats 0.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(ggmap)
library(tmap)
library(rgdal)
## Loading required package: sp
## rgdal: version: 1.3-6, (SVN revision 773)
##  Geospatial Data Abstraction Library extensions to R successfully loaded
##  Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
##  Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rgdal/gdal
##  GDAL binary built with GEOS: FALSE 
##  Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
##  Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rgdal/proj
##  Linking to sp version: 1.3-1
library(sp)
library(sf)
## Linking to GEOS 3.6.1, GDAL 2.1.3, PROJ 4.9.3
library(maps)
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(RColorBrewer)
library(tmaptools)
library(gganimate)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggmap':
## 
##     wind
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

Data Import…

To get eBird data you merely need to submit a request on their website. You give them a short abstract of what your project is about, and, once approved, you can use their interface to do some initial filtering. I went ahead and requested all observations they have in Colombia. A day or so goes by, then you get a link to download your data, which shows up as a .csv file.

eBird_base <- read_tsv("~/Desktop/Work/2018-19/1 Fall/GEOG 208/Project/Data/eBird_Oct_2018/ebd_CO_relOct-2018.txt")
## Warning: Missing column names filled in: 'X47' [47]
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   `LAST EDITED DATE` = col_datetime(format = ""),
##   `TAXONOMIC ORDER` = col_integer(),
##   `OBSERVATION COUNT` = col_integer(),
##   LATITUDE = col_double(),
##   LONGITUDE = col_double(),
##   `OBSERVATION DATE` = col_date(format = ""),
##   `TIME OBSERVATIONS STARTED` = col_time(format = ""),
##   `DURATION MINUTES` = col_integer(),
##   `EFFORT DISTANCE KM` = col_double(),
##   `NUMBER OBSERVERS` = col_integer(),
##   `ALL SPECIES REPORTED` = col_integer(),
##   `HAS MEDIA` = col_integer(),
##   APPROVED = col_integer(),
##   REVIEWED = col_integer()
## )
## See spec(...) for full column specifications.
## Warning in rbind(names(probs), probs_f): number of columns of result is not
## a multiple of vector length (arg 1)
## Warning: 409827 parsing failures.
## row # A tibble: 5 x 5 col     row col         expected  actual file                                  expected   <int> <chr>       <chr>     <chr>  <chr>                                 actual 1  1431 OBSERVATIO… an integ… X      '~/Desktop/Work/2018-19/1 Fall/GEOG … file 2  4481 OBSERVATIO… an integ… X      '~/Desktop/Work/2018-19/1 Fall/GEOG … row 3  4494 OBSERVATIO… an integ… X      '~/Desktop/Work/2018-19/1 Fall/GEOG … col 4  4496 OBSERVATIO… an integ… X      '~/Desktop/Work/2018-19/1 Fall/GEOG … expected 5  4497 OBSERVATIO… an integ… X      '~/Desktop/Work/2018-19/1 Fall/GEOG …
## ... ................. ... .......................................................................... ........ .......................................................................... ...... .......................................................................... .... .......................................................................... ... .......................................................................... ... .......................................................................... ........ ..........................................................................
## See problems(...) for more details.

#####…and having a look

str(eBird_base)
## Classes 'tbl_df', 'tbl' and 'data.frame':    2905351 obs. of  47 variables:
##  $ GLOBAL UNIQUE IDENTIFIER    : chr  "URN:CornellLabOfOrnithology:EBIRD:OBS169258184" "URN:CornellLabOfOrnithology:EBIRD:OBS169276456" "URN:CornellLabOfOrnithology:EBIRD:OBS169274032" "URN:CornellLabOfOrnithology:EBIRD:OBS169260295" ...
##  $ LAST EDITED DATE            : POSIXct, format: "2015-07-20 22:22:34" "2015-07-21 12:24:08" ...
##  $ TAXONOMIC ORDER             : int  16271 12710 24218 33111 28026 28026 28026 16354 12253 6469 ...
##  $ CATEGORY                    : chr  "species" "species" "species" "species" ...
##  $ COMMON NAME                 : chr  "Amazonian Umbrellabird" "Barred Antshrike" "Black-capped Donacobius" "Blue-black Grassquit" ...
##  $ SCIENTIFIC NAME             : chr  "Cephalopterus ornatus" "Thamnophilus doliatus" "Donacobius atricapilla" "Volatinia jacarina" ...
##  $ SUBSPECIES COMMON NAME      : chr  NA NA NA NA ...
##  $ SUBSPECIES SCIENTIFIC NAME  : chr  NA NA NA NA ...
##  $ OBSERVATION COUNT           : int  3 1 2 3 4 1 2 1 1 5 ...
##  $ BREEDING BIRD ATLAS CODE    : chr  NA NA NA NA ...
##  $ BREEDING BIRD ATLAS CATEGORY: chr  NA NA NA NA ...
##  $ AGE/SEX                     : chr  NA NA NA NA ...
##  $ COUNTRY                     : chr  "Colombia" "Colombia" "Colombia" "Colombia" ...
##  $ COUNTRY CODE                : chr  "CO" "CO" "CO" "CO" ...
##  $ STATE                       : chr  "Amazonas" "Amazonas" "Amazonas" "Amazonas" ...
##  $ STATE CODE                  : chr  "CO-AMA" "CO-AMA" "CO-AMA" "CO-AMA" ...
##  $ COUNTY                      : chr  NA NA NA NA ...
##  $ COUNTY CODE                 : chr  NA NA NA NA ...
##  $ IBA CODE                    : chr  NA NA NA NA ...
##  $ BCR CODE                    : chr  NA NA NA NA ...
##  $ USFWS CODE                  : chr  NA NA NA NA ...
##  $ ATLAS BLOCK                 : chr  NA NA NA NA ...
##  $ LOCALITY                    : chr  "Rio Amazonas--Isla de los Micos (Monkey Island)" "COLOMBIA: Amazonas; Leticia outskirts" "COLOMBIA: Amazonas; Leticia to Km 17" "COLOMBIA: Amazonas; Leticia outskirts" ...
##  $ LOCALITY ID                 : chr  "L881399" "L1783841" "L1802270" "L1783841" ...
##  $ LOCALITY TYPE               : chr  "H" "P" "P" "P" ...
##  $ LATITUDE                    : num  -4.04 -4.2 -4.09 -4.2 -4.2 ...
##  $ LONGITUDE                   : num  -70.1 -69.9 -70 -69.9 -69.9 ...
##  $ OBSERVATION DATE            : Date, format: "1973-08-13" "1973-08-25" ...
##  $ TIME OBSERVATIONS STARTED   : 'hms' num  05:45:00 07:15:00 08:15:00 16:45:00 ...
##   ..- attr(*, "units")= chr "secs"
##  $ OBSERVER ID                 : chr  "obsr30312" "obsr30312" "obsr30312" "obsr30312" ...
##  $ SAMPLING EVENT IDENTIFIER   : chr  "S12011239" "S12012365" "S12012226" "S12011388" ...
##  $ PROTOCOL TYPE               : chr  "Traveling" "Traveling" "Traveling" "Traveling" ...
##  $ PROTOCOL CODE               : chr  "P22" "P22" "P22" "P22" ...
##  $ PROJECT CODE                : chr  "EBIRD" "EBIRD" "EBIRD" "EBIRD" ...
##  $ DURATION MINUTES            : int  180 135 315 60 60 135 315 NA 315 60 ...
##  $ EFFORT DISTANCE KM          : num  1 2 17 1 1 2 17 NA 17 1 ...
##  $ EFFORT AREA HA              : chr  NA NA NA NA ...
##  $ NUMBER OBSERVERS            : int  3 2 1 1 1 2 1 NA 1 1 ...
##  $ ALL SPECIES REPORTED        : int  1 1 1 1 1 1 1 0 1 1 ...
##  $ GROUP IDENTIFIER            : chr  NA NA NA NA ...
##  $ HAS MEDIA                   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ APPROVED                    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ REVIEWED                    : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ REASON                      : chr  NA NA NA NA ...
##  $ TRIP COMMENTS               : chr  "with Brian Meyers, Elsie Ritchie.  At this point, I knew few voices, so these are largely visual detections only." "from Parador Ticuna to outskirts of town. 80-85, clear, calm" "trucked to KM 17, then walked back (0915-1230) on road to Km 11, and then return; 45 min spent in primary fores"| __truncated__ NA ...
##  $ SPECIES COMMENTS            : chr  NA NA NA NA ...
##  $ X47                         : chr  NA NA NA NA ...
##  - attr(*, "problems")=Classes 'tbl_df', 'tbl' and 'data.frame': 409827 obs. of  5 variables:
##   ..$ row     : int  1431 4481 4494 4496 4497 4506 4507 4514 4516 4518 ...
##   ..$ col     : chr  "OBSERVATION COUNT" "OBSERVATION COUNT" "OBSERVATION COUNT" "OBSERVATION COUNT" ...
##   ..$ expected: chr  "an integer" "an integer" "an integer" "an integer" ...
##   ..$ actual  : chr  "X" "X" "X" "X" ...
##   ..$ file    : chr  "'~/Desktop/Work/2018-19/1 Fall/GEOG 208/Project/Data/eBird_Oct_2018/ebd_CO_relOct-2018.txt'" "'~/Desktop/Work/2018-19/1 Fall/GEOG 208/Project/Data/eBird_Oct_2018/ebd_CO_relOct-2018.txt'" "'~/Desktop/Work/2018-19/1 Fall/GEOG 208/Project/Data/eBird_Oct_2018/ebd_CO_relOct-2018.txt'" "'~/Desktop/Work/2018-19/1 Fall/GEOG 208/Project/Data/eBird_Oct_2018/ebd_CO_relOct-2018.txt'" ...
##  - attr(*, "spec")=List of 2
##   ..$ cols   :List of 47
##   .. ..$ GLOBAL UNIQUE IDENTIFIER    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ LAST EDITED DATE            :List of 1
##   .. .. ..$ format: chr ""
##   .. .. ..- attr(*, "class")= chr  "collector_datetime" "collector"
##   .. ..$ TAXONOMIC ORDER             : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ CATEGORY                    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ COMMON NAME                 : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ SCIENTIFIC NAME             : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ SUBSPECIES COMMON NAME      : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ SUBSPECIES SCIENTIFIC NAME  : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ OBSERVATION COUNT           : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ BREEDING BIRD ATLAS CODE    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ BREEDING BIRD ATLAS CATEGORY: list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ AGE/SEX                     : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ COUNTRY                     : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ COUNTRY CODE                : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ STATE                       : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ STATE CODE                  : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ COUNTY                      : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ COUNTY CODE                 : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ IBA CODE                    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ BCR CODE                    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ USFWS CODE                  : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ ATLAS BLOCK                 : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ LOCALITY                    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ LOCALITY ID                 : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ LOCALITY TYPE               : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ LATITUDE                    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ LONGITUDE                   : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ OBSERVATION DATE            :List of 1
##   .. .. ..$ format: chr ""
##   .. .. ..- attr(*, "class")= chr  "collector_date" "collector"
##   .. ..$ TIME OBSERVATIONS STARTED   :List of 1
##   .. .. ..$ format: chr ""
##   .. .. ..- attr(*, "class")= chr  "collector_time" "collector"
##   .. ..$ OBSERVER ID                 : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ SAMPLING EVENT IDENTIFIER   : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ PROTOCOL TYPE               : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ PROTOCOL CODE               : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ PROJECT CODE                : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ DURATION MINUTES            : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ EFFORT DISTANCE KM          : list()
##   .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
##   .. ..$ EFFORT AREA HA              : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ NUMBER OBSERVERS            : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ ALL SPECIES REPORTED        : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ GROUP IDENTIFIER            : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ HAS MEDIA                   : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ APPROVED                    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ REVIEWED                    : list()
##   .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
##   .. ..$ REASON                      : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ TRIP COMMENTS               : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ SPECIES COMMENTS            : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   .. ..$ X47                         : list()
##   .. .. ..- attr(*, "class")= chr  "collector_character" "collector"
##   ..$ default: list()
##   .. ..- attr(*, "class")= chr  "collector_guess" "collector"
##   ..- attr(*, "class")= chr "col_spec"

…and…

head(eBird_base)
## # A tibble: 6 x 47
##   `GLOBAL UNIQUE … `LAST EDITED DATE`  `TAXONOMIC ORDE… CATEGORY
##   <chr>            <dttm>                         <int> <chr>   
## 1 URN:CornellLabO… 2015-07-20 22:22:34            16271 species 
## 2 URN:CornellLabO… 2015-07-21 12:24:08            12710 species 
## 3 URN:CornellLabO… 2015-08-21 09:29:12            24218 species 
## 4 URN:CornellLabO… 2015-07-21 12:24:08            33111 species 
## 5 URN:CornellLabO… 2015-07-21 12:24:08            28026 species 
## 6 URN:CornellLabO… 2015-07-21 12:24:08            28026 species 
## # ... with 43 more variables: `COMMON NAME` <chr>, `SCIENTIFIC
## #   NAME` <chr>, `SUBSPECIES COMMON NAME` <chr>, `SUBSPECIES SCIENTIFIC
## #   NAME` <chr>, `OBSERVATION COUNT` <int>, `BREEDING BIRD ATLAS
## #   CODE` <chr>, `BREEDING BIRD ATLAS CATEGORY` <chr>, `AGE/SEX` <chr>,
## #   COUNTRY <chr>, `COUNTRY CODE` <chr>, STATE <chr>, `STATE CODE` <chr>,
## #   COUNTY <chr>, `COUNTY CODE` <chr>, `IBA CODE` <chr>, `BCR CODE` <chr>,
## #   `USFWS CODE` <chr>, `ATLAS BLOCK` <chr>, LOCALITY <chr>, `LOCALITY
## #   ID` <chr>, `LOCALITY TYPE` <chr>, LATITUDE <dbl>, LONGITUDE <dbl>,
## #   `OBSERVATION DATE` <date>, `TIME OBSERVATIONS STARTED` <time>,
## #   `OBSERVER ID` <chr>, `SAMPLING EVENT IDENTIFIER` <chr>, `PROTOCOL
## #   TYPE` <chr>, `PROTOCOL CODE` <chr>, `PROJECT CODE` <chr>, `DURATION
## #   MINUTES` <int>, `EFFORT DISTANCE KM` <dbl>, `EFFORT AREA HA` <chr>,
## #   `NUMBER OBSERVERS` <int>, `ALL SPECIES REPORTED` <int>, `GROUP
## #   IDENTIFIER` <chr>, `HAS MEDIA` <int>, APPROVED <int>, REVIEWED <int>,
## #   REASON <chr>, `TRIP COMMENTS` <chr>, `SPECIES COMMENTS` <chr>,
## #   X47 <chr>

…so this is pretty cool! There’s tons of stuff here, and this data would be useuful to a whole bunch of different researchers. But not all of it is useful to me. So my first order of business is 1) drop the things I don’t need to make this thing more manageable and 2) reformat the column titles so I’m not in backtick hell. So…

eBird_cleaner <- eBird_base %>% 
  select(-c(`LAST EDITED DATE`, `TAXONOMIC ORDER`, `CATEGORY`, `SUBSPECIES COMMON NAME`, `SUBSPECIES SCIENTIFIC NAME`, `BREEDING BIRD ATLAS CODE`, `BREEDING BIRD ATLAS CATEGORY`, `AGE/SEX`, `COUNTY`, `COUNTY CODE`, `IBA CODE`, `BCR CODE`, `USFWS CODE`, `ATLAS BLOCK`, `TIME OBSERVATIONS STARTED`, `DURATION MINUTES`, `EFFORT AREA HA`, `ALL SPECIES REPORTED`, `HAS MEDIA`, `APPROVED`, `REVIEWED`, `REASON`, `SPECIES COMMENTS`, `X47`, `PROJECT CODE`, `GROUP IDENTIFIER`, `TRIP COMMENTS`, `PROTOCOL TYPE`, `OBSERVATION COUNT`, `GLOBAL UNIQUE IDENTIFIER`))
names(eBird_cleaner) %<>% str_replace_all("\\s","_") %>% tolower

I clearly didn’t do this in the most efficient way possible. It would have been much smarter to select by column numbers, but honestly I had already made the mistake in a previous ‘draft’ so might has well go with ctrl+c and ctrl+v. Likewise, there are lots of things I still don’t need, like species names, etc., but it felt better to still have a more complete-feeling dataset. But we can see that it does look much better and more manageable:

head(eBird_cleaner)
## # A tibble: 6 x 17
##   common_name scientific_name country country_code state state_code
##   <chr>       <chr>           <chr>   <chr>        <chr> <chr>     
## 1 Amazonian … Cephalopterus … Colomb… CO           Amaz… CO-AMA    
## 2 Barred Ant… Thamnophilus d… Colomb… CO           Amaz… CO-AMA    
## 3 Black-capp… Donacobius atr… Colomb… CO           Amaz… CO-AMA    
## 4 Blue-black… Volatinia jaca… Colomb… CO           Amaz… CO-AMA    
## 5 Black-bill… Turdus ignobil… Colomb… CO           Amaz… CO-AMA    
## 6 Black-bill… Turdus ignobil… Colomb… CO           Amaz… CO-AMA    
## # ... with 11 more variables: locality <chr>, locality_id <chr>,
## #   locality_type <chr>, latitude <dbl>, longitude <dbl>,
## #   observation_date <date>, observer_id <chr>,
## #   sampling_event_identifier <chr>, protocol_code <chr>,
## #   effort_distance_km <dbl>, number_observers <int>

One thing I notice, though, is that the dates start in 1973. That’s too far back. Individuals can backdate sitings in eBird, which is why this information exists, but it’s not relevant to us, and has to be dropped. I chose 1998 because that’s essentially where the records become contiguous.

eBird_cleaner <- eBird_cleaner %>% 
  subset(observation_date > "1998-01-01")

Data Manipulation and Visualization

So where do we begin? I have a dataframe with just shy of 2.9 million observations, each of which has locational data. So we know we have something to work with. I also have a feeling plotting 2.9 million observations isn’t going to look very good, and we’re going to have to chop things up to make it meaningful. But I want to start having a look anyways. So here’s what I’m going to do: first, grab a map of Colombia, and second, subset our data by points in relevant FARC-prone states, and then choose only the last few years, which should hopefully give us some kind of rough idea of the most recent incursions.

First, a map of Colombia (which I do by subsetting a world map)…

map_world <- map_data("world")
map_colombia <- map_world %>% 
  subset(region %in% "Colombia")

…and then chopping the data into a manageable piece. Since I’m interested in conflict areas, I select by a list of ‘conflict-prone’ states, which are all the state implicated in CTEP, which I explain below.

target <- c("Cauca", "Arauca", "Antioquia", "Norte de Santander", "Caquetá", "Chocó", "Meta",  "Bolívar", "Nariño", "Putumayo", "Cesar", "Córdoba", "Tolima")
eBird_chopped <- eBird_cleaner %>% 
  filter(state %in% target) %>% 
  subset(observation_date > "2016-01-01")

…and then having a look….

ggplot() +
  geom_polygon(data = map_world, aes(x = long, y = lat, group = group), color = "gray60", fill = "gray84") +
  geom_polygon(data = map_colombia, aes(x = long, y = lat, group = group), color = "gray30", fill = "cornsilk2") +
  geom_point(data = eBird_chopped, aes(x = longitude, y = latitude, color = state), size = .25) +
  coord_quickmap(xlim = c(-80.5, -66), ylim = c(-4.75, 12.75)) +
  theme(panel.background = element_rect(fill = "lightcyan2"),
        panel.border = element_rect(color = "black", size = 1, fill = NA),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        panel.grid = element_blank(),
        axis.ticks = element_blank(),
        axis.text = element_blank(),
        legend.title = element_blank()) +
  ggtitle("Having a look at recent observations...") 

Lesson learned: coord_quickmap() is necessary for a problem-free zoom. coord_map() resulted in odd lines all over the place.

But as we can see above, this is interesting, but perhaps not everything we need. First, there’s a spatial mismatch problem. We can group things by state, but that’s a spatially incomplete view of the phenomenon in question. If we’re interested in an phenomenon that occurs across the Oregon/California border, aggregating the data by the state won’t give us an accurrate view. The ‘localities,’ likewise, in this data set aren’t fine-grained enough. This means I’m going to need to add some additional spatial information, something best done outside of R. But don’t worry, we’ll be back.

CTEP

Circunscripciones Transitorias Especiales de Paz

The conflcit that the FARC launced more than half a century ago sprung from a lack of rural representation and incorporation. As part of the peace accord, the government and FARC reprsentatives agreed to pay special attention to areas with long histories of conflict. To do so, they agreed upon special administrative zones (CTEP, in their Spanish acronym), which would send additional representative members to congress as well as serve as territorial circumscriptions singled out for additional funding and investment opportunities. These areas, for our purposes, are the post-conflict areas. And they’re the the political-spatial demarcations we need to measure our phenomenon by. To do so, I relied on QGIS, which, combined with municipality-level shapefiles from Diva-GIS, allowed me to make a CTEP shapefile. I then used that CTEP shapefile to clip my eBird data and adjust the attribute table to reflect which circusmcription, along with state and municipality, observations fall in.

They look like this:

map_co_shp <- readOGR("Shapefiles/Colombia Admin/Col_0/COL_adm0.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/travisbott/Desktop/Work/2018-19/1 Fall/GEOG 208/Project/eBird/Shapefiles/Colombia Admin/Col_0/COL_adm0.shp", layer: "COL_adm0"
## with 1 features
## It has 70 fields
## Integer64 fields read as strings:  ID_0 OBJECTID_1
map_col_muni_shp <- readOGR("Shapefiles/Colombia Admin/Col_2/COL_adm2.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/travisbott/Desktop/Work/2018-19/1 Fall/GEOG 208/Project/eBird/Shapefiles/Colombia Admin/Col_2/COL_adm2.shp", layer: "COL_adm2"
## with 1065 features
## It has 11 fields
## Integer64 fields read as strings:  ID_0 ID_1 ID_2
map_CTEP_shp <- readOGR("Shapefiles/CTEP_Merged/CTEP_Merged.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/travisbott/Desktop/Work/2018-19/1 Fall/GEOG 208/Project/eBird/Shapefiles/CTEP_Merged/CTEP_Merged.shp", layer: "CTEP_Merged"
## with 160 features
## It has 14 fields
## Integer64 fields read as strings:  ID_0 ID_1 ID_2

At this point I learn that the sp/RGDAL package, from which readOGR is based, is incredibly slow. It produces a very complete spatial object but it does not handle large shapefiles well. The file below, which is essentially a shapefile of my entire eBird set, made it, but my clipped files did not. For the sake of not killing myself at export time, I’m shifting to an sf object.

points_obs_all <- read_sf("Shapefiles/ebird_cleaned_joined/ebird_cleaned_joined.shp")

For aesthetic purposes, I want background and context for my maps. To do so, I import a countries shapefile I happen to have lying around, I make an extent object (basically a matrix of x/y longitude/latitude values), which I can then use in the crop() function to clip down to a manageable size.

library(raster)
## 
## Attaching package: 'raster'
## The following object is masked from 'package:plotly':
## 
##     select
## The following object is masked from 'package:gganimate':
## 
##     animate
## The following object is masked from 'package:dplyr':
## 
##     select
## The following object is masked from 'package:tidyr':
## 
##     extract
map_world_sf <- readOGR("Shapefiles/countries/ne_10m_admin_0_countries.shp")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/travisbott/Desktop/Work/2018-19/1 Fall/GEOG 208/Project/eBird/Shapefiles/countries/ne_10m_admin_0_countries.shp", layer: "ne_10m_admin_0_countries"
## with 255 features
## It has 94 fields
## Integer64 fields read as strings:  POP_EST NE_ID
extent_object <- extent(-80.5, -66, -4.75, 12.75)
map_world_cropped <- crop(map_world_sf, extent_object)
And then, some mapping…

This first map represents the work I did in QGIS, paring down municipalities to just the special CTEP jurisdictions I’m interested in.

tm_shape(map_world_cropped) +
  tm_fill(col = "gray90") +
  tm_borders("grey70") +
tm_shape(map_co_shp) +
  tm_fill(col = "gray80", legend.show = FALSE) +
  tm_borders("grey40") +
tm_shape(map_col_muni_shp) +
  tm_fill(col = "gray80", legend.show = FALSE) +
  tm_borders("grey70", lwd = .3) +
tm_shape(map_CTEP_shp) +
  tm_polygons(col = "layer", palette = brewer.pal("Dark2", n = 8), legend.show = FALSE) +
  tm_borders("grey80") +
tm_layout(main.title = "CTEP, Colombia", bg.color = "lightcyan", inner.margins = c(0, 0, 0, 0))
## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).

Next, I can start thinking about more spatial analysis vis-a-vis eBird data. I’ll start by seeing where the observations fall. I drop the background municipality shapefile that’s in the above map as it’s not really necessary and this is already a big operation. I keep it above, though, to show how these municipalities are grouped.

tm_shape(map_world_cropped) +
  tm_fill(col = "gray90") +
  tm_borders("grey70") +
tm_shape(map_co_shp) +
  tm_fill(col = "gray80", legend.show = FALSE) +
  tm_borders("grey40") +
tm_shape(map_CTEP_shp) +
  tm_polygons(col = "layer", palette = brewer.pal("Dark2", n = 8), legend.show = FALSE) +
  tm_borders("grey80") +
tm_shape(points_obs_all) +
  tm_dots(col = "coral1", size = .075, alpha = .4, legend.show = FALSE) +
tm_layout(main.title = "All eBird Observations", title.position = c("LEFT", "BOTTOM"), title.bg.color = "white", title.bg.alpha = .5, bg.color = "lightcyan", inner.margins = c(0, 0, 0, 0))
## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).

Turns out, all over the place. So, back to QGIS…The internet suggests that it’s definitely possible to perform a clip operation in R, but I chose the path of least resistance. And I’d do it again if I had to, see (which should be said in 1920’s mobster voice, cigar in mouth). I clipped my eBird obserations by the new jurisdictions, and re-imported that shapefile as an sf object (readOGR could not handle this file, despite it being smaller than the original file, used above. I think it was something that happened during clipping).

points_obs_CTEP <- read_sf("Shapefiles/Obs_by_CTEP_1998_Joined/Obs_by_CTEP_1998_Joined.shp")

Now, let’s only those observations that fall within our areas of interest…

tm_shape(map_world_cropped) +
  tm_fill(col = "gray90") +
  tm_borders("grey70") +
tm_shape(map_co_shp) +
  tm_fill(col = "gray85", legend.show = FALSE) +
  tm_borders("grey40") +
tm_shape(map_CTEP_shp) +
  tm_polygons(col = "layer", palette = brewer.pal("Dark2", n = 8), legend.show = FALSE) +
  tm_borders("grey80") +
tm_shape(points_obs_CTEP) +
  tm_dots(col = "coral1", size = .075, alpha = .4, legend.show = FALSE) +
tm_layout(main.title = "Observations in CTEPs Only", title.bg.color = "white", title.bg.alpha = .5, bg.color = "lightcyan", inner.margins = c(0, 0, 0, 0))
## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).

For my purposes, particularly thinking about where I should train my gaze going forward, this map is extremely helpful. Not all of these areas are created equal, and some receive considerably more traffic than others. So if I, say, was planning on writing a proposal to fund a trip to one of these places, this kind of analysis could help justify my site choices.

This map, though, shows all observations, from 1998 onward, in these areas. We get no sense of changes over time. One potential option would be to filter our analysis by a particulary hard-hit zone, and then facet by year. But I’m not convinced that’s the best way. It would help eliminate seasonality in the analysis, which is good, but the human eye isn’t that good at distinguishing the number of dots across mulitple visuals. Instead, I’ll turn in this section to more ‘traditional’ visuals, using ggplot.

Other Visuals

Like so many things in R, there are a lot of ways to skin your cats. As I mention above, I learned after-the-fact that clipping is possible on spatial objects in R (but a GIS is much easier). Likewise, there are simple-enough ways to extract data frames from your spatial objects in R. However, in this case, I did a bit of a workaround. Having done a spatial join in QGIS so my general eBird data set would carry the CTEP and finer-grained municipality data, I exported a .csv file to form the basis of the rest of my analysis. Here they are:

eBird_ctep <- read_csv("Obs_CTEP_1998.csv")
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   field_1 = col_integer(),
##   latitude = col_double(),
##   longitude = col_double(),
##   observatio = col_date(format = ""),
##   effort_dis = col_double(),
##   number_obs = col_integer(),
##   Circ = col_integer()
## )
## See spec(...) for full column specifications.
head(eBird_ctep)
## # A tibble: 6 x 20
##   field_1 common_nam scientific country country_co state state_code
##     <int> <chr>      <chr>      <chr>   <chr>      <chr> <chr>     
## 1 1840462 Amazon Ki… Chlorocer… Colomb… CO         Nari… CO-NAR    
## 2 1840463 American … Haematopu… Colomb… CO         Nari… CO-NAR    
## 3 1840465 Blue-foot… Sula nebo… Colomb… CO         Nari… CO-NAR    
## 4 1840466 Black-bel… Pluvialis… Colomb… CO         Nari… CO-NAR    
## 5 1840467 Blackish … Pardirall… Colomb… CO         Nari… CO-NAR    
## 6 1840469 Blue-blac… Volatinia… Colomb… CO         Nari… CO-NAR    
## # ... with 13 more variables: locality <chr>, locality_i <chr>,
## #   locality_t <chr>, latitude <dbl>, longitude <dbl>, observatio <date>,
## #   observer_i <chr>, sampling_e <chr>, protocol_c <chr>,
## #   effort_dis <dbl>, number_obs <int>, NAME_2 <chr>, Circ <int>

So, I now have a data set that’s all of the observations that occurred in the areas of interest from 1998-onward, with two additional columns indicating which Circumscription they fall in and their proper municipality. Somehow during the R > GQIS > R cycle some of my label names got chopped. I’m ok with it for now. My goal here is to find if the number of observers is, in fact, increasing.

Total Observations in Colombia

First thing’s first: let’s get a general trend for birders in Colombia:

total_obs_year <- eBird_cleaner %>% 
  group_by(year = floor_date(observation_date, "year")) %>% 
  subset(observation_date > "2010-01-01") %>% 
  summarize(unique_observers = n_distinct(observer_id))

I opt to group our time period by year only because seasonal fluctuations are so extreme in Colombia, and the resulting plots are more difficult to read. I could use stat smoothing (which I use later on), but for a general trend line I feel grouping by year is sufficient.

ggplot(total_obs_year, aes(x = year, y = unique_observers)) +
  geom_line(color = "darkolivegreen3", size = 2) +
  theme_minimal() +
  labs(x = "Year", y = "Unique Observers") +
  theme(panel.grid.major = element_blank(),
        panel.border = element_rect(fill = NA)) +
  ggtitle("Unique Observers in Colombia")

We know from this we might be on to something, but we have to check something else, first. The above plot shows the number of unique observers by year. However, this code will represent the same observer, if they appear in multiple years, as unique in each year. Even though a by-year basis is something to still feel pretty good about, we want to make sure we’re getting an accurate picture, so we need to filter out repeat visitors to see if numbers of individuals are increasing. This is less important when grouping by year as opposed to by month, but it’s best practice, regardless.

The first task to to add a column so that observations are truly ‘new,’ in the sense that the same observer is not counted twice.

eBird_cleaner$true_obs <- as.numeric(!duplicated(eBird_cleaner$observer_id))

We will then add up those unique observers by year. Notice the final summarize() bit changes, as they column has already been aggregated and just needs to be added up.

total_true_obs_year <- eBird_cleaner %>% 
  group_by(year = floor_date(observation_date, "year")) %>% 
  subset(observation_date > "2010-01-01") %>% 
  summarize(all_colombia = sum(true_obs))

Now to plot it:

ggplot(total_true_obs_year, aes(x = year, y = all_colombia)) +
  geom_line(color = "darkorange3", size = 2) +
  theme_minimal() +
  labs(x = "Year", y = "Unique Observers") +
  theme(panel.grid.major = element_blank(),
        panel.border = element_rect(fill = NA)) +
  ggtitle("Truly Unique Observers in Colombia")

The number are slightly different, but the trend line is the same, which is good to know. One last thing, just to verify that we’re on the right track….

The people really are what we care about in this scenario. However, there is one other metric of ‘penetration’ that’s useful to take a look at. eBird has something called “checklists,” which appear in the data set as “sampling events.” Checklists are essentially ‘birding events;’ somebody might go out and create a checklist for the day, indicating the species they’d like to see. They’re essentially a measure of depth of penetration of a given observer; you might have 10 observers show up and make a few observations in a particular place. Slightly different though, is a single observer participating in several different birding ‘events’ in a given region. Recall how the number of observations doesn’t say much, as a single person could stand still and make fifty observations in a few minutes. Below I do a similar plot to above, but by unique checklists.

checklists_by_year_all <- eBird_cleaner %>% 
  group_by(year = floor_date(observation_date, "year")) %>% 
  subset(observation_date > "2010-01-01") %>% 
  summarize(checklists = n_distinct(sampling_event_identifier))

And to plot…

ggplot(checklists_by_year_all, aes(x = year, y = checklists)) +
  geom_line(color = "deepskyblue3", size = 2) +
  theme_minimal() +
  labs(x = "Year", y = "Checklists") +
  theme(panel.grid.major = element_blank(),
        panel.border = element_rect(fill = NA)) +
  ggtitle("Unique Birding 'Events' in Colombia")

With the exception of a slight drop-off in checklist counts around 2017, all of our trend lines point in the same direction: there’s more birding activity in Colombia.

Observers by CTEP

As mentioned at the outset, what interests me in this project is the growing number of unique observes in post-conflict regions of the country. The first thing I want to do is compare how those regions as a group compare to Colombia as a whole.

The first step, similar to one performed with the whole data set, is to filter out the truly unique observers–no repeats.

eBird_ctep$Circ <- as.character(eBird_ctep$Circ)
eBird_ctep$true_obs <- as.numeric(!duplicated(eBird_ctep$observer_i))

Then group them by year…

obs_by_year_all_ctep <- eBird_ctep %>% 
  group_by(year = floor_date(observatio, "year")) %>% 
  subset(observatio > "2010-01-01") %>% 
  summarize(CTEP_total = sum(true_obs))

…and compare them to the country as a whole… The first plot compares total observers, the second compares rates.

obs_by_year_joined <- left_join(total_obs_year, obs_by_year_all_ctep) %>% 
  rename(Colombia = unique_observers, CTEP = CTEP_total) %>% 
  gather(Region, Total, Colombia:CTEP)
## Joining, by = "year"
ggplot(obs_by_year_joined, aes(x = year, y = Total, fill = Region)) +
  geom_area(alpha =.8, color = "gray80", size = .3) +
  scale_fill_brewer(palette = "Dark2") +
  labs(x = "Year", y = "Unique Observers") +
  theme_minimal() +
  theme(panel.grid.major = element_blank()) +
  ggtitle("Total Unique Observers", subtitle = "Comparing all of Colombia with CTEP Regions")

obs_year_joined_perc <- obs_by_year_joined %>% group_by(year) %>% mutate(Percent = Total / sum(Total) * 100)
obs_year_joined_perc_2 <- obs_by_year_joined %>% group_by(Region) %>% mutate(Percent = Total / sum(Total) * 100)
ggplot() +
  geom_area(data = obs_year_joined_perc_2, aes(x = year, y = Percent, fill = Region), alpha = .8, color = "white", size = .3) +
  geom_area(data = obs_year_joined_perc, aes(x = year, y = Percent, fill = Region), alpha = .5) +
  scale_fill_brewer(palette = "Dark2") +
  labs(x = "Year") +
  theme_minimal() +
  theme(panel.grid.major = element_blank()) +
  ggtitle("Growth in Observers", subtitle = "Comparing all of Colombia with CTEP Regions")

The above plot is the result of two stacked area plots: one showing each share as percentage of the whole birding population, and another that shows their growth. Both get to the same thing, they just visualize it differently, and I liked the general aesthetic of trying to stack them together. What we find here is disappointing for my research, but not a death blow: birders have been flocking to CTEP regions at a lower rate than they have been coming to Colombia as a whole. That shifts the narrative a little; these are not necessarily zones that have seen the flood gates open and people start streaming in. They have, however, still exerpienced considerable growth in absolute terms, which means that the processes I’m interested in studying are still viable, perhaps just less extreme.

One caveat, though: the above shows an aggregation of all CTEP regions together. Is it possible that some zones are more affected than others?

Focusing on the CTEPs Themselves…

This section explores the CTEP zones individually. Unfortunately Circumscription zones don’t have names assigned to them; they’re just numbered. This makes the aesthetics of our plotting a little wonky, but it’ll be ok.

The first step is to make several of the same transformations I made earlier, just now to the clipped data set that represents observations in CTEP regions only. I elected to do most of these by month, instead of by year, to get a sense for their seasonal variation.

Transforming…

obs_by_month <- eBird_ctep %>% 
  group_by(month = floor_date(observatio, "month"), Circ) %>% 
  subset(observatio > "2010-01-01") %>% 
  summarize(total = sum(true_obs)) %>% 
  arrange(Circ)

…and plotting…

ggplot(obs_by_month, aes(x = month, y = total, color = Circ)) +
  geom_smooth(se = FALSE) +
  theme_minimal() +
  theme(panel.grid.major = element_blank()) +
  labs(x = "Year", y = "Unique Observers") +
  ggtitle("Unique Observers in each CTEP Region")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 14595
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 2510.6
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 1.1656e+06
## Warning in sqrt(sum.squares/one.delta): NaNs produced

From the above plot we see that there is certainly an increasing trend. One, Circumscription 12, has skyrocketed, but its rate seems pretty constant. For Circumscription 11 we see an odd dip in the middle of 2017. But in general, we see an increase, if modest, in unique observers. Let’s try by a different metric.

checklists_by_month <- eBird_ctep %>% 
  group_by(month = floor_date(observatio, "month"), Circ) %>% 
  subset(observatio > "2010-01-01") %>% 
  summarize(checklists = n_distinct(sampling_e)) %>% 
  arrange(Circ)
ggplot(checklists_by_month, aes(x = month, y = checklists, color = Circ)) +
  geom_smooth(se = FALSE) +
  theme_minimal() +
  labs(x = "Year", y = "Number of Checklists") +
  ggtitle("Checklists by CTEP Region")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 14595
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 2510.6
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 1.1656e+06
## Warning in sqrt(sum.squares/one.delta): NaNs produced

The resulting pattern is similar, but we get the same general trend. Good to know. How can we quantify it differntly?

checklists_by_year <- eBird_ctep %>% 
  group_by(year = floor_date(observatio, "year"), Circ) %>% 
  subset(observatio > "2010-01-01") %>% 
  summarize(checklists = n_distinct(sampling_e)) %>% 
  filter(year == "2017-01-01") %>% 
  arrange(desc(checklists))
ggplot(checklists_by_year, aes(x = Circ, y = checklists, fill = Circ)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  theme_minimal() +
  labs(x = "Circumscription Number", y = "Checklists") +
  theme(panel.grid.major = element_blank()) +
  ggtitle("Number of Checklists by Circumscription, 2017")

We have some clear winners: Circumscriptions number 12 and 9 are clearly head and shoulders above the others. 16 is then the start of the more mid-range areas. Below is a map highlighting the two areas in question.

First, to subset the shapefile…

map_top_ctep <- map_CTEP_shp$layer %in% c("Circum_12", "Cirucm_9")
map_top_ctep_shp <- map_CTEP_shp[map_top_ctep, ]

…then to build the map…

tm_shape(map_world_cropped) +
  tm_fill(col = "gray90") +
  tm_borders("grey70") +
tm_shape(map_co_shp) +
  tm_fill(col = "gray80", legend.show = FALSE) +
  tm_borders("grey40") +
tm_shape(map_col_muni_shp) +
  tm_fill(col = "gray80", legend.show = FALSE) +
  tm_borders("grey75", lwd = .3) +
tm_shape(map_CTEP_shp) +
  tm_fill(col = "layer", palette = brewer.pal("Pastel1", n = 8), alpha = .5, legend.show = FALSE) +
  tm_borders("grey70", lwd = .3) +
tm_shape(map_top_ctep_shp) +
  tm_fill(col = "layer", palette = brewer.pal("Dark2", n = 3), legend.show = FALSE) +
  tm_polygons(border.col = "white", lwd = .3) +
  tm_borders("gray70", lwd = .7) +
tm_layout(main.title = "Important Circumscriptions", bg.color = "lightcyan", inner.margins = c(0, 0, 0, 0))
## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).

Making the areas ‘pop’ in tmap is a little more difficult than in mapping softward, but it works ok enough. As we can see, though, we have our most important municipalities if we’re interested in getting on the ground.

One last thing…Who are these people?

I decided to sort by number of observations to see who my tob observers are.

top_observers <- eBird_cleaner %>%
  mutate(count = 1) %>% 
  group_by(observer_id) %>% 
  summarize(total = sum(count)) %>% 
  arrange(desc(total)) %>% 
  top_n(20)
## Selecting by total
head(top_observers)
## # A tibble: 6 x 2
##   observer_id total
##   <chr>       <dbl>
## 1 obsr693372  44017
## 2 obsr507040  40204
## 3 obsr205104  30905
## 4 obsr162659  28710
## 5 obsr354845  26502
## 6 obsr173592  23921

Observer numero uno, obsr693372, has made 44,017 observations. That’s ridiculous. But let’s have a look and see where he/she’s been.

obsr693372 <- eBird_cleaner %>% 
  filter(observer_id == "obsr693372") %>% 
  arrange(observation_date)
ggplot() +
  geom_polygon(data = map_world, aes(x = long, y = lat, group = group), color = "gray60", fill = "gray84") +
  geom_polygon(data = map_colombia, aes(x = long, y = lat, group = group), color = "gray30", fill = "cornsilk2") +
  geom_point(data = obsr693372, aes(x = longitude, y = latitude, color = observation_date), size = 1.5, alpha = .9) +
  coord_quickmap(xlim = c(-80.5, -66), ylim = c(-4.75, 12.75)) +
  theme(panel.background = element_rect(fill = "lightcyan2"),
        panel.border = element_rect(color = "black", size = 1, fill = NA),
        axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        panel.grid = element_blank(),
        axis.ticks = element_blank(),
        axis.text = element_blank(),
        legend.title = element_blank()) +
  ggtitle("This guy gets around") 

I have labored and labored to change the default color scheme away from blue and have it group by year, but to no avail. Error codes when using scale_fill_manual() or scale_color_gradient() generally revovlve around setting an ‘origin’ with the date. It’s a bit confusing, as it clearly can set one automatically through its default settings, it just won’t let me substitute a color scheme without additional parameters. Either way, though, we have a rough history of Colombia’s most prolific eBirder, from 2009-2018. What’s interesting is that there are popular areas, such as on the Atlantic Coast near Panama, that this person clearly doesn’t go.

Conclusion

Unfortunately, a few of the things I wanted to do here didn’t work out. First, I was never able to render a proper animation. Attempts in gganimate, plotly, and tmap proved too difficult. I’m sure this could get figured out, but at some point losses have to be cut and we have to move on.

There’s a ton of information here, and somebody with more skill than I can make more use of it. But it’s been incredibly helpful for my purposes. If nothing else I’ve managed to pinpoint CTEP hotspots for birding activity, which, should I ever write a proposal to go do field work, these sites, and the work that I’ve done here to dig into them, will be my justification.

As for my core questions, none of them can be answered here. The questions that I have involve on-the-ground processes in these places, process that can’t really be shown through this data set here. But at least I know that the pheonomenon I’m interested in asking questions about is in fact a real thing. This is indeed justification of some kind that I’m on the right track, and birders can be a valuable tool for studying the deepening prosaic geographies of stateness in post-conflict Colombia.