The purpose of this project is to get an initial spatial-quantitative handle on birders in Colombia using data from eBird, a popular app among avian enthusiasts. The app is used for both personal and scientific purposes; individuals use it to keep track of their personal lists—what they’ve seen and where they’ve seen it—and researchers treat it as citizen-science data. Whereas most researchers take advantage of this data to track species, I use it to focus on the people making the observations—where they’re going, and how many of them are going there, and how those places may overlap with the post-conflict process. Hopefully, if things go according to plan, we’ll see that there is at least some meat on the bone of my idea that birders are a significant component of this initital wave of ‘re-territorialization.’
library(tidyr)
library(ggplot2)
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ tibble 1.4.2 ✔ dplyr 0.7.7
## ✔ readr 1.1.1 ✔ stringr 1.3.1
## ✔ purrr 0.2.5 ✔ forcats 0.3.0
## ── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(dplyr)
library(readr)
library(stringr)
library(ggmap)
library(tmap)
library(rgdal)
## Loading required package: sp
## rgdal: version: 1.3-6, (SVN revision 773)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
## Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rgdal/gdal
## GDAL binary built with GEOS: FALSE
## Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
## Path to PROJ.4 shared files: /Library/Frameworks/R.framework/Versions/3.5/Resources/library/rgdal/proj
## Linking to sp version: 1.3-1
library(sp)
library(sf)
## Linking to GEOS 3.6.1, GDAL 2.1.3, PROJ 4.9.3
library(maps)
##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
library(RColorBrewer)
library(tmaptools)
library(gganimate)
To get eBird data you merely need to submit a request on their website. You give them a short abstract of what your project is about, and, once approved, you can use their interface to do some initial filtering. I went ahead and grabbed all observations they have in Colombia.
eBird_base <- read_tsv("~/Desktop/Work/2018-19/1 Fall/GEOG 208/Project/Data/eBird_Oct_2018/ebd_CO_relOct-2018.txt")
## Warning: Missing column names filled in: 'X47' [47]
## Parsed with column specification:
## cols(
## .default = col_character(),
## `LAST EDITED DATE` = col_datetime(format = ""),
## `TAXONOMIC ORDER` = col_integer(),
## `OBSERVATION COUNT` = col_integer(),
## LATITUDE = col_double(),
## LONGITUDE = col_double(),
## `OBSERVATION DATE` = col_date(format = ""),
## `TIME OBSERVATIONS STARTED` = col_time(format = ""),
## `DURATION MINUTES` = col_integer(),
## `EFFORT DISTANCE KM` = col_double(),
## `NUMBER OBSERVERS` = col_integer(),
## `ALL SPECIES REPORTED` = col_integer(),
## `HAS MEDIA` = col_integer(),
## APPROVED = col_integer(),
## REVIEWED = col_integer()
## )
## See spec(...) for full column specifications.
## Warning in rbind(names(probs), probs_f): number of columns of result is not
## a multiple of vector length (arg 1)
## Warning: 409827 parsing failures.
## row # A tibble: 5 x 5 col row col expected actual file expected <int> <chr> <chr> <chr> <chr> actual 1 1431 OBSERVATIO… an integ… X '~/Desktop/Work/2018-19/1 Fall/GEOG … file 2 4481 OBSERVATIO… an integ… X '~/Desktop/Work/2018-19/1 Fall/GEOG … row 3 4494 OBSERVATIO… an integ… X '~/Desktop/Work/2018-19/1 Fall/GEOG … col 4 4496 OBSERVATIO… an integ… X '~/Desktop/Work/2018-19/1 Fall/GEOG … expected 5 4497 OBSERVATIO… an integ… X '~/Desktop/Work/2018-19/1 Fall/GEOG …
## ... ................. ... .......................................................................... ........ .......................................................................... ...... .......................................................................... .... .......................................................................... ... .......................................................................... ... .......................................................................... ........ ..........................................................................
## See problems(...) for more details.
#####…and having a look
str(eBird_base)
## Classes 'tbl_df', 'tbl' and 'data.frame': 2905351 obs. of 47 variables:
## $ GLOBAL UNIQUE IDENTIFIER : chr "URN:CornellLabOfOrnithology:EBIRD:OBS169258184" "URN:CornellLabOfOrnithology:EBIRD:OBS169276456" "URN:CornellLabOfOrnithology:EBIRD:OBS169274032" "URN:CornellLabOfOrnithology:EBIRD:OBS169260295" ...
## $ LAST EDITED DATE : POSIXct, format: "2015-07-20 22:22:34" "2015-07-21 12:24:08" ...
## $ TAXONOMIC ORDER : int 16271 12710 24218 33111 28026 28026 28026 16354 12253 6469 ...
## $ CATEGORY : chr "species" "species" "species" "species" ...
## $ COMMON NAME : chr "Amazonian Umbrellabird" "Barred Antshrike" "Black-capped Donacobius" "Blue-black Grassquit" ...
## $ SCIENTIFIC NAME : chr "Cephalopterus ornatus" "Thamnophilus doliatus" "Donacobius atricapilla" "Volatinia jacarina" ...
## $ SUBSPECIES COMMON NAME : chr NA NA NA NA ...
## $ SUBSPECIES SCIENTIFIC NAME : chr NA NA NA NA ...
## $ OBSERVATION COUNT : int 3 1 2 3 4 1 2 1 1 5 ...
## $ BREEDING BIRD ATLAS CODE : chr NA NA NA NA ...
## $ BREEDING BIRD ATLAS CATEGORY: chr NA NA NA NA ...
## $ AGE/SEX : chr NA NA NA NA ...
## $ COUNTRY : chr "Colombia" "Colombia" "Colombia" "Colombia" ...
## $ COUNTRY CODE : chr "CO" "CO" "CO" "CO" ...
## $ STATE : chr "Amazonas" "Amazonas" "Amazonas" "Amazonas" ...
## $ STATE CODE : chr "CO-AMA" "CO-AMA" "CO-AMA" "CO-AMA" ...
## $ COUNTY : chr NA NA NA NA ...
## $ COUNTY CODE : chr NA NA NA NA ...
## $ IBA CODE : chr NA NA NA NA ...
## $ BCR CODE : chr NA NA NA NA ...
## $ USFWS CODE : chr NA NA NA NA ...
## $ ATLAS BLOCK : chr NA NA NA NA ...
## $ LOCALITY : chr "Rio Amazonas--Isla de los Micos (Monkey Island)" "COLOMBIA: Amazonas; Leticia outskirts" "COLOMBIA: Amazonas; Leticia to Km 17" "COLOMBIA: Amazonas; Leticia outskirts" ...
## $ LOCALITY ID : chr "L881399" "L1783841" "L1802270" "L1783841" ...
## $ LOCALITY TYPE : chr "H" "P" "P" "P" ...
## $ LATITUDE : num -4.04 -4.2 -4.09 -4.2 -4.2 ...
## $ LONGITUDE : num -70.1 -69.9 -70 -69.9 -69.9 ...
## $ OBSERVATION DATE : Date, format: "1973-08-13" "1973-08-25" ...
## $ TIME OBSERVATIONS STARTED : 'hms' num 05:45:00 07:15:00 08:15:00 16:45:00 ...
## ..- attr(*, "units")= chr "secs"
## $ OBSERVER ID : chr "obsr30312" "obsr30312" "obsr30312" "obsr30312" ...
## $ SAMPLING EVENT IDENTIFIER : chr "S12011239" "S12012365" "S12012226" "S12011388" ...
## $ PROTOCOL TYPE : chr "Traveling" "Traveling" "Traveling" "Traveling" ...
## $ PROTOCOL CODE : chr "P22" "P22" "P22" "P22" ...
## $ PROJECT CODE : chr "EBIRD" "EBIRD" "EBIRD" "EBIRD" ...
## $ DURATION MINUTES : int 180 135 315 60 60 135 315 NA 315 60 ...
## $ EFFORT DISTANCE KM : num 1 2 17 1 1 2 17 NA 17 1 ...
## $ EFFORT AREA HA : chr NA NA NA NA ...
## $ NUMBER OBSERVERS : int 3 2 1 1 1 2 1 NA 1 1 ...
## $ ALL SPECIES REPORTED : int 1 1 1 1 1 1 1 0 1 1 ...
## $ GROUP IDENTIFIER : chr NA NA NA NA ...
## $ HAS MEDIA : int 0 0 0 0 0 0 0 0 0 0 ...
## $ APPROVED : int 1 1 1 1 1 1 1 1 1 1 ...
## $ REVIEWED : int 0 0 0 0 0 0 0 0 0 0 ...
## $ REASON : chr NA NA NA NA ...
## $ TRIP COMMENTS : chr "with Brian Meyers, Elsie Ritchie. At this point, I knew few voices, so these are largely visual detections only." "from Parador Ticuna to outskirts of town. 80-85, clear, calm" "trucked to KM 17, then walked back (0915-1230) on road to Km 11, and then return; 45 min spent in primary fores"| __truncated__ NA ...
## $ SPECIES COMMENTS : chr NA NA NA NA ...
## $ X47 : chr NA NA NA NA ...
## - attr(*, "problems")=Classes 'tbl_df', 'tbl' and 'data.frame': 409827 obs. of 5 variables:
## ..$ row : int 1431 4481 4494 4496 4497 4506 4507 4514 4516 4518 ...
## ..$ col : chr "OBSERVATION COUNT" "OBSERVATION COUNT" "OBSERVATION COUNT" "OBSERVATION COUNT" ...
## ..$ expected: chr "an integer" "an integer" "an integer" "an integer" ...
## ..$ actual : chr "X" "X" "X" "X" ...
## ..$ file : chr "'~/Desktop/Work/2018-19/1 Fall/GEOG 208/Project/Data/eBird_Oct_2018/ebd_CO_relOct-2018.txt'" "'~/Desktop/Work/2018-19/1 Fall/GEOG 208/Project/Data/eBird_Oct_2018/ebd_CO_relOct-2018.txt'" "'~/Desktop/Work/2018-19/1 Fall/GEOG 208/Project/Data/eBird_Oct_2018/ebd_CO_relOct-2018.txt'" "'~/Desktop/Work/2018-19/1 Fall/GEOG 208/Project/Data/eBird_Oct_2018/ebd_CO_relOct-2018.txt'" ...
## - attr(*, "spec")=List of 2
## ..$ cols :List of 47
## .. ..$ GLOBAL UNIQUE IDENTIFIER : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ LAST EDITED DATE :List of 1
## .. .. ..$ format: chr ""
## .. .. ..- attr(*, "class")= chr "collector_datetime" "collector"
## .. ..$ TAXONOMIC ORDER : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ CATEGORY : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ COMMON NAME : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ SCIENTIFIC NAME : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ SUBSPECIES COMMON NAME : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ SUBSPECIES SCIENTIFIC NAME : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ OBSERVATION COUNT : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ BREEDING BIRD ATLAS CODE : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ BREEDING BIRD ATLAS CATEGORY: list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ AGE/SEX : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ COUNTRY : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ COUNTRY CODE : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ STATE : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ STATE CODE : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ COUNTY : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ COUNTY CODE : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ IBA CODE : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ BCR CODE : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ USFWS CODE : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ ATLAS BLOCK : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ LOCALITY : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ LOCALITY ID : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ LOCALITY TYPE : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ LATITUDE : list()
## .. .. ..- attr(*, "class")= chr "collector_double" "collector"
## .. ..$ LONGITUDE : list()
## .. .. ..- attr(*, "class")= chr "collector_double" "collector"
## .. ..$ OBSERVATION DATE :List of 1
## .. .. ..$ format: chr ""
## .. .. ..- attr(*, "class")= chr "collector_date" "collector"
## .. ..$ TIME OBSERVATIONS STARTED :List of 1
## .. .. ..$ format: chr ""
## .. .. ..- attr(*, "class")= chr "collector_time" "collector"
## .. ..$ OBSERVER ID : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ SAMPLING EVENT IDENTIFIER : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ PROTOCOL TYPE : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ PROTOCOL CODE : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ PROJECT CODE : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ DURATION MINUTES : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ EFFORT DISTANCE KM : list()
## .. .. ..- attr(*, "class")= chr "collector_double" "collector"
## .. ..$ EFFORT AREA HA : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ NUMBER OBSERVERS : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ ALL SPECIES REPORTED : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ GROUP IDENTIFIER : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ HAS MEDIA : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ APPROVED : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ REVIEWED : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ REASON : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ TRIP COMMENTS : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ SPECIES COMMENTS : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ X47 : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## ..$ default: list()
## .. ..- attr(*, "class")= chr "collector_guess" "collector"
## ..- attr(*, "class")= chr "col_spec"
…and…
head(eBird_base)
## # A tibble: 6 x 47
## `GLOBAL UNIQUE … `LAST EDITED DATE` `TAXONOMIC ORDE… CATEGORY
## <chr> <dttm> <int> <chr>
## 1 URN:CornellLabO… 2015-07-20 22:22:34 16271 species
## 2 URN:CornellLabO… 2015-07-21 12:24:08 12710 species
## 3 URN:CornellLabO… 2015-08-21 09:29:12 24218 species
## 4 URN:CornellLabO… 2015-07-21 12:24:08 33111 species
## 5 URN:CornellLabO… 2015-07-21 12:24:08 28026 species
## 6 URN:CornellLabO… 2015-07-21 12:24:08 28026 species
## # ... with 43 more variables: `COMMON NAME` <chr>, `SCIENTIFIC
## # NAME` <chr>, `SUBSPECIES COMMON NAME` <chr>, `SUBSPECIES SCIENTIFIC
## # NAME` <chr>, `OBSERVATION COUNT` <int>, `BREEDING BIRD ATLAS
## # CODE` <chr>, `BREEDING BIRD ATLAS CATEGORY` <chr>, `AGE/SEX` <chr>,
## # COUNTRY <chr>, `COUNTRY CODE` <chr>, STATE <chr>, `STATE CODE` <chr>,
## # COUNTY <chr>, `COUNTY CODE` <chr>, `IBA CODE` <chr>, `BCR CODE` <chr>,
## # `USFWS CODE` <chr>, `ATLAS BLOCK` <chr>, LOCALITY <chr>, `LOCALITY
## # ID` <chr>, `LOCALITY TYPE` <chr>, LATITUDE <dbl>, LONGITUDE <dbl>,
## # `OBSERVATION DATE` <date>, `TIME OBSERVATIONS STARTED` <time>,
## # `OBSERVER ID` <chr>, `SAMPLING EVENT IDENTIFIER` <chr>, `PROTOCOL
## # TYPE` <chr>, `PROTOCOL CODE` <chr>, `PROJECT CODE` <chr>, `DURATION
## # MINUTES` <int>, `EFFORT DISTANCE KM` <dbl>, `EFFORT AREA HA` <chr>,
## # `NUMBER OBSERVERS` <int>, `ALL SPECIES REPORTED` <int>, `GROUP
## # IDENTIFIER` <chr>, `HAS MEDIA` <int>, APPROVED <int>, REVIEWED <int>,
## # REASON <chr>, `TRIP COMMENTS` <chr>, `SPECIES COMMENTS` <chr>,
## # X47 <chr>
…so this is pretty cool! There’s tons of stuff here, and this data would be useuful to a whole bunch of different researchers. But not all of it is useful to me. So my first order of business is 1) drop the things I don’t need to make this thing more manageable and 2) reformat the column titles so I’m not in backtick hell. So…
eBird_cleaner <- eBird_base %>%
select(-c(`LAST EDITED DATE`, `TAXONOMIC ORDER`, `CATEGORY`, `SUBSPECIES COMMON NAME`, `SUBSPECIES SCIENTIFIC NAME`, `BREEDING BIRD ATLAS CODE`, `BREEDING BIRD ATLAS CATEGORY`, `AGE/SEX`, `COUNTY`, `COUNTY CODE`, `IBA CODE`, `BCR CODE`, `USFWS CODE`, `ATLAS BLOCK`, `TIME OBSERVATIONS STARTED`, `DURATION MINUTES`, `EFFORT AREA HA`, `ALL SPECIES REPORTED`, `HAS MEDIA`, `APPROVED`, `REVIEWED`, `REASON`, `SPECIES COMMENTS`, `X47`, `PROJECT CODE`, `GROUP IDENTIFIER`, `TRIP COMMENTS`, `PROTOCOL TYPE`, `OBSERVATION COUNT`, `GLOBAL UNIQUE IDENTIFIER`))
names(eBird_cleaner) %<>% str_replace_all("\\s","_") %>% tolower
I clearly didn’t do this in the most efficient way possible. It would have been much smarter to select by column numbers, but honestly I had already made the mistake in a previous ‘draft’ so might has well go with ctrl+c and ctrl+v. Likewise, there are lots of things I still don’t need, like species names, etc., but it felt better to still have a more complete-feeling dataset. But we can see that it does look much better and more manageable:
head(eBird_cleaner)
## # A tibble: 6 x 17
## common_name scientific_name country country_code state state_code
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Amazonian … Cephalopterus … Colomb… CO Amaz… CO-AMA
## 2 Barred Ant… Thamnophilus d… Colomb… CO Amaz… CO-AMA
## 3 Black-capp… Donacobius atr… Colomb… CO Amaz… CO-AMA
## 4 Blue-black… Volatinia jaca… Colomb… CO Amaz… CO-AMA
## 5 Black-bill… Turdus ignobil… Colomb… CO Amaz… CO-AMA
## 6 Black-bill… Turdus ignobil… Colomb… CO Amaz… CO-AMA
## # ... with 11 more variables: locality <chr>, locality_id <chr>,
## # locality_type <chr>, latitude <dbl>, longitude <dbl>,
## # observation_date <date>, observer_id <chr>,
## # sampling_event_identifier <chr>, protocol_code <chr>,
## # effort_distance_km <dbl>, number_observers <int>
One thing I notice, though, is that the dates start in 1973. That’s too far back. Individuals can backdate sitings in eBird, which is why this information exists, but it’s not relevant to us, and has to be dropped. I chose 1998 because that’s essentially where the records become contiguous.
eBird_cleaner <- eBird_cleaner %>%
subset(observation_date > "1998-01-01")
So where do we begin? I have a dataframe with just shy of 2.9 million observations, each of which has locational data. So we know we have something to work with. I also have a feeling plotting 2.9 million observations isn’t going to look very good, and we’re going to have to chop things up to make it meaningful. But I want to start having a look anyways. So here’s what I’m going to do: first, grab a map of Colombia, and second, subset our data by points in relevant FARC-prone states, and then choose only the last few years, whcih should hopefully give us some kind of rough idea of the most recent incursions.
First, a map of Colombia (which I do by subsetting a world map)…
map_world <- map_data("world")
map_colombia <- map_world %>%
subset(region %in% "Colombia")
…and then chopping the data into a manageable piece. Since I’m interested in conflict areas, I select by a list of ‘conflict-prone’ states, which are all the state implicated in CTEP, which I explain below.
target <- c("Cauca", "Arauca", "Antioquia", "Norte de Santander", "Caquetá", "Chocó", "Meta", "Bolívar", "Nariño", "Putumayo", "Cesar", "Córdoba", "Tolima")
eBird_chopped <- eBird_cleaner %>%
filter(state %in% target) %>%
subset(observation_date > "2016-01-01")
…and then having a look….
ggplot() +
geom_polygon(data = map_world, aes(x = long, y = lat, group = group), color = "gray60", fill = "gray84") +
geom_polygon(data = map_colombia, aes(x = long, y = lat, group = group), color = "gray30", fill = "cornsilk2") +
geom_point(data = eBird_chopped, aes(x = longitude, y = latitude, color = state), size = .25) +
coord_quickmap(xlim = c(-80.5, -66), ylim = c(-4.75, 12.75)) +
theme(panel.background = element_rect(fill = "lightcyan2"),
panel.border = element_rect(color = "black", size = 1, fill = NA),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid = element_blank(),
axis.ticks = element_blank(),
axis.text = element_blank(),
legend.title = element_blank()) +
ggtitle("Having a look at recent observations...")
Lesson learned: coord_quickmap() is necessary for a problem-free zoom. coord_map() resulted in odd lines all over the place.
But as we can see above, this is interesting, but perhaps not everything we need. First, there’s a spatial mismatch problem. We can group things by state, but that’s a spatially incomplete view of the phenomenon in question. If we’re interested in an phenomenon that occurs across the Oregon/California border, aggregating the data by the state won’t give us an accurrate view. The ‘localities,’ likewise, in this data set aren’t fine-grained or complete enough. This means I’m going to need to add some additional spatial information, something best done outside of R. But don’t worry, we’ll be back.
The conflcit that the FARC launced more than half a century ago sprung from a lack of rural representation and incorporation. As part of the peace accord, the government and FARC reprsentatives agreed to pay special attention to areas with long histories of conflict. To do so, they agreed upon special administrative zones (CTEP, in their Spanish acronym), which would send additional representative members to congress as well as serve as territorial circumscriptions singled out for additional funding and investment opportunities. These areas, for our purposes, are the post-conflict areas. And they’re the the political-spatial demarcations we need to measure our phenomenon by. To do so, I relied on QGIS, which, combined with municipality-level shapefiles from Diva-GIS, allowed me to make a CTEP shapefile. I then used that CTEP shapefile to clip my eBird data and adjust the attribute table to reflect which circusmcription, along with state and municipality, observations fall in.
They look sort of like this:
map_co_shp <- readOGR("Shapefiles/Colombia Admin/Col_0/COL_adm0.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/travisbott/Desktop/Work/2018-19/1 Fall/GEOG 208/Project/eBird/Shapefiles/Colombia Admin/Col_0/COL_adm0.shp", layer: "COL_adm0"
## with 1 features
## It has 70 fields
## Integer64 fields read as strings: ID_0 OBJECTID_1
map_col_muni_shp <- readOGR("Shapefiles/Colombia Admin/Col_2/COL_adm2.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/travisbott/Desktop/Work/2018-19/1 Fall/GEOG 208/Project/eBird/Shapefiles/Colombia Admin/Col_2/COL_adm2.shp", layer: "COL_adm2"
## with 1065 features
## It has 11 fields
## Integer64 fields read as strings: ID_0 ID_1 ID_2
map_CTEP_shp <- readOGR("Shapefiles/CTEP_Merged/CTEP_Merged.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/travisbott/Desktop/Work/2018-19/1 Fall/GEOG 208/Project/eBird/Shapefiles/CTEP_Merged/CTEP_Merged.shp", layer: "CTEP_Merged"
## with 160 features
## It has 14 fields
## Integer64 fields read as strings: ID_0 ID_1 ID_2
At this point I learn that the sp/RGDAL package, from which readOGR is based, is incredibly slow. It produces a very complete spatial object but it does not handle large shapefiles well. The file below, which is essentially a shapefile of my entire eBird set, made it, but my clipped files did not. For the sake of not killing myself at export time, I’m shifting to an sf object.
points_obs_all <- read_sf("Shapefiles/ebird_cleaned_joined/ebird_cleaned_joined.shp")
For aesthetic purposes, I want background and context for my maps. To do so, I import a countries shapefile I happen to have lying around, I make an extent object (basically a matrix of x/y longitude/latitude values), which I can then use in the crop() function to clip down to a manageable size.
library(raster)
##
## Attaching package: 'raster'
## The following object is masked from 'package:gganimate':
##
## animate
## The following object is masked from 'package:dplyr':
##
## select
## The following object is masked from 'package:tidyr':
##
## extract
map_world_sf <- readOGR("Shapefiles/countries/ne_10m_admin_0_countries.shp")
## OGR data source with driver: ESRI Shapefile
## Source: "/Users/travisbott/Desktop/Work/2018-19/1 Fall/GEOG 208/Project/eBird/Shapefiles/countries/ne_10m_admin_0_countries.shp", layer: "ne_10m_admin_0_countries"
## with 255 features
## It has 94 fields
## Integer64 fields read as strings: POP_EST NE_ID
extent_object <- extent(-80.5, -66, -4.75, 12.75)
map_world_cropped <- crop(map_world_sf, extent_object)
This first map represents the work I did in QGIS, paring down municipalities to just the special CTEP jurisdictions I’m interested in.
tm_shape(map_world_cropped) +
tm_fill(col = "gray90") +
tm_borders("grey70") +
tm_shape(map_co_shp) +
tm_fill(col = "gray80", legend.show = FALSE) +
tm_borders("grey40") +
tm_shape(map_col_muni_shp) +
tm_fill(col = "gray80", legend.show = FALSE) +
tm_borders("grey70") +
tm_shape(map_CTEP_shp) +
tm_polygons(col = "layer", palette = brewer.pal("Dark2", n = 8), legend.show = FALSE) +
tm_borders("grey80") +
tm_layout(main.title = "CTEP, Colombia", bg.color = "lightcyan", inner.margins = c(0, 0, 0, 0))
## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).
Next, I can start thinking about more spatial analysis vis-a-vis eBird data. I’ll start by seeing where they fall.
tm_shape(map_world_cropped) +
tm_fill(col = "gray90") +
tm_borders("grey70") +
tm_shape(map_co_shp) +
tm_fill(col = "gray80", legend.show = FALSE) +
tm_borders("grey40") +
tm_shape(map_CTEP_shp) +
tm_polygons(col = "layer", palette = brewer.pal("Dark2", n = 8), legend.show = FALSE) +
tm_borders("grey80") +
tm_shape(points_obs_all) +
tm_dots(col = "coral1", size = .075, alpha = .4, legend.show = FALSE) +
tm_layout(main.title = "All eBird Observations", title.position = c("LEFT", "BOTTOM"), title.bg.color = "white", title.bg.alpha = .5, bg.color = "lightcyan", inner.margins = c(0, 0, 0, 0))
## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).
Turns out, all over the place. So, back to QGIS…The internet suggests that it’s definitely possible to perform a clip operation in R, but I chose the path of least resistance. And I’d do it again if I had to, see (which should be said in 1920’s mobster voice, cigar in mouth). I clipped my eBird obserations by the new jurisdictions, and imported that shapefile as an sf object (readOGR could not handle this file, despite it being smaller than the original file, used above. I think it was something that happened during clipping).
points_obs_CTEP <- read_sf("Shapefiles/Obs_by_CTEP_1998_Joined/Obs_by_CTEP_1998_Joined.shp")
Now, only those observations that fall within our areas of interest…
tm_shape(map_world_cropped) +
tm_fill(col = "gray90") +
tm_borders("grey70") +
tm_shape(map_co_shp) +
tm_fill(col = "gray85", legend.show = FALSE) +
tm_borders("grey40") +
tm_shape(map_CTEP_shp) +
tm_polygons(col = "layer", palette = brewer.pal("Dark2", n = 8), legend.show = FALSE) +
tm_borders("grey80") +
tm_shape(points_obs_CTEP) +
tm_dots(col = "coral1", size = .075, alpha = .4, legend.show = FALSE) +
tm_layout(main.title = "Observations in CTEPs Only", title.bg.color = "white", title.bg.alpha = .5, bg.color = "lightcyan", inner.margins = c(0, 0, 0, 0))
## Warning: One tm layer group has duplicated layer types, which are omitted.
## To draw multiple layers of the same type, use multiple layer groups (i.e.
## specify tm_shape prior to each of them).
For my purposes, particularly thinking about where I should train my gaze going forward, this map is extremely helpful. Not all of these areas are created equal, and some receive considerably more traffic than others. So if I, say, was planning on writing a proposal to fund a trip to one of these places, this kind of analysis could help justify my site choices.
This map, though, shows all observations, from 1998 onward, in these areas. We get no sense of changes over time. One potential option would be to filter our analysis by a particulary hard-hit zone, and then facet by year. But I’m not convinced that’s the best way. It would help eliminate seasonality in the analysis, which is good, but the human eye isn’t that good at distinguishing the number of dots across mulitple visuals. Instead, I’ll turn in this section to more ‘traditional’ visuals, using ggplot.
Like so many things in R, there are a lot of ways to skin your cats. As I mention above, I learned after-the-fact that clipping is possible on spatial objects in R (but a GIS is much easier). Likewise, there are simple-enough ways to extract data frames from your spatial objects in R. However, in this case, I did a bit of a workaround. Having done a spatial join in QGIS so my general eBird data set would carry the CTEP and finer-grained municipality data, I exported a .csv file to form the basis of the rest of my analysis. Here they are:
eBird_ctep <- read_csv("Obs_CTEP_1998.csv")
## Parsed with column specification:
## cols(
## .default = col_character(),
## field_1 = col_integer(),
## latitude = col_double(),
## longitude = col_double(),
## observatio = col_date(format = ""),
## effort_dis = col_double(),
## number_obs = col_integer(),
## Circ = col_integer()
## )
## See spec(...) for full column specifications.
head(eBird_ctep)
## # A tibble: 6 x 20
## field_1 common_nam scientific country country_co state state_code
## <int> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 1840462 Amazon Ki… Chlorocer… Colomb… CO Nari… CO-NAR
## 2 1840463 American … Haematopu… Colomb… CO Nari… CO-NAR
## 3 1840465 Blue-foot… Sula nebo… Colomb… CO Nari… CO-NAR
## 4 1840466 Black-bel… Pluvialis… Colomb… CO Nari… CO-NAR
## 5 1840467 Blackish … Pardirall… Colomb… CO Nari… CO-NAR
## 6 1840469 Blue-blac… Volatinia… Colomb… CO Nari… CO-NAR
## # ... with 13 more variables: locality <chr>, locality_i <chr>,
## # locality_t <chr>, latitude <dbl>, longitude <dbl>, observatio <date>,
## # observer_i <chr>, sampling_e <chr>, protocol_c <chr>,
## # effort_dis <dbl>, number_obs <int>, NAME_2 <chr>, Circ <int>
So, I now have a data set that’s all of the observations that occurred in the ares of interest from 1998-onward, with two additional columns indicating which circumscription they fall in and their proper municipality. Somehow during the R > GQIS > R cycle some of my label names got chopped. I’m ok with it for now. My goal here is to find if the number of observers is, in fact, increasing.
First thing’s first: let’s get a general trend for birders in Colombia:
total_obs_year <- eBird_cleaner %>%
group_by(year = floor_date(observation_date, "year")) %>%
subset(observation_date > "2010-01-01") %>%
summarize(unique_observers = n_distinct(observer_id))
ggplot(total_obs_year, aes(x = year, y = unique_observers)) +
geom_line(color = "red", size = 2) +
theme_minimal()
At least from this we know we might be on to something.
Then, to get into this question of birders as ‘agents’ in post-donflict Colombia, we need to filter by unique observer ID’s, which is essentially the person doing the identification, by each CTEP by, in this case, year. However, we also need to make sure it doesn’t count the same person appearing accross multiple years numerous times. I also want to change the Circ column to read as a character.
eBird_ctep$Circ <- as.character(eBird_ctep$Circ)
eBird_ctep$true_obs <- as.numeric(!duplicated(eBird_ctep$observer_i))
Transforming…
obs_by_month <- eBird_ctep %>%
group_by(month = floor_date(observatio, "month"), Circ) %>%
subset(observatio > "2010-01-01") %>%
summarize(total = sum(true_obs)) %>%
arrange(Circ)
…and plotting…
ggplot(obs_by_month, aes(x = month, y = total, color = Circ)) +
geom_smooth(se = FALSE) +
theme_minimal()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 14595
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 2510.6
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 1.1656e+06
## Warning in sqrt(sum.squares/one.delta): NaNs produced
From the above plot we see that there is certainly an increasing trend. One, Circumscription 12, has skyrocketed, but its rate seems pretty constant. For Circumscription 11 we see an odd dip in the middle of 2017. But in general, we see an increase, if modest, in unique observers. Let’s try by a different metric.
Numbers of observations isn’t a good metric; somebody could stand in one spot and make 50 observations, so that doesn’t tell us what we want to know. Total number of unique observers says something, to be sure. But there’s something in between, called checklists. “Checklists” are essentially ‘birding events;’ somebody might go out and create a checklist for the day, indicating the species they’d like to see. They’re essentially a measure of depth of penetration of a given observer, since there mere existance and number of observations don’t in themselves, say much. Below I do a similar plot to above, but by unique checklists.
checklists_by_month <- eBird_ctep %>%
group_by(month = floor_date(observatio, "month"), Circ) %>%
subset(observatio > "2010-01-01") %>%
summarize(checklists = n_distinct(sampling_e)) %>%
arrange(Circ)
ggplot(checklists_by_month, aes(x = month, y = checklists, color = Circ)) +
geom_smooth(se = FALSE) +
theme_minimal()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : span too small. fewer data values than degrees of freedom.
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : pseudoinverse used at 14595
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : neighborhood radius 2510.6
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : reciprocal condition number 0
## Warning in simpleLoess(y, x, w, span, degree = degree, parametric =
## parametric, : There are other near singularities as well. 1.1656e+06
## Warning in sqrt(sum.squares/one.delta): NaNs produced
The resulting pattern is similar, but we get the same general trend. Good to know. How can we quantify it differntly?
checklists_by_year <- eBird_ctep %>%
group_by(year = floor_date(observatio, "year"), Circ) %>%
subset(observatio > "2010-01-01") %>%
summarize(checklists = n_distinct(sampling_e)) %>%
filter(year == "2017-01-01")
ggplot(checklists_by_year, aes(x = Circ, y = checklists, fill = Circ)) +
geom_bar(stat = "identity") +
theme_minimal() +
theme(axis.text.x = element_blank(),
panel.grid.major = element_blank()) +
ggtitle("Number of Checklists by Circumscription, 2017")
I decided to sort by number of observations to see who my tob observers are.
top_observers <- eBird_cleaner %>%
mutate(count = 1) %>%
group_by(observer_id) %>%
summarize(total = sum(count)) %>%
arrange(desc(total)) %>%
top_n(20)
## Selecting by total
head(top_observers)
## # A tibble: 6 x 2
## observer_id total
## <chr> <dbl>
## 1 obsr693372 44017
## 2 obsr507040 40204
## 3 obsr205104 30905
## 4 obsr162659 28710
## 5 obsr354845 26502
## 6 obsr173592 23921
Observer numero uno, obsr693372, has made 44,017 observations. That’s ridiculous. But let’s have a look and see where he/she’s been.
obsr693372 <- eBird_cleaner %>%
filter(observer_id == "obsr693372")
observer_map <- ggplot() +
geom_polygon(data = map_world, aes(x = long, y = lat, group = group), color = "gray60", fill = "gray84") +
geom_polygon(data = map_colombia, aes(x = long, y = lat, group = group), color = "gray30", fill = "cornsilk2") +
geom_point(data = obsr693372, aes(x = longitude, y = latitude, color = "coral1", alpha = .01), size = 1, show.legend = FALSE) +
coord_quickmap(xlim = c(-80.5, -66), ylim = c(-4.75, 12.75)) +
theme(panel.background = element_rect(fill = "lightcyan2"),
panel.border = element_rect(color = "black", size = 1, fill = NA),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.grid = element_blank(),
axis.ticks = element_blank(),
axis.text = element_blank(),
legend.title = element_blank()) +
ggtitle("This guy gets around")
observer_map
gganimate code that I can’t seem to get to work…
transition_time(observation_date) + shadow_mark()
animate(observer_map, nframes = 800, fps = 80, device = ‘png’)