ADD vegan options

Beta Diversity Across Scales Code: Setting up Data

This is code that I used to sample and calculate beta diversity using hexagonal bins for the ebird data. I’ll try and keep this brief and to the point so it’ll be easy to follow.

First we load the packages:

Now we add the data. For simplicity sake let me just do this with aggregated data from May and Dec of 2010. We will only use the May data.

MAY_ebird<-readRDS(file = "Data/ebird/MAY_ebird_2010.rds")
DEC_ebird<-readRDS(file = "Data/ebird/DEC_ebird_2010.rds")

We need to change this to point data with package sf.

MAY_ebird2<-MAY_ebird %>% st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = 4326) %>%
  st_transform(crs = st_crs("EPSG:5070"))
DEC_ebird<-DEC_ebird %>% st_as_sf(coords = c("LONGITUDE", "LATITUDE"), crs = st_crs("EPSG:5070")) %>% st_transform()

Now let’s get a hexgrid. I left you three different resolutions of hexbins. Here I’m just using the 1,000 km ones and I’m reprojecting it to EOSG:5070

hexgrid = readRDS("Data/Hexgrids/US_10000KM2.RDS")

grid<-st_as_sf(hexgrid,crs=st_crs("EPSG:5070")) %>%
  st_transform(grid, crs=st_crs("EPSG:5070")) %>%
  mutate(ID = 1:nrow(.))
grid

## Simple feature collection with 886 features and 1 field
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: -2353797 ymin: 321803.8 xmax: 2254131 ymax: 3163215
## Projected CRS: NAD83 / Conus Albers
## First 10 features:
##                                 x ID
## 1  POLYGON ((-2332343 2085878,...  1
## 2  POLYGON ((-2349881 2282125,...  2
## 3  POLYGON ((-2270591 1802064,...  3
## 4  MULTIPOLYGON (((-2314931 19...  4
## 5  POLYGON ((-2300069 2067245,...  5
## 6  POLYGON ((-2300069 2253366,...  6
## 7  POLYGON ((-2281915 2553087,...  7
## 8  POLYGON ((-2244487 1724953,...  8
## 9  POLYGON ((-2246340 1788063,...  9
## 10 MULTIPOLYGON (((-2246340 19... 10

plot(grid)

Now we intersect ebird points with the hexgrids! So let’s just use the Spring (MAY ebird dataset for this). **might take a while.

MAY_int<-sf::st_intersection(grid, MAY_ebird2)

## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries

Code above should give you point data of every occurrence in every bin.

ggplot() +
  geom_sf(data = hexgrid) +
  geom_sf(data = MAY_int)

For simplicity sake, let’s just get unique species occurrence per hexbin (unique species roster per hexbin and discounting abundance of occurences)

occurence_only_may<-MAY_int %>%
  group_by(ID) %>%
  filter(!duplicated(SCIENTIFICNAME)) %>% ungroup()

ggplot() +
  geom_sf(data = hexgrid) +
  geom_sf(data = occurence_only_may)

No visible loss of geographic coverage when counting only single occurrence of a species within each hexbin.

Nearest Neighbor analysis

To do the analysis, I built a bunch of function in the file: “Functions_nn.R” You can take a look at that file to see the code behind the functions:

source(file = "Functions_nn.R")

So first thing, we need to get the original hexgrid cells and remove any cells that had less than 3 species. To do this we first group the point data by hexbin ID and figure out which ID has less than three unique species. Then we filter out the hexgrid for those bins.

cells_to_keep<-occurence_only_may %>%
  mutate(n = 1) %>%
  group_by(ID) %>%
  summarise(S = n()) %>%
  ungroup() %>%
  filter(! S < 3) #here we remove anything where S is less than 3

#make a copy of the original hexgrid
grid2<-grid


#filter for bins that we have > 3 sp. for
sp_grids<-grid2 %>%
  filter(ID %in% cells_to_keep$ID) %>%
  dplyr::select(-ID) %>%
  mutate(ID = 1:nrow(.))


#intersect again with occurrence only points to rename IDS for your point data
occurence_only_may2<-st_intersection(sp_grids, occurence_only_may)

## Warning: attribute variables are assumed to be spatially constant throughout all
## geometries

Now we can see which cells are actually going to be used for the analyses (red are the cells we will include):

ggplot() +
  geom_sf(data = grid, fill = "black") +
  geom_sf(data = sp_grids, fill = "red")

Now let’s identify the neighbors for all those cells

#NOTE we are only looking for neighbors that contain birds! 
#That's why we are using the sp_grids polygon file

#We need to rename the IDs because we removed some hexbins
sp_grids2<-sp_grids 
 
MAY_neighbor<-st_touches(sp_grids2,remove_self =T)
MAY_neighbor

## Sparse geometry binary predicate list of length 778, where the
## predicate was `touches', with remove_self = TRUE
## first 10 elements:
##  1: 4, 5, 10
##  2: 5, 6, 11
##  3: 9, 15
##  4: 1, 9, 10, 16
##  5: 1, 2, 10, 11, 17
##  6: 2, 11, 12, 18
##  7: 12, 13, 19
##  8: 14, 15, 23
##  9: 3, 4, 15, 16, 24
##  10: 1, 4, 5, 16, 17, 25

The way this is set up is that hexcell ID 7 has three neighbors…hex bin 12, 13, and 19. If we plot these bins with hexbin 7 it looks like:

ggplot() +
  geom_sf(data = grid, fill = "black") +
  geom_sf(data = sp_grids %>% filter(ID %in% c(7,12,13,19)), fill = "red")

Now that we have a list of neighbors we calculate beta differences based on which ever hexcell we want!

To do this we need to create a matrix of species as rows and hexbins as columns (species x site). To do this we use a function from the file I previously showed called create_disim. This function will make a species x site matrix we need to calculate betas based on a dataset, a list of sites, and a column where we keep the species name. So for below, I will show an example of this function where we use the occurrence only POINT data from May and create a matrix of just several sites ( 7,12,13,19).

sxs<-create_disim(occurence_only_may2, c(7,12,13,19), sp_column = "SCIENTIFICNAME")

## Warning: Using an external vector in selections was deprecated in tidyselect 1.1.0.
## ℹ Please use `all_of()` or `any_of()` instead.
##   # Was:
##   data %>% select(sp_column)
## 
##   # Now:
##   data %>% select(all_of(sp_column))
## 
## See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.

sxs

## # A tibble: 184 × 5
##    SCIENTIFICNAME             `19`  `12`   `7`  `13`
##    <chr>                     <dbl> <dbl> <dbl> <dbl>
##  1 Sialia mexicana               1     1     0     0
##  2 Selasphorus rufus             1     1     1     1
##  3 Spinus psaltria               1     1     0     0
##  4 Troglodytes aedon             1     0     0     0
##  5 Pheucticus melanocephalus     1     1     1     1
##  6 Passerina amoena              1     1     0     1
##  7 Tachycineta bicolor           1     1     1     0
##  8 Poecile atricapillus          1     1     1     1
##  9 Aphelocoma californica        1     1     1     0
## 10 Buteo lineatus                1     1     0     0
## # … with 174 more rows

Now we can create matrices of nearest neighbors. How do we calculate an average beta? We will use the vegan package to calculate Koleff’s 333 betas

#ok so instead of stealing someone's code let's just use vegan's beta functions to calculate
#first we change the rownames to be our species name
rownames(sxs)<-sxs$SCIENTIFICNAME

## Warning: Setting row names on a tibble is deprecated.

sxs1<-t(sxs[,-1]) #then we transpose the matrix because betadiver takes a specific kind of species by site matrix
beta<-mean(betadiver(sxs1,"j"))
beta

## [1] 0.3252772

So the Jaccard’s metric between hexbin 7 (the reference bin) and hexbin 19 (one of three of its neighbors) is ~0.33. Now we need to do this with all of hexbin 7s’ neighbors but how can we do that with one line of code?

Getting to business

I have created a function calc_beta_mean_vegan that will calculate either all individual betas or a mean beta given a reference hexbin and a neighbor matrix (MAY_neighbor):

calc_beta_mean_vegan(refbinID  = "7", Neighborls = MAY_neighbor, data = occurence_only_may2, betatype = "j")

## [1] 0.3252772

Ideally we should be able to do now calculate an average beta for every cell with this function:

source("Functions_nn.R")

#this will give us the list of betas we can use
betadiver(help=TRUE)

## 1 "w" = (b+c)/(2*a+b+c)
## 2 "-1" = (b+c)/(2*a+b+c)
## 3 "c" = (b+c)/2
## 4 "wb" = b+c
## 5 "r" = 2*b*c/((a+b+c)^2-2*b*c)
## 6 "I" = log(2*a+b+c) - 2*a*log(2)/(2*a+b+c) - ((a+b)*log(a+b) +
## (a+c)*log(a+c)) / (2*a+b+c)
## 7 "e" = exp(log(2*a+b+c) - 2*a*log(2)/(2*a+b+c) - ((a+b)*log(a+b) +
## (a+c)*log(a+c)) / (2*a+b+c))-1
## 8 "t" = (b+c)/(2*a+b+c)
## 9 "me" = (b+c)/(2*a+b+c)
## 10 "j" = a/(a+b+c)
## 11 "sor" = 2*a/(2*a+b+c)
## 12 "m" = (2*a+b+c)*(b+c)/(a+b+c)
## 13 "-2" = pmin(b,c)/(pmax(b,c)+a)
## 14 "co" = (a*c+a*b+2*b*c)/(2*(a+b)*(a+c))
## 15 "cc" = (b+c)/(a+b+c)
## 16 "g" = (b+c)/(a+b+c)
## 17 "-3" = pmin(b,c)/(a+b+c)
## 18 "l" = (b+c)/2
## 19 "19" = 2*(b*c+1)/(a+b+c)/(a+b+c-1)
## 20 "hk" = (b+c)/(2*a+b+c)
## 21 "rlb" = a/(a+c)
## 22 "sim" = pmin(b,c)/(pmin(b,c)+a)
## 23 "gl" = 2*abs(b-c)/(2*a+b+c)
## 24 "z" = (log(2)-log(2*a+b+c)+log(a+b+c))/log(2)

#now with my function and lapply I can iterate and calculate beta for every grid
#cell in sp_grids
betas<-lapply(1:nrow(sp_grids), function(x){calc_beta_mean_vegan(refbinID = x, 
                                                      Neighborls  =MAY_neighbor,
                     data = occurence_only_may2,
                     betatype = 1) })

#make the results into  a clean dataframe
betas<-data.frame(Avg_beta = do.call(rbind, betas))
betas<-betas %>%
  mutate(ID = 1:nrow(.))

Now let’s join the betas back up with the hex grid to map it:

map_data<-left_join(sp_grids, betas, by  = "ID")

Firstnn<-ggplot() +
  geom_sf(data = map_data, aes(fill = Avg_beta)) +
  scale_fill_viridis_c() +
  labs(title = "1st Deg Neighbor Beta")
Firstnn

How about second order neighbors?

Doing the 2nd nearest neighbor gets a little confusing. Here it goes:

Let’s go back to the example of reference bin 7 and the neighbors of 12,13,19

MAY_neighbor[[7]]

## [1] 12 13 19

We need to get the neighbors of those neighbors and then cut out repeats:

#Let's combine the ref hexbin and its first degree neighbors
original<-c(7,MAY_neighbor[[7]])
#This is a list of the neighbors of the first degree neighbors of ref bin 7
new_nn<-unlist(MAY_neighbor[c(12,13,19)])

#Now filter out the list of neighbors from your original= to get only second degree neighbors
Second_nn<-new_nn[!new_nn %in% original]

#Remove duplicates
Second_nn<-Second_nn[!duplicated(Second_nn)]

Let’s see if that worked by plotting:

ggplot() +
  geom_sf(data = hexgrid) +
  geom_sf(data = sp_grids2 %>% filter(ID %in% original), fill = "red") +
  geom_sf(data = sp_grids2 %>% filter(ID %in% Second_nn), fill = "black") +
  labs(title = "Red = ref bin + 1st order neighbors\nBlack = 2nd deg neighbors")

So that’s how we figure out the second order neighbors. How do we do this for all the cells? I made a function to do that called get_neighbor1and2.

get_neighbor1and2(1, MAY_neighbor)

## [1]  9 10 16  2 11 17  4  5 25

We can apply it to all hexbins with lapply. This function gets me the first AND second neighbors for a given cell.

new_neigh_2<-lapply(1:nrow(sp_grids2),
                    function(x){get_neighbor1and2(x,MAY_neighbor)})

Now we have all the first and second order neighbors and guess what? WE CAN USE THE SAME FUNCTIONS TO CALCULATE NEW BETAS!

calc_beta_mean_vegan(refbinID  = "7", 
                     Neighborls = new_neigh_2, 
                     data = occurence_only_may2, betatype = 1)

## [1] 0.425945

So now we just use lapply again like we did previously! Note I change the argument from nrow to length.

average_beta<-lapply(1:length(new_neigh_2), 
                     function(x){calc_beta_mean_vegan(refbinID = paste(x), 
                                                       Neighborls = new_neigh_2, 
                     data = occurence_only_may2, betatype = 1)})

Now let’s join it to the hexgrid shapefile and plot it:

#make the results into  a clean dataframe
betas2<-data.frame(Avg_beta2 = do.call(rbind, average_beta))
betas2<-betas2 %>%
  mutate(ID = 1:nrow(.))

Now let’s join the betas back up with the hex grid to map it:

map_data2<-left_join(sp_grids, betas2, by  = "ID")

twodeg<-ggplot() +
  geom_sf(data = map_data2, aes(fill = Avg_beta2)) +
  scale_fill_viridis_c() +
  labs(title = "1st + 2nd deg Neighbor Beta")
twodeg

We can combine both plots:

library(patchwork)
twodeg + Firstnn + plot_layout(nrow = 2)

Neighbor Sampling GIS with VEGAN

Leo_Ohyama

2023-01-26

Beta Diversity Across Scales Code: Setting up Data

Nearest Neighbor analysis

Getting to business

How about second order neighbors?