require(tidyverse)
require(magrittr)
require(pedigree)
require(inbreedR)

Readme and Motivation

This is an R Notebook. If you’d like to pre-rendered figures, read a summary of analysis and view code, please open the html file in a browser. If you’d like to run the code yourself download and open the rmd file in rstudio or any other integrated development environment that supports r and markdown.

The motivation of this notebook is to collect analyses associated with incorporating genetics into HexSim modeling of proposed NSO translocation and supplementation programs.

Inbreeding Proxy

One of the first challenges to overcome is how (and whether) to incorporate inbreeding depression into the model. First we must determine how to track the inbreeding coefficient ( F ) of an individual in the model. Then we can use F to reduce survival or reproduction using the number of lethal equivalents. Ideally we would keep track of inbreeding as determined by the pedigree ( Fp ). While HexSim can output the pedigree, it can not be read back into the model at each iteration/generation, so we need a workaround. We have proposed two strategies to track F in the model:

  1. Develop a trait that tracks recent ancestors, and use this trait to assign categorical inbreeding levels to individuals. In essence we are hacking together a shallow pedigree for each individual.

  2. Use multilocus heterozygosity (MLH). F is defined as the portion of the genome expected to be identical by descent (IBD). In the absence of mutation, all IBD will result in homozygosity. If each founder has a completely unique set of alleles, then all homozygosity in their descendants must be due to IBD. Therefore, we can use 1-MLH to estimate F. MLH will be relatively easy to define as a trait in HexSim.

Approach (1) presents some logistical challenges for implementation in HexSim and has a few (minor) shortcomings for biological realism: (a) it cannot capture variance in F that arises due to linkage/recombination and (b) it will only be practicable for a small number of generations - i.e. as we get farther along in time, this estimate of F will tend to underestimate Fp. Both of these shortcomings may be unimportant to us. If we can demonstrate that this metric tracks Fp for the time period we are most interested in, then we should probably go with this.

Approach (2) also has some issues. Most importantly, the number of markers required. With fewer markers we expect a noisier relationship the relationship between MLH and F.

Using MLH

For (2), we consider the pedigree and genotypes from a simulation run for 1600 generations, 100 individuals (randomly culled to 100 individuals each generation), 16 loci with 1000 alleles each (randomly assigned - this might cause a problem), and TLF = 2.

Below, I ask: How well does homozygosity track the inbreeding coefficient calculated from the pedigree?

Data Import, Cleanup and Calculation

First we need to get the data into R from the simulation, clean it up and convert it into heterozygosity, and a pedigree based estimate of the inbreeding coefficient

Data wrangling

# Data import
#First did some regex in a text editor to split the alleles at each locus into two columns
ped_genos <- read_tsv("data/pedigree_genos.txt")

ped <- ped_genos %>%
  select(ID = `Child ID`, dam = `Mother ID`, sire = `Father ID`)

genos <- ped_genos %>%
  select(ID = `Child ID`, generation = `Time Step`, starts_with("S"))

Ho

# calculate MLH
# let's use the popular package inbreedR to calculate MLH because it will be very fast

#first convert to heterozygosity format from inbreedR
genos_het <- genos %>%
  column_to_rownames("ID") %>%
  select(-generation) %>%
  convert_raw()

#then calculate the MLH and add to genos object
genos$mlh <- MLH(genos_het)
genos$smlh <- sMLH(genos_het)

#wow thats fast, note that we will have to make a trait in Hexsim that can do this calculation, but it should be relatively straightforward.

Let’s make a quick plot of multilocus heterozygosity over time as QC check.

ggplot(data = genos, aes(generation, mlh))+geom_point(alpha = 0.01)+geom_smooth()+theme_bw()+ylab("Multilocus Heterozygosity")

Yes, this look good. Drift leads to eventual fixation at all loci by ~1400 generations.

Inbreeding Coefficient

Next let’s calculate F from the pedigree. We will use the popular R package “Pedigree”

ped2 <- ped %>%
  rename(id = ID) %>%
  as.data.frame
F <- pedigree::calcInbreeding(ped2)

genos$F <- F

Evaluation

What is the relationship between MLH and Fp? How much variance is there?

ggplot(data = genos)+geom_point(aes(x = 1 - mlh, y = F), alpha = 0.01)+geom_smooth(aes(x = 1 - mlh, y = F), method = "lm")+geom_abline(slope = 1, intercept = 0) +theme_bw()+ylab("F\n Estimated as Inbreeding Coefficient from the Pedigree")+xlab("F\n Estimated as 1 - MLH")

Strong correlation between 1-MLH and Fp. There is also a lot of variance, suggesting that 16 markers may not be sufficient. Alternatively, this might be a desirable property of MLH. Fp will always underestimate the variance in IBD (Kardos 2015).

How does the metric perform for shallower pedigrees? Let’s only look at the data from the first 25 generations of the model (about 100 years for NSO)

ggplot(data = filter(genos, generation < 25))+
  geom_point(aes(x = 1 - mlh, y = F), alpha = 0.1)+
  geom_smooth(aes(x = 1 - mlh, y = F), method = "lm")+
  geom_abline(slope = 1, intercept = 0) +theme_bw() +theme_bw()+ylab("F\n Estimated as Inbreeding Coefficient from the Pedigree")+xlab("F\n Estimated as 1 - MLH")

Disagreement is much greater for shallower pedigrees, where it matters the most for our purposes!

Multilocus heterozygosity over estimates F relative to Fp. Importantly there are individuals with no inbreeding in the pedigree, but show high F according to MLH. How could homozygosity arise without IBD in this system? One source could be a violation of the assumption we laid out to make this all work: unique alleles in each founder. This condition was only approximated by setting the number of alleles at each locus to be 10 fold greater than the number of founders. This could be the source of the problem. I can’t assess it because the output files did not include founder genotypes.