NPY Introduction: Background and Functional Regions

Neuropeptide Y, commonly known as NPY, is a 36 amino acid peptide and neuromodulator known for regulating many functions, such as neuronal development, pain modulation, thermogenesis, stress response, and food intake through the regulation of hormones such as corticotropin releasing hormone (CRH or CRF). The biological effects of NPY are heavily mediated by the modulation of four receptors, Y1, Y2, Y4 and Y5 (Sajdyk 2004). NPY receptor Y1 is heavily involved in sympathetic activity of the central nervous system (Wahlestedt 1986) and in previous studies, it has been found that reduced hypothalamic expression of the Y1 receptor can be induced by fasting and/or food deprivation (Mercer 2011). While there is evidence of expression of receptor Y1 outside of the human brain, receptor Y2 is limitly expressed outside of the human brain. While fasting has shown no effects on Y2 expression, Y2 agonists promote anorexigenic functions through reduced food intake and body weight, as well as better glucose regulation when given to mice while antagonists in satiated rats lead to an increase in feeding (Abbott 2005). Though noted as as a differential receptor of NPY, receptor Y4 is expressed at low levels and mainly linked to expression of the human pancreatic polypeptide (Dumont 1998). NPY receptor Y5 agonists express similar regulation of hypothalamic function as receptor Y1, and additionally, treatment of antagonists in humans revealed reduced food intakce and significant weight loss (Mercer 2011).

It was found in Dumont et. al. (1998) that differences in NPY receptor expression was prevalent among mammalian species, meaning that the differential functions of specific receptors is may be more advantageous to the physiological functions of one species over another. In Vervet monkeys (Genus Chlorocebus) , expression of receptors Y2, Y1 and Y5 have been found to be similar to expression in humans. Given this relationship, exploration of NPY and NPY receptor expression in vervet monkeys could be a good model for understanding how NPY and these receptors function in normal energy balance humans (Mercer 2011). Seeing that NPY is one of the most potent stimulants in food intake, it is possible patterns in BMI and body mass in wild vervets may be linked to differential expression of NPY in relation to these NPY receptors (particularly vervets associated with increased expression in Y1 and Y5). With this, significantly heavier populations in southern Africa are likely to have variants associated with increased food intake.

The role of NPY in food intake

NPY is recognized as one of the most potent orexigenic neuropeptides in the hypothalamus. As a highly conserved gene among mammals and non-mammals alike, it can be suggested that NPY has a vital role in physiology (Mercer 2011). By Intracerebroventricular (ICV) injection, it was found that NPY stimulates food intake. Through various animal models, it was observed that in obese subjects, levels of NPY gene expression and NPY release were elevated. Release of NPY was specifically elevated before the onset of feeding and decreased gradually as food intake proceeded. This increase in NPY is correlated with conditions of energy deprivation, as well as energy demand (Mercer 2011).

NPY within the hypothalamus is mainly supplied from specialized neurons in the ARC (Arcuate Nucleus) which is a regulator of energy homeostasis. This diagram shows the main pathway of NPY in the ARC, between NPY and pro-opiomelanocortin (POMC) and peptide cocaine- and amphetamine-regulated transcript (CART) neurons in the ARC. Here it can be seen how release of NPY to receptors Y1 and Y2 leads to the inhibition of POMC and CART neurons, which are anorexigenic (causing a loss of appetite) in function. It can also be seen that NPY stimulation of receptor Y5 leads to inhibition of melanocortin release.

NPY Y1 expression has been shown to induce NPY mediated food intake. When injected by ICV, it was shown that orexigenic activity occurred, resulting in stimulated food intake and induced body weight gain (Mercer 2011). In past experiments where NPY Y1 was knocked out, a significant reduction in NPY stimulated food intake occurred compared to the other controls used (Mercer 2011). In order to begin investigation as to the correlation between differential NPY expression mediated by specific receptors and differences in BMI and body mass in wild vervets, I will conduct analysis of NPY and NPY Y1 expression in populations of south African vervet monkeys.

Isolating NPY and NPY1

Though the aim of this research is to determine how variation in expression of NPY receptors may be indicative of certain population characheristics, such as increased body mass, BMI, and food intake, general expression of the NPY gene itself may also align with variation in such characteristics.

It would be enlightening to explore, if possible, which SNPs that express variation in the NPY gene align with expression of whichever receptor.

Isolating NPY

To begin, I will first isolate the NPY gene from the whole genome data provided in the vervet pipeline.

module load htslib
tabix -p vcf 163_201701_UG_vervet_monkey_SNPs_all_chrom_beagle_shapeit.vcf.gz
tabix -h 163_201701_UG_vervet_monkey_SNPs_all_chrom_beagle_shapeit.vcf.gz CAE21:34058530-34085127 > vervNPY.full.vcf

For some reason, the tabix code will not run in a chunk, so the code must be copied directly to the terminal window. Once indexed, the genome can be isolated at the desired position, in this case at the NPY genome.

less vervNPY.full.vcf
wc -l vervNPY.full.vcf
grep -c "^##" vervNPY.full.vcf
## 565 vervNPY.full.vcf
## 38

This shows that there are 565 lines in the VCF file, and 38 are metadata headlines. This means that there are 108 lines representing individual call variants in the NPY gene region out of the 527 vervets.

Functional Regions of NPY

To learn more about the NPY genomic region, I can look to ensembl to get the positions of the exonic regions

  • EXON 1 CAE21:34075127-34074940
  • EXON 2 CAE21: 34070881-34070801
  • EXON 3 CAE21:34068735-34068530

Isolating NPY1

Now I will isolate the NPY Y1 gene from the whole genome data provided in the vervet pipeline.

module load htslib
tabix -p vcf 163_201701_UG_vervet_monkey_SNPs_all_chrom_beagle_shapeit.vcf.gz
tabix -h 163_201701_UG_vervet_monkey_SNPs_all_chrom_beagle_shapeit.vcf.gz CAE7:109764918-109786905 > vervNPY1.full.vcf

As mentioned above, the tabix code will not run in a chunk, so the code must be copied directly to the terminal window. Once indexed, the genome can be isolated at the desired position, in this case at the NPY1 genome.

less vervNPY1.full.vcf

In order to determine the amount of variant calls in the genome region, I can use this code:

wc -l vervNPY1.full.vcf
grep -c "^##" vervNPY1.full.vcf
## 471 vervNPY1.full.vcf
## 38

This shows that there are 471 lines in the VCF file, and only 38 lines are metadata headlines. This means there are 433 lines for individual variant calls in the NPY1 gene region.

Functional NPY1 Regions

Once the entire gene has been isolated, I could then look to Ensembl in order to pick out the functional regions of NPY1. From Ensembl, I found that the functional regions include:

  • EXON1: CAE7:109776905-109776056.
  • EXON2: CAE7:109775955-109774918.

Hardy Weinburg Equilibrium (HWE)

Once the genes are isolated, we can now conduct analysis, beginning with Hardy Weinburg Equilibrium (HWE)

Gathering Files

Before I could begin editing, I was having issues opening the vervet.population.panel . From troubleshooting, I realized it was due to the fact that I did not have the file in my directory. In order to acquire the file, I checked in other directories that may have had the file, specifically caschmitt. From here I could copy the file into my directory:

cp ../caschmit/vervet.population.panel .

Once obtained, I could finally begin editing the file and analysis.

grep pygerythrus vervet.population.panel | cut -f1 > pyg.samples.list
gedit pyg.samples.list

The gedit command would’nt work for some reason, displaying the message (gedit:347): Gtk-WARNING **: 22:53:03.389: cannot open display: :0 so I had to open and edit the file using the vim command:

vim pyg.samples.list

From here I deleted the VSAI3005 and VBO individuals using command “d” and saved the new file using “:x!”.

Then I was able to make a vcf file of just pygerythrus:

HWE in NPY Pygerythrus

Vcftools will create a VCF with just the samples comprised of just the South African pygerythrus

module load vcftools
vcftools --vcf vervNPY.full.vcf --keep pyg.samples.list --recode --out pygNPY
mv pygNPY.recode.vcf pygNPY.vcf
## 
## VCFtools - 0.1.16
## (C) Adam Auton and Anthony Marcketta 2009
## 
## Parameters as interpreted:
##  --vcf vervNPY.full.vcf
##  --keep pyg.samples.list
##  --out pygNPY
##  --recode
## 
## Keeping individuals in 'keep' list
## After filtering, kept 51 out of 163 Individuals
## Outputting VCF file...
## After filtering, kept 526 out of a possible 526 Sites
## Run Time = 0.00 seconds
egrep -w 'pygerythrus|cynosuros|hilgerti' vervet.population.panel | cut -f1 > SEexp.samples.list

Another VCF can be made for the southern expansion population

module load vcftools
vcftools --vcf vervNPY.full.vcf --keep SEexp.samples.list --recode --out seNPY
mv seNPY.recode.vcf seNPY.vcf
## 
## VCFtools - 0.1.16
## (C) Adam Auton and Anthony Marcketta 2009
## 
## Parameters as interpreted:
##  --vcf vervNPY.full.vcf
##  --keep SEexp.samples.list
##  --out seNPY
##  --recode
## 
## Keeping individuals in 'keep' list
## After filtering, kept 73 out of 163 Individuals
## Outputting VCF file...
## After filtering, kept 526 out of a possible 526 Sites
## Run Time = 0.00 seconds

To see if there are enough samples to run a HWE, a power test can be used.

library(pwr)
pwr.chisq.test(w = 0.5, df = 1, sig.level = 0.05, power=0.95)
## 
##      Chi squared power calculation 
## 
##               w = 0.5
##               N = 51.97884
##              df = 1
##       sig.level = 0.05
##           power = 0.95
## 
## NOTE: N is the number of observations

This equation would be used for 52 samples, and though there are only 48 in pygerythrus and 73 in the southern expansion, it is the best fit for the data given. Now I can begin to do the HWE starting with the pygerthrus taxon.

NPY: South African Pygerthrus HWE

module load vcftools
vcftools --vcf pygNPY.vcf --hardy --out pygNPY
less pygNPY.hwe

Here all 526 SNPs were kept

vcftools.HWE<-read.table("pygNPY.hwe",header=TRUE)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.2.0     ✓ stringr 1.4.0
## ✓ readr   2.1.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
vcftools.HWE.sig <-
  vcftools.HWE %>%
  as_tibble(vcftools.HWE) %>%
  dplyr::select(CHR,POS,ChiSq_HWE,P_HWE) %>%
  filter(P_HWE <= 0.05)
knitr::kable(vcftools.HWE.sig, caption = 'South African pygerthrus SNPs')
South African pygerthrus SNPs
CHR POS ChiSq_HWE P_HWE
CAE21 34058832 13.878890 0.0116233
CAE21 34059166 13.878890 0.0116233
CAE21 34059328 5.000084 0.0378777
CAE21 34059431 6.106048 0.0250379
CAE21 34059435 13.878890 0.0116233
CAE21 34059685 13.878890 0.0116233
CAE21 34060020 6.106048 0.0250379
CAE21 34060108 5.296296 0.0224088
CAE21 34060249 6.106048 0.0250379
CAE21 34063486 13.878890 0.0116233
CAE21 34063575 13.878890 0.0116233
CAE21 34065174 13.878890 0.0116233
CAE21 34066013 5.230772 0.0293162
CAE21 34069785 6.305024 0.0130251
CAE21 34069848 9.587124 0.0077078
CAE21 34069864 7.680000 0.0143732
CAE21 34070401 20.650710 0.0315790
CAE21 34071796 13.878890 0.0116233
CAE21 34072148 5.185192 0.0228769
CAE21 34072414 48.000000 0.0105263
CAE21 34074427 4.409314 0.0376433
CAE21 34075017 5.086454 0.0288535
CAE21 34075345 5.086454 0.0288535
CAE21 34075768 4.005926 0.0465014
CAE21 34075960 5.086454 0.0288535
CAE21 34076108 11.916520 0.0038166
CAE21 34076144 4.005926 0.0465014
CAE21 34078625 6.829473 0.0266078
CAE21 34078706 4.005926 0.0465014
CAE21 34078799 48.000000 0.0105263
CAE21 34078953 6.829473 0.0266078
CAE21 34080030 6.305024 0.0130251
CAE21 34081189 13.878890 0.0116233
CAE21 34081340 14.031550 0.0007867
CAE21 34081427 5.259260 0.0437012
CAE21 34081476 5.259260 0.0437012
CAE21 34081704 4.800351 0.0411025
CAE21 34081920 5.259260 0.0437012
CAE21 34082004 9.781812 0.0038620
CAE21 34082228 5.259260 0.0437012
CAE21 34082417 19.934810 0.0050561
CAE21 34082628 5.259260 0.0437012
CAE21 34083131 9.187500 0.0030584
CAE21 34083478 5.259260 0.0437012
CAE21 34083662 5.259260 0.0437012
CAE21 34083771 5.259260 0.0437012
CAE21 34083929 14.072990 0.0001589
CAE21 34084060 9.781812 0.0038620

After running HWE in the South African pygerthrus, it was found that there were 20 SNPs that are out of HWE, meaning that in the pygerthrus population, there is variation resulting in evolution found in these sequences.

This shows that out of the 73 NPY sequences in the southern expansion, there are 20 sequences that are outside of HWE. The relevance of each SNP sequence is listed below: - CAE21 34058832
- CAE21 34059166
- CAE21 34059328
- CAE21 34059431
- CAE21 34059435
- CAE21 34059685
- CAE21 34060020
- CAE21 34060108
- CAE21 34060249
- CAE21 34063486
- CAE21 34063575
- CAE21 34065174
- CAE21 34066013
- CAE21:34069785 Intronic Variant - CAE21:34069848 Intronic Variant - CAE21:34069864 Intronic Variant - CAE21:34070401 Intronic Variant - CAE21:34071796 Intronic Variant - CAE21:34072148 Intronic Variant - CAE21:34072414 Intronic Variant - CAE21:34074427 Intronic Variant - CAE21:34075017 Synonymous Variant (no change to the encoded amino acid) -unsure if I want to list them all, there are many!

NPY: Southern Expansion HWE

module load vcftools
vcftools --vcf seNPY.vcf --hardy --out seNPY
less seNPY.hwe
vcftools.HWE<-read.table("seNPY.hwe",header=TRUE)
library(tidyverse)
vcftools.HWE.sig <-
  vcftools.HWE %>%
  as_tibble(vcftools.HWE) %>%
  dplyr::select(CHR,POS,ChiSq_HWE,P_HWE) %>%
  filter(P_HWE <= 0.05)
knitr::kable(vcftools.HWE.sig, caption = 'Southern Expansion HWE')
Southern Expansion HWE
CHR POS ChiSq_HWE P_HWE
CAE21 34058832 22.070120 0.0049921
CAE21 34059166 22.070120 0.0049921
CAE21 34059328 7.989901 0.0080698
CAE21 34059431 5.262497 0.0273389
CAE21 34059435 12.148780 0.0173751
CAE21 34059685 22.070120 0.0049921
CAE21 34060020 6.736909 0.0130994
CAE21 34060108 5.990589 0.0183223
CAE21 34060249 4.609415 0.0402947
CAE21 34060394 73.000000 0.0068966
CAE21 34060688 12.430060 0.0011354
CAE21 34063486 22.070120 0.0049921
CAE21 34063575 22.070120 0.0049921
CAE21 34064112 8.164983 0.0139226
CAE21 34065174 22.070120 0.0049921
CAE21 34066013 5.034996 0.0285534
CAE21 34066438 8.164983 0.0139226
CAE21 34067096 6.780227 0.0216437
CAE21 34067819 8.164983 0.0139226
CAE21 34069785 5.100963 0.0289483
CAE21 34069848 12.916160 0.0023409
CAE21 34069864 9.344020 0.0071133
CAE21 34070401 31.767360 0.0206897
CAE21 34071796 22.070120 0.0049921
CAE21 34072117 5.969926 0.0210159
CAE21 34072148 5.743848 0.0252939
CAE21 34072372 5.034996 0.0285534
CAE21 34072414 73.000000 0.0068966
CAE21 34073484 5.969926 0.0210159
CAE21 34074070 5.969926 0.0210159
CAE21 34074427 5.321700 0.0214643
CAE21 34075017 8.204678 0.0059548
CAE21 34075252 6.876338 0.0099801
CAE21 34075345 8.022223 0.0081531
CAE21 34075386 73.000000 0.0068966
CAE21 34075687 8.012347 0.0209903
CAE21 34075768 5.743848 0.0252939
CAE21 34075960 8.022223 0.0081531
CAE21 34076108 20.195350 0.0007092
CAE21 34076144 5.743848 0.0252939
CAE21 34077626 31.767360 0.0206897
CAE21 34078625 12.205950 0.0078355
CAE21 34078706 6.339308 0.0126096
CAE21 34078799 73.000000 0.0068966
CAE21 34078953 12.205950 0.0078355
CAE21 34080030 7.075205 0.0154694
CAE21 34081189 22.070120 0.0049921
CAE21 34081331 73.000000 0.0068966
CAE21 34081340 13.643000 0.0012831
CAE21 34081427 6.486039 0.0319006
CAE21 34081476 6.486039 0.0319006
CAE21 34081704 6.780227 0.0216437
CAE21 34081920 6.486039 0.0319006
CAE21 34082004 11.775930 0.0023733
CAE21 34082228 6.486039 0.0319006
CAE21 34082417 31.068870 0.0021600
CAE21 34082628 6.486039 0.0319006
CAE21 34083131 12.277350 0.0008424
CAE21 34083478 5.230242 0.0465494
CAE21 34083662 6.486039 0.0319006
CAE21 34083771 6.486039 0.0319006
CAE21 34083929 15.418290 0.0000870
CAE21 34084060 5.262497 0.0273389

Here there are also 62 regions of variation, which are actually the same regions listed above in the pygerthrus population.

HWE in NPY1

The process for creating a vcf file can be repeated for NPY1:

module load vcftools
vcftools --vcf vervNPY1.full.vcf --keep pyg.samples.list --recode --out pygNPY1
mv pygNPY1.recode.vcf pygNPY1.vcf
## 
## VCFtools - 0.1.16
## (C) Adam Auton and Anthony Marcketta 2009
## 
## Parameters as interpreted:
##  --vcf vervNPY1.full.vcf
##  --keep pyg.samples.list
##  --out pygNPY1
##  --recode
## 
## Keeping individuals in 'keep' list
## After filtering, kept 51 out of 163 Individuals
## Outputting VCF file...
## After filtering, kept 432 out of a possible 432 Sites
## Run Time = 0.00 seconds
egrep -w 'pygerythrus|cynosuros|hilgerti' vervet.population.panel | cut -f1 > SEexp.samples.list

Another VCF can be made for the southern expansion NPY1 population

module load vcftools
vcftools --vcf vervNPY1.full.vcf --keep SEexp.samples.list --recode --out seNPY1
mv seNPY1.recode.vcf seNPY1.vcf
## 
## VCFtools - 0.1.16
## (C) Adam Auton and Anthony Marcketta 2009
## 
## Parameters as interpreted:
##  --vcf vervNPY1.full.vcf
##  --keep SEexp.samples.list
##  --out seNPY1
##  --recode
## 
## Keeping individuals in 'keep' list
## After filtering, kept 73 out of 163 Individuals
## Outputting VCF file...
## After filtering, kept 432 out of a possible 432 Sites
## Run Time = 0.00 seconds

Since the power test was already done above, this step can be skipped.

NPY1: HWE in South African Pygerthrus

module load vcftools
vcftools --vcf pygNPY1.vcf --hardy --out pygNPY1
less pygNPY1.hwe
vcftools.HWE<-read.table("pygNPY1.hwe",header=TRUE)
library(tidyverse)
vcftools.HWE.sig1 <-
  vcftools.HWE %>%
  as_tibble(vcftools.HWE) %>%
  dplyr::select(CHR,POS,ChiSq_HWE,P_HWE) %>%
  filter(P_HWE <= 0.05)
knitr::kable(vcftools.HWE.sig1, caption = 'South African pygerthrus SNPs')

After running HWE in the South African pygerthrus, it was found that there were 10 SNPs that are out of HWE, meaning that in the pygerthrus population, 10 SNP variation resulting in evolution found in these sequences.

NPY1: HWE in Southern Expansion

module load vcftools
vcftools --vcf seNPY1.vcf --hardy --out seNPY1
less seNPY1.hwe
vcftools.HWE<-read.table("seNPY1.hwe",header=TRUE)
library(tidyverse)
vcftools.HWE.sig <-
  vcftools.HWE %>%
  as_tibble(vcftools.HWE) %>%
  dplyr::select(CHR,POS,ChiSq_HWE,P_HWE) %>%
  filter(P_HWE <= 0.05)
knitr::kable(vcftools.HWE.sig, caption = 'Southern Expansion HWE')
Southern Expansion HWE
CHR POS ChiSq_HWE P_HWE
CAE7 109766025 16.195390 0.0098413
CAE7 109766082 5.137139 0.0329729
CAE7 109766483 17.236310 0.0412346
CAE7 109766617 9.245614 0.0282634
CAE7 109766621 9.884091 0.0131802
CAE7 109766666 8.012347 0.0209903
CAE7 109766747 9.245614 0.0282634
CAE7 109766784 7.096299 0.0431318
CAE7 109766886 16.195390 0.0098413
CAE7 109767821 16.195390 0.0098413
CAE7 109767944 10.907990 0.0017473
CAE7 109768022 8.163399 0.0063574
CAE7 109768050 31.068870 0.0021600
CAE7 109768094 5.599291 0.0323911
CAE7 109768234 5.990589 0.0183223
CAE7 109768598 22.070120 0.0049921
CAE7 109768764 17.236310 0.0412346
CAE7 109769055 22.070120 0.0049921
CAE7 109769271 17.236310 0.0412346
CAE7 109769649 22.070120 0.0049921
CAE7 109769999 22.070120 0.0049921
CAE7 109770093 4.291186 0.0415112
CAE7 109770128 4.291186 0.0415112
CAE7 109770187 4.291186 0.0415112
CAE7 109770358 4.979165 0.0286948
CAE7 109770580 4.268649 0.0472498
CAE7 109770918 9.245614 0.0282634
CAE7 109770921 7.009596 0.0167820
CAE7 109771426 5.599291 0.0323911
CAE7 109771489 16.195390 0.0098413
CAE7 109771496 17.236310 0.0412346
CAE7 109772125 7.222928 0.0093910
CAE7 109772184 6.736909 0.0130994
CAE7 109772436 5.112970 0.0417347
CAE7 109773716 17.236310 0.0412346
CAE7 109774230 4.860916 0.0338828
CAE7 109775436 17.236310 0.0412346
CAE7 109775479 6.035482 0.0184038
CAE7 109775960 6.913583 0.0106484
CAE7 109776039 5.230242 0.0465494
CAE7 109777165 9.196390 0.0033099
CAE7 109777223 4.924113 0.0340392
CAE7 109777564 6.035482 0.0184038
CAE7 109778565 7.888899 0.0081322
CAE7 109778772 7.096299 0.0431318
CAE7 109779088 12.148780 0.0173751
CAE7 109779499 9.176697 0.0048670
CAE7 109779852 9.245614 0.0282634
CAE7 109779976 17.236310 0.0412346
CAE7 109780426 7.888899 0.0081322
CAE7 109781039 9.176697 0.0048670
CAE7 109781618 14.034140 0.0008135
CAE7 109781797 7.888899 0.0081322
CAE7 109782396 29.657410 0.0000001
CAE7 109782413 7.905991 0.0115980
CAE7 109784120 34.265290 0.0000000
CAE7 109784778 23.871700 0.0000356
CAE7 109785842 23.871700 0.0000356
CAE7 109786018 9.245614 0.0282634
CAE7 109786902 15.961550 0.0002307

This shows that out of the 73 NPY sequences in the southern expansion, there are 59 sequences that are outside of HWE. The relevance of each SNP sequence is listed below:

  • CAE7:109775436 3’ UTR Variant
  • CAE7:109775479 3’ UTR Variant
  • CAE7:109775960 Splice Region Variant;Intronic Variant
  • CAE7:109776039 Intronic Variant

Linkage Disequilibrium

First, all the positions in both genes need to be listed:

grep -v "^##" pygNPY.vcf | cut -f1-3 > pygNPY_loci.txt
grep -v "^##" seNPY.vcf | cut -f1-3 > seNPY_loci.txt
grep -v "^##" pygNPY1.vcf | cut -f1-3 > pygNPY1_loci.txt
grep -v "^##" seNPY1.vcf | cut -f1-3 > seNPY1_loci.txt

Then some specialty packages need to be downloaded

library("devtools")
library("BiocManager")
devtools::install_github("SFUStatgen/LDheatmap")
BiocManager::install("snpStats")
BiocManager::install("VariantAnnotation")
BiocManager::install("GenomicFeatures")
BiocManager::install("gpart")

Installing both BiocManager and devtools wouldn’t work in the chunk, so I had to individually install each package. I realized this was due to me accidentally trying to run the code in Terminal. RMariadb doesn’t seem to exist?

It turned out RMariadb was within the BiocManager package, which is why it would not appear in the list of packages available.

The positions for each gene can then be put into a vector of position numbers

positions_pygNPY<-read.table("pygNPY_loci.txt")
positions_pygNPY<-positions_pygNPY$V2
positions_seNPY<-read.table("seNPY_loci.txt")
positions_seNPY<-positions_seNPY$V2
positions_pygNPY1<-read.table("pygNPY1_loci.txt")
positions_pygNPY1<-positions_pygNPY1$V2
positions_seNPY1<-read.table("seNPY1_loci.txt")
positions_seNPY1<-positions_seNPY1$V2

Linkage Disequilibrium in pygerthrus

NPY

First the VCF file must be converted to a SNP matrix

library(tidyverse)
library(snpStats)
## Loading required package: survival
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
library(VariantAnnotation)
## Loading required package: BiocGenerics
## Loading required package: parallel
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
## 
##     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
##     clusterExport, clusterMap, parApply, parCapply, parLapply,
##     parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:dplyr':
## 
##     combine, intersect, setdiff, union
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, append, as.data.frame, basename, cbind, colnames,
##     dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
##     grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
##     order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
##     rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
##     union, unique, unsplit, which.max, which.min
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
## 
## Attaching package: 'matrixStats'
## The following object is masked from 'package:dplyr':
## 
##     count
## 
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars
## Loading required package: GenomeInfoDb
## Loading required package: S4Vectors
## Loading required package: stats4
## 
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:Matrix':
## 
##     expand, unname
## The following objects are masked from 'package:dplyr':
## 
##     first, rename
## The following object is masked from 'package:tidyr':
## 
##     expand
## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname
## Loading required package: IRanges
## 
## Attaching package: 'IRanges'
## The following objects are masked from 'package:dplyr':
## 
##     collapse, desc, slice
## The following object is masked from 'package:purrr':
## 
##     reduce
## Loading required package: GenomicRanges
## Loading required package: SummarizedExperiment
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians
## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians
## Loading required package: Rsamtools
## Loading required package: Biostrings
## Loading required package: XVector
## 
## Attaching package: 'XVector'
## The following object is masked from 'package:purrr':
## 
##     compact
## 
## Attaching package: 'Biostrings'
## The following object is masked from 'package:base':
## 
##     strsplit
## 
## Attaching package: 'VariantAnnotation'
## The following object is masked from 'package:stringr':
## 
##     fixed
## The following object is masked from 'package:base':
## 
##     tabulate
library(LDheatmap)

pygNPYvcf<-readVcf("pygNPY.vcf")
pygNPYmatrix<-genotypeToSnpMatrix(pygNPYvcf)
pygNPYvcf
## class: CollapsedVCF 
## dim: 526 51 
## rowRanges(vcf):
##   GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
## info(vcf):
##   DataFrame with 0 columns: 
## geno(vcf):
##   List of length 1: GT
## geno(header(vcf)):
##       Number Type   Description    
##    GT 1      String Phased Genotype
pygNPY_LD<-ld(pygNPYmatrix$genotypes,depth=525,stats="R.squared")

Since the data in pygerthrus NPY is not extremely large, it had to be reset to a depth of 525 for the matrix

cols=colorRampPalette(c("yellow","red"))(10)
image(pygNPY_LD,lwd=0,cuts=9,col.regions=cols,colorkey=TRUE)

The blocks of red indicates that there is a linkage block of selection occurring on this part of the gene

To incorporate the genetic distance of the linkages into the map, we can align the SNPs to the vervet reference:

library(GenomicFeatures)
## Loading required package: AnnotationDbi
## 
## Attaching package: 'AnnotationDbi'
## The following object is masked from 'package:dplyr':
## 
##     select
library(RMariaDB)

txdb <- makeTxDbFromEnsembl(organism="Chlorocebus sabaeus", server="useastdb.ensembl.org") #Create txdb for chlSab2
## Fetch transcripts and genes from Ensembl ...
## OK
##   (fetched 28078 transcripts from 27985 genes)
## Fetch exons and CDS from Ensembl ... OK
## Fetch chromosome names and lengths from Ensembl ...OK
## Gather the metadata ... OK
## Make the TxDb object ... OK
vcfbig <- readVcf("pygNPY.vcf", "txdb") #align data with vervet reference genome
ALLmatrix <- genotypeToSnpMatrix(vcfbig)
ALLmatrix
## $genotypes
## A SnpMatrix with  51 rows and  526 columns
## Row names:  VBOA1003 ... VSAM5007 
## Col names:  CAE21:34058537_A/G ... CAE21:34085036_G/A 
## 
## $map
## DataFrame with 526 rows and 4 columns
##              snp.names       allele.1           allele.2    ignore
##            <character> <DNAStringSet> <DNAStringSetList> <logical>
## 1   CAE21:34058537_A/G              A                  G     FALSE
## 2   CAE21:34058548_T/C              T                  C     FALSE
## 3   CAE21:34058562_C/T              C                  T     FALSE
## 4   CAE21:34058673_C/T              C                  T     FALSE
## 5   CAE21:34058753_G/A              G                  A     FALSE
## ...                ...            ...                ...       ...
## 522 CAE21:34084886_T/G              T                  G     FALSE
## 523 CAE21:34084887_G/A              G                  A     FALSE
## 524 CAE21:34084917_A/G              A                  G     FALSE
## 525 CAE21:34084994_C/T              C                  T     FALSE
## 526 CAE21:34085036_G/A              G                  A     FALSE

You can continue from here and create a heat map, however there’s an easier way to do this:

BiocManager::install("gpart")
## Bioconductor version 3.14 (BiocManager 1.30.16), R 4.1.2 (2021-11-01)
## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
##   re-install: 'gpart'
## Installation paths not writeable, unable to update packages
##   path: /share/pkg.7/r/4.1.2/install/lib64/R/library
##   packages:
##     actuar, AdaptFitOS, ade4, admisc, affxparser, affy, affyio, affyPLM, akima,
##     annotate, AnnotationFilter, AnnotationForge, AnnotationHub, antiword, ape,
##     apeglm, aplot, argparse, arrow, ashr, assertive.properties, babelgene,
##     batchelor, bayesplot, BBmisc, BDgraph, beachmat, BHC, bigleaf, bigmemory,
##     bigsnpr, bigsparser, bigstatsr, Biobase, BiocFileCache, BiocGenerics,
##     BiocIO, BiocNeighbors, BiocParallel, BiocSingular, biomaRt, Biostrings,
##     biovizBase, blob, bluster, BMA, bnlearn, brew, Brobdingnag, broom,
##     BSgenome, C50, Cairo, car, caret, Category, CDM, checkmate, circlize, cli,
##     clipr, cluster, commonmark, ComplexHeatmap, conquer, crayon, cubature,
##     Cubist, cytolib, DelayedArray, DelayedMatrixStats, densEstBayes,
##     densityClust, DEoptimR, desc, DESeq2, deSolve, DiffBind, DNAcopy,
##     doParallel, dplyr, DropletUtils, DT, edgeR, ensembldb, EpiModel, evaluate,
##     evd, expint, extRemes, fansi, fastGHQuad, ff, fftw, fishpond, fitdistrplus,
##     flexclust, float, foreach, formatR, future, future.apply, gam, gamlss,
##     gamlss.dist, gcrma, gdsfmt, gdtools, gee, genefilter, geneplotter, GENESIS,
##     GenomeInfoDb, GenomeInfoDbData, GenomicAlignments, GenomicRanges,
##     geojsonsf, geometry, gert, ggdendro, ggfun, ggplot2, git2r, glmmML,
##     glmmTMB, glmnet, glue, gmp, GO.db, GOstats, gower, gplots, graph,
##     GreyListChIP, GSEABase, gstat, GSVA, GWASTools, haven, HDF5Array, Hmisc,
##     httr, hwriter, igraph, impute, infotheo, IRanges, iterators, jomo,
##     jsonlite, KEGGREST, Kendall, kernlab, knitr, ks, lars, lavaan, leiden, lfe,
##     lhs, limma, linprog, lme4, LMest, lmtest, locfit, logspline, loo,
##     lpsymphony, magic, magrittr, maptools, MASS, MassSpecWavelet, Matching,
##     mathjaxr, MatrixGenerics, matrixStats, mbkmeans, MCMCpack, metap, metapod,
##     mgcv, minpack.lm, mlapi, MLEcens, mlegp, msigdbr, multtest, networkDynamic,
##     nimble, nlme, nloptr, officer, oligo, oligoClasses, openair, OpenMx,
##     openssl, pander, parallelly, parsedate, party, pbdZMQ, pcaMethods, pcaPP,
##     pdftools, pdist, penalized, peperr, plyr, polspline, polynom,
##     preprocessCore, processx, ProtGenerics, ps, pspline, psych, qgraph,
##     quanteda, quantmod, quantreg, quantsmooth, qvalue, ragg, RandomFieldsUtils,
##     randomForest, RBGL, rbibutils, RColorBrewer, Rcpp, RcppArmadillo,
##     RcppEigen, RcppGSL, RCurl, Rdpack, readxl, recipes, REddyProc,
##     ResidualMatrix, Rfast, rgdal, rgenoud, Rgraphviz, rhdf5, rhdf5filters,
##     Rhdf5lib, Rhtslib, RInside, rjags, rlang, rmarkdown, rmio, rms, robustbase,
##     rprojroot, RProtoBufLib, rrcov, Rsamtools, RSpectra, RSQLite, rstan,
##     rstanarm, rstantools, Rsubread, rsvg, rtracklayer, Rtsne, S4Vectors, sass,
##     ScaledMatrix, scales, scater, scattermore, scran, scuttle, segmented,
##     SeqArray, seqminer, SeqVarTools, seriation, sets, Seurat, SeuratObject, sf,
##     sfsmisc, shinystan, ShortRead, SHT, SingleCellExperiment, SingleR, SKAT,
##     sn, SNPRelate, sp, spaMM, sparseMatrixStats, spatstat.core, spatstat.data,
##     spatstat.geom, spatstat.linnet, spatstat.sparse, spatstat.utils, spikeslab,
##     splancs, statnet.common, subplex, SummarizedExperiment, survival, sva,
##     svglite, systemfonts, systemPipeR, tclust, testthat, text2vec, TH.data,
##     tibble, tidygraph, tidyselect, tidytree, tinytex, TMB, tmvtnorm, tseries,
##     TSP, tzdb, udunits2, umap, units, unmarked, uuid, V8, VariantAnnotation,
##     vctrs, vegan, VGAM, waldo, webshot, withr, xfun, xgboost, XML, XVector,
##     yaml, zlibbioc
## Old packages: 'BiocManager', 'LDheatmap'

The gpart package is now within BiocManager, so I had to install from within that package.

Finally, we can produce a heat map in order to visualize the linkage blocks.

library(gpart)
## Loading required package: grid
## 
## Attaching package: 'grid'
## The following object is masked from 'package:Biostrings':
## 
##     pattern
## Loading required package: Homo.sapiens
## Loading required package: OrganismDbi
## Loading required package: GO.db
## 
## Loading required package: org.Hs.eg.db
## 
## Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene
## Loading required package: TxDb.Hsapiens.UCSC.hg38.knownGene
pyg_NPY_res1 = BigLD(genofile = "pygNPY.vcf",LD="r2")
## data reformating done!
## working chromosome name: CAE21
## remove SNPs with MAF < 0.05
## Number of SNPs after prunning SNPs with MAF< 0.05 : 152
## start to split the whole sequence into sub-task regions
## there is only one sub-region!
## 2022-05-08 21:50:32 | chrCAE21:34058548-34085036 | sub-region: 1/1
## CLQ done!
constructLDblock done!
## 
## BigLD done!
pyg_NPY_res1
##     chr start.index end.index start.rsID end.rsID start.bp   end.bp
## 1 CAE21           2       526          .        . 34058548 34085036
nowcex=0.5
LDblockHeatmap(genofile = "pygNPY.vcf", chrN = "CAE21", geneDB = "ensembl", geneid = "hgnc_symbol", blocktype="bigld", LD="r2", type="png", filename = "heatmap_pyg_NPY_r2", CLQmode = "density", res = 600)

NPY Heat Map

This shows that there is only one linkage block created by BigLD, and the start and end points are given.

NPY1

First the VCF file must be converted to a SNP matrix

library(tidyverse)
library(snpStats)
library(VariantAnnotation)
library(LDheatmap)

pygNPY1vcf<-readVcf("pygNPY1.vcf")
pygNPY1matrix<-genotypeToSnpMatrix(pygNPY1vcf)
pygNPY1vcf
## class: CollapsedVCF 
## dim: 432 51 
## rowRanges(vcf):
##   GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
## info(vcf):
##   DataFrame with 0 columns: 
## geno(vcf):
##   List of length 1: GT
## geno(header(vcf)):
##       Number Type   Description    
##    GT 1      String Phased Genotype
pygNPY1_LD<-ld(pygNPY1matrix$genotypes,depth=431,stats="R.squared")

Since the data in pygerthrus NPY Y1 receptor is small, it had to be reset to a depth of 431 for the matrix

cols=colorRampPalette(c("yellow","red"))(10)
image(pygNPY1_LD,lwd=0,cuts=9,col.regions=cols,colorkey=TRUE)

The block of red indicates that there is a linkage block of selection occurring on this part of the gene

We can align the SNPs to the vervet reference as done above:

library(GenomicFeatures)
library(RMariaDB)

txdb <- makeTxDbFromEnsembl(organism="Chlorocebus sabaeus", server="useastdb.ensembl.org") #Create txdb for chlSab2
vcfbig <- readVcf("pygNPY1.vcf", "txdb") #align data with vervet reference genome
ALLmatrix <- genotypeToSnpMatrix(vcfbig)
ALLmatrix

However there’s an easier way to do this:

BiocManager::install("gpart")

The gpart package is now within BiocManager, so I had to install from within that package.

library(gpart)
pyg_npy1_res1 = BigLD(genofile = "pygNPY1.vcf",LD="r2")
## data reformating done!
## working chromosome name: CAE7
## remove SNPs with MAF < 0.05
## Number of SNPs after prunning SNPs with MAF< 0.05 : 104
## start to split the whole sequence into sub-task regions
## there is only one sub-region!
## 2022-05-08 21:50:33 | chrCAE7:109764999-109786902 | sub-region: 1/1
## CLQ done!
constructLDblock done!
## 
## BigLD done!
pyg_npy1_res1
##     chr start.index end.index start.rsID end.rsID  start.bp    end.bp
## 1  CAE7           2         3          .        . 109764999 109765029
## 2  CAE7          26        87          .        . 109766082 109768247
## 3  CAE7          97       202          .        . 109768697 109773839
## 4  CAE7         214       233          .        . 109775029 109777165
## 5  CAE7         234       243          .        . 109777223 109777564
## 6  CAE7         245       250          .        . 109777686 109778565
## 7  CAE7         253       254          .        . 109778772 109778809
## 8  CAE7         268       322          .        . 109779336 109781797
## 9  CAE7         334       406          .        . 109782396 109786018
## 10 CAE7         429       432          .        . 109786819 109786902
nowcex=0.5
LDblockHeatmap(genofile = "pygNPY1.vcf", chrN = "CAE7", geneDB = "ensembl", geneid = "hgnc_symbol", blocktype="bigld", LD="r2", type="png", filename = "heatmap_pyg_npy1_r2", CLQmode = "density", res = 600)

This code creates a LD heat map to help visualize the linkage blocks produced from the gpart code.

NPY1 Heat Map Here we can see that there are 7 linkage blocks that can be distinguished. This next code below will help in getting specific LD values between SNP points of interest that can be exported to excel and used later when trying to determine the loci we want to access for selection in the easiest manner possible.

library(Matrix)
library(tidyverse)

sparse2triples<-function(m) {
  SM = summary(m)
  D1 = m@Dimnames[[1]][SM[,1]]
  D2 = m@Dimnames[[2]][SM[,2]]
  data.frame(row=D1,col=D2,x=m@x)
}

LD2<-sparse2triples(pygNPY_LD)
LD2<-as_tibble(LD2)
LD3<-
  LD2 %>%
  filter(x!=NA)

write.csv(LD3,"pygnpy1_LDr2.csv")

Linkage Disequilibrium in Southern Expansion

Now I can repeat this process for finding linkage disequilibrium in both NPY and NPY1 in the Souther Expansion population

NPY

library(snpStats)
library(VariantAnnotation)
library(LDheatmap)

seNPYvcf<-readVcf("seNPY.vcf")
seNPYmatrix<-genotypeToSnpMatrix(seNPYvcf)
seNPYvcf
## class: CollapsedVCF 
## dim: 526 73 
## rowRanges(vcf):
##   GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
## info(vcf):
##   DataFrame with 0 columns: 
## geno(vcf):
##   List of length 1: GT
## geno(header(vcf)):
##       Number Type   Description    
##    GT 1      String Phased Genotype
seNPY_LD<-ld(seNPYmatrix$genotypes,depth=525,stats="R.squared")

Next we can visualize the linkage:

cols=colorRampPalette(c("yellow","red"))(10)
image(seNPY_LD,lwd=0,cuts=9,col.regions=cols,colorkey=TRUE)

And use the easy way to visualize the linkage in a heat map that incorporates the genetic distance:

library(gpart)
se_NPY_res1 = BigLD(genofile = "seNPY.vcf",LD="r2")
## data reformating done!
## working chromosome name: CAE21
## remove SNPs with MAF < 0.05
## Number of SNPs after prunning SNPs with MAF< 0.05 : 125
## start to split the whole sequence into sub-task regions
## there is only one sub-region!
## 2022-05-08 21:50:34 | chrCAE21:34059089-34084994 | sub-region: 1/1
## CLQ done!
constructLDblock done!
## 
## BigLD done!
se_NPY_res1
##     chr start.index end.index start.rsID end.rsID start.bp   end.bp
## 1 CAE21          11       525          .        . 34059089 34084994

This shows that the southern expansion NPY genomic regions fall into 1 distinct linkage block, and it gives us the specific start and end points for the blocks. This is very intriguing, as it shows that the NPY gene is completely coinherited.

Next I can visualize the linkage blocks similar to how I did with the pygerythrus population.

nowcex=0.5
LDblockHeatmap(genofile = "seNPY.vcf", chrN = "CAE21", geneDB = "ensembl", geneid = "hgnc_symbol", blocktype="bigld", LD="r2", type="png", filename = "heatmap_se_NPY_r2", CLQmode = "density", res = 600)
## data reformating done!
## Warning in if (assembly == "GRCh38") {: the condition has length > 1 and only
## the first element will be used
## load gene information from ensembl (assembly GRCh38)
## Number of SNPs after prunning SNPs with MAF< 0.05 : 125
## working chromosome name: CAE21
## remove SNPs with MAF < 0.05
## Number of SNPs after prunning SNPs with MAF< 0.05 : 125
## start to split the whole sequence into sub-task regions
## there is only one sub-region!
## 2022-05-08 21:50:46 | chrCAE21:34059089-34084994 | sub-region: 1/1
## CLQ done!
constructLDblock done!
## 
## BigLD done!
## This region does not overlap any gene region!
## Generating file....
## heatmap_se_NPY_r2.png

NPY Heat Map

NPY1

Once again, all of these steps can be repeated for the NPY Y1 receptor in the southern expansion population.

library(snpStats)
library(VariantAnnotation)
library(LDheatmap)

seNPY1vcf<-readVcf("seNPY1.vcf")
seNPY1matrix<-genotypeToSnpMatrix(seNPY1vcf)
seNPY1vcf
## class: CollapsedVCF 
## dim: 432 73 
## rowRanges(vcf):
##   GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
## info(vcf):
##   DataFrame with 0 columns: 
## geno(vcf):
##   List of length 1: GT
## geno(header(vcf)):
##       Number Type   Description    
##    GT 1      String Phased Genotype
seNPY1_LD<-ld(seNPY1matrix$genotypes,depth=431,stats="R.squared")

For NPY Y1, the bounds of the data is a bit smaller, so I had to reset the depth to 431.

The linkage blocks matrix can then be visualized into a heatmap:

cols=colorRampPalette(c("yellow","red"))(10)
image(seNPY1_LD,lwd=0,cuts=9,col.regions=cols,colorkey=TRUE)

And the linkage blocks can be characterized and visualized with the genetic distance incorporated:

library(gpart)
se_npy1_res1 = BigLD(genofile = "seNPY1.vcf",LD="r2")
## data reformating done!
## working chromosome name: CAE7
## remove SNPs with MAF < 0.05
## Number of SNPs after prunning SNPs with MAF< 0.05 : 119
## start to split the whole sequence into sub-task regions
## there is only one sub-region!
## 2022-05-08 21:50:49 | chrCAE7:109764999-109786902 | sub-region: 1/1
## CLQ done!
constructLDblock done!
## 
## BigLD done!
se_npy1_res1
##     chr start.index end.index start.rsID end.rsID  start.bp    end.bp
## 1  CAE7           2         3          .        . 109764999 109765029
## 2  CAE7          20        90          .        . 109765855 109768322
## 3  CAE7          97       113          .        . 109768697 109770013
## 4  CAE7         115       126          .        . 109770093 109770701
## 5  CAE7         133       146          .        . 109770921 109771383
## 6  CAE7         148       158          .        . 109771489 109771902
## 7  CAE7         161       169          .        . 109772125 109772447
## 8  CAE7         171       250          .        . 109772516 109778565
## 9  CAE7         253       254          .        . 109778772 109778809
## 10 CAE7         264       335          .        . 109779088 109782413
## 11 CAE7         352       359          .        . 109783893 109784174
## 12 CAE7         369       432          .        . 109784778 109786902

This shows that the southern expansion NPY genomic regions fall into 12 distinct linkage blocks, and it gives us the specific start and end points for the blocks.

Finally, we can see the final heatmap for NPY Y1 created below:

nowcex=0.5
LDblockHeatmap(genofile = "seNPY1.vcf", chrN = "CAE7", geneDB = "ensembl", geneid = "hgnc_symbol", blocktype="bigld", LD="r2", type="png", filename = "heatmap_se_npy1_r2", CLQmode = "density", res = 600)
## data reformating done!
## Warning in if (assembly == "GRCh38") {: the condition has length > 1 and only
## the first element will be used
## load gene information from ensembl (assembly GRCh38)
## Number of SNPs after prunning SNPs with MAF< 0.05 : 119
## working chromosome name: CAE7
## remove SNPs with MAF < 0.05
## Number of SNPs after prunning SNPs with MAF< 0.05 : 119
## start to split the whole sequence into sub-task regions
## there is only one sub-region!
## 2022-05-08 21:51:02 | chrCAE7:109764999-109786902 | sub-region: 1/1
## CLQ done!
constructLDblock done!
## 
## BigLD done!
## This region does not overlap any gene region!
## Generating file....
## heatmap_se_npy1_r2.png

NPY Heat Map

Population Structure

The final analysis I will be completing will be in order to determine the population structure. Analyzing population structure is important because it tells us whether the variation in the gene region is structured by phylogeny, and therefore due to something like genetic drift, or fundamentally different, showing that selection or something similar is altering the pattern of variation.

The two analyses used will be PCA (principal component analysis) and DAPC (discriminant analysis of principal components) plots to see if they show a pattern consistent with differentiation at the taxonomic level (pygerythrus, cynosuros, and hilgerti), as well as the population level. And we will test the significance of differentiation using Adonis (tests for molecular variation; multivariate AMOVA) and AMOVA (analysis of molecular variance)

First I need to download the necessary packages:

library(devtools)
## Loading required package: usethis
library(vcfR)
## 
##    *****       ***   vcfR   ***       *****
##    This is vcfR 1.12.0 
##      browseVignettes('vcfR') # Documentation
##      citation('vcfR') # Citation
##    *****       *****      *****       *****
library(adegenet)
## Loading required package: ade4
## 
## Attaching package: 'ade4'
## The following object is masked from 'package:Biostrings':
## 
##     score
## The following object is masked from 'package:GenomicRanges':
## 
##     score
## The following object is masked from 'package:BiocGenerics':
## 
##     score
## 
##    /// adegenet 2.1.5 is loaded ////////////
## 
##    > overview: '?adegenet'
##    > tutorials/doc/questions: 'adegenetWeb()' 
##    > bug reports/feature requests: adegenetIssues()
## 
## Attaching package: 'adegenet'
## The following object is masked from 'package:survival':
## 
##     strata
library(ggplot2)
library(vegan)
## Loading required package: permute
## 
## Attaching package: 'permute'
## The following object is masked from 'package:devtools':
## 
##     check
## Loading required package: lattice
## This is vegan 2.5-7
library(poppr)
## Registered S3 method overwritten by 'pegas':
##   method      from
##   print.amova ade4
## This is poppr version 2.9.3. To get started, type package?poppr
## OMP parallel support: available
library(dplyr)

And load the right VCF files:

NPY <- read.vcfR("seNPY.vcf", verbose = TRUE)
## Scanning file to determine attributes.
## File attributes:
##   meta lines: 38
##   header_line: 39
##   variant count: 526
##   column count: 82
## 
Meta line 38 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
##   Character matrix gt rows: 526
##   Character matrix gt cols: 82
##   skip: 0
##   nrows: 526
##   row_num: 0
## 
Processed variant: 526
## All variants processed
NPY1 <- read.vcfR("seNPY1.vcf", verbose = TRUE)
## Scanning file to determine attributes.
## File attributes:
##   meta lines: 38
##   header_line: 39
##   variant count: 432
##   column count: 82
## 
Meta line 38 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
##   Character matrix gt rows: 432
##   Character matrix gt cols: 82
##   skip: 0
##   nrows: 432
##   row_num: 0
## 
Processed variant: 432
## All variants processed

To organize my work, lets start with the background organizational population structure work for NPY.

NPY

The following code will convert the vcf file into a genind file, read the population panel data, and filter out all non SE-individuals. If this works, all the samples in the VCF and population panel should be the same, and the final line should read TRUE

NPY_genind <- vcfR2genind(NPY, sep = "[|/]")
pop.data <- read.table("vervet.population.panel", sep = "\t", header = FALSE)
colnames(pop.data) <- c("sample", "pop", "region", "country", "taxon")
pop.data<-filter(pop.data,taxon == "pygerythrus" | taxon == "hilgerti" | taxon == "cynosuros")
all(colnames(NPY@gt)[-1] == pop.data$AccessID)
## [1] TRUE

The vectors in the code above did not align due to there being 6 column names but only 5 values in the vector that it was labeling. So I had to delete one name and now the vectors align and the code is TRUE

The next code will assign the population panel data to GENIND, the the population panel as the population, and create a GENLIGHT object for PCA:

strata(NPY_genind) <- pop.data[(match(indNames(NPY_genind), pop.data$sample)), ] 
setPop(NPY_genind) <- ~pop
NPY_genlight <- vcfR2genlight(NPY)
strata(NPY_genlight) <- pop.data[(match(indNames(NPY_genlight), pop.data$sample)), ]
setPop(NPY_genlight) <- ~pop

Now we can repeat this process for NPY Y1!

NPY1

NPY1_genind <- vcfR2genind(NPY1, sep = "[|/]")
pop.data <- read.table("vervet.population.panel", sep = "\t", header = FALSE)
colnames(pop.data) <- c("sample", "region", "pop", "country", "taxon")
pop.data <-filter(pop.data,taxon == "pygerythrus" | taxon == "hilgerti" | taxon == "cynosuros")
all(colnames(NPY1@gt)[-1] == pop.data$AccessID)
## [1] TRUE
strata(NPY1_genind) <- pop.data[(match(indNames(NPY1_genind), pop.data$sample)), ] 
setPop(NPY1_genind) <- ~pop
NPY1_genlight <- vcfR2genlight(NPY1)
strata(NPY1_genlight) <- pop.data[(match(indNames(NPY1_genlight), pop.data$sample)), ]
setPop(NPY1_genlight) <- ~pop

Once this has been done for the gene and it’s receptor, we can move on to the analyses.

NPY Population Structure Analysis

PCA

Separation by taxon

setPop(NPY_genlight) <- ~taxon
pca <- glPca(NPY_genlight, nf = 2)
barplot(100*pca$eig/sum(pca$eig), col = heat.colors(50), main="PCA Eigenvalues",ylab="% of Variance Explained",xlab="Eigenvalues")

This graph will show the amount of variation from the original dataset that is kept in the PCA. This will help us determine the amount of PC’s to keep, usually those that explain the most variation would want to be kept.

Once this is determined, we can graph with ggplot.

pca.scores <- as.data.frame(pca$scores)
pca.scores$taxon <- pop(NPY_genlight)
p <- ggplot(pca.scores, aes(x=PC1, y=PC2, colour= taxon)) + geom_point(size=2) + stat_ellipse(level = 0.95, size = 1) + theme_bw()
print(p)

To see if these differences are significant, we can run the multivariate AMOVA:

adonis1 <- adonis(pca$scores ~ taxon, data = pca.scores, method='eu', na.rm = TRUE)
adonis1
## 
## Call:
## adonis(formula = pca$scores ~ taxon, data = pca.scores, method = "eu",      na.rm = TRUE) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##           Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)
## taxon      2     10.37  5.1848  1.2681 0.03496  0.303
## Residuals 70    286.21  4.0887         0.96504       
## Total     72    296.58                 1.00000

According to this output, about 35% of genomic variance of the NPY gene is accounted for at a taxonomic level, but the difference in variance is not significant, with a significance value of 0.291.

Separation by Population

setPop(NPY_genlight) <- ~pop
pca <- glPca(NPY_genlight, nf = 2)

barplot(100*pca$eig/sum(pca$eig), col = heat.colors(50), main="PCA Eigenvalues")
title(ylab="Percent of variance\nexplained", line = 2)
title(xlab="Eigenvalues", line = 1)

This code repeats the process of seeing how much variation from the original data is seen in the PCA for population. This will help us determine what PC’s we should keep.

The results can then be graphed into a ggplot

pca.scores <- as.data.frame(pca$scores)
pca.scores$pop <- pop(NPY_genlight)

p <- ggplot(pca.scores, aes(x=PC1, y=PC2, colour=pop)) + geom_point(size=2) + stat_ellipse(level = 0.95, size = 1) + theme_bw() + scale_color_manual(values=c("purple4","salmon","dark green","forest green","seagreen1","dodgerblue","green","pink", "red","black","light blue","blue3","lavender","yellow","grey","brown","cyan","black","light grey","orange","darkturquoise","darkgoldenrod1"))
print(p)
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Warning: Removed 14 row(s) containing missing values (geom_path).

ggsave("NPY_PCA.png",plot=p,width=5.5,height=4,units=c("in"),dpi=300)
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Warning: Removed 14 row(s) containing missing values (geom_path).

Now we can run the multivariate AMOVA using adonis to see if the differences are significant

adonis2 <- adonis(pca$scores ~pop, data = pca.scores, method='eu', na.rm = TRUE)
adonis2
## 
## Call:
## adonis(formula = pca$scores ~ pop, data = pca.scores, method = "eu",      na.rm = TRUE) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##           Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)    
## pop       21    176.01  8.3816  3.5454 0.59348  0.001 ***
## Residuals 51    120.57  2.3640         0.40652           
## Total     72    296.58                 1.00000           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This shows that there is a significant difference in populations (p<0.001), and population explains about 59.3% of the genomic variance in the gene region.

DAPC

Now we can move onto the DAPC population structure analysis!

dapc <- dapc(NPY_genind, n.pca = 2, n.da = 2)
dapc$grp
##  [1] Unknown          Unknown          Kasane           Kasane          
##  [5] Samburu          Mosiro           Naivasha         Kimana          
##  [9] Soetdoring       Soetdoring       Soetdoring       Gariep          
## [13] Gariep           Gariep           Gariep           Gariep          
## [17] Gariep           Gariep           Gariep           Gariep          
## [21] Gariep           Gariep           Sandveld         Sandveld        
## [25] Sandveld         Sandveld         Sandveld         Parys           
## [29] Zinkwazi         Zinkwazi         Zinkwazi         Zinkwazi        
## [33] Zinkwazi         Zinkwazi         Blythedale Beach Blythedale Beach
## [37] Blythedale Beach Blythedale Beach Blythedale Beach Kwela           
## [41] Kwela            Kwela            Anerley          Letsitele       
## [45] Letsitele        Shamwari         Shamwari         Shamwari        
## [49] Shamwari         Shamwari         Shamwari         Amakhala        
## [53] Amakhala         Amakhala         Amakhala         Amakhala        
## [57] Addo             Gate             Gate             Gate            
## [61] Gate             Gate             Gate             Mukambi         
## [65] Mukambi          Mukambi          Lilayi           Lilayi          
## [69] Livingstone      Livingstone      Livingstone      Livingstone     
## [73] Livingstone     
## 22 Levels: Unknown Kasane Samburu Mosiro Naivasha Kimana Soetdoring ... Livingstone
scatter(dapc, cex = 2, legend = TRUE, clabel = F, posi.leg = "bottomleft", scree.pca = TRUE,
        posi.pca = "topright", cleg = 0.75, col=c("purple4","pink","dark green","forest green","seagreen1","dodgerblue","green","salmon","red","black","light blue","blue3","lavender","yellow","grey","brown","cyan","black","light grey","orange","darkturquoise","darkgoldenrod1"))

Though there is generally much overlap in variation for NPY between populations and taxons, we can see a bit more clearly that the variation we do see in South Africa appears to scale away a bit from the rest of Africa along DA2, and within South Africa along DA1.

We’ve seen a few things from looking at the PCAs and DAPC from NPY. With the first graph, we see that there is no separation along PC1 and PC2 for the three taxons (pygerythrus, hilgerti and cynosuros). When we look at the sub-populations of each species in a PCA, we see that there’s yet again not a large amount of separation, but the values are shown significance by the Adonis resuls. The cluster of variation that is maintained to a small area within hilgerti and cynosuros taxons is made up of a couple of specific sub-populations not also found in the pygerythrus including but not limited to: Gate, Livingstone, Zinkwazi, Amakhala, Shamwari, Samburu, and Lilayi. There is also a variation in the Anerley population that is within only pygerythrus. In the DAPC, we again see the aforementioned populations clustering within hilgerti and cynosuros. In both graphs, it appears that the Gariep population is an area of overlap between these clusters of variation.

AMOVA

Now we can perform the AMOVA analysis:

library(poppr)
NPY_genlight2 <- popsub(NPY_genlight, exclude = c("Limpopo"))
amova1 <- poppr.amova(NPY_genlight2, ~taxon/pop)
amova1
## $call
## ade4::amova(samples = xtab, distances = xdist, structures = xstruct)
## 
## $results
##                             Df    Sum Sq  Mean Sq
## Between taxon                2  109.4299 54.71494
## Between pop Within taxon    19  678.4398 35.70736
## Between samples Within pop  51  843.4591 16.53841
## Within samples              73 1090.5000 14.93836
## Total                      145 2721.8288 18.77123
## 
## $componentsofcovariance
##                                             Sigma          %
## Variations  Between taxon               0.5565822   2.888387
## Variations  Between pop Within taxon    2.9746899  15.437171
## Variations  Between samples Within pop  0.8000287   4.151754
## Variations  Within samples             14.9383562  77.522689
## Total variations                       19.2696569 100.000000
## 
## $statphi
##                          Phi
## Phi-samples-total 0.22477311
## Phi-samples-pop   0.05083296
## Phi-pop-taxon     0.15896318
## Phi-taxon-total   0.02888387
amova.test <- randtest(amova1) 
plot(amova.test)

amova.test
## class: krandtest lightkrandtest 
## Monte-Carlo tests
## Call: randtest.amova(xtest = amova1)
## 
## Number of tests:   4 
## 
## Adjustment method for multiple comparisons:   none 
## Permutation number:   99 
##                         Test        Obs    Std.Obs   Alter Pvalue
## 1  Variations within samples 14.9383562 -6.0074447    less   0.01
## 2 Variations between samples  0.8000287  0.6320627 greater   0.26
## 3     Variations between pop  2.9746899  6.7976024 greater   0.01
## 4   Variations between taxon  0.5565822  1.2725286 greater   0.15

When we run the AMOVA on NPY looking for molecular variance, we can see that 2.8% of the variance is between taxa, 15.4% is between populations within taxa, 4.15% between samples within populations, and the remaining 77.5% is within samples outside these designations. This suggests that our population designations are relatively homogenous within themselves, suggesting good mixing at that level, but significantly stratified by population, and not significantly stratified by taxon.

Fst

library(hierfstat)
## 
## Attaching package: 'hierfstat'
## The following objects are masked from 'package:adegenet':
## 
##     Hs, read.fstat
wc(NPY_genind)
## $FST
## [1] 0.171096
## 
## $FIS
## [1] 0.05083296

This gives us FST among all of the populations combined (so that means that about 17.1% of the variants in NPY are not shared between populations), but this doesn’t tell us which populations are more or less different. Let’s look at the pairwise FST between each population:

NPYFst <- genet.dist(NPY_genind[1:73,],method="WC84")
NPYFst
##                       Unknown       Kasane      Samburu       Mosiro
## Kasane            0.115044248                                       
## Samburu           0.263157895  0.361111111                          
## Mosiro           -0.075812274 -0.222707424  0.000000000             
## Naivasha         -0.052631579 -0.105882353  0.000000000  0.000000000
## Kimana            0.056250000  0.144578313  0.000000000  0.000000000
## Soetdoring        0.009736330  0.033611623  0.089304258 -0.142302717
## Gariep            0.133840051  0.081888429  0.196413884  0.001223990
## Sandveld          0.093729589  0.164662147  0.084194978  0.021226415
## Parys             0.185840708  0.146341463  0.000000000  0.000000000
## Zinkwazi          0.196373057  0.190174039  0.288836292  0.141394026
## Blythedale Beach  0.194668391  0.243608119  0.267167382  0.180334395
## Kwela             0.178505705  0.248938178  0.131359852  0.130634775
## Anerley           0.542288557  0.574257426  0.000000000  0.000000000
## Letsitele         0.004739336 -0.026455026 -0.014492754 -0.268389662
## Shamwari          0.121543920  0.168794412  0.173590982  0.055290785
## Amakhala          0.253581856  0.239687767  0.337187789  0.206270627
## Addo              0.139175258  0.036585366  0.000000000  0.000000000
## Gate              0.186662784  0.029411765  0.471294766  0.042927794
## Mukambi           0.184128848  0.032129404  0.570637119  0.112359551
## Lilayi            0.165137615  0.135416667  0.485992692  0.127789047
## Livingstone       0.200241209  0.200047498  0.299526440  0.184931507
##                      Naivasha       Kimana   Soetdoring       Gariep
## Kasane                                                              
## Samburu                                                             
## Mosiro                                                              
## Naivasha                                                            
## Kimana            0.000000000                                       
## Soetdoring       -0.047281324 -0.112640801                          
## Gariep            0.020972735  0.107980367  0.049820223             
## Sandveld          0.033694810  0.007148531 -0.014263698  0.084685104
## Parys             0.000000000  0.000000000  0.004032258  0.087673689
## Zinkwazi          0.163301060  0.189121712  0.108393720  0.006833853
## Blythedale Beach  0.196261682  0.237656595  0.109523105  0.110862011
## Kwela             0.115671642  0.180878553  0.074468085  0.117444427
## Anerley           0.000000000  0.000000000  0.358974359  0.372542659
## Letsitele        -0.222857143 -0.023771791 -0.082713478  0.099833145
## Shamwari          0.101218162  0.052869739  0.079412065  0.190474410
## Amakhala          0.246618106  0.276394422  0.155108973  0.243288045
## Addo              0.000000000  0.000000000 -0.144078144  0.032539221
## Gate              0.185127202  0.317603171  0.136553112  0.177410667
## Mukambi           0.289532294  0.368627451  0.097087379  0.167560133
## Lilayi            0.229129663  0.389121339  0.038212981  0.058559051
## Livingstone       0.183394161  0.269731489  0.110080186  0.042914770
##                      Sandveld        Parys     Zinkwazi Blythedale Beach
## Kasane                                                                  
## Samburu                                                                 
## Mosiro                                                                  
## Naivasha                                                                
## Kimana                                                                  
## Soetdoring                                                              
## Gariep                                                                  
## Sandveld                                                                
## Parys            -0.003388554                                           
## Zinkwazi          0.049092518  0.209669142                              
## Blythedale Beach  0.052600473  0.099193548  0.097776406                 
## Kwela            -0.006312603 -0.035502959  0.135299625     -0.030628501
## Anerley           0.204768125  0.000000000  0.434503361      0.247747748
## Letsitele         0.086255699  0.026282854  0.202975558      0.143712198
## Shamwari          0.206980034  0.125993972  0.267888255      0.263986992
## Amakhala          0.306992139  0.186934673  0.363450417      0.346774194
## Addo              0.069264069  0.000000000  0.229097857      0.208762887
## Gate              0.257716535  0.359645404  0.299693252      0.316385688
## Mukambi           0.199585147  0.414464534  0.271457511      0.287076127
## Lilayi            0.091380652  0.243027888  0.135773318      0.146062541
## Livingstone       0.091019417  0.248466258  0.067273054      0.072798487
##                         Kwela      Anerley    Letsitele     Shamwari
## Kasane                                                              
## Samburu                                                             
## Mosiro                                                              
## Naivasha                                                            
## Kimana                                                              
## Soetdoring                                                          
## Gariep                                                              
## Sandveld                                                            
## Parys                                                               
## Zinkwazi                                                            
## Blythedale Beach                                                    
## Kwela                                                               
## Anerley           0.109375000                                       
## Letsitele         0.099875156  0.483028721                          
## Shamwari          0.239948224  0.487153839  0.034851430             
## Amakhala          0.343491480  0.616574586  0.103546205  0.070336354
## Addo              0.124452235  0.000000000 -0.446028513 -0.180581745
## Gate              0.366388723  0.670809077  0.134426230  0.230897704
## Mukambi           0.325503356  0.751099384  0.105159705  0.196299276
## Lilayi            0.142677298  0.609706775  0.032967033  0.169439687
## Livingstone       0.114726972  0.485425342  0.153104592  0.230138915
##                      Amakhala         Addo         Gate      Mukambi
## Kasane                                                              
## Samburu                                                             
## Mosiro                                                              
## Naivasha                                                            
## Kimana                                                              
## Soetdoring                                                          
## Gariep                                                              
## Sandveld                                                            
## Parys                                                               
## Zinkwazi                                                            
## Blythedale Beach                                                    
## Kwela                                                               
## Anerley                                                             
## Letsitele                                                           
## Shamwari                                                            
## Amakhala                                                            
## Addo             -0.403846154                                       
## Gate              0.277538297  0.175138852                          
## Mukambi           0.299324654  0.317343173  0.029970760             
## Lilayi            0.281333767  0.247376312  0.184278351  0.244705882
## Livingstone       0.313798220  0.204834606  0.242188319  0.265708368
##                        Lilayi
## Kasane                       
## Samburu                      
## Mosiro                       
## Naivasha                     
## Kimana                       
## Soetdoring                   
## Gariep                       
## Sandveld                     
## Parys                        
## Zinkwazi                     
## Blythedale Beach             
## Kwela                        
## Anerley                      
## Letsitele                    
## Shamwari                     
## Amakhala                     
## Addo                         
## Gate                         
## Mukambi                      
## Lilayi                       
## Livingstone       0.145811252

Now that we have that, we can create a heatmap to visualize FST using {ggplot2}. The output that {hierfstat} gave us is a dist class file, so first we will use the {reshape2} package to make it a matrix.

library(reshape2)
## 
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
## 
##     smiths
NPYFst <- as.matrix(NPYFst)
NPYFst
##                       Unknown      Kasane     Samburu      Mosiro    Naivasha
## Unknown           0.000000000  0.11504425  0.26315789 -0.07581227 -0.05263158
## Kasane            0.115044248  0.00000000  0.36111111 -0.22270742 -0.10588235
## Samburu           0.263157895  0.36111111  0.00000000  0.00000000  0.00000000
## Mosiro           -0.075812274 -0.22270742  0.00000000  0.00000000  0.00000000
## Naivasha         -0.052631579 -0.10588235  0.00000000  0.00000000  0.00000000
## Kimana            0.056250000  0.14457831  0.00000000  0.00000000  0.00000000
## Soetdoring        0.009736330  0.03361162  0.08930426 -0.14230272 -0.04728132
## Gariep            0.133840051  0.08188843  0.19641388  0.00122399  0.02097274
## Sandveld          0.093729589  0.16466215  0.08419498  0.02122642  0.03369481
## Parys             0.185840708  0.14634146  0.00000000  0.00000000  0.00000000
## Zinkwazi          0.196373057  0.19017404  0.28883629  0.14139403  0.16330106
## Blythedale Beach  0.194668391  0.24360812  0.26716738  0.18033439  0.19626168
## Kwela             0.178505705  0.24893818  0.13135985  0.13063477  0.11567164
## Anerley           0.542288557  0.57425743  0.00000000  0.00000000  0.00000000
## Letsitele         0.004739336 -0.02645503 -0.01449275 -0.26838966 -0.22285714
## Shamwari          0.121543920  0.16879441  0.17359098  0.05529078  0.10121816
## Amakhala          0.253581856  0.23968777  0.33718779  0.20627063  0.24661811
## Addo              0.139175258  0.03658537  0.00000000  0.00000000  0.00000000
## Gate              0.186662784  0.02941176  0.47129477  0.04292779  0.18512720
## Mukambi           0.184128848  0.03212940  0.57063712  0.11235955  0.28953229
## Lilayi            0.165137615  0.13541667  0.48599269  0.12778905  0.22912966
## Livingstone       0.200241209  0.20004750  0.29952644  0.18493151  0.18339416
##                        Kimana   Soetdoring      Gariep     Sandveld
## Unknown           0.056250000  0.009736330 0.133840051  0.093729589
## Kasane            0.144578313  0.033611623 0.081888429  0.164662147
## Samburu           0.000000000  0.089304258 0.196413884  0.084194978
## Mosiro            0.000000000 -0.142302717 0.001223990  0.021226415
## Naivasha          0.000000000 -0.047281324 0.020972735  0.033694810
## Kimana            0.000000000 -0.112640801 0.107980367  0.007148531
## Soetdoring       -0.112640801  0.000000000 0.049820223 -0.014263698
## Gariep            0.107980367  0.049820223 0.000000000  0.084685104
## Sandveld          0.007148531 -0.014263698 0.084685104  0.000000000
## Parys             0.000000000  0.004032258 0.087673689 -0.003388554
## Zinkwazi          0.189121712  0.108393720 0.006833853  0.049092518
## Blythedale Beach  0.237656595  0.109523105 0.110862011  0.052600473
## Kwela             0.180878553  0.074468085 0.117444427 -0.006312603
## Anerley           0.000000000  0.358974359 0.372542659  0.204768125
## Letsitele        -0.023771791 -0.082713478 0.099833145  0.086255699
## Shamwari          0.052869739  0.079412065 0.190474410  0.206980034
## Amakhala          0.276394422  0.155108973 0.243288045  0.306992139
## Addo              0.000000000 -0.144078144 0.032539221  0.069264069
## Gate              0.317603171  0.136553112 0.177410667  0.257716535
## Mukambi           0.368627451  0.097087379 0.167560133  0.199585147
## Lilayi            0.389121339  0.038212981 0.058559051  0.091380652
## Livingstone       0.269731489  0.110080186 0.042914770  0.091019417
##                         Parys    Zinkwazi Blythedale Beach        Kwela
## Unknown           0.185840708 0.196373057       0.19466839  0.178505705
## Kasane            0.146341463 0.190174039       0.24360812  0.248938178
## Samburu           0.000000000 0.288836292       0.26716738  0.131359852
## Mosiro            0.000000000 0.141394026       0.18033439  0.130634775
## Naivasha          0.000000000 0.163301060       0.19626168  0.115671642
## Kimana            0.000000000 0.189121712       0.23765660  0.180878553
## Soetdoring        0.004032258 0.108393720       0.10952311  0.074468085
## Gariep            0.087673689 0.006833853       0.11086201  0.117444427
## Sandveld         -0.003388554 0.049092518       0.05260047 -0.006312603
## Parys             0.000000000 0.209669142       0.09919355 -0.035502959
## Zinkwazi          0.209669142 0.000000000       0.09777641  0.135299625
## Blythedale Beach  0.099193548 0.097776406       0.00000000 -0.030628501
## Kwela            -0.035502959 0.135299625      -0.03062850  0.000000000
## Anerley           0.000000000 0.434503361       0.24774775  0.109375000
## Letsitele         0.026282854 0.202975558       0.14371220  0.099875156
## Shamwari          0.125993972 0.267888255       0.26398699  0.239948224
## Amakhala          0.186934673 0.363450417       0.34677419  0.343491480
## Addo              0.000000000 0.229097857       0.20876289  0.124452235
## Gate              0.359645404 0.299693252       0.31638569  0.366388723
## Mukambi           0.414464534 0.271457511       0.28707613  0.325503356
## Lilayi            0.243027888 0.135773318       0.14606254  0.142677298
## Livingstone       0.248466258 0.067273054       0.07279849  0.114726972
##                    Anerley    Letsitele    Shamwari    Amakhala        Addo
## Unknown          0.5422886  0.004739336  0.12154392  0.25358186  0.13917526
## Kasane           0.5742574 -0.026455026  0.16879441  0.23968777  0.03658537
## Samburu          0.0000000 -0.014492754  0.17359098  0.33718779  0.00000000
## Mosiro           0.0000000 -0.268389662  0.05529078  0.20627063  0.00000000
## Naivasha         0.0000000 -0.222857143  0.10121816  0.24661811  0.00000000
## Kimana           0.0000000 -0.023771791  0.05286974  0.27639442  0.00000000
## Soetdoring       0.3589744 -0.082713478  0.07941206  0.15510897 -0.14407814
## Gariep           0.3725427  0.099833145  0.19047441  0.24328804  0.03253922
## Sandveld         0.2047681  0.086255699  0.20698003  0.30699214  0.06926407
## Parys            0.0000000  0.026282854  0.12599397  0.18693467  0.00000000
## Zinkwazi         0.4345034  0.202975558  0.26788825  0.36345042  0.22909786
## Blythedale Beach 0.2477477  0.143712198  0.26398699  0.34677419  0.20876289
## Kwela            0.1093750  0.099875156  0.23994822  0.34349148  0.12445223
## Anerley          0.0000000  0.483028721  0.48715384  0.61657459  0.00000000
## Letsitele        0.4830287  0.000000000  0.03485143  0.10354621 -0.44602851
## Shamwari         0.4871538  0.034851430  0.00000000  0.07033635 -0.18058175
## Amakhala         0.6165746  0.103546205  0.07033635  0.00000000 -0.40384615
## Addo             0.0000000 -0.446028513 -0.18058175 -0.40384615  0.00000000
## Gate             0.6708091  0.134426230  0.23089770  0.27753830  0.17513885
## Mukambi          0.7510994  0.105159705  0.19629928  0.29932465  0.31734317
## Lilayi           0.6097068  0.032967033  0.16943969  0.28133377  0.24737631
## Livingstone      0.4854253  0.153104592  0.23013891  0.31379822  0.20483461
##                        Gate    Mukambi     Lilayi Livingstone
## Unknown          0.18666278 0.18412885 0.16513761  0.20024121
## Kasane           0.02941176 0.03212940 0.13541667  0.20004750
## Samburu          0.47129477 0.57063712 0.48599269  0.29952644
## Mosiro           0.04292779 0.11235955 0.12778905  0.18493151
## Naivasha         0.18512720 0.28953229 0.22912966  0.18339416
## Kimana           0.31760317 0.36862745 0.38912134  0.26973149
## Soetdoring       0.13655311 0.09708738 0.03821298  0.11008019
## Gariep           0.17741067 0.16756013 0.05855905  0.04291477
## Sandveld         0.25771653 0.19958515 0.09138065  0.09101942
## Parys            0.35964540 0.41446453 0.24302789  0.24846626
## Zinkwazi         0.29969325 0.27145751 0.13577332  0.06727305
## Blythedale Beach 0.31638569 0.28707613 0.14606254  0.07279849
## Kwela            0.36638872 0.32550336 0.14267730  0.11472697
## Anerley          0.67080908 0.75109938 0.60970677  0.48542534
## Letsitele        0.13442623 0.10515971 0.03296703  0.15310459
## Shamwari         0.23089770 0.19629928 0.16943969  0.23013891
## Amakhala         0.27753830 0.29932465 0.28133377  0.31379822
## Addo             0.17513885 0.31734317 0.24737631  0.20483461
## Gate             0.00000000 0.02997076 0.18427835  0.24218832
## Mukambi          0.02997076 0.00000000 0.24470588  0.26570837
## Lilayi           0.18427835 0.24470588 0.00000000  0.14581125
## Livingstone      0.24218832 0.26570837 0.14581125  0.00000000
colnames(NPYFst) <- c("Unknown","Kasane","Samburu","Mosiro","Naivasha","Kimana","Soetdoring","Gariep","Sandveld","Parys","Zinkwazi","Blythedale Beach","Kwela","Anerley","Letsitele","Shamwari","Amakhala","Addo","Gate","Mukambi","Lilayi","Livingstone") #rename columns from numbers to population names. 
rownames(NPYFst) <- c("Unknown","Kasane","Samburu","Mosiro","Naivasha","Kimana","Soetdoring","Gariep","Sandveld","Parys","Zinkwazi","Blythedale Beach","Kwela","Anerley","Letsitele","Shamwari","Amakhala","Addo","Gate","Mukambi","Lilayi","Livingstone") #rename rows from numbers to population names. 
NPYFst
##                       Unknown      Kasane     Samburu      Mosiro    Naivasha
## Unknown           0.000000000  0.11504425  0.26315789 -0.07581227 -0.05263158
## Kasane            0.115044248  0.00000000  0.36111111 -0.22270742 -0.10588235
## Samburu           0.263157895  0.36111111  0.00000000  0.00000000  0.00000000
## Mosiro           -0.075812274 -0.22270742  0.00000000  0.00000000  0.00000000
## Naivasha         -0.052631579 -0.10588235  0.00000000  0.00000000  0.00000000
## Kimana            0.056250000  0.14457831  0.00000000  0.00000000  0.00000000
## Soetdoring        0.009736330  0.03361162  0.08930426 -0.14230272 -0.04728132
## Gariep            0.133840051  0.08188843  0.19641388  0.00122399  0.02097274
## Sandveld          0.093729589  0.16466215  0.08419498  0.02122642  0.03369481
## Parys             0.185840708  0.14634146  0.00000000  0.00000000  0.00000000
## Zinkwazi          0.196373057  0.19017404  0.28883629  0.14139403  0.16330106
## Blythedale Beach  0.194668391  0.24360812  0.26716738  0.18033439  0.19626168
## Kwela             0.178505705  0.24893818  0.13135985  0.13063477  0.11567164
## Anerley           0.542288557  0.57425743  0.00000000  0.00000000  0.00000000
## Letsitele         0.004739336 -0.02645503 -0.01449275 -0.26838966 -0.22285714
## Shamwari          0.121543920  0.16879441  0.17359098  0.05529078  0.10121816
## Amakhala          0.253581856  0.23968777  0.33718779  0.20627063  0.24661811
## Addo              0.139175258  0.03658537  0.00000000  0.00000000  0.00000000
## Gate              0.186662784  0.02941176  0.47129477  0.04292779  0.18512720
## Mukambi           0.184128848  0.03212940  0.57063712  0.11235955  0.28953229
## Lilayi            0.165137615  0.13541667  0.48599269  0.12778905  0.22912966
## Livingstone       0.200241209  0.20004750  0.29952644  0.18493151  0.18339416
##                        Kimana   Soetdoring      Gariep     Sandveld
## Unknown           0.056250000  0.009736330 0.133840051  0.093729589
## Kasane            0.144578313  0.033611623 0.081888429  0.164662147
## Samburu           0.000000000  0.089304258 0.196413884  0.084194978
## Mosiro            0.000000000 -0.142302717 0.001223990  0.021226415
## Naivasha          0.000000000 -0.047281324 0.020972735  0.033694810
## Kimana            0.000000000 -0.112640801 0.107980367  0.007148531
## Soetdoring       -0.112640801  0.000000000 0.049820223 -0.014263698
## Gariep            0.107980367  0.049820223 0.000000000  0.084685104
## Sandveld          0.007148531 -0.014263698 0.084685104  0.000000000
## Parys             0.000000000  0.004032258 0.087673689 -0.003388554
## Zinkwazi          0.189121712  0.108393720 0.006833853  0.049092518
## Blythedale Beach  0.237656595  0.109523105 0.110862011  0.052600473
## Kwela             0.180878553  0.074468085 0.117444427 -0.006312603
## Anerley           0.000000000  0.358974359 0.372542659  0.204768125
## Letsitele        -0.023771791 -0.082713478 0.099833145  0.086255699
## Shamwari          0.052869739  0.079412065 0.190474410  0.206980034
## Amakhala          0.276394422  0.155108973 0.243288045  0.306992139
## Addo              0.000000000 -0.144078144 0.032539221  0.069264069
## Gate              0.317603171  0.136553112 0.177410667  0.257716535
## Mukambi           0.368627451  0.097087379 0.167560133  0.199585147
## Lilayi            0.389121339  0.038212981 0.058559051  0.091380652
## Livingstone       0.269731489  0.110080186 0.042914770  0.091019417
##                         Parys    Zinkwazi Blythedale Beach        Kwela
## Unknown           0.185840708 0.196373057       0.19466839  0.178505705
## Kasane            0.146341463 0.190174039       0.24360812  0.248938178
## Samburu           0.000000000 0.288836292       0.26716738  0.131359852
## Mosiro            0.000000000 0.141394026       0.18033439  0.130634775
## Naivasha          0.000000000 0.163301060       0.19626168  0.115671642
## Kimana            0.000000000 0.189121712       0.23765660  0.180878553
## Soetdoring        0.004032258 0.108393720       0.10952311  0.074468085
## Gariep            0.087673689 0.006833853       0.11086201  0.117444427
## Sandveld         -0.003388554 0.049092518       0.05260047 -0.006312603
## Parys             0.000000000 0.209669142       0.09919355 -0.035502959
## Zinkwazi          0.209669142 0.000000000       0.09777641  0.135299625
## Blythedale Beach  0.099193548 0.097776406       0.00000000 -0.030628501
## Kwela            -0.035502959 0.135299625      -0.03062850  0.000000000
## Anerley           0.000000000 0.434503361       0.24774775  0.109375000
## Letsitele         0.026282854 0.202975558       0.14371220  0.099875156
## Shamwari          0.125993972 0.267888255       0.26398699  0.239948224
## Amakhala          0.186934673 0.363450417       0.34677419  0.343491480
## Addo              0.000000000 0.229097857       0.20876289  0.124452235
## Gate              0.359645404 0.299693252       0.31638569  0.366388723
## Mukambi           0.414464534 0.271457511       0.28707613  0.325503356
## Lilayi            0.243027888 0.135773318       0.14606254  0.142677298
## Livingstone       0.248466258 0.067273054       0.07279849  0.114726972
##                    Anerley    Letsitele    Shamwari    Amakhala        Addo
## Unknown          0.5422886  0.004739336  0.12154392  0.25358186  0.13917526
## Kasane           0.5742574 -0.026455026  0.16879441  0.23968777  0.03658537
## Samburu          0.0000000 -0.014492754  0.17359098  0.33718779  0.00000000
## Mosiro           0.0000000 -0.268389662  0.05529078  0.20627063  0.00000000
## Naivasha         0.0000000 -0.222857143  0.10121816  0.24661811  0.00000000
## Kimana           0.0000000 -0.023771791  0.05286974  0.27639442  0.00000000
## Soetdoring       0.3589744 -0.082713478  0.07941206  0.15510897 -0.14407814
## Gariep           0.3725427  0.099833145  0.19047441  0.24328804  0.03253922
## Sandveld         0.2047681  0.086255699  0.20698003  0.30699214  0.06926407
## Parys            0.0000000  0.026282854  0.12599397  0.18693467  0.00000000
## Zinkwazi         0.4345034  0.202975558  0.26788825  0.36345042  0.22909786
## Blythedale Beach 0.2477477  0.143712198  0.26398699  0.34677419  0.20876289
## Kwela            0.1093750  0.099875156  0.23994822  0.34349148  0.12445223
## Anerley          0.0000000  0.483028721  0.48715384  0.61657459  0.00000000
## Letsitele        0.4830287  0.000000000  0.03485143  0.10354621 -0.44602851
## Shamwari         0.4871538  0.034851430  0.00000000  0.07033635 -0.18058175
## Amakhala         0.6165746  0.103546205  0.07033635  0.00000000 -0.40384615
## Addo             0.0000000 -0.446028513 -0.18058175 -0.40384615  0.00000000
## Gate             0.6708091  0.134426230  0.23089770  0.27753830  0.17513885
## Mukambi          0.7510994  0.105159705  0.19629928  0.29932465  0.31734317
## Lilayi           0.6097068  0.032967033  0.16943969  0.28133377  0.24737631
## Livingstone      0.4854253  0.153104592  0.23013891  0.31379822  0.20483461
##                        Gate    Mukambi     Lilayi Livingstone
## Unknown          0.18666278 0.18412885 0.16513761  0.20024121
## Kasane           0.02941176 0.03212940 0.13541667  0.20004750
## Samburu          0.47129477 0.57063712 0.48599269  0.29952644
## Mosiro           0.04292779 0.11235955 0.12778905  0.18493151
## Naivasha         0.18512720 0.28953229 0.22912966  0.18339416
## Kimana           0.31760317 0.36862745 0.38912134  0.26973149
## Soetdoring       0.13655311 0.09708738 0.03821298  0.11008019
## Gariep           0.17741067 0.16756013 0.05855905  0.04291477
## Sandveld         0.25771653 0.19958515 0.09138065  0.09101942
## Parys            0.35964540 0.41446453 0.24302789  0.24846626
## Zinkwazi         0.29969325 0.27145751 0.13577332  0.06727305
## Blythedale Beach 0.31638569 0.28707613 0.14606254  0.07279849
## Kwela            0.36638872 0.32550336 0.14267730  0.11472697
## Anerley          0.67080908 0.75109938 0.60970677  0.48542534
## Letsitele        0.13442623 0.10515971 0.03296703  0.15310459
## Shamwari         0.23089770 0.19629928 0.16943969  0.23013891
## Amakhala         0.27753830 0.29932465 0.28133377  0.31379822
## Addo             0.17513885 0.31734317 0.24737631  0.20483461
## Gate             0.00000000 0.02997076 0.18427835  0.24218832
## Mukambi          0.02997076 0.00000000 0.24470588  0.26570837
## Lilayi           0.18427835 0.24470588 0.00000000  0.14581125
## Livingstone      0.24218832 0.26570837 0.14581125  0.00000000
melted_fst <- melt(NPYFst) #melt the data to be used for a heatmap
melted_fst
##                 Var1             Var2        value
## 1            Unknown          Unknown  0.000000000
## 2             Kasane          Unknown  0.115044248
## 3            Samburu          Unknown  0.263157895
## 4             Mosiro          Unknown -0.075812274
## 5           Naivasha          Unknown -0.052631579
## 6             Kimana          Unknown  0.056250000
## 7         Soetdoring          Unknown  0.009736330
## 8             Gariep          Unknown  0.133840051
## 9           Sandveld          Unknown  0.093729589
## 10             Parys          Unknown  0.185840708
## 11          Zinkwazi          Unknown  0.196373057
## 12  Blythedale Beach          Unknown  0.194668391
## 13             Kwela          Unknown  0.178505705
## 14           Anerley          Unknown  0.542288557
## 15         Letsitele          Unknown  0.004739336
## 16          Shamwari          Unknown  0.121543920
## 17          Amakhala          Unknown  0.253581856
## 18              Addo          Unknown  0.139175258
## 19              Gate          Unknown  0.186662784
## 20           Mukambi          Unknown  0.184128848
## 21            Lilayi          Unknown  0.165137615
## 22       Livingstone          Unknown  0.200241209
## 23           Unknown           Kasane  0.115044248
## 24            Kasane           Kasane  0.000000000
## 25           Samburu           Kasane  0.361111111
## 26            Mosiro           Kasane -0.222707424
## 27          Naivasha           Kasane -0.105882353
## 28            Kimana           Kasane  0.144578313
## 29        Soetdoring           Kasane  0.033611623
## 30            Gariep           Kasane  0.081888429
## 31          Sandveld           Kasane  0.164662147
## 32             Parys           Kasane  0.146341463
## 33          Zinkwazi           Kasane  0.190174039
## 34  Blythedale Beach           Kasane  0.243608119
## 35             Kwela           Kasane  0.248938178
## 36           Anerley           Kasane  0.574257426
## 37         Letsitele           Kasane -0.026455026
## 38          Shamwari           Kasane  0.168794412
## 39          Amakhala           Kasane  0.239687767
## 40              Addo           Kasane  0.036585366
## 41              Gate           Kasane  0.029411765
## 42           Mukambi           Kasane  0.032129404
## 43            Lilayi           Kasane  0.135416667
## 44       Livingstone           Kasane  0.200047498
## 45           Unknown          Samburu  0.263157895
## 46            Kasane          Samburu  0.361111111
## 47           Samburu          Samburu  0.000000000
## 48            Mosiro          Samburu  0.000000000
## 49          Naivasha          Samburu  0.000000000
## 50            Kimana          Samburu  0.000000000
## 51        Soetdoring          Samburu  0.089304258
## 52            Gariep          Samburu  0.196413884
## 53          Sandveld          Samburu  0.084194978
## 54             Parys          Samburu  0.000000000
## 55          Zinkwazi          Samburu  0.288836292
## 56  Blythedale Beach          Samburu  0.267167382
## 57             Kwela          Samburu  0.131359852
## 58           Anerley          Samburu  0.000000000
## 59         Letsitele          Samburu -0.014492754
## 60          Shamwari          Samburu  0.173590982
## 61          Amakhala          Samburu  0.337187789
## 62              Addo          Samburu  0.000000000
## 63              Gate          Samburu  0.471294766
## 64           Mukambi          Samburu  0.570637119
## 65            Lilayi          Samburu  0.485992692
## 66       Livingstone          Samburu  0.299526440
## 67           Unknown           Mosiro -0.075812274
## 68            Kasane           Mosiro -0.222707424
## 69           Samburu           Mosiro  0.000000000
## 70            Mosiro           Mosiro  0.000000000
## 71          Naivasha           Mosiro  0.000000000
## 72            Kimana           Mosiro  0.000000000
## 73        Soetdoring           Mosiro -0.142302717
## 74            Gariep           Mosiro  0.001223990
## 75          Sandveld           Mosiro  0.021226415
## 76             Parys           Mosiro  0.000000000
## 77          Zinkwazi           Mosiro  0.141394026
## 78  Blythedale Beach           Mosiro  0.180334395
## 79             Kwela           Mosiro  0.130634775
## 80           Anerley           Mosiro  0.000000000
## 81         Letsitele           Mosiro -0.268389662
## 82          Shamwari           Mosiro  0.055290785
## 83          Amakhala           Mosiro  0.206270627
## 84              Addo           Mosiro  0.000000000
## 85              Gate           Mosiro  0.042927794
## 86           Mukambi           Mosiro  0.112359551
## 87            Lilayi           Mosiro  0.127789047
## 88       Livingstone           Mosiro  0.184931507
## 89           Unknown         Naivasha -0.052631579
## 90            Kasane         Naivasha -0.105882353
## 91           Samburu         Naivasha  0.000000000
## 92            Mosiro         Naivasha  0.000000000
## 93          Naivasha         Naivasha  0.000000000
## 94            Kimana         Naivasha  0.000000000
## 95        Soetdoring         Naivasha -0.047281324
## 96            Gariep         Naivasha  0.020972735
## 97          Sandveld         Naivasha  0.033694810
## 98             Parys         Naivasha  0.000000000
## 99          Zinkwazi         Naivasha  0.163301060
## 100 Blythedale Beach         Naivasha  0.196261682
## 101            Kwela         Naivasha  0.115671642
## 102          Anerley         Naivasha  0.000000000
## 103        Letsitele         Naivasha -0.222857143
## 104         Shamwari         Naivasha  0.101218162
## 105         Amakhala         Naivasha  0.246618106
## 106             Addo         Naivasha  0.000000000
## 107             Gate         Naivasha  0.185127202
## 108          Mukambi         Naivasha  0.289532294
## 109           Lilayi         Naivasha  0.229129663
## 110      Livingstone         Naivasha  0.183394161
## 111          Unknown           Kimana  0.056250000
## 112           Kasane           Kimana  0.144578313
## 113          Samburu           Kimana  0.000000000
## 114           Mosiro           Kimana  0.000000000
## 115         Naivasha           Kimana  0.000000000
## 116           Kimana           Kimana  0.000000000
## 117       Soetdoring           Kimana -0.112640801
## 118           Gariep           Kimana  0.107980367
## 119         Sandveld           Kimana  0.007148531
## 120            Parys           Kimana  0.000000000
## 121         Zinkwazi           Kimana  0.189121712
## 122 Blythedale Beach           Kimana  0.237656595
## 123            Kwela           Kimana  0.180878553
## 124          Anerley           Kimana  0.000000000
## 125        Letsitele           Kimana -0.023771791
## 126         Shamwari           Kimana  0.052869739
## 127         Amakhala           Kimana  0.276394422
## 128             Addo           Kimana  0.000000000
## 129             Gate           Kimana  0.317603171
## 130          Mukambi           Kimana  0.368627451
## 131           Lilayi           Kimana  0.389121339
## 132      Livingstone           Kimana  0.269731489
## 133          Unknown       Soetdoring  0.009736330
## 134           Kasane       Soetdoring  0.033611623
## 135          Samburu       Soetdoring  0.089304258
## 136           Mosiro       Soetdoring -0.142302717
## 137         Naivasha       Soetdoring -0.047281324
## 138           Kimana       Soetdoring -0.112640801
## 139       Soetdoring       Soetdoring  0.000000000
## 140           Gariep       Soetdoring  0.049820223
## 141         Sandveld       Soetdoring -0.014263698
## 142            Parys       Soetdoring  0.004032258
## 143         Zinkwazi       Soetdoring  0.108393720
## 144 Blythedale Beach       Soetdoring  0.109523105
## 145            Kwela       Soetdoring  0.074468085
## 146          Anerley       Soetdoring  0.358974359
## 147        Letsitele       Soetdoring -0.082713478
## 148         Shamwari       Soetdoring  0.079412065
## 149         Amakhala       Soetdoring  0.155108973
## 150             Addo       Soetdoring -0.144078144
## 151             Gate       Soetdoring  0.136553112
## 152          Mukambi       Soetdoring  0.097087379
## 153           Lilayi       Soetdoring  0.038212981
## 154      Livingstone       Soetdoring  0.110080186
## 155          Unknown           Gariep  0.133840051
## 156           Kasane           Gariep  0.081888429
## 157          Samburu           Gariep  0.196413884
## 158           Mosiro           Gariep  0.001223990
## 159         Naivasha           Gariep  0.020972735
## 160           Kimana           Gariep  0.107980367
## 161       Soetdoring           Gariep  0.049820223
## 162           Gariep           Gariep  0.000000000
## 163         Sandveld           Gariep  0.084685104
## 164            Parys           Gariep  0.087673689
## 165         Zinkwazi           Gariep  0.006833853
## 166 Blythedale Beach           Gariep  0.110862011
## 167            Kwela           Gariep  0.117444427
## 168          Anerley           Gariep  0.372542659
## 169        Letsitele           Gariep  0.099833145
## 170         Shamwari           Gariep  0.190474410
## 171         Amakhala           Gariep  0.243288045
## 172             Addo           Gariep  0.032539221
## 173             Gate           Gariep  0.177410667
## 174          Mukambi           Gariep  0.167560133
## 175           Lilayi           Gariep  0.058559051
## 176      Livingstone           Gariep  0.042914770
## 177          Unknown         Sandveld  0.093729589
## 178           Kasane         Sandveld  0.164662147
## 179          Samburu         Sandveld  0.084194978
## 180           Mosiro         Sandveld  0.021226415
## 181         Naivasha         Sandveld  0.033694810
## 182           Kimana         Sandveld  0.007148531
## 183       Soetdoring         Sandveld -0.014263698
## 184           Gariep         Sandveld  0.084685104
## 185         Sandveld         Sandveld  0.000000000
## 186            Parys         Sandveld -0.003388554
## 187         Zinkwazi         Sandveld  0.049092518
## 188 Blythedale Beach         Sandveld  0.052600473
## 189            Kwela         Sandveld -0.006312603
## 190          Anerley         Sandveld  0.204768125
## 191        Letsitele         Sandveld  0.086255699
## 192         Shamwari         Sandveld  0.206980034
## 193         Amakhala         Sandveld  0.306992139
## 194             Addo         Sandveld  0.069264069
## 195             Gate         Sandveld  0.257716535
## 196          Mukambi         Sandveld  0.199585147
## 197           Lilayi         Sandveld  0.091380652
## 198      Livingstone         Sandveld  0.091019417
## 199          Unknown            Parys  0.185840708
## 200           Kasane            Parys  0.146341463
## 201          Samburu            Parys  0.000000000
## 202           Mosiro            Parys  0.000000000
## 203         Naivasha            Parys  0.000000000
## 204           Kimana            Parys  0.000000000
## 205       Soetdoring            Parys  0.004032258
## 206           Gariep            Parys  0.087673689
## 207         Sandveld            Parys -0.003388554
## 208            Parys            Parys  0.000000000
## 209         Zinkwazi            Parys  0.209669142
## 210 Blythedale Beach            Parys  0.099193548
## 211            Kwela            Parys -0.035502959
## 212          Anerley            Parys  0.000000000
## 213        Letsitele            Parys  0.026282854
## 214         Shamwari            Parys  0.125993972
## 215         Amakhala            Parys  0.186934673
## 216             Addo            Parys  0.000000000
## 217             Gate            Parys  0.359645404
## 218          Mukambi            Parys  0.414464534
## 219           Lilayi            Parys  0.243027888
## 220      Livingstone            Parys  0.248466258
## 221          Unknown         Zinkwazi  0.196373057
## 222           Kasane         Zinkwazi  0.190174039
## 223          Samburu         Zinkwazi  0.288836292
## 224           Mosiro         Zinkwazi  0.141394026
## 225         Naivasha         Zinkwazi  0.163301060
## 226           Kimana         Zinkwazi  0.189121712
## 227       Soetdoring         Zinkwazi  0.108393720
## 228           Gariep         Zinkwazi  0.006833853
## 229         Sandveld         Zinkwazi  0.049092518
## 230            Parys         Zinkwazi  0.209669142
## 231         Zinkwazi         Zinkwazi  0.000000000
## 232 Blythedale Beach         Zinkwazi  0.097776406
## 233            Kwela         Zinkwazi  0.135299625
## 234          Anerley         Zinkwazi  0.434503361
## 235        Letsitele         Zinkwazi  0.202975558
## 236         Shamwari         Zinkwazi  0.267888255
## 237         Amakhala         Zinkwazi  0.363450417
## 238             Addo         Zinkwazi  0.229097857
## 239             Gate         Zinkwazi  0.299693252
## 240          Mukambi         Zinkwazi  0.271457511
## 241           Lilayi         Zinkwazi  0.135773318
## 242      Livingstone         Zinkwazi  0.067273054
## 243          Unknown Blythedale Beach  0.194668391
## 244           Kasane Blythedale Beach  0.243608119
## 245          Samburu Blythedale Beach  0.267167382
## 246           Mosiro Blythedale Beach  0.180334395
## 247         Naivasha Blythedale Beach  0.196261682
## 248           Kimana Blythedale Beach  0.237656595
## 249       Soetdoring Blythedale Beach  0.109523105
## 250           Gariep Blythedale Beach  0.110862011
## 251         Sandveld Blythedale Beach  0.052600473
## 252            Parys Blythedale Beach  0.099193548
## 253         Zinkwazi Blythedale Beach  0.097776406
## 254 Blythedale Beach Blythedale Beach  0.000000000
## 255            Kwela Blythedale Beach -0.030628501
## 256          Anerley Blythedale Beach  0.247747748
## 257        Letsitele Blythedale Beach  0.143712198
## 258         Shamwari Blythedale Beach  0.263986992
## 259         Amakhala Blythedale Beach  0.346774194
## 260             Addo Blythedale Beach  0.208762887
## 261             Gate Blythedale Beach  0.316385688
## 262          Mukambi Blythedale Beach  0.287076127
## 263           Lilayi Blythedale Beach  0.146062541
## 264      Livingstone Blythedale Beach  0.072798487
## 265          Unknown            Kwela  0.178505705
## 266           Kasane            Kwela  0.248938178
## 267          Samburu            Kwela  0.131359852
## 268           Mosiro            Kwela  0.130634775
## 269         Naivasha            Kwela  0.115671642
## 270           Kimana            Kwela  0.180878553
## 271       Soetdoring            Kwela  0.074468085
## 272           Gariep            Kwela  0.117444427
## 273         Sandveld            Kwela -0.006312603
## 274            Parys            Kwela -0.035502959
## 275         Zinkwazi            Kwela  0.135299625
## 276 Blythedale Beach            Kwela -0.030628501
## 277            Kwela            Kwela  0.000000000
## 278          Anerley            Kwela  0.109375000
## 279        Letsitele            Kwela  0.099875156
## 280         Shamwari            Kwela  0.239948224
## 281         Amakhala            Kwela  0.343491480
## 282             Addo            Kwela  0.124452235
## 283             Gate            Kwela  0.366388723
## 284          Mukambi            Kwela  0.325503356
## 285           Lilayi            Kwela  0.142677298
## 286      Livingstone            Kwela  0.114726972
## 287          Unknown          Anerley  0.542288557
## 288           Kasane          Anerley  0.574257426
## 289          Samburu          Anerley  0.000000000
## 290           Mosiro          Anerley  0.000000000
## 291         Naivasha          Anerley  0.000000000
## 292           Kimana          Anerley  0.000000000
## 293       Soetdoring          Anerley  0.358974359
## 294           Gariep          Anerley  0.372542659
## 295         Sandveld          Anerley  0.204768125
## 296            Parys          Anerley  0.000000000
## 297         Zinkwazi          Anerley  0.434503361
## 298 Blythedale Beach          Anerley  0.247747748
## 299            Kwela          Anerley  0.109375000
## 300          Anerley          Anerley  0.000000000
## 301        Letsitele          Anerley  0.483028721
## 302         Shamwari          Anerley  0.487153839
## 303         Amakhala          Anerley  0.616574586
## 304             Addo          Anerley  0.000000000
## 305             Gate          Anerley  0.670809077
## 306          Mukambi          Anerley  0.751099384
## 307           Lilayi          Anerley  0.609706775
## 308      Livingstone          Anerley  0.485425342
## 309          Unknown        Letsitele  0.004739336
## 310           Kasane        Letsitele -0.026455026
## 311          Samburu        Letsitele -0.014492754
## 312           Mosiro        Letsitele -0.268389662
## 313         Naivasha        Letsitele -0.222857143
## 314           Kimana        Letsitele -0.023771791
## 315       Soetdoring        Letsitele -0.082713478
## 316           Gariep        Letsitele  0.099833145
## 317         Sandveld        Letsitele  0.086255699
## 318            Parys        Letsitele  0.026282854
## 319         Zinkwazi        Letsitele  0.202975558
## 320 Blythedale Beach        Letsitele  0.143712198
## 321            Kwela        Letsitele  0.099875156
## 322          Anerley        Letsitele  0.483028721
## 323        Letsitele        Letsitele  0.000000000
## 324         Shamwari        Letsitele  0.034851430
## 325         Amakhala        Letsitele  0.103546205
## 326             Addo        Letsitele -0.446028513
## 327             Gate        Letsitele  0.134426230
## 328          Mukambi        Letsitele  0.105159705
## 329           Lilayi        Letsitele  0.032967033
## 330      Livingstone        Letsitele  0.153104592
## 331          Unknown         Shamwari  0.121543920
## 332           Kasane         Shamwari  0.168794412
## 333          Samburu         Shamwari  0.173590982
## 334           Mosiro         Shamwari  0.055290785
## 335         Naivasha         Shamwari  0.101218162
## 336           Kimana         Shamwari  0.052869739
## 337       Soetdoring         Shamwari  0.079412065
## 338           Gariep         Shamwari  0.190474410
## 339         Sandveld         Shamwari  0.206980034
## 340            Parys         Shamwari  0.125993972
## 341         Zinkwazi         Shamwari  0.267888255
## 342 Blythedale Beach         Shamwari  0.263986992
## 343            Kwela         Shamwari  0.239948224
## 344          Anerley         Shamwari  0.487153839
## 345        Letsitele         Shamwari  0.034851430
## 346         Shamwari         Shamwari  0.000000000
## 347         Amakhala         Shamwari  0.070336354
## 348             Addo         Shamwari -0.180581745
## 349             Gate         Shamwari  0.230897704
## 350          Mukambi         Shamwari  0.196299276
## 351           Lilayi         Shamwari  0.169439687
## 352      Livingstone         Shamwari  0.230138915
## 353          Unknown         Amakhala  0.253581856
## 354           Kasane         Amakhala  0.239687767
## 355          Samburu         Amakhala  0.337187789
## 356           Mosiro         Amakhala  0.206270627
## 357         Naivasha         Amakhala  0.246618106
## 358           Kimana         Amakhala  0.276394422
## 359       Soetdoring         Amakhala  0.155108973
## 360           Gariep         Amakhala  0.243288045
## 361         Sandveld         Amakhala  0.306992139
## 362            Parys         Amakhala  0.186934673
## 363         Zinkwazi         Amakhala  0.363450417
## 364 Blythedale Beach         Amakhala  0.346774194
## 365            Kwela         Amakhala  0.343491480
## 366          Anerley         Amakhala  0.616574586
## 367        Letsitele         Amakhala  0.103546205
## 368         Shamwari         Amakhala  0.070336354
## 369         Amakhala         Amakhala  0.000000000
## 370             Addo         Amakhala -0.403846154
## 371             Gate         Amakhala  0.277538297
## 372          Mukambi         Amakhala  0.299324654
## 373           Lilayi         Amakhala  0.281333767
## 374      Livingstone         Amakhala  0.313798220
## 375          Unknown             Addo  0.139175258
## 376           Kasane             Addo  0.036585366
## 377          Samburu             Addo  0.000000000
## 378           Mosiro             Addo  0.000000000
## 379         Naivasha             Addo  0.000000000
## 380           Kimana             Addo  0.000000000
## 381       Soetdoring             Addo -0.144078144
## 382           Gariep             Addo  0.032539221
## 383         Sandveld             Addo  0.069264069
## 384            Parys             Addo  0.000000000
## 385         Zinkwazi             Addo  0.229097857
## 386 Blythedale Beach             Addo  0.208762887
## 387            Kwela             Addo  0.124452235
## 388          Anerley             Addo  0.000000000
## 389        Letsitele             Addo -0.446028513
## 390         Shamwari             Addo -0.180581745
## 391         Amakhala             Addo -0.403846154
## 392             Addo             Addo  0.000000000
## 393             Gate             Addo  0.175138852
## 394          Mukambi             Addo  0.317343173
## 395           Lilayi             Addo  0.247376312
## 396      Livingstone             Addo  0.204834606
## 397          Unknown             Gate  0.186662784
## 398           Kasane             Gate  0.029411765
## 399          Samburu             Gate  0.471294766
## 400           Mosiro             Gate  0.042927794
## 401         Naivasha             Gate  0.185127202
## 402           Kimana             Gate  0.317603171
## 403       Soetdoring             Gate  0.136553112
## 404           Gariep             Gate  0.177410667
## 405         Sandveld             Gate  0.257716535
## 406            Parys             Gate  0.359645404
## 407         Zinkwazi             Gate  0.299693252
## 408 Blythedale Beach             Gate  0.316385688
## 409            Kwela             Gate  0.366388723
## 410          Anerley             Gate  0.670809077
## 411        Letsitele             Gate  0.134426230
## 412         Shamwari             Gate  0.230897704
## 413         Amakhala             Gate  0.277538297
## 414             Addo             Gate  0.175138852
## 415             Gate             Gate  0.000000000
## 416          Mukambi             Gate  0.029970760
## 417           Lilayi             Gate  0.184278351
## 418      Livingstone             Gate  0.242188319
## 419          Unknown          Mukambi  0.184128848
## 420           Kasane          Mukambi  0.032129404
## 421          Samburu          Mukambi  0.570637119
## 422           Mosiro          Mukambi  0.112359551
## 423         Naivasha          Mukambi  0.289532294
## 424           Kimana          Mukambi  0.368627451
## 425       Soetdoring          Mukambi  0.097087379
## 426           Gariep          Mukambi  0.167560133
## 427         Sandveld          Mukambi  0.199585147
## 428            Parys          Mukambi  0.414464534
## 429         Zinkwazi          Mukambi  0.271457511
## 430 Blythedale Beach          Mukambi  0.287076127
## 431            Kwela          Mukambi  0.325503356
## 432          Anerley          Mukambi  0.751099384
## 433        Letsitele          Mukambi  0.105159705
## 434         Shamwari          Mukambi  0.196299276
## 435         Amakhala          Mukambi  0.299324654
## 436             Addo          Mukambi  0.317343173
## 437             Gate          Mukambi  0.029970760
## 438          Mukambi          Mukambi  0.000000000
## 439           Lilayi          Mukambi  0.244705882
## 440      Livingstone          Mukambi  0.265708368
## 441          Unknown           Lilayi  0.165137615
## 442           Kasane           Lilayi  0.135416667
## 443          Samburu           Lilayi  0.485992692
## 444           Mosiro           Lilayi  0.127789047
## 445         Naivasha           Lilayi  0.229129663
## 446           Kimana           Lilayi  0.389121339
## 447       Soetdoring           Lilayi  0.038212981
## 448           Gariep           Lilayi  0.058559051
## 449         Sandveld           Lilayi  0.091380652
## 450            Parys           Lilayi  0.243027888
## 451         Zinkwazi           Lilayi  0.135773318
## 452 Blythedale Beach           Lilayi  0.146062541
## 453            Kwela           Lilayi  0.142677298
## 454          Anerley           Lilayi  0.609706775
## 455        Letsitele           Lilayi  0.032967033
## 456         Shamwari           Lilayi  0.169439687
## 457         Amakhala           Lilayi  0.281333767
## 458             Addo           Lilayi  0.247376312
## 459             Gate           Lilayi  0.184278351
## 460          Mukambi           Lilayi  0.244705882
## 461           Lilayi           Lilayi  0.000000000
## 462      Livingstone           Lilayi  0.145811252
## 463          Unknown      Livingstone  0.200241209
## 464           Kasane      Livingstone  0.200047498
## 465          Samburu      Livingstone  0.299526440
## 466           Mosiro      Livingstone  0.184931507
## 467         Naivasha      Livingstone  0.183394161
## 468           Kimana      Livingstone  0.269731489
## 469       Soetdoring      Livingstone  0.110080186
## 470           Gariep      Livingstone  0.042914770
## 471         Sandveld      Livingstone  0.091019417
## 472            Parys      Livingstone  0.248466258
## 473         Zinkwazi      Livingstone  0.067273054
## 474 Blythedale Beach      Livingstone  0.072798487
## 475            Kwela      Livingstone  0.114726972
## 476          Anerley      Livingstone  0.485425342
## 477        Letsitele      Livingstone  0.153104592
## 478         Shamwari      Livingstone  0.230138915
## 479         Amakhala      Livingstone  0.313798220
## 480             Addo      Livingstone  0.204834606
## 481             Gate      Livingstone  0.242188319
## 482          Mukambi      Livingstone  0.265708368
## 483           Lilayi      Livingstone  0.145811252
## 484      Livingstone      Livingstone  0.000000000
library(ggplot2)
ggplot(data = melted_fst, aes(x=Var1, y=Var2, fill=value)) + ylab(NULL) + xlab(NULL) + coord_fixed() + 
  theme(axis.text=element_text(angle=45,vjust=0.5)) + geom_tile()

#this looks okay, but we'd rather not have the redundance of the same values being shown twice. So we will just use the lower half of the matrix:

get_lower_tri<-function(NPYFst){
    NPYFst[lower.tri(NPYFst)] <- NA
    return(NPYFst)
}

lower_tri <- get_lower_tri(NPYFst) #Fill half of the repeated values with "NA's"
lower_tri
##                  Unknown    Kasane   Samburu      Mosiro    Naivasha    Kimana
## Unknown                0 0.1150442 0.2631579 -0.07581227 -0.05263158 0.0562500
## Kasane                NA 0.0000000 0.3611111 -0.22270742 -0.10588235 0.1445783
## Samburu               NA        NA 0.0000000  0.00000000  0.00000000 0.0000000
## Mosiro                NA        NA        NA  0.00000000  0.00000000 0.0000000
## Naivasha              NA        NA        NA          NA  0.00000000 0.0000000
## Kimana                NA        NA        NA          NA          NA 0.0000000
## Soetdoring            NA        NA        NA          NA          NA        NA
## Gariep                NA        NA        NA          NA          NA        NA
## Sandveld              NA        NA        NA          NA          NA        NA
## Parys                 NA        NA        NA          NA          NA        NA
## Zinkwazi              NA        NA        NA          NA          NA        NA
## Blythedale Beach      NA        NA        NA          NA          NA        NA
## Kwela                 NA        NA        NA          NA          NA        NA
## Anerley               NA        NA        NA          NA          NA        NA
## Letsitele             NA        NA        NA          NA          NA        NA
## Shamwari              NA        NA        NA          NA          NA        NA
## Amakhala              NA        NA        NA          NA          NA        NA
## Addo                  NA        NA        NA          NA          NA        NA
## Gate                  NA        NA        NA          NA          NA        NA
## Mukambi               NA        NA        NA          NA          NA        NA
## Lilayi                NA        NA        NA          NA          NA        NA
## Livingstone           NA        NA        NA          NA          NA        NA
##                   Soetdoring     Gariep     Sandveld        Parys    Zinkwazi
## Unknown           0.00973633 0.13384005  0.093729589  0.185840708 0.196373057
## Kasane            0.03361162 0.08188843  0.164662147  0.146341463 0.190174039
## Samburu           0.08930426 0.19641388  0.084194978  0.000000000 0.288836292
## Mosiro           -0.14230272 0.00122399  0.021226415  0.000000000 0.141394026
## Naivasha         -0.04728132 0.02097274  0.033694810  0.000000000 0.163301060
## Kimana           -0.11264080 0.10798037  0.007148531  0.000000000 0.189121712
## Soetdoring        0.00000000 0.04982022 -0.014263698  0.004032258 0.108393720
## Gariep                    NA 0.00000000  0.084685104  0.087673689 0.006833853
## Sandveld                  NA         NA  0.000000000 -0.003388554 0.049092518
## Parys                     NA         NA           NA  0.000000000 0.209669142
## Zinkwazi                  NA         NA           NA           NA 0.000000000
## Blythedale Beach          NA         NA           NA           NA          NA
## Kwela                     NA         NA           NA           NA          NA
## Anerley                   NA         NA           NA           NA          NA
## Letsitele                 NA         NA           NA           NA          NA
## Shamwari                  NA         NA           NA           NA          NA
## Amakhala                  NA         NA           NA           NA          NA
## Addo                      NA         NA           NA           NA          NA
## Gate                      NA         NA           NA           NA          NA
## Mukambi                   NA         NA           NA           NA          NA
## Lilayi                    NA         NA           NA           NA          NA
## Livingstone               NA         NA           NA           NA          NA
##                  Blythedale Beach        Kwela   Anerley    Letsitele
## Unknown                0.19466839  0.178505705 0.5422886  0.004739336
## Kasane                 0.24360812  0.248938178 0.5742574 -0.026455026
## Samburu                0.26716738  0.131359852 0.0000000 -0.014492754
## Mosiro                 0.18033439  0.130634775 0.0000000 -0.268389662
## Naivasha               0.19626168  0.115671642 0.0000000 -0.222857143
## Kimana                 0.23765660  0.180878553 0.0000000 -0.023771791
## Soetdoring             0.10952311  0.074468085 0.3589744 -0.082713478
## Gariep                 0.11086201  0.117444427 0.3725427  0.099833145
## Sandveld               0.05260047 -0.006312603 0.2047681  0.086255699
## Parys                  0.09919355 -0.035502959 0.0000000  0.026282854
## Zinkwazi               0.09777641  0.135299625 0.4345034  0.202975558
## Blythedale Beach       0.00000000 -0.030628501 0.2477477  0.143712198
## Kwela                          NA  0.000000000 0.1093750  0.099875156
## Anerley                        NA           NA 0.0000000  0.483028721
## Letsitele                      NA           NA        NA  0.000000000
## Shamwari                       NA           NA        NA           NA
## Amakhala                       NA           NA        NA           NA
## Addo                           NA           NA        NA           NA
## Gate                           NA           NA        NA           NA
## Mukambi                        NA           NA        NA           NA
## Lilayi                         NA           NA        NA           NA
## Livingstone                    NA           NA        NA           NA
##                    Shamwari   Amakhala        Addo       Gate    Mukambi
## Unknown          0.12154392 0.25358186  0.13917526 0.18666278 0.18412885
## Kasane           0.16879441 0.23968777  0.03658537 0.02941176 0.03212940
## Samburu          0.17359098 0.33718779  0.00000000 0.47129477 0.57063712
## Mosiro           0.05529078 0.20627063  0.00000000 0.04292779 0.11235955
## Naivasha         0.10121816 0.24661811  0.00000000 0.18512720 0.28953229
## Kimana           0.05286974 0.27639442  0.00000000 0.31760317 0.36862745
## Soetdoring       0.07941206 0.15510897 -0.14407814 0.13655311 0.09708738
## Gariep           0.19047441 0.24328804  0.03253922 0.17741067 0.16756013
## Sandveld         0.20698003 0.30699214  0.06926407 0.25771653 0.19958515
## Parys            0.12599397 0.18693467  0.00000000 0.35964540 0.41446453
## Zinkwazi         0.26788825 0.36345042  0.22909786 0.29969325 0.27145751
## Blythedale Beach 0.26398699 0.34677419  0.20876289 0.31638569 0.28707613
## Kwela            0.23994822 0.34349148  0.12445223 0.36638872 0.32550336
## Anerley          0.48715384 0.61657459  0.00000000 0.67080908 0.75109938
## Letsitele        0.03485143 0.10354621 -0.44602851 0.13442623 0.10515971
## Shamwari         0.00000000 0.07033635 -0.18058175 0.23089770 0.19629928
## Amakhala                 NA 0.00000000 -0.40384615 0.27753830 0.29932465
## Addo                     NA         NA  0.00000000 0.17513885 0.31734317
## Gate                     NA         NA          NA 0.00000000 0.02997076
## Mukambi                  NA         NA          NA         NA 0.00000000
## Lilayi                   NA         NA          NA         NA         NA
## Livingstone              NA         NA          NA         NA         NA
##                      Lilayi Livingstone
## Unknown          0.16513761  0.20024121
## Kasane           0.13541667  0.20004750
## Samburu          0.48599269  0.29952644
## Mosiro           0.12778905  0.18493151
## Naivasha         0.22912966  0.18339416
## Kimana           0.38912134  0.26973149
## Soetdoring       0.03821298  0.11008019
## Gariep           0.05855905  0.04291477
## Sandveld         0.09138065  0.09101942
## Parys            0.24302789  0.24846626
## Zinkwazi         0.13577332  0.06727305
## Blythedale Beach 0.14606254  0.07279849
## Kwela            0.14267730  0.11472697
## Anerley          0.60970677  0.48542534
## Letsitele        0.03296703  0.15310459
## Shamwari         0.16943969  0.23013891
## Amakhala         0.28133377  0.31379822
## Addo             0.24737631  0.20483461
## Gate             0.18427835  0.24218832
## Mukambi          0.24470588  0.26570837
## Lilayi           0.00000000  0.14581125
## Livingstone              NA  0.00000000
melted_NPY <- melt(lower_tri, na.rm = TRUE) #melt this half of the matrix

ggplot(data = melted_NPY, aes(x=Var1, y=Var2, fill=value)) + theme_minimal() + ylab(NULL) + xlab(NULL) + coord_fixed() + theme(axis.text=element_text(angle=45, vjust=0.5)) + geom_tile(color = "white")

FST matrices and heatmaps are another way of statistically showing us genetic differentiation within NPY between our populations. Our pairwise FST matrix and heatmap helpfully shows us pairwise levels of differentiation between our populations. For example, Mukambi and Anerley populations seem to have very few differences in their NPY regions, while the Addo and Letsitele populations are among the most different (note that these two populations are also the furthest apart in our DAPC/PCA plots).

Structure

In order to run analyses similar to structure, we must use this code:

install.packages(c("fields","RColorBrewer","mapplots"),repos = "http://cran.us.r-project.org")
## Installing packages into '/usr4/ugrad/masiedu/R/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
BiocManager::install("LEA")
## Bioconductor version 3.14 (BiocManager 1.30.16), R 4.1.2 (2021-11-01)
## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
##   re-install: 'LEA'
## Installation paths not writeable, unable to update packages
##   path: /share/pkg.7/r/4.1.2/install/lib64/R/library
##   packages:
##     actuar, AdaptFitOS, ade4, admisc, affxparser, affy, affyio, affyPLM, akima,
##     annotate, AnnotationFilter, AnnotationForge, AnnotationHub, antiword, ape,
##     apeglm, aplot, argparse, arrow, ashr, assertive.properties, babelgene,
##     batchelor, bayesplot, BBmisc, BDgraph, beachmat, BHC, bigleaf, bigmemory,
##     bigsnpr, bigsparser, bigstatsr, Biobase, BiocFileCache, BiocGenerics,
##     BiocIO, BiocNeighbors, BiocParallel, BiocSingular, biomaRt, Biostrings,
##     biovizBase, blob, bluster, BMA, bnlearn, brew, Brobdingnag, broom,
##     BSgenome, C50, Cairo, car, caret, Category, CDM, checkmate, circlize, cli,
##     clipr, cluster, commonmark, ComplexHeatmap, conquer, crayon, cubature,
##     Cubist, cytolib, DelayedArray, DelayedMatrixStats, densEstBayes,
##     densityClust, DEoptimR, desc, DESeq2, deSolve, DiffBind, DNAcopy,
##     doParallel, dplyr, DropletUtils, DT, edgeR, ensembldb, EpiModel, evaluate,
##     evd, expint, extRemes, fansi, fastGHQuad, ff, fftw, fishpond, fitdistrplus,
##     flexclust, float, foreach, formatR, future, future.apply, gam, gamlss,
##     gamlss.dist, gcrma, gdsfmt, gdtools, gee, genefilter, geneplotter, GENESIS,
##     GenomeInfoDb, GenomeInfoDbData, GenomicAlignments, GenomicRanges,
##     geojsonsf, geometry, gert, ggdendro, ggfun, ggplot2, git2r, glmmML,
##     glmmTMB, glmnet, glue, gmp, GO.db, GOstats, gower, gplots, graph,
##     GreyListChIP, GSEABase, gstat, GSVA, GWASTools, haven, HDF5Array, Hmisc,
##     httr, hwriter, igraph, impute, infotheo, IRanges, iterators, jomo,
##     jsonlite, KEGGREST, Kendall, kernlab, knitr, ks, lars, lavaan, leiden, lfe,
##     lhs, limma, linprog, lme4, LMest, lmtest, locfit, logspline, loo,
##     lpsymphony, magic, magrittr, maptools, MASS, MassSpecWavelet, Matching,
##     mathjaxr, MatrixGenerics, matrixStats, mbkmeans, MCMCpack, metap, metapod,
##     mgcv, minpack.lm, mlapi, MLEcens, mlegp, msigdbr, multtest, networkDynamic,
##     nimble, nlme, nloptr, officer, oligo, oligoClasses, openair, OpenMx,
##     openssl, pander, parallelly, parsedate, party, pbdZMQ, pcaMethods, pcaPP,
##     pdftools, pdist, penalized, peperr, plyr, polspline, polynom,
##     preprocessCore, processx, ProtGenerics, ps, pspline, psych, qgraph,
##     quanteda, quantmod, quantreg, quantsmooth, qvalue, ragg, RandomFieldsUtils,
##     randomForest, RBGL, rbibutils, RColorBrewer, Rcpp, RcppArmadillo,
##     RcppEigen, RcppGSL, RCurl, Rdpack, readxl, recipes, REddyProc,
##     ResidualMatrix, Rfast, rgdal, rgenoud, Rgraphviz, rhdf5, rhdf5filters,
##     Rhdf5lib, Rhtslib, RInside, rjags, rlang, rmarkdown, rmio, rms, robustbase,
##     rprojroot, RProtoBufLib, rrcov, Rsamtools, RSpectra, RSQLite, rstan,
##     rstanarm, rstantools, Rsubread, rsvg, rtracklayer, Rtsne, S4Vectors, sass,
##     ScaledMatrix, scales, scater, scattermore, scran, scuttle, segmented,
##     SeqArray, seqminer, SeqVarTools, seriation, sets, Seurat, SeuratObject, sf,
##     sfsmisc, shinystan, ShortRead, SHT, SingleCellExperiment, SingleR, SKAT,
##     sn, SNPRelate, sp, spaMM, sparseMatrixStats, spatstat.core, spatstat.data,
##     spatstat.geom, spatstat.linnet, spatstat.sparse, spatstat.utils, spikeslab,
##     splancs, statnet.common, subplex, SummarizedExperiment, survival, sva,
##     svglite, systemfonts, systemPipeR, tclust, testthat, text2vec, TH.data,
##     tibble, tidygraph, tidyselect, tidytree, tinytex, TMB, tmvtnorm, tseries,
##     TSP, tzdb, udunits2, umap, units, unmarked, uuid, V8, VariantAnnotation,
##     vctrs, vegan, VGAM, waldo, webshot, withr, xfun, xgboost, XML, XVector,
##     yaml, zlibbioc
## Old packages: 'BiocManager', 'LDheatmap'
library(LEA)
## 
## Attaching package: 'LEA'
## The following object is masked from 'package:lattice':
## 
##     barchart
seNPYgeno<-vcf2geno("seNPY.vcf")
## 
##  - number of detected individuals:   73
##  - number of detected loci:      526
## 
## For SNP info, please check ./seNPY.vcfsnp.
## 
## 0 line(s) were removed because these are not SNPs.
## Please, check ./seNPY.removed file, for more informations.
seNPY.snmf = snmf(seNPYgeno, K = 1:8, ploidy = 2, entropy = T,
alpha = 100, project = "new")
## The project is saved into :
##  seNPY.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY.snmfProject")
## 
## [1] 1197416926
## [1] "*************************************"
## [1] "*          create.dataset            *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)                 73
##         -L (number of loci)                        526
##         -s (seed random init)                      1197416926
##         -r (percentage of masked data)             0.05
##         -x (genotype file in .geno format)         /projectnb2/vervpop/masiedu25/seNPY.geno
##         -o (output file in .geno format)           /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
## 
##  Write genotype file with masked data, /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno:     OK.
## 
## [1] "*************************************"
## [1] "* sNMF K = 1  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             73
##         -L (number of loci)                    526
##         -K (number of ancestral pops)          1
##         -x (input file)                        /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
##         -q (individual admixture file)         /projectnb2/vervpop/masiedu25/seNPY.snmf/K1/run1/seNPY_r1.1.Q
##         -g (ancestral frequencies file)        /projectnb2/vervpop/masiedu25/seNPY.snmf/K1/run1/seNPY_r1.1.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  1197416926
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno:     OK.
## 
## 
## Main algorithm:
## 
## Least-square error: 4211.151346
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K1/run1/seNPY_r1.1.Q:        OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K1/run1/seNPY_r1.1.G: OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         73
##         -L (number of loci)                526
##         -K (number of ancestral pops)      1
##         -x (genotype file)                 /projectnb2/vervpop/masiedu25/seNPY.geno
##         -q (individual admixture)          /projectnb2/vervpop/masiedu25/seNPY.snmf/K1/run1/seNPY_r1.1.Q
##         -g (ancestral frequencies)         /projectnb2/vervpop/masiedu25/seNPY.snmf/K1/run1/seNPY_r1.1.G
##         -i (with masked genotypes)         /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.200906
## Cross-Entropy (masked data):  0.246183
## The project is saved into :
##  seNPY.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 2  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             73
##         -L (number of loci)                    526
##         -K (number of ancestral pops)          2
##         -x (input file)                        /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
##         -q (individual admixture file)         /projectnb2/vervpop/masiedu25/seNPY.snmf/K2/run1/seNPY_r1.2.Q
##         -g (ancestral frequencies file)        /projectnb2/vervpop/masiedu25/seNPY.snmf/K2/run1/seNPY_r1.2.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  1197416926
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno:     OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [=============]
## Number of iterations: 35
## 
## Least-square error: 3703.736023
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K2/run1/seNPY_r1.2.Q:        OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K2/run1/seNPY_r1.2.G: OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         73
##         -L (number of loci)                526
##         -K (number of ancestral pops)      2
##         -x (genotype file)                 /projectnb2/vervpop/masiedu25/seNPY.geno
##         -q (individual admixture)          /projectnb2/vervpop/masiedu25/seNPY.snmf/K2/run1/seNPY_r1.2.Q
##         -g (ancestral frequencies)         /projectnb2/vervpop/masiedu25/seNPY.snmf/K2/run1/seNPY_r1.2.G
##         -i (with masked genotypes)         /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.169994
## Cross-Entropy (masked data):  0.219773
## The project is saved into :
##  seNPY.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 3  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             73
##         -L (number of loci)                    526
##         -K (number of ancestral pops)          3
##         -x (input file)                        /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
##         -q (individual admixture file)         /projectnb2/vervpop/masiedu25/seNPY.snmf/K3/run1/seNPY_r1.3.Q
##         -g (ancestral frequencies file)        /projectnb2/vervpop/masiedu25/seNPY.snmf/K3/run1/seNPY_r1.3.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  1197416926
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno:     OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [======]
## Number of iterations: 16
## 
## Least-square error: 3328.029911
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K3/run1/seNPY_r1.3.Q:        OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K3/run1/seNPY_r1.3.G: OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         73
##         -L (number of loci)                526
##         -K (number of ancestral pops)      3
##         -x (genotype file)                 /projectnb2/vervpop/masiedu25/seNPY.geno
##         -q (individual admixture)          /projectnb2/vervpop/masiedu25/seNPY.snmf/K3/run1/seNPY_r1.3.Q
##         -g (ancestral frequencies)         /projectnb2/vervpop/masiedu25/seNPY.snmf/K3/run1/seNPY_r1.3.G
##         -i (with masked genotypes)         /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.15091
## Cross-Entropy (masked data):  0.201915
## The project is saved into :
##  seNPY.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 4  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             73
##         -L (number of loci)                    526
##         -K (number of ancestral pops)          4
##         -x (input file)                        /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
##         -q (individual admixture file)         /projectnb2/vervpop/masiedu25/seNPY.snmf/K4/run1/seNPY_r1.4.Q
##         -g (ancestral frequencies file)        /projectnb2/vervpop/masiedu25/seNPY.snmf/K4/run1/seNPY_r1.4.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  1197416926
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno:     OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [========]
## Number of iterations: 22
## 
## Least-square error: 3104.167940
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K4/run1/seNPY_r1.4.Q:        OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K4/run1/seNPY_r1.4.G: OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         73
##         -L (number of loci)                526
##         -K (number of ancestral pops)      4
##         -x (genotype file)                 /projectnb2/vervpop/masiedu25/seNPY.geno
##         -q (individual admixture)          /projectnb2/vervpop/masiedu25/seNPY.snmf/K4/run1/seNPY_r1.4.Q
##         -g (ancestral frequencies)         /projectnb2/vervpop/masiedu25/seNPY.snmf/K4/run1/seNPY_r1.4.G
##         -i (with masked genotypes)         /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.139231
## Cross-Entropy (masked data):  0.185925
## The project is saved into :
##  seNPY.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 5  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             73
##         -L (number of loci)                    526
##         -K (number of ancestral pops)          5
##         -x (input file)                        /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
##         -q (individual admixture file)         /projectnb2/vervpop/masiedu25/seNPY.snmf/K5/run1/seNPY_r1.5.Q
##         -g (ancestral frequencies file)        /projectnb2/vervpop/masiedu25/seNPY.snmf/K5/run1/seNPY_r1.5.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  9787351518
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno:     OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [========]
## Number of iterations: 22
## 
## Least-square error: 2961.967697
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K5/run1/seNPY_r1.5.Q:        OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K5/run1/seNPY_r1.5.G: OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         73
##         -L (number of loci)                526
##         -K (number of ancestral pops)      5
##         -x (genotype file)                 /projectnb2/vervpop/masiedu25/seNPY.geno
##         -q (individual admixture)          /projectnb2/vervpop/masiedu25/seNPY.snmf/K5/run1/seNPY_r1.5.Q
##         -g (ancestral frequencies)         /projectnb2/vervpop/masiedu25/seNPY.snmf/K5/run1/seNPY_r1.5.G
##         -i (with masked genotypes)         /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.132443
## Cross-Entropy (masked data):  0.185272
## The project is saved into :
##  seNPY.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 6  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             73
##         -L (number of loci)                    526
##         -K (number of ancestral pops)          6
##         -x (input file)                        /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
##         -q (individual admixture file)         /projectnb2/vervpop/masiedu25/seNPY.snmf/K6/run1/seNPY_r1.6.Q
##         -g (ancestral frequencies file)        /projectnb2/vervpop/masiedu25/seNPY.snmf/K6/run1/seNPY_r1.6.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  5492384222
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno:     OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [============]
## Number of iterations: 32
## 
## Least-square error: 2675.056545
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K6/run1/seNPY_r1.6.Q:        OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K6/run1/seNPY_r1.6.G: OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         73
##         -L (number of loci)                526
##         -K (number of ancestral pops)      6
##         -x (genotype file)                 /projectnb2/vervpop/masiedu25/seNPY.geno
##         -q (individual admixture)          /projectnb2/vervpop/masiedu25/seNPY.snmf/K6/run1/seNPY_r1.6.Q
##         -g (ancestral frequencies)         /projectnb2/vervpop/masiedu25/seNPY.snmf/K6/run1/seNPY_r1.6.G
##         -i (with masked genotypes)         /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.118842
## Cross-Entropy (masked data):  0.169688
## The project is saved into :
##  seNPY.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 7  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             73
##         -L (number of loci)                    526
##         -K (number of ancestral pops)          7
##         -x (input file)                        /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
##         -q (individual admixture file)         /projectnb2/vervpop/masiedu25/seNPY.snmf/K7/run1/seNPY_r1.7.Q
##         -g (ancestral frequencies file)        /projectnb2/vervpop/masiedu25/seNPY.snmf/K7/run1/seNPY_r1.7.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  1197416926
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno:     OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [=====================]
## Number of iterations: 55
## 
## Least-square error: 2579.364127
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K7/run1/seNPY_r1.7.Q:        OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K7/run1/seNPY_r1.7.G: OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         73
##         -L (number of loci)                526
##         -K (number of ancestral pops)      7
##         -x (genotype file)                 /projectnb2/vervpop/masiedu25/seNPY.geno
##         -q (individual admixture)          /projectnb2/vervpop/masiedu25/seNPY.snmf/K7/run1/seNPY_r1.7.Q
##         -g (ancestral frequencies)         /projectnb2/vervpop/masiedu25/seNPY.snmf/K7/run1/seNPY_r1.7.G
##         -i (with masked genotypes)         /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.113686
## Cross-Entropy (masked data):  0.180468
## The project is saved into :
##  seNPY.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 8  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             73
##         -L (number of loci)                    526
##         -K (number of ancestral pops)          8
##         -x (input file)                        /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
##         -q (individual admixture file)         /projectnb2/vervpop/masiedu25/seNPY.snmf/K8/run1/seNPY_r1.8.Q
##         -g (ancestral frequencies file)        /projectnb2/vervpop/masiedu25/seNPY.snmf/K8/run1/seNPY_r1.8.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  44147089886
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno:     OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [==============]
## Number of iterations: 38
## 
## Least-square error: 2420.827401
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K8/run1/seNPY_r1.8.Q:        OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K8/run1/seNPY_r1.8.G: OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         73
##         -L (number of loci)                526
##         -K (number of ancestral pops)      8
##         -x (genotype file)                 /projectnb2/vervpop/masiedu25/seNPY.geno
##         -q (individual admixture)          /projectnb2/vervpop/masiedu25/seNPY.snmf/K8/run1/seNPY_r1.8.Q
##         -g (ancestral frequencies)         /projectnb2/vervpop/masiedu25/seNPY.snmf/K8/run1/seNPY_r1.8.G
##         -i (with masked genotypes)         /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.106535
## Cross-Entropy (masked data):  0.167179
## The project is saved into :
##  seNPY.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY.snmfProject")
plot(seNPY.snmf, col = "blue4", cex = 1.4, pch = 19)

Here we can see there is a gradual plateau down to 8, which should be the selected value for K. Typically, a gradual slide in values suggests that there is a potentially important role for isolation-by-distance processes in the history of this gene region for vervets (in other words, genetic drift might be a major explanation for the patterns of variation we see in this gene region).

To make things a bit simpler, let’s run a population structure analysis that includes K=3 clusters

seNPY.snmf = snmf(seNPYgeno, K = 3, alpha = 100, project = "new")
## The project is saved into :
##  seNPY.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 3  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             73
##         -L (number of loci)                    526
##         -K (number of ancestral pops)          3
##         -x (input file)                        /projectnb2/vervpop/masiedu25/seNPY.geno
##         -q (individual admixture file)         /projectnb2/vervpop/masiedu25/seNPY.snmf/K3/run1/seNPY_r1.3.Q
##         -g (ancestral frequencies file)        /projectnb2/vervpop/masiedu25/seNPY.snmf/K3/run1/seNPY_r1.3.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  2120019740
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY.geno:     OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [========]
## Number of iterations: 22
## 
## Least-square error: 3255.507431
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K3/run1/seNPY_r1.3.Q:        OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K3/run1/seNPY_r1.3.G: OK.
## 
## The project is saved into :
##  seNPY.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY.snmfProject")
qmatrix = Q(seNPY.snmf, K = 3)

And let’s visualize this in a STRUCTURE-type barplot:

barplot(t(qmatrix), col = c("#F8766D","#00BA38","#619CFF"), border = 1, space = 0,
xlab = "Individuals", ylab = "Admixture coefficients", names.arg = pop.data$sample, las = 2, cex.names = 0.5)

Here we can kind of see that the C1 cluster predominates in part of FS North and KwaZulu-Nata along with C2, while the C3 is mostly dominant in Zambia and East Africa and FS South and Eastern Cape along with a bit of C2 in Zambia and East Africa.

Now we can repeat the Population Structure Analysis for NPY Y1!

NPY1 Population Structure Analysis

PCA

Separation by Taxon

setPop(NPY1_genlight) <- ~taxon
pca2 <- glPca(NPY1_genlight, nf = 2)
barplot(100*pca2$eig/sum(pca$eig), col = heat.colors(50), main="PCA Eigenvalues",ylab="% of Variance Explained",xlab="Eigenvalues")

This graph will show the amount of variation from the original dataset that is kept in the PCA. This will help us determine the amount of PC’s to keep, usually those that explain the most variation would want to be kept.

Once this is determines, we can graph with ggplot.

pca2.scores <- as.data.frame(pca2$scores)
pca2.scores$taxon <- pop(NPY1_genlight)
p <- ggplot(pca2.scores, aes(x=PC1, y=PC2, colour=taxon)) + geom_point(size=2) + stat_ellipse(level = 0.95, size = 1) + theme_bw()
print(p)

To see if these differences are significant, we can run the multivariate AMOVA:

adonis3 <- adonis(pca2$scores ~ taxon, data = pca2.scores, method='eu', na.rm = TRUE)
adonis3
## 
## Call:
## adonis(formula = pca2$scores ~ taxon, data = pca2.scores, method = "eu",      na.rm = TRUE) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##           Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)    
## taxon      2    110.23  55.113  16.063 0.31458  0.001 ***
## Residuals 70    240.17   3.431         0.68542           
## Total     72    350.39                 1.00000           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This shows that there is a significant difference between taxons for the NPY Y1 receptor (p<0.001), and the taxons explain about 31.5% of the geonmic variation according to the PCA.

Separation by Population

setPop(NPY1_genlight) <- ~pop
pca2 <- glPca(NPY1_genlight, nf = 2)

barplot(100*pca2$eig/sum(pca$eig), col = heat.colors(50), main="PCA Eigenvalues")
title(ylab="Percent of variance\nexplained", line = 2)
title(xlab="Eigenvalues", line = 1)

This code repeats the process of seeing how much variation from the original data is seen in the PCA for population. This will help us determine what PC’s we should keep.

The results can then be graphed into a ggplot

pca2.scores <- as.data.frame(pca2$scores)
pca2.scores$pop <- pop(NPY1_genlight)

p2 <- ggplot(pca2.scores, aes(x=PC1, y=PC2, colour=pop)) + geom_point(size=2) + stat_ellipse(level = 0.95, size = 1) + theme_bw() + scale_color_manual(values=c("purple4","salmon","dark green","forest green","seagreen1","dodgerblue","green","pink","cyan","red","darkblue"))
print(p2)
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Warning: Removed 6 row(s) containing missing values (geom_path).

ggsave("NPY1_PCA.png",plot=p,width=5.5,height=4,units=c("in"),dpi=300)

Now we can run the multivariate AMOVA using adonis to see if the differences are significant

adonis4 <- adonis(pca2$scores ~ pop, data = pca2.scores, method='eu', na.rm = TRUE)
adonis4
## 
## Call:
## adonis(formula = pca2$scores ~ pop, data = pca2.scores, method = "eu",      na.rm = TRUE) 
## 
## Permutation: free
## Number of permutations: 999
## 
## Terms added sequentially (first to last)
## 
##           Df SumsOfSqs MeanSqs F.Model      R2 Pr(>F)    
## pop       10    236.76 23.6757  12.917 0.67569  0.001 ***
## Residuals 62    113.64  1.8329         0.32431           
## Total     72    350.39                 1.00000           
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This shows that there is a significant difference in populations (p<0.001), and population explains about 67.6% of the genomic variance in the NPY Y1 gene region.

Between significant variation found in NPY and NPY receptor Y1 populations, variation was commonly found in an Unknown population as well as the Kasane and Livingstone sub-populations. This may suggest NPY variation, leading to possible differences in BMI and body weight, may be explained by variation in the NPY Y1 receptor of these populations.

Now we can move onto the DAPC population structure analysis!

DAPC

dapc1 <- dapc(NPY1_genind, n.pca = 2, n.da = 2)
dapc1$grp
##  [1] Unknown       Unknown       Kasane        Kasane        Lowland      
##  [6] Lowland       Highland      Highland      Free State    Free State   
## [11] Free State    Free State    Free State    Free State    Free State   
## [16] Free State    Free State    Free State    Free State    Free State   
## [21] Free State    Free State    Free State    Free State    Free State   
## [26] Free State    Free State    Free State    KwaZulu-Natal KwaZulu-Natal
## [31] KwaZulu-Natal KwaZulu-Natal KwaZulu-Natal KwaZulu-Natal KwaZulu-Natal
## [36] KwaZulu-Natal KwaZulu-Natal KwaZulu-Natal KwaZulu-Natal KwaZulu-Natal
## [41] KwaZulu-Natal KwaZulu-Natal KwaZulu-Natal Limpopo       Limpopo      
## [46] Eastern Cape  Eastern Cape  Eastern Cape  Eastern Cape  Eastern Cape 
## [51] Eastern Cape  Eastern Cape  Eastern Cape  Eastern Cape  Eastern Cape 
## [56] Eastern Cape  Eastern Cape  Kafue         Kafue         Kafue        
## [61] Kafue         Kafue         Kafue         Kafue         Kafue        
## [66] Kafue         Lusaka        Lusaka        Livingston    Livingston   
## [71] Livingston    Livingston    Livingston   
## 11 Levels: Unknown Kasane Lowland Highland Free State KwaZulu-Natal ... Livingston
scatter(dapc1, cex = 2, legend = TRUE, clabel = F, posi.leg = "bottomleft", scree.pca2 = TRUE,
        posi.pca2 = "topright", cleg = 0.75, col=c("purple4","salmon","dark green","forest green","seagreen1","dodgerblue","green","pink","cyan","red","darkblue"))
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "scree.pca2" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "posi.pca2" is not a
## graphical parameter

Here we can see a bit more clearly that the variation we see in South Africa appears to scale away from the rest of Africa along DA2, and within South Africa along DA1.

From looking at the PCAs and DAPC from NPY Y1, we can make a couple of observations. With the first graph, we see that pygerythrus is separating from hilgerti and cynosuros mainly along PC1 because of a single cluster of pygerythrus. When we look at the sub-populations of each species in a PCA, we see that the cluster of pygerythrus that is separating from the other two species is made up of four sub-populations: Free State, Limpopo, KwaZulu-Natal, and Eastern Cape. It is clear that the rest of the sub-populations are maintained only to hilgerti and cynosuros populations. The high significance of this separation is also shown by our Adonis results. In the DAPC, we again see the Free State, Limpopo, KwaZulu-Natal, and Eastern Cape populations clustering apart from the rest. In both graphs, it appears that there is very little to no overlap of variation between taxons and sub-populations for NPY Y1.

Now we can perform the AMOVA analysis:

AMOVA

library(poppr)
NPY1_genlight2 <- popsub(NPY1_genlight, exclude = c("Limpopo"))
amova2 <- poppr.amova(NPY1_genlight2, ~taxon/pop)
amova2
## $call
## ade4::amova(samples = xtab, distances = xdist, structures = xstruct)
## 
## $results
##                             Df    Sum Sq   Mean Sq
## Between taxon                2  311.4445 155.72226
## Between pop Within taxon     7  432.8280  61.83258
## Between samples Within pop  61 1030.9528  16.90087
## Within samples              71 1172.5000  16.51408
## Total                      141 2947.7254  20.90585
## 
## $componentsofcovariance
##                                             Sigma           %
## Variations  Between taxon               2.9619781  12.8433255
## Variations  Between pop Within taxon    3.3929391  14.7119999
## Variations  Between samples Within pop  0.1933904   0.8385529
## Variations  Within samples             16.5140845  71.6061218
## Total variations                       23.0623920 100.0000000
## 
## $statphi
##                          Phi
## Phi-samples-total 0.28393878
## Phi-samples-pop   0.01157508
## Phi-pop-taxon     0.16879946
## Phi-taxon-total   0.12843326
amova.test <- randtest(amova2) 
plot(amova.test)

When we run the AMOVA on NPY looking for molecular variance, we can see that 12.8% of the variance is between taxa, 14.7% is between populations within taxa, 0.84% between samples within populations, and the remaining 71.4% is within samples outside these designations. This suggests that our population designations are relatively homogeneous within themselves, suggesting good mixing at that level, but significantly stratified by population and taxon.

Fst

library(hierfstat)
wc(NPY1_genind)
## $FST
## [1] 0.2244068
## 
## $FIS
## [1] 0.0100416
NPY1Fst <- genet.dist(NPY1_genind[1:73,],method="WC84")
NPY1Fst
##                    Unknown       Kasane      Lowland     Highland   Free State
## Kasane         0.133903134                                                    
## Lowland       -0.065616798  0.147453083                                       
## Highland      -0.063829787  0.157556270  0.016216216                          
## Free State     0.260688124  0.146034125  0.199827920  0.254480197             
## KwaZulu-Natal  0.202412833  0.155824645  0.151510963  0.218634966  0.079864810
## Limpopo        0.135678392  0.108013937  0.052910053  0.183783784 -0.088922312
## Eastern Cape   0.539645291  0.488852380  0.471345589  0.535889346  0.336229791
## Kafue          0.123359760  0.082095762  0.076870378  0.136315790  0.189493503
## Lusaka         0.151041667  0.209302326  0.116883117  0.248677249  0.195942547
## Livingston     0.255438332  0.204191741  0.199184679  0.271502096  0.246229463
##               KwaZulu-Natal      Limpopo Eastern Cape        Kafue       Lusaka
## Kasane                                                                         
## Lowland                                                                        
## Highland                                                                       
## Free State                                                                     
## KwaZulu-Natal                                                                  
## Limpopo        -0.009731088                                                    
## Eastern Cape    0.180984767  0.352527298                                       
## Kafue           0.163601101  0.116933297  0.438388424                          
## Lusaka          0.184555854  0.165165165  0.552815198  0.042428920             
## Livingston      0.213107516  0.174884614  0.507877984  0.132050531  0.172304741

Now that we have that, we can create a heatmap to visualize FST using {ggplot2}. The output that {hierfstat} gave us is a dist class file, so first we will use the {reshape2} package to make it a matrix.

library(reshape2)
NPY1Fst <- as.matrix(NPY1Fst)
NPY1Fst
##                   Unknown     Kasane     Lowland    Highland  Free State
## Unknown        0.00000000 0.13390313 -0.06561680 -0.06382979  0.26068812
## Kasane         0.13390313 0.00000000  0.14745308  0.15755627  0.14603413
## Lowland       -0.06561680 0.14745308  0.00000000  0.01621622  0.19982792
## Highland      -0.06382979 0.15755627  0.01621622  0.00000000  0.25448020
## Free State     0.26068812 0.14603413  0.19982792  0.25448020  0.00000000
## KwaZulu-Natal  0.20241283 0.15582464  0.15151096  0.21863497  0.07986481
## Limpopo        0.13567839 0.10801394  0.05291005  0.18378378 -0.08892231
## Eastern Cape   0.53964529 0.48885238  0.47134559  0.53588935  0.33622979
## Kafue          0.12335976 0.08209576  0.07687038  0.13631579  0.18949350
## Lusaka         0.15104167 0.20930233  0.11688312  0.24867725  0.19594255
## Livingston     0.25543833 0.20419174  0.19918468  0.27150210  0.24622946
##               KwaZulu-Natal      Limpopo Eastern Cape      Kafue     Lusaka
## Unknown         0.202412833  0.135678392    0.5396453 0.12335976 0.15104167
## Kasane          0.155824645  0.108013937    0.4888524 0.08209576 0.20930233
## Lowland         0.151510963  0.052910053    0.4713456 0.07687038 0.11688312
## Highland        0.218634966  0.183783784    0.5358893 0.13631579 0.24867725
## Free State      0.079864810 -0.088922312    0.3362298 0.18949350 0.19594255
## KwaZulu-Natal   0.000000000 -0.009731088    0.1809848 0.16360110 0.18455585
## Limpopo        -0.009731088  0.000000000    0.3525273 0.11693330 0.16516517
## Eastern Cape    0.180984767  0.352527298    0.0000000 0.43838842 0.55281520
## Kafue           0.163601101  0.116933297    0.4383884 0.00000000 0.04242892
## Lusaka          0.184555854  0.165165165    0.5528152 0.04242892 0.00000000
## Livingston      0.213107516  0.174884614    0.5078780 0.13205053 0.17230474
##               Livingston
## Unknown        0.2554383
## Kasane         0.2041917
## Lowland        0.1991847
## Highland       0.2715021
## Free State     0.2462295
## KwaZulu-Natal  0.2131075
## Limpopo        0.1748846
## Eastern Cape   0.5078780
## Kafue          0.1320505
## Lusaka         0.1723047
## Livingston     0.0000000
colnames(NPY1Fst) <- c("Unknown","Kasane","Lowland","Highland","Free State","KwaZulu-Natal","Limpopo","Eastern Cape","Kafue","Lusaka","Livingston") #rename columns from numbers to population names. 
rownames(NPY1Fst) <- c("Unknown","Kasane","Lowland","Highland","Free State","KwaZulu-Natal","Limpopo","Eastern Cape","Kafue","Lusaka","Livingston") #rename rows from numbers to population names. 
NPY1Fst
##                   Unknown     Kasane     Lowland    Highland  Free State
## Unknown        0.00000000 0.13390313 -0.06561680 -0.06382979  0.26068812
## Kasane         0.13390313 0.00000000  0.14745308  0.15755627  0.14603413
## Lowland       -0.06561680 0.14745308  0.00000000  0.01621622  0.19982792
## Highland      -0.06382979 0.15755627  0.01621622  0.00000000  0.25448020
## Free State     0.26068812 0.14603413  0.19982792  0.25448020  0.00000000
## KwaZulu-Natal  0.20241283 0.15582464  0.15151096  0.21863497  0.07986481
## Limpopo        0.13567839 0.10801394  0.05291005  0.18378378 -0.08892231
## Eastern Cape   0.53964529 0.48885238  0.47134559  0.53588935  0.33622979
## Kafue          0.12335976 0.08209576  0.07687038  0.13631579  0.18949350
## Lusaka         0.15104167 0.20930233  0.11688312  0.24867725  0.19594255
## Livingston     0.25543833 0.20419174  0.19918468  0.27150210  0.24622946
##               KwaZulu-Natal      Limpopo Eastern Cape      Kafue     Lusaka
## Unknown         0.202412833  0.135678392    0.5396453 0.12335976 0.15104167
## Kasane          0.155824645  0.108013937    0.4888524 0.08209576 0.20930233
## Lowland         0.151510963  0.052910053    0.4713456 0.07687038 0.11688312
## Highland        0.218634966  0.183783784    0.5358893 0.13631579 0.24867725
## Free State      0.079864810 -0.088922312    0.3362298 0.18949350 0.19594255
## KwaZulu-Natal   0.000000000 -0.009731088    0.1809848 0.16360110 0.18455585
## Limpopo        -0.009731088  0.000000000    0.3525273 0.11693330 0.16516517
## Eastern Cape    0.180984767  0.352527298    0.0000000 0.43838842 0.55281520
## Kafue           0.163601101  0.116933297    0.4383884 0.00000000 0.04242892
## Lusaka          0.184555854  0.165165165    0.5528152 0.04242892 0.00000000
## Livingston      0.213107516  0.174884614    0.5078780 0.13205053 0.17230474
##               Livingston
## Unknown        0.2554383
## Kasane         0.2041917
## Lowland        0.1991847
## Highland       0.2715021
## Free State     0.2462295
## KwaZulu-Natal  0.2131075
## Limpopo        0.1748846
## Eastern Cape   0.5078780
## Kafue          0.1320505
## Lusaka         0.1723047
## Livingston     0.0000000
melted_fst1 <- melt(NPY1Fst) #melt the data to be used for a heatmap
melted_fst1
##              Var1          Var2        value
## 1         Unknown       Unknown  0.000000000
## 2          Kasane       Unknown  0.133903134
## 3         Lowland       Unknown -0.065616798
## 4        Highland       Unknown -0.063829787
## 5      Free State       Unknown  0.260688124
## 6   KwaZulu-Natal       Unknown  0.202412833
## 7         Limpopo       Unknown  0.135678392
## 8    Eastern Cape       Unknown  0.539645291
## 9           Kafue       Unknown  0.123359760
## 10         Lusaka       Unknown  0.151041667
## 11     Livingston       Unknown  0.255438332
## 12        Unknown        Kasane  0.133903134
## 13         Kasane        Kasane  0.000000000
## 14        Lowland        Kasane  0.147453083
## 15       Highland        Kasane  0.157556270
## 16     Free State        Kasane  0.146034125
## 17  KwaZulu-Natal        Kasane  0.155824645
## 18        Limpopo        Kasane  0.108013937
## 19   Eastern Cape        Kasane  0.488852380
## 20          Kafue        Kasane  0.082095762
## 21         Lusaka        Kasane  0.209302326
## 22     Livingston        Kasane  0.204191741
## 23        Unknown       Lowland -0.065616798
## 24         Kasane       Lowland  0.147453083
## 25        Lowland       Lowland  0.000000000
## 26       Highland       Lowland  0.016216216
## 27     Free State       Lowland  0.199827920
## 28  KwaZulu-Natal       Lowland  0.151510963
## 29        Limpopo       Lowland  0.052910053
## 30   Eastern Cape       Lowland  0.471345589
## 31          Kafue       Lowland  0.076870378
## 32         Lusaka       Lowland  0.116883117
## 33     Livingston       Lowland  0.199184679
## 34        Unknown      Highland -0.063829787
## 35         Kasane      Highland  0.157556270
## 36        Lowland      Highland  0.016216216
## 37       Highland      Highland  0.000000000
## 38     Free State      Highland  0.254480197
## 39  KwaZulu-Natal      Highland  0.218634966
## 40        Limpopo      Highland  0.183783784
## 41   Eastern Cape      Highland  0.535889346
## 42          Kafue      Highland  0.136315790
## 43         Lusaka      Highland  0.248677249
## 44     Livingston      Highland  0.271502096
## 45        Unknown    Free State  0.260688124
## 46         Kasane    Free State  0.146034125
## 47        Lowland    Free State  0.199827920
## 48       Highland    Free State  0.254480197
## 49     Free State    Free State  0.000000000
## 50  KwaZulu-Natal    Free State  0.079864810
## 51        Limpopo    Free State -0.088922312
## 52   Eastern Cape    Free State  0.336229791
## 53          Kafue    Free State  0.189493503
## 54         Lusaka    Free State  0.195942547
## 55     Livingston    Free State  0.246229463
## 56        Unknown KwaZulu-Natal  0.202412833
## 57         Kasane KwaZulu-Natal  0.155824645
## 58        Lowland KwaZulu-Natal  0.151510963
## 59       Highland KwaZulu-Natal  0.218634966
## 60     Free State KwaZulu-Natal  0.079864810
## 61  KwaZulu-Natal KwaZulu-Natal  0.000000000
## 62        Limpopo KwaZulu-Natal -0.009731088
## 63   Eastern Cape KwaZulu-Natal  0.180984767
## 64          Kafue KwaZulu-Natal  0.163601101
## 65         Lusaka KwaZulu-Natal  0.184555854
## 66     Livingston KwaZulu-Natal  0.213107516
## 67        Unknown       Limpopo  0.135678392
## 68         Kasane       Limpopo  0.108013937
## 69        Lowland       Limpopo  0.052910053
## 70       Highland       Limpopo  0.183783784
## 71     Free State       Limpopo -0.088922312
## 72  KwaZulu-Natal       Limpopo -0.009731088
## 73        Limpopo       Limpopo  0.000000000
## 74   Eastern Cape       Limpopo  0.352527298
## 75          Kafue       Limpopo  0.116933297
## 76         Lusaka       Limpopo  0.165165165
## 77     Livingston       Limpopo  0.174884614
## 78        Unknown  Eastern Cape  0.539645291
## 79         Kasane  Eastern Cape  0.488852380
## 80        Lowland  Eastern Cape  0.471345589
## 81       Highland  Eastern Cape  0.535889346
## 82     Free State  Eastern Cape  0.336229791
## 83  KwaZulu-Natal  Eastern Cape  0.180984767
## 84        Limpopo  Eastern Cape  0.352527298
## 85   Eastern Cape  Eastern Cape  0.000000000
## 86          Kafue  Eastern Cape  0.438388424
## 87         Lusaka  Eastern Cape  0.552815198
## 88     Livingston  Eastern Cape  0.507877984
## 89        Unknown         Kafue  0.123359760
## 90         Kasane         Kafue  0.082095762
## 91        Lowland         Kafue  0.076870378
## 92       Highland         Kafue  0.136315790
## 93     Free State         Kafue  0.189493503
## 94  KwaZulu-Natal         Kafue  0.163601101
## 95        Limpopo         Kafue  0.116933297
## 96   Eastern Cape         Kafue  0.438388424
## 97          Kafue         Kafue  0.000000000
## 98         Lusaka         Kafue  0.042428920
## 99     Livingston         Kafue  0.132050531
## 100       Unknown        Lusaka  0.151041667
## 101        Kasane        Lusaka  0.209302326
## 102       Lowland        Lusaka  0.116883117
## 103      Highland        Lusaka  0.248677249
## 104    Free State        Lusaka  0.195942547
## 105 KwaZulu-Natal        Lusaka  0.184555854
## 106       Limpopo        Lusaka  0.165165165
## 107  Eastern Cape        Lusaka  0.552815198
## 108         Kafue        Lusaka  0.042428920
## 109        Lusaka        Lusaka  0.000000000
## 110    Livingston        Lusaka  0.172304741
## 111       Unknown    Livingston  0.255438332
## 112        Kasane    Livingston  0.204191741
## 113       Lowland    Livingston  0.199184679
## 114      Highland    Livingston  0.271502096
## 115    Free State    Livingston  0.246229463
## 116 KwaZulu-Natal    Livingston  0.213107516
## 117       Limpopo    Livingston  0.174884614
## 118  Eastern Cape    Livingston  0.507877984
## 119         Kafue    Livingston  0.132050531
## 120        Lusaka    Livingston  0.172304741
## 121    Livingston    Livingston  0.000000000
library(ggplot2)
ggplot(data = melted_fst1, aes(x=Var1, y=Var2, fill=value)) + ylab(NULL) + xlab(NULL) + coord_fixed() + 
  theme(axis.text=element_text(angle=45,vjust=0.5)) + geom_tile()

#this looks okay, but as before we'd rather not have the redundance of the same values being shown twice. So we will just use the lower half of the matrix:

get_lower_tri1<-function(NPY1Fst){
    NPY1Fst[lower.tri(NPY1Fst)] <- NA
    return(NPY1Fst)
}

lower_tri1 <- get_lower_tri1(NPY1Fst) #Fill half of the repeated values with "NA's"
lower_tri1
##               Unknown    Kasane    Lowland    Highland Free State KwaZulu-Natal
## Unknown             0 0.1339031 -0.0656168 -0.06382979  0.2606881    0.20241283
## Kasane             NA 0.0000000  0.1474531  0.15755627  0.1460341    0.15582464
## Lowland            NA        NA  0.0000000  0.01621622  0.1998279    0.15151096
## Highland           NA        NA         NA  0.00000000  0.2544802    0.21863497
## Free State         NA        NA         NA          NA  0.0000000    0.07986481
## KwaZulu-Natal      NA        NA         NA          NA         NA    0.00000000
## Limpopo            NA        NA         NA          NA         NA            NA
## Eastern Cape       NA        NA         NA          NA         NA            NA
## Kafue              NA        NA         NA          NA         NA            NA
## Lusaka             NA        NA         NA          NA         NA            NA
## Livingston         NA        NA         NA          NA         NA            NA
##                    Limpopo Eastern Cape      Kafue     Lusaka Livingston
## Unknown        0.135678392    0.5396453 0.12335976 0.15104167  0.2554383
## Kasane         0.108013937    0.4888524 0.08209576 0.20930233  0.2041917
## Lowland        0.052910053    0.4713456 0.07687038 0.11688312  0.1991847
## Highland       0.183783784    0.5358893 0.13631579 0.24867725  0.2715021
## Free State    -0.088922312    0.3362298 0.18949350 0.19594255  0.2462295
## KwaZulu-Natal -0.009731088    0.1809848 0.16360110 0.18455585  0.2131075
## Limpopo        0.000000000    0.3525273 0.11693330 0.16516517  0.1748846
## Eastern Cape            NA    0.0000000 0.43838842 0.55281520  0.5078780
## Kafue                   NA           NA 0.00000000 0.04242892  0.1320505
## Lusaka                  NA           NA         NA 0.00000000  0.1723047
## Livingston              NA           NA         NA         NA  0.0000000
melted_NPY1 <- melt(lower_tri1, na.rm = TRUE) #melt this half of the matrix

ggplot(data = melted_NPY1, aes(x=Var1, y=Var2, fill=value)) + theme_minimal() + ylab(NULL) + xlab(NULL) + coord_fixed() + theme(axis.text=element_text(angle=45, vjust=0.5)) + geom_tile(color = "white")

Here we can see that there are very few differences in the Lusaka and Eastern Cape populations for their NPY Y1 regions, but Limpopo and Free State shows that their populations are among the most different

Structure

In order to run analyses similar to structure, we must use this code:

install.packages(c("fields","RColorBrewer","mapplots"))
BiocManager::install("LEA")
library(LEA)
seNPY1geno<-vcf2geno("seNPY1.vcf")
## 
##  - number of detected individuals:   73
##  - number of detected loci:      432
## 
## For SNP info, please check ./seNPY1.vcfsnp.
## 
## 0 line(s) were removed because these are not SNPs.
## Please, check ./seNPY1.removed file, for more informations.

Once again, we can figure out how many clusters we should look at in our sample using cross-entropy criterion:

seNPY1.snmf = snmf(seNPY1geno, K = 1:8, ploidy = 2, entropy = T,
alpha = 100, project = "new")
## The project is saved into :
##  seNPY1.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY1.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY1.snmfProject")
## 
## [1] 281110398
## [1] "*************************************"
## [1] "*          create.dataset            *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)                 73
##         -L (number of loci)                        432
##         -s (seed random init)                      281110398
##         -r (percentage of masked data)             0.05
##         -x (genotype file in .geno format)         /projectnb2/vervpop/masiedu25/seNPY1.geno
##         -o (output file in .geno format)           /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
## 
##  Write genotype file with masked data, /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno:       OK.
## 
## [1] "*************************************"
## [1] "* sNMF K = 1  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             73
##         -L (number of loci)                    432
##         -K (number of ancestral pops)          1
##         -x (input file)                        /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
##         -q (individual admixture file)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/K1/run1/seNPY1_r1.1.Q
##         -g (ancestral frequencies file)        /projectnb2/vervpop/masiedu25/seNPY1.snmf/K1/run1/seNPY1_r1.1.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  281110398
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno:       OK.
## 
## 
## Main algorithm:
## 
## Least-square error: 4547.562109
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K1/run1/seNPY1_r1.1.Q:      OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K1/run1/seNPY1_r1.1.G:   OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         73
##         -L (number of loci)                432
##         -K (number of ancestral pops)      1
##         -x (genotype file)                 /projectnb2/vervpop/masiedu25/seNPY1.geno
##         -q (individual admixture)          /projectnb2/vervpop/masiedu25/seNPY1.snmf/K1/run1/seNPY1_r1.1.Q
##         -g (ancestral frequencies)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/K1/run1/seNPY1_r1.1.G
##         -i (with masked genotypes)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.258672
## Cross-Entropy (masked data):  0.269014
## The project is saved into :
##  seNPY1.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY1.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY1.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 2  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             73
##         -L (number of loci)                    432
##         -K (number of ancestral pops)          2
##         -x (input file)                        /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
##         -q (individual admixture file)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/K2/run1/seNPY1_r1.2.Q
##         -g (ancestral frequencies file)        /projectnb2/vervpop/masiedu25/seNPY1.snmf/K2/run1/seNPY1_r1.2.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  281110398
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno:       OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [================]
## Number of iterations: 44
## 
## Least-square error: 4031.080744
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K2/run1/seNPY1_r1.2.Q:      OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K2/run1/seNPY1_r1.2.G:   OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         73
##         -L (number of loci)                432
##         -K (number of ancestral pops)      2
##         -x (genotype file)                 /projectnb2/vervpop/masiedu25/seNPY1.geno
##         -q (individual admixture)          /projectnb2/vervpop/masiedu25/seNPY1.snmf/K2/run1/seNPY1_r1.2.Q
##         -g (ancestral frequencies)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/K2/run1/seNPY1_r1.2.G
##         -i (with masked genotypes)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.221146
## Cross-Entropy (masked data):  0.237834
## The project is saved into :
##  seNPY1.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY1.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY1.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 3  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             73
##         -L (number of loci)                    432
##         -K (number of ancestral pops)          3
##         -x (input file)                        /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
##         -q (individual admixture file)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/K3/run1/seNPY1_r1.3.Q
##         -g (ancestral frequencies file)        /projectnb2/vervpop/masiedu25/seNPY1.snmf/K3/run1/seNPY1_r1.3.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  281110398
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno:       OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [=========]
## Number of iterations: 24
## 
## Least-square error: 3652.375951
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K3/run1/seNPY1_r1.3.Q:      OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K3/run1/seNPY1_r1.3.G:   OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         73
##         -L (number of loci)                432
##         -K (number of ancestral pops)      3
##         -x (genotype file)                 /projectnb2/vervpop/masiedu25/seNPY1.geno
##         -q (individual admixture)          /projectnb2/vervpop/masiedu25/seNPY1.snmf/K3/run1/seNPY1_r1.3.Q
##         -g (ancestral frequencies)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/K3/run1/seNPY1_r1.3.G
##         -i (with masked genotypes)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.198766
## Cross-Entropy (masked data):  0.223452
## The project is saved into :
##  seNPY1.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY1.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY1.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 4  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             73
##         -L (number of loci)                    432
##         -K (number of ancestral pops)          4
##         -x (input file)                        /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
##         -q (individual admixture file)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/K4/run1/seNPY1_r1.4.Q
##         -g (ancestral frequencies file)        /projectnb2/vervpop/masiedu25/seNPY1.snmf/K4/run1/seNPY1_r1.4.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  281110398
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno:       OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [========================]
## Number of iterations: 65
## 
## Least-square error: 3466.029958
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K4/run1/seNPY1_r1.4.Q:      OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K4/run1/seNPY1_r1.4.G:   OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         73
##         -L (number of loci)                432
##         -K (number of ancestral pops)      4
##         -x (genotype file)                 /projectnb2/vervpop/masiedu25/seNPY1.geno
##         -q (individual admixture)          /projectnb2/vervpop/masiedu25/seNPY1.snmf/K4/run1/seNPY1_r1.4.Q
##         -g (ancestral frequencies)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/K4/run1/seNPY1_r1.4.G
##         -i (with masked genotypes)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.187517
## Cross-Entropy (masked data):  0.221579
## The project is saved into :
##  seNPY1.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY1.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY1.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 5  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             73
##         -L (number of loci)                    432
##         -K (number of ancestral pops)          5
##         -x (input file)                        /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
##         -q (individual admixture file)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/K5/run1/seNPY1_r1.5.Q
##         -g (ancestral frequencies file)        /projectnb2/vervpop/masiedu25/seNPY1.snmf/K5/run1/seNPY1_r1.5.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  8871044990
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno:       OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [===============]
## Number of iterations: 41
## 
## Least-square error: 3333.170342
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K5/run1/seNPY1_r1.5.Q:      OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K5/run1/seNPY1_r1.5.G:   OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         73
##         -L (number of loci)                432
##         -K (number of ancestral pops)      5
##         -x (genotype file)                 /projectnb2/vervpop/masiedu25/seNPY1.geno
##         -q (individual admixture)          /projectnb2/vervpop/masiedu25/seNPY1.snmf/K5/run1/seNPY1_r1.5.Q
##         -g (ancestral frequencies)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/K5/run1/seNPY1_r1.5.G
##         -i (with masked genotypes)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.179007
## Cross-Entropy (masked data):  0.223992
## The project is saved into :
##  seNPY1.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY1.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY1.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 6  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             73
##         -L (number of loci)                    432
##         -K (number of ancestral pops)          6
##         -x (input file)                        /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
##         -q (individual admixture file)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/K6/run1/seNPY1_r1.6.Q
##         -g (ancestral frequencies file)        /projectnb2/vervpop/masiedu25/seNPY1.snmf/K6/run1/seNPY1_r1.6.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  4576077694
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno:       OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [=========================]
## Number of iterations: 68
## 
## Least-square error: 3157.853409
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K6/run1/seNPY1_r1.6.Q:      OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K6/run1/seNPY1_r1.6.G:   OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         73
##         -L (number of loci)                432
##         -K (number of ancestral pops)      6
##         -x (genotype file)                 /projectnb2/vervpop/masiedu25/seNPY1.geno
##         -q (individual admixture)          /projectnb2/vervpop/masiedu25/seNPY1.snmf/K6/run1/seNPY1_r1.6.Q
##         -g (ancestral frequencies)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/K6/run1/seNPY1_r1.6.G
##         -i (with masked genotypes)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.166591
## Cross-Entropy (masked data):  0.21531
## The project is saved into :
##  seNPY1.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY1.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY1.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 7  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             73
##         -L (number of loci)                    432
##         -K (number of ancestral pops)          7
##         -x (input file)                        /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
##         -q (individual admixture file)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/K7/run1/seNPY1_r1.7.Q
##         -g (ancestral frequencies file)        /projectnb2/vervpop/masiedu25/seNPY1.snmf/K7/run1/seNPY1_r1.7.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  281110398
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno:       OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [=======================]
## Number of iterations: 61
## 
## Least-square error: 3080.779990
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K7/run1/seNPY1_r1.7.Q:      OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K7/run1/seNPY1_r1.7.G:   OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         73
##         -L (number of loci)                432
##         -K (number of ancestral pops)      7
##         -x (genotype file)                 /projectnb2/vervpop/masiedu25/seNPY1.geno
##         -q (individual admixture)          /projectnb2/vervpop/masiedu25/seNPY1.snmf/K7/run1/seNPY1_r1.7.Q
##         -g (ancestral frequencies)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/K7/run1/seNPY1_r1.7.G
##         -i (with masked genotypes)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.163003
## Cross-Entropy (masked data):  0.214531
## The project is saved into :
##  seNPY1.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY1.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY1.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 8  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             73
##         -L (number of loci)                    432
##         -K (number of ancestral pops)          8
##         -x (input file)                        /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
##         -q (individual admixture file)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/K8/run1/seNPY1_r1.8.Q
##         -g (ancestral frequencies file)        /projectnb2/vervpop/masiedu25/seNPY1.snmf/K8/run1/seNPY1_r1.8.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  281110398
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno:       OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [============================]
## Number of iterations: 76
## 
## Least-square error: 2816.073589
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K8/run1/seNPY1_r1.8.Q:      OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K8/run1/seNPY1_r1.8.G:   OK.
## 
## [1] "*************************************"
## [1] "*    cross-entropy estimation       *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)         73
##         -L (number of loci)                432
##         -K (number of ancestral pops)      8
##         -x (genotype file)                 /projectnb2/vervpop/masiedu25/seNPY1.geno
##         -q (individual admixture)          /projectnb2/vervpop/masiedu25/seNPY1.snmf/K8/run1/seNPY1_r1.8.Q
##         -g (ancestral frequencies)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/K8/run1/seNPY1_r1.8.G
##         -i (with masked genotypes)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
##         - diploid
## 
## Cross-Entropy (all data):     0.149025
## Cross-Entropy (masked data):  0.217321
## The project is saved into :
##  seNPY1.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY1.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY1.snmfProject")
plot(seNPY1.snmf, col = "blue4", cex = 1.4, pch = 19)

Since there is not a clear gradual plateau present, it cannot be said that isolation-by-distance processes such as genetic drift explain variation among the NPY Y1 region, and it is possible that other population processes such as selection may be taking place.

To make things a bit simpler, let’s run a population structure analysis that includes K=3 clusters:

seNPY1.snmf = snmf(seNPY1geno, K = 3, alpha = 100, project = "new")
## The project is saved into :
##  seNPY1.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY1.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY1.snmfProject")
## 
## [1] "*************************************"
## [1] "* sNMF K = 3  repetition 1      *"
## [1] "*************************************"
## summary of the options:
## 
##         -n (number of individuals)             73
##         -L (number of loci)                    432
##         -K (number of ancestral pops)          3
##         -x (input file)                        /projectnb2/vervpop/masiedu25/seNPY1.geno
##         -q (individual admixture file)         /projectnb2/vervpop/masiedu25/seNPY1.snmf/K3/run1/seNPY1_r1.3.Q
##         -g (ancestral frequencies file)        /projectnb2/vervpop/masiedu25/seNPY1.snmf/K3/run1/seNPY1_r1.3.G
##         -i (number max of iterations)          200
##         -a (regularization parameter)          100
##         -s (seed random init)                  2010044859
##         -e (tolerance error)                   1E-05
##         -p (number of processes)               1
##         - diploid
## 
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY1.geno:        OK.
## 
## 
## Main algorithm:
##  [                                                                           ]
##  [==========]
## Number of iterations: 26
## 
## Least-square error: 3591.243243
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K3/run1/seNPY1_r1.3.Q:      OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K3/run1/seNPY1_r1.3.G:   OK.
## 
## The project is saved into :
##  seNPY1.snmfProject 
## 
## To load the project, use:
##  project = load.snmfProject("seNPY1.snmfProject")
## 
## To remove the project, use:
##  remove.snmfProject("seNPY1.snmfProject")
qmatrix1 = Q(seNPY1.snmf, K = 3)

And let’s visualize this in a STRUCTURE-type barplot:

barplot(t(qmatrix1), col = c("#F8766D","#00BA38","#619CFF"), border = 1, space = 0,
xlab = "Individuals", ylab = "Admixture coefficients", names.arg = pop.data$sample, las = 2, cex.names = 0.5)

Here we can kind of see that the C1 and C2 cluster largely predominates in part of FS North and KwaZulu-Nata along, while the C3 is mostly dominant in Zambia and East Africa and a little bit in FS South and Eastern Cape along with a bit of C1 in Zambia and East Africa.

Conclusion

In conclusion, it is clear that there is a significant difference in the variation of gene regions amongst the vervet sub-populations for both NPY and NPY Y1 genes (p<0.001). Though there was a significant difference in variation among taxons for the NPY Y1 receptor gene region (p<0.001), there was not significant difference in taxons for the NPY gene itself. This may imply that generally throughout the vervet species, NPY is highly conserved and therefore integral to their physiological functions. Seeing that the NPY Y1 receptor is integral in the regulation of feeding and body weight regulation, understanding how differences in receptor expression may correlate to overall expression of the NPY gene, and therefore orexigenic processes among the vervet species, may provide insight as to the significance for the type of mechanisms of selection that take place among the species. Analyzing the significance of linkage disequilibrium and the presentation of linkage blocks within both gene regions, which show that there are certainly genetic forces working on the gene regions, provides the space to inquire on why such forces are occurring.

From this research, it can be found that NPY Y1 receptor expresses a significant difference in variation only prevalent within certain taxon and populations. It was found in Mercer et. al. that NPY receptor Y1 knockout is correlated with a decrease in food intake, an increase in body weight, and a decrease in energy expenditure (2011). This shows that in individuals where Y1 variance leads to upregulation of the NPY Y1 gene, there may be a need to control functionality of the NPY gene to result in an increase in energy expenditure and food intake, and a decrease in body weight. In addition to this, the overlap of hypothalamic expression in the between receptor Y1 and Y5 (Mercer 2011) may correlate with the significance of the NPY gene function and its pathways. While this research details the significance of NPY and NPY Y1 gene variation among vervets, the role of variation among other receptors of interest is still unclear. With further analysis of the gene regions for the other prevalent NPY receptors that also commonly exist in humans (NPY Y2 and Y5), we will be able to gain insight as to how variation breaks down within and between vervet taxons and populations.

It is noted that NPY and NPY receptor expression and it’s interaction with corticotropin-releasing hormone (CRH) is linked to regulation of fear, stress and anxiety (Sajdyk 2004). When introduced to stressful or novel environments, some vervet monkeys display defensive or aggressive behavior, while others stop ongoing activities such as feeding and drinking (Elbejjani 2007). In continuation of this research, we may be able to answer why differences in gene expression, variation within the gene region and variation within receptor gene regions occurs in certain populations and taxons through field work analysis of activity, food availability/variability, stressors/external factors, and BMI among vervet populations of interest. This research may be expanded even further by forming understanding how the prevalence of anxiety disorders or other disorders related to stress, fear, and anxiety in human populations may be linked NPY and NPY receptor variation, which therefore correlates to variation of physiological functions related to food intake and body mass.

Works Cited

Abbott, C. R., Small, C. J., Kennedy, A. R., Neary, N. M., Sajedi, A., Ghatei, M. A., & Bloom, S. R. (2005). Blockade of the neuropeptide Y Y2 receptor with the specific antagonist BIIE0246 attenuates the effect of endogenous and exogenous peptide YY (3–36) on food intake. Brain research, 1043(1-2), 139-144.

Dumont, Y., Jacques, D., Bouchard, P., & Quirion, R. (1998). Species differences in the expression and distribution of the neuropeptide Y Y1, Y2, Y4, and Y5 receptors in rodents, guinea pig, and primates brains. Journal of Comparative Neurology, 402(3), 372-384.

Elbejjani, M. (2007). Dissecting anxiety in the vervet monkey: a search for association between polymorphisms in the corticotropin releasing hormone (CRH) and neuropeptide Y (NPY) genes and anxious behavior.

Mercer, R. E., Chee, M. J., & Colmers, W. F. (2011). The role of NPY in hypothalamic mediated food intake. Frontiers in neuroendocrinology, 32(4), 398-415.

Sajdyk, T. J., Shekhar, A., & Gehlert, D. R. (2004). Interactions between NPY and CRF in the amygdala to regulate emotionality. Neuropeptides, 38(4), 225-234.

Schmitt, C. A., Gagnon C. M. (2021). Vervet Selection Pipeline Guide: UCP1 Analysis.

Wahlestedt, C., Yanaihara, N., & Håkanson, R. (1986). Evidence for different pre-and post-junctional receptors for neuropeptide Y and related peptides. Regulatory peptides, 13(3-4), 307-318.

Wang, Q., Bing, C., Al-Barazanji, K., Mossakowaska, D. E., Wang, X. M., McBay, D. L., … & Williams, G. (1997). Interactions between leptin and hypothalamic neuropeptide Y neurons in the control of food intake and energy homeostasis in the rat. Diabetes, 46(3), 335-341.