Neuropeptide Y, commonly known as NPY, is a 36 amino acid peptide and neuromodulator known for regulating many functions, such as neuronal development, pain modulation, thermogenesis, stress response, and food intake through the regulation of hormones such as corticotropin releasing hormone (CRH or CRF). The biological effects of NPY are heavily mediated by the modulation of four receptors, Y1, Y2, Y4 and Y5 (Sajdyk 2004). NPY receptor Y1 is heavily involved in sympathetic activity of the central nervous system (Wahlestedt 1986) and in previous studies, it has been found that reduced hypothalamic expression of the Y1 receptor can be induced by fasting and/or food deprivation (Mercer 2011). While there is evidence of expression of receptor Y1 outside of the human brain, receptor Y2 is limitly expressed outside of the human brain. While fasting has shown no effects on Y2 expression, Y2 agonists promote anorexigenic functions through reduced food intake and body weight, as well as better glucose regulation when given to mice while antagonists in satiated rats lead to an increase in feeding (Abbott 2005). Though noted as as a differential receptor of NPY, receptor Y4 is expressed at low levels and mainly linked to expression of the human pancreatic polypeptide (Dumont 1998). NPY receptor Y5 agonists express similar regulation of hypothalamic function as receptor Y1, and additionally, treatment of antagonists in humans revealed reduced food intakce and significant weight loss (Mercer 2011).
It was found in Dumont et. al. (1998) that differences in NPY receptor expression was prevalent among mammalian species, meaning that the differential functions of specific receptors is may be more advantageous to the physiological functions of one species over another. In Vervet monkeys (Genus Chlorocebus) , expression of receptors Y2, Y1 and Y5 have been found to be similar to expression in humans. Given this relationship, exploration of NPY and NPY receptor expression in vervet monkeys could be a good model for understanding how NPY and these receptors function in normal energy balance humans (Mercer 2011). Seeing that NPY is one of the most potent stimulants in food intake, it is possible patterns in BMI and body mass in wild vervets may be linked to differential expression of NPY in relation to these NPY receptors (particularly vervets associated with increased expression in Y1 and Y5). With this, significantly heavier populations in southern Africa are likely to have variants associated with increased food intake.
NPY is recognized as one of the most potent orexigenic neuropeptides in the hypothalamus. As a highly conserved gene among mammals and non-mammals alike, it can be suggested that NPY has a vital role in physiology (Mercer 2011). By Intracerebroventricular (ICV) injection, it was found that NPY stimulates food intake. Through various animal models, it was observed that in obese subjects, levels of NPY gene expression and NPY release were elevated. Release of NPY was specifically elevated before the onset of feeding and decreased gradually as food intake proceeded. This increase in NPY is correlated with conditions of energy deprivation, as well as energy demand (Mercer 2011).
NPY within the hypothalamus is mainly supplied from specialized neurons in the ARC (Arcuate Nucleus) which is a regulator of energy homeostasis. This diagram shows the main pathway of NPY in the ARC, between NPY and pro-opiomelanocortin (POMC) and peptide cocaine- and amphetamine-regulated transcript (CART) neurons in the ARC. Here it can be seen how release of NPY to receptors Y1 and Y2 leads to the inhibition of POMC and CART neurons, which are anorexigenic (causing a loss of appetite) in function. It can also be seen that NPY stimulation of receptor Y5 leads to inhibition of melanocortin release.
NPY Y1 expression has been shown to induce NPY mediated food intake. When injected by ICV, it was shown that orexigenic activity occurred, resulting in stimulated food intake and induced body weight gain (Mercer 2011). In past experiments where NPY Y1 was knocked out, a significant reduction in NPY stimulated food intake occurred compared to the other controls used (Mercer 2011). In order to begin investigation as to the correlation between differential NPY expression mediated by specific receptors and differences in BMI and body mass in wild vervets, I will conduct analysis of NPY and NPY Y1 expression in populations of south African vervet monkeys.
Though the aim of this research is to determine how variation in expression of NPY receptors may be indicative of certain population characheristics, such as increased body mass, BMI, and food intake, general expression of the NPY gene itself may also align with variation in such characteristics.
It would be enlightening to explore, if possible, which SNPs that express variation in the NPY gene align with expression of whichever receptor.
To begin, I will first isolate the NPY gene from the whole genome data provided in the vervet pipeline.
module load htslib
tabix -p vcf 163_201701_UG_vervet_monkey_SNPs_all_chrom_beagle_shapeit.vcf.gz
tabix -h 163_201701_UG_vervet_monkey_SNPs_all_chrom_beagle_shapeit.vcf.gz CAE21:34058530-34085127 > vervNPY.full.vcf
For some reason, the tabix code will not run in a chunk, so the code must be copied directly to the terminal window. Once indexed, the genome can be isolated at the desired position, in this case at the NPY genome.
less vervNPY.full.vcf
wc -l vervNPY.full.vcf
grep -c "^##" vervNPY.full.vcf
## 565 vervNPY.full.vcf
## 38
This shows that there are 565 lines in the VCF file, and 38 are metadata headlines. This means that there are 108 lines representing individual call variants in the NPY gene region out of the 527 vervets.
To learn more about the NPY genomic region, I can look to ensembl to get the positions of the exonic regions
Now I will isolate the NPY Y1 gene from the whole genome data provided in the vervet pipeline.
module load htslib
tabix -p vcf 163_201701_UG_vervet_monkey_SNPs_all_chrom_beagle_shapeit.vcf.gz
tabix -h 163_201701_UG_vervet_monkey_SNPs_all_chrom_beagle_shapeit.vcf.gz CAE7:109764918-109786905 > vervNPY1.full.vcf
As mentioned above, the tabix code will not run in a chunk, so the code must be copied directly to the terminal window. Once indexed, the genome can be isolated at the desired position, in this case at the NPY1 genome.
less vervNPY1.full.vcf
In order to determine the amount of variant calls in the genome region, I can use this code:
wc -l vervNPY1.full.vcf
grep -c "^##" vervNPY1.full.vcf
## 471 vervNPY1.full.vcf
## 38
This shows that there are 471 lines in the VCF file, and only 38 lines are metadata headlines. This means there are 433 lines for individual variant calls in the NPY1 gene region.
Once the entire gene has been isolated, I could then look to Ensembl in order to pick out the functional regions of NPY1. From Ensembl, I found that the functional regions include:
Once the genes are isolated, we can now conduct analysis, beginning with Hardy Weinburg Equilibrium (HWE)
Before I could begin editing, I was having issues opening the vervet.population.panel . From troubleshooting, I realized it was due to the fact that I did not have the file in my directory. In order to acquire the file, I checked in other directories that may have had the file, specifically caschmitt. From here I could copy the file into my directory:
cp ../caschmit/vervet.population.panel .
Once obtained, I could finally begin editing the file and analysis.
grep pygerythrus vervet.population.panel | cut -f1 > pyg.samples.list
gedit pyg.samples.list
The gedit command would’nt work for some reason, displaying the message (gedit:347): Gtk-WARNING **: 22:53:03.389: cannot open display: :0 so I had to open and edit the file using the vim command:
vim pyg.samples.list
From here I deleted the VSAI3005 and VBO individuals using command “d” and saved the new file using “:x!”.
Then I was able to make a vcf file of just pygerythrus:
Vcftools will create a VCF with just the samples comprised of just the South African pygerythrus
module load vcftools
vcftools --vcf vervNPY.full.vcf --keep pyg.samples.list --recode --out pygNPY
mv pygNPY.recode.vcf pygNPY.vcf
##
## VCFtools - 0.1.16
## (C) Adam Auton and Anthony Marcketta 2009
##
## Parameters as interpreted:
## --vcf vervNPY.full.vcf
## --keep pyg.samples.list
## --out pygNPY
## --recode
##
## Keeping individuals in 'keep' list
## After filtering, kept 51 out of 163 Individuals
## Outputting VCF file...
## After filtering, kept 526 out of a possible 526 Sites
## Run Time = 0.00 seconds
egrep -w 'pygerythrus|cynosuros|hilgerti' vervet.population.panel | cut -f1 > SEexp.samples.list
Another VCF can be made for the southern expansion population
module load vcftools
vcftools --vcf vervNPY.full.vcf --keep SEexp.samples.list --recode --out seNPY
mv seNPY.recode.vcf seNPY.vcf
##
## VCFtools - 0.1.16
## (C) Adam Auton and Anthony Marcketta 2009
##
## Parameters as interpreted:
## --vcf vervNPY.full.vcf
## --keep SEexp.samples.list
## --out seNPY
## --recode
##
## Keeping individuals in 'keep' list
## After filtering, kept 73 out of 163 Individuals
## Outputting VCF file...
## After filtering, kept 526 out of a possible 526 Sites
## Run Time = 0.00 seconds
To see if there are enough samples to run a HWE, a power test can be used.
library(pwr)
pwr.chisq.test(w = 0.5, df = 1, sig.level = 0.05, power=0.95)
##
## Chi squared power calculation
##
## w = 0.5
## N = 51.97884
## df = 1
## sig.level = 0.05
## power = 0.95
##
## NOTE: N is the number of observations
This equation would be used for 52 samples, and though there are only 48 in pygerythrus and 73 in the southern expansion, it is the best fit for the data given. Now I can begin to do the HWE starting with the pygerthrus taxon.
module load vcftools
vcftools --vcf pygNPY.vcf --hardy --out pygNPY
less pygNPY.hwe
Here all 526 SNPs were kept
vcftools.HWE<-read.table("pygNPY.hwe",header=TRUE)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.6 ✓ dplyr 1.0.7
## ✓ tidyr 1.2.0 ✓ stringr 1.4.0
## ✓ readr 2.1.2 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
vcftools.HWE.sig <-
vcftools.HWE %>%
as_tibble(vcftools.HWE) %>%
dplyr::select(CHR,POS,ChiSq_HWE,P_HWE) %>%
filter(P_HWE <= 0.05)
knitr::kable(vcftools.HWE.sig, caption = 'South African pygerthrus SNPs')
| CHR | POS | ChiSq_HWE | P_HWE |
|---|---|---|---|
| CAE21 | 34058832 | 13.878890 | 0.0116233 |
| CAE21 | 34059166 | 13.878890 | 0.0116233 |
| CAE21 | 34059328 | 5.000084 | 0.0378777 |
| CAE21 | 34059431 | 6.106048 | 0.0250379 |
| CAE21 | 34059435 | 13.878890 | 0.0116233 |
| CAE21 | 34059685 | 13.878890 | 0.0116233 |
| CAE21 | 34060020 | 6.106048 | 0.0250379 |
| CAE21 | 34060108 | 5.296296 | 0.0224088 |
| CAE21 | 34060249 | 6.106048 | 0.0250379 |
| CAE21 | 34063486 | 13.878890 | 0.0116233 |
| CAE21 | 34063575 | 13.878890 | 0.0116233 |
| CAE21 | 34065174 | 13.878890 | 0.0116233 |
| CAE21 | 34066013 | 5.230772 | 0.0293162 |
| CAE21 | 34069785 | 6.305024 | 0.0130251 |
| CAE21 | 34069848 | 9.587124 | 0.0077078 |
| CAE21 | 34069864 | 7.680000 | 0.0143732 |
| CAE21 | 34070401 | 20.650710 | 0.0315790 |
| CAE21 | 34071796 | 13.878890 | 0.0116233 |
| CAE21 | 34072148 | 5.185192 | 0.0228769 |
| CAE21 | 34072414 | 48.000000 | 0.0105263 |
| CAE21 | 34074427 | 4.409314 | 0.0376433 |
| CAE21 | 34075017 | 5.086454 | 0.0288535 |
| CAE21 | 34075345 | 5.086454 | 0.0288535 |
| CAE21 | 34075768 | 4.005926 | 0.0465014 |
| CAE21 | 34075960 | 5.086454 | 0.0288535 |
| CAE21 | 34076108 | 11.916520 | 0.0038166 |
| CAE21 | 34076144 | 4.005926 | 0.0465014 |
| CAE21 | 34078625 | 6.829473 | 0.0266078 |
| CAE21 | 34078706 | 4.005926 | 0.0465014 |
| CAE21 | 34078799 | 48.000000 | 0.0105263 |
| CAE21 | 34078953 | 6.829473 | 0.0266078 |
| CAE21 | 34080030 | 6.305024 | 0.0130251 |
| CAE21 | 34081189 | 13.878890 | 0.0116233 |
| CAE21 | 34081340 | 14.031550 | 0.0007867 |
| CAE21 | 34081427 | 5.259260 | 0.0437012 |
| CAE21 | 34081476 | 5.259260 | 0.0437012 |
| CAE21 | 34081704 | 4.800351 | 0.0411025 |
| CAE21 | 34081920 | 5.259260 | 0.0437012 |
| CAE21 | 34082004 | 9.781812 | 0.0038620 |
| CAE21 | 34082228 | 5.259260 | 0.0437012 |
| CAE21 | 34082417 | 19.934810 | 0.0050561 |
| CAE21 | 34082628 | 5.259260 | 0.0437012 |
| CAE21 | 34083131 | 9.187500 | 0.0030584 |
| CAE21 | 34083478 | 5.259260 | 0.0437012 |
| CAE21 | 34083662 | 5.259260 | 0.0437012 |
| CAE21 | 34083771 | 5.259260 | 0.0437012 |
| CAE21 | 34083929 | 14.072990 | 0.0001589 |
| CAE21 | 34084060 | 9.781812 | 0.0038620 |
After running HWE in the South African pygerthrus, it was found that there were 20 SNPs that are out of HWE, meaning that in the pygerthrus population, there is variation resulting in evolution found in these sequences.
This shows that out of the 73 NPY sequences in the southern expansion, there are 20 sequences that are outside of HWE. The relevance of each SNP sequence is listed below: - CAE21 34058832
- CAE21 34059166
- CAE21 34059328
- CAE21 34059431
- CAE21 34059435
- CAE21 34059685
- CAE21 34060020
- CAE21 34060108
- CAE21 34060249
- CAE21 34063486
- CAE21 34063575
- CAE21 34065174
- CAE21 34066013
- CAE21:34069785 Intronic Variant - CAE21:34069848 Intronic Variant - CAE21:34069864 Intronic Variant - CAE21:34070401 Intronic Variant - CAE21:34071796 Intronic Variant - CAE21:34072148 Intronic Variant - CAE21:34072414 Intronic Variant - CAE21:34074427 Intronic Variant - CAE21:34075017 Synonymous Variant (no change to the encoded amino acid) -unsure if I want to list them all, there are many!
module load vcftools
vcftools --vcf seNPY.vcf --hardy --out seNPY
less seNPY.hwe
vcftools.HWE<-read.table("seNPY.hwe",header=TRUE)
library(tidyverse)
vcftools.HWE.sig <-
vcftools.HWE %>%
as_tibble(vcftools.HWE) %>%
dplyr::select(CHR,POS,ChiSq_HWE,P_HWE) %>%
filter(P_HWE <= 0.05)
knitr::kable(vcftools.HWE.sig, caption = 'Southern Expansion HWE')
| CHR | POS | ChiSq_HWE | P_HWE |
|---|---|---|---|
| CAE21 | 34058832 | 22.070120 | 0.0049921 |
| CAE21 | 34059166 | 22.070120 | 0.0049921 |
| CAE21 | 34059328 | 7.989901 | 0.0080698 |
| CAE21 | 34059431 | 5.262497 | 0.0273389 |
| CAE21 | 34059435 | 12.148780 | 0.0173751 |
| CAE21 | 34059685 | 22.070120 | 0.0049921 |
| CAE21 | 34060020 | 6.736909 | 0.0130994 |
| CAE21 | 34060108 | 5.990589 | 0.0183223 |
| CAE21 | 34060249 | 4.609415 | 0.0402947 |
| CAE21 | 34060394 | 73.000000 | 0.0068966 |
| CAE21 | 34060688 | 12.430060 | 0.0011354 |
| CAE21 | 34063486 | 22.070120 | 0.0049921 |
| CAE21 | 34063575 | 22.070120 | 0.0049921 |
| CAE21 | 34064112 | 8.164983 | 0.0139226 |
| CAE21 | 34065174 | 22.070120 | 0.0049921 |
| CAE21 | 34066013 | 5.034996 | 0.0285534 |
| CAE21 | 34066438 | 8.164983 | 0.0139226 |
| CAE21 | 34067096 | 6.780227 | 0.0216437 |
| CAE21 | 34067819 | 8.164983 | 0.0139226 |
| CAE21 | 34069785 | 5.100963 | 0.0289483 |
| CAE21 | 34069848 | 12.916160 | 0.0023409 |
| CAE21 | 34069864 | 9.344020 | 0.0071133 |
| CAE21 | 34070401 | 31.767360 | 0.0206897 |
| CAE21 | 34071796 | 22.070120 | 0.0049921 |
| CAE21 | 34072117 | 5.969926 | 0.0210159 |
| CAE21 | 34072148 | 5.743848 | 0.0252939 |
| CAE21 | 34072372 | 5.034996 | 0.0285534 |
| CAE21 | 34072414 | 73.000000 | 0.0068966 |
| CAE21 | 34073484 | 5.969926 | 0.0210159 |
| CAE21 | 34074070 | 5.969926 | 0.0210159 |
| CAE21 | 34074427 | 5.321700 | 0.0214643 |
| CAE21 | 34075017 | 8.204678 | 0.0059548 |
| CAE21 | 34075252 | 6.876338 | 0.0099801 |
| CAE21 | 34075345 | 8.022223 | 0.0081531 |
| CAE21 | 34075386 | 73.000000 | 0.0068966 |
| CAE21 | 34075687 | 8.012347 | 0.0209903 |
| CAE21 | 34075768 | 5.743848 | 0.0252939 |
| CAE21 | 34075960 | 8.022223 | 0.0081531 |
| CAE21 | 34076108 | 20.195350 | 0.0007092 |
| CAE21 | 34076144 | 5.743848 | 0.0252939 |
| CAE21 | 34077626 | 31.767360 | 0.0206897 |
| CAE21 | 34078625 | 12.205950 | 0.0078355 |
| CAE21 | 34078706 | 6.339308 | 0.0126096 |
| CAE21 | 34078799 | 73.000000 | 0.0068966 |
| CAE21 | 34078953 | 12.205950 | 0.0078355 |
| CAE21 | 34080030 | 7.075205 | 0.0154694 |
| CAE21 | 34081189 | 22.070120 | 0.0049921 |
| CAE21 | 34081331 | 73.000000 | 0.0068966 |
| CAE21 | 34081340 | 13.643000 | 0.0012831 |
| CAE21 | 34081427 | 6.486039 | 0.0319006 |
| CAE21 | 34081476 | 6.486039 | 0.0319006 |
| CAE21 | 34081704 | 6.780227 | 0.0216437 |
| CAE21 | 34081920 | 6.486039 | 0.0319006 |
| CAE21 | 34082004 | 11.775930 | 0.0023733 |
| CAE21 | 34082228 | 6.486039 | 0.0319006 |
| CAE21 | 34082417 | 31.068870 | 0.0021600 |
| CAE21 | 34082628 | 6.486039 | 0.0319006 |
| CAE21 | 34083131 | 12.277350 | 0.0008424 |
| CAE21 | 34083478 | 5.230242 | 0.0465494 |
| CAE21 | 34083662 | 6.486039 | 0.0319006 |
| CAE21 | 34083771 | 6.486039 | 0.0319006 |
| CAE21 | 34083929 | 15.418290 | 0.0000870 |
| CAE21 | 34084060 | 5.262497 | 0.0273389 |
Here there are also 62 regions of variation, which are actually the same regions listed above in the pygerthrus population.
The process for creating a vcf file can be repeated for NPY1:
module load vcftools
vcftools --vcf vervNPY1.full.vcf --keep pyg.samples.list --recode --out pygNPY1
mv pygNPY1.recode.vcf pygNPY1.vcf
##
## VCFtools - 0.1.16
## (C) Adam Auton and Anthony Marcketta 2009
##
## Parameters as interpreted:
## --vcf vervNPY1.full.vcf
## --keep pyg.samples.list
## --out pygNPY1
## --recode
##
## Keeping individuals in 'keep' list
## After filtering, kept 51 out of 163 Individuals
## Outputting VCF file...
## After filtering, kept 432 out of a possible 432 Sites
## Run Time = 0.00 seconds
egrep -w 'pygerythrus|cynosuros|hilgerti' vervet.population.panel | cut -f1 > SEexp.samples.list
Another VCF can be made for the southern expansion NPY1 population
module load vcftools
vcftools --vcf vervNPY1.full.vcf --keep SEexp.samples.list --recode --out seNPY1
mv seNPY1.recode.vcf seNPY1.vcf
##
## VCFtools - 0.1.16
## (C) Adam Auton and Anthony Marcketta 2009
##
## Parameters as interpreted:
## --vcf vervNPY1.full.vcf
## --keep SEexp.samples.list
## --out seNPY1
## --recode
##
## Keeping individuals in 'keep' list
## After filtering, kept 73 out of 163 Individuals
## Outputting VCF file...
## After filtering, kept 432 out of a possible 432 Sites
## Run Time = 0.00 seconds
Since the power test was already done above, this step can be skipped.
module load vcftools
vcftools --vcf pygNPY1.vcf --hardy --out pygNPY1
less pygNPY1.hwe
vcftools.HWE<-read.table("pygNPY1.hwe",header=TRUE)
library(tidyverse)
vcftools.HWE.sig1 <-
vcftools.HWE %>%
as_tibble(vcftools.HWE) %>%
dplyr::select(CHR,POS,ChiSq_HWE,P_HWE) %>%
filter(P_HWE <= 0.05)
knitr::kable(vcftools.HWE.sig1, caption = 'South African pygerthrus SNPs')
After running HWE in the South African pygerthrus, it was found that there were 10 SNPs that are out of HWE, meaning that in the pygerthrus population, 10 SNP variation resulting in evolution found in these sequences.
module load vcftools
vcftools --vcf seNPY1.vcf --hardy --out seNPY1
less seNPY1.hwe
vcftools.HWE<-read.table("seNPY1.hwe",header=TRUE)
library(tidyverse)
vcftools.HWE.sig <-
vcftools.HWE %>%
as_tibble(vcftools.HWE) %>%
dplyr::select(CHR,POS,ChiSq_HWE,P_HWE) %>%
filter(P_HWE <= 0.05)
knitr::kable(vcftools.HWE.sig, caption = 'Southern Expansion HWE')
| CHR | POS | ChiSq_HWE | P_HWE |
|---|---|---|---|
| CAE7 | 109766025 | 16.195390 | 0.0098413 |
| CAE7 | 109766082 | 5.137139 | 0.0329729 |
| CAE7 | 109766483 | 17.236310 | 0.0412346 |
| CAE7 | 109766617 | 9.245614 | 0.0282634 |
| CAE7 | 109766621 | 9.884091 | 0.0131802 |
| CAE7 | 109766666 | 8.012347 | 0.0209903 |
| CAE7 | 109766747 | 9.245614 | 0.0282634 |
| CAE7 | 109766784 | 7.096299 | 0.0431318 |
| CAE7 | 109766886 | 16.195390 | 0.0098413 |
| CAE7 | 109767821 | 16.195390 | 0.0098413 |
| CAE7 | 109767944 | 10.907990 | 0.0017473 |
| CAE7 | 109768022 | 8.163399 | 0.0063574 |
| CAE7 | 109768050 | 31.068870 | 0.0021600 |
| CAE7 | 109768094 | 5.599291 | 0.0323911 |
| CAE7 | 109768234 | 5.990589 | 0.0183223 |
| CAE7 | 109768598 | 22.070120 | 0.0049921 |
| CAE7 | 109768764 | 17.236310 | 0.0412346 |
| CAE7 | 109769055 | 22.070120 | 0.0049921 |
| CAE7 | 109769271 | 17.236310 | 0.0412346 |
| CAE7 | 109769649 | 22.070120 | 0.0049921 |
| CAE7 | 109769999 | 22.070120 | 0.0049921 |
| CAE7 | 109770093 | 4.291186 | 0.0415112 |
| CAE7 | 109770128 | 4.291186 | 0.0415112 |
| CAE7 | 109770187 | 4.291186 | 0.0415112 |
| CAE7 | 109770358 | 4.979165 | 0.0286948 |
| CAE7 | 109770580 | 4.268649 | 0.0472498 |
| CAE7 | 109770918 | 9.245614 | 0.0282634 |
| CAE7 | 109770921 | 7.009596 | 0.0167820 |
| CAE7 | 109771426 | 5.599291 | 0.0323911 |
| CAE7 | 109771489 | 16.195390 | 0.0098413 |
| CAE7 | 109771496 | 17.236310 | 0.0412346 |
| CAE7 | 109772125 | 7.222928 | 0.0093910 |
| CAE7 | 109772184 | 6.736909 | 0.0130994 |
| CAE7 | 109772436 | 5.112970 | 0.0417347 |
| CAE7 | 109773716 | 17.236310 | 0.0412346 |
| CAE7 | 109774230 | 4.860916 | 0.0338828 |
| CAE7 | 109775436 | 17.236310 | 0.0412346 |
| CAE7 | 109775479 | 6.035482 | 0.0184038 |
| CAE7 | 109775960 | 6.913583 | 0.0106484 |
| CAE7 | 109776039 | 5.230242 | 0.0465494 |
| CAE7 | 109777165 | 9.196390 | 0.0033099 |
| CAE7 | 109777223 | 4.924113 | 0.0340392 |
| CAE7 | 109777564 | 6.035482 | 0.0184038 |
| CAE7 | 109778565 | 7.888899 | 0.0081322 |
| CAE7 | 109778772 | 7.096299 | 0.0431318 |
| CAE7 | 109779088 | 12.148780 | 0.0173751 |
| CAE7 | 109779499 | 9.176697 | 0.0048670 |
| CAE7 | 109779852 | 9.245614 | 0.0282634 |
| CAE7 | 109779976 | 17.236310 | 0.0412346 |
| CAE7 | 109780426 | 7.888899 | 0.0081322 |
| CAE7 | 109781039 | 9.176697 | 0.0048670 |
| CAE7 | 109781618 | 14.034140 | 0.0008135 |
| CAE7 | 109781797 | 7.888899 | 0.0081322 |
| CAE7 | 109782396 | 29.657410 | 0.0000001 |
| CAE7 | 109782413 | 7.905991 | 0.0115980 |
| CAE7 | 109784120 | 34.265290 | 0.0000000 |
| CAE7 | 109784778 | 23.871700 | 0.0000356 |
| CAE7 | 109785842 | 23.871700 | 0.0000356 |
| CAE7 | 109786018 | 9.245614 | 0.0282634 |
| CAE7 | 109786902 | 15.961550 | 0.0002307 |
This shows that out of the 73 NPY sequences in the southern expansion, there are 59 sequences that are outside of HWE. The relevance of each SNP sequence is listed below:
First, all the positions in both genes need to be listed:
grep -v "^##" pygNPY.vcf | cut -f1-3 > pygNPY_loci.txt
grep -v "^##" seNPY.vcf | cut -f1-3 > seNPY_loci.txt
grep -v "^##" pygNPY1.vcf | cut -f1-3 > pygNPY1_loci.txt
grep -v "^##" seNPY1.vcf | cut -f1-3 > seNPY1_loci.txt
Then some specialty packages need to be downloaded
library("devtools")
library("BiocManager")
devtools::install_github("SFUStatgen/LDheatmap")
BiocManager::install("snpStats")
BiocManager::install("VariantAnnotation")
BiocManager::install("GenomicFeatures")
BiocManager::install("gpart")
Installing both BiocManager and devtools wouldn’t work in the chunk, so I had to individually install each package. I realized this was due to me accidentally trying to run the code in Terminal. RMariadb doesn’t seem to exist?
It turned out RMariadb was within the BiocManager package, which is why it would not appear in the list of packages available.
The positions for each gene can then be put into a vector of position numbers
positions_pygNPY<-read.table("pygNPY_loci.txt")
positions_pygNPY<-positions_pygNPY$V2
positions_seNPY<-read.table("seNPY_loci.txt")
positions_seNPY<-positions_seNPY$V2
positions_pygNPY1<-read.table("pygNPY1_loci.txt")
positions_pygNPY1<-positions_pygNPY1$V2
positions_seNPY1<-read.table("seNPY1_loci.txt")
positions_seNPY1<-positions_seNPY1$V2
First the VCF file must be converted to a SNP matrix
library(tidyverse)
library(snpStats)
## Loading required package: survival
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
library(VariantAnnotation)
## Loading required package: BiocGenerics
## Loading required package: parallel
##
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:parallel':
##
## clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
## clusterExport, clusterMap, parApply, parCapply, parLapply,
## parLapplyLB, parRapply, parSapply, parSapplyLB
## The following objects are masked from 'package:dplyr':
##
## combine, intersect, setdiff, union
## The following objects are masked from 'package:stats':
##
## IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
##
## anyDuplicated, append, as.data.frame, basename, cbind, colnames,
## dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
## grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
## order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
## rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
## union, unique, unsplit, which.max, which.min
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
##
## Attaching package: 'matrixStats'
## The following object is masked from 'package:dplyr':
##
## count
##
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
##
## colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
## colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
## colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
## colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
## colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
## colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
## colWeightedMeans, colWeightedMedians, colWeightedSds,
## colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
## rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
## rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
## rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
## rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
## rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
## rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
## rowWeightedSds, rowWeightedVars
## Loading required package: GenomeInfoDb
## Loading required package: S4Vectors
## Loading required package: stats4
##
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:Matrix':
##
## expand, unname
## The following objects are masked from 'package:dplyr':
##
## first, rename
## The following object is masked from 'package:tidyr':
##
## expand
## The following objects are masked from 'package:base':
##
## expand.grid, I, unname
## Loading required package: IRanges
##
## Attaching package: 'IRanges'
## The following objects are masked from 'package:dplyr':
##
## collapse, desc, slice
## The following object is masked from 'package:purrr':
##
## reduce
## Loading required package: GenomicRanges
## Loading required package: SummarizedExperiment
## Loading required package: Biobase
## Welcome to Bioconductor
##
## Vignettes contain introductory material; view with
## 'browseVignettes()'. To cite Bioconductor, see
## 'citation("Biobase")', and for packages 'citation("pkgname")'.
##
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
##
## rowMedians
## The following objects are masked from 'package:matrixStats':
##
## anyMissing, rowMedians
## Loading required package: Rsamtools
## Loading required package: Biostrings
## Loading required package: XVector
##
## Attaching package: 'XVector'
## The following object is masked from 'package:purrr':
##
## compact
##
## Attaching package: 'Biostrings'
## The following object is masked from 'package:base':
##
## strsplit
##
## Attaching package: 'VariantAnnotation'
## The following object is masked from 'package:stringr':
##
## fixed
## The following object is masked from 'package:base':
##
## tabulate
library(LDheatmap)
pygNPYvcf<-readVcf("pygNPY.vcf")
pygNPYmatrix<-genotypeToSnpMatrix(pygNPYvcf)
pygNPYvcf
## class: CollapsedVCF
## dim: 526 51
## rowRanges(vcf):
## GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
## info(vcf):
## DataFrame with 0 columns:
## geno(vcf):
## List of length 1: GT
## geno(header(vcf)):
## Number Type Description
## GT 1 String Phased Genotype
pygNPY_LD<-ld(pygNPYmatrix$genotypes,depth=525,stats="R.squared")
Since the data in pygerthrus NPY is not extremely large, it had to be reset to a depth of 525 for the matrix
cols=colorRampPalette(c("yellow","red"))(10)
image(pygNPY_LD,lwd=0,cuts=9,col.regions=cols,colorkey=TRUE)
The blocks of red indicates that there is a linkage block of selection occurring on this part of the gene
To incorporate the genetic distance of the linkages into the map, we can align the SNPs to the vervet reference:
library(GenomicFeatures)
## Loading required package: AnnotationDbi
##
## Attaching package: 'AnnotationDbi'
## The following object is masked from 'package:dplyr':
##
## select
library(RMariaDB)
txdb <- makeTxDbFromEnsembl(organism="Chlorocebus sabaeus", server="useastdb.ensembl.org") #Create txdb for chlSab2
## Fetch transcripts and genes from Ensembl ...
## OK
## (fetched 28078 transcripts from 27985 genes)
## Fetch exons and CDS from Ensembl ... OK
## Fetch chromosome names and lengths from Ensembl ...OK
## Gather the metadata ... OK
## Make the TxDb object ... OK
vcfbig <- readVcf("pygNPY.vcf", "txdb") #align data with vervet reference genome
ALLmatrix <- genotypeToSnpMatrix(vcfbig)
ALLmatrix
## $genotypes
## A SnpMatrix with 51 rows and 526 columns
## Row names: VBOA1003 ... VSAM5007
## Col names: CAE21:34058537_A/G ... CAE21:34085036_G/A
##
## $map
## DataFrame with 526 rows and 4 columns
## snp.names allele.1 allele.2 ignore
## <character> <DNAStringSet> <DNAStringSetList> <logical>
## 1 CAE21:34058537_A/G A G FALSE
## 2 CAE21:34058548_T/C T C FALSE
## 3 CAE21:34058562_C/T C T FALSE
## 4 CAE21:34058673_C/T C T FALSE
## 5 CAE21:34058753_G/A G A FALSE
## ... ... ... ... ...
## 522 CAE21:34084886_T/G T G FALSE
## 523 CAE21:34084887_G/A G A FALSE
## 524 CAE21:34084917_A/G A G FALSE
## 525 CAE21:34084994_C/T C T FALSE
## 526 CAE21:34085036_G/A G A FALSE
You can continue from here and create a heat map, however there’s an easier way to do this:
BiocManager::install("gpart")
## Bioconductor version 3.14 (BiocManager 1.30.16), R 4.1.2 (2021-11-01)
## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
## re-install: 'gpart'
## Installation paths not writeable, unable to update packages
## path: /share/pkg.7/r/4.1.2/install/lib64/R/library
## packages:
## actuar, AdaptFitOS, ade4, admisc, affxparser, affy, affyio, affyPLM, akima,
## annotate, AnnotationFilter, AnnotationForge, AnnotationHub, antiword, ape,
## apeglm, aplot, argparse, arrow, ashr, assertive.properties, babelgene,
## batchelor, bayesplot, BBmisc, BDgraph, beachmat, BHC, bigleaf, bigmemory,
## bigsnpr, bigsparser, bigstatsr, Biobase, BiocFileCache, BiocGenerics,
## BiocIO, BiocNeighbors, BiocParallel, BiocSingular, biomaRt, Biostrings,
## biovizBase, blob, bluster, BMA, bnlearn, brew, Brobdingnag, broom,
## BSgenome, C50, Cairo, car, caret, Category, CDM, checkmate, circlize, cli,
## clipr, cluster, commonmark, ComplexHeatmap, conquer, crayon, cubature,
## Cubist, cytolib, DelayedArray, DelayedMatrixStats, densEstBayes,
## densityClust, DEoptimR, desc, DESeq2, deSolve, DiffBind, DNAcopy,
## doParallel, dplyr, DropletUtils, DT, edgeR, ensembldb, EpiModel, evaluate,
## evd, expint, extRemes, fansi, fastGHQuad, ff, fftw, fishpond, fitdistrplus,
## flexclust, float, foreach, formatR, future, future.apply, gam, gamlss,
## gamlss.dist, gcrma, gdsfmt, gdtools, gee, genefilter, geneplotter, GENESIS,
## GenomeInfoDb, GenomeInfoDbData, GenomicAlignments, GenomicRanges,
## geojsonsf, geometry, gert, ggdendro, ggfun, ggplot2, git2r, glmmML,
## glmmTMB, glmnet, glue, gmp, GO.db, GOstats, gower, gplots, graph,
## GreyListChIP, GSEABase, gstat, GSVA, GWASTools, haven, HDF5Array, Hmisc,
## httr, hwriter, igraph, impute, infotheo, IRanges, iterators, jomo,
## jsonlite, KEGGREST, Kendall, kernlab, knitr, ks, lars, lavaan, leiden, lfe,
## lhs, limma, linprog, lme4, LMest, lmtest, locfit, logspline, loo,
## lpsymphony, magic, magrittr, maptools, MASS, MassSpecWavelet, Matching,
## mathjaxr, MatrixGenerics, matrixStats, mbkmeans, MCMCpack, metap, metapod,
## mgcv, minpack.lm, mlapi, MLEcens, mlegp, msigdbr, multtest, networkDynamic,
## nimble, nlme, nloptr, officer, oligo, oligoClasses, openair, OpenMx,
## openssl, pander, parallelly, parsedate, party, pbdZMQ, pcaMethods, pcaPP,
## pdftools, pdist, penalized, peperr, plyr, polspline, polynom,
## preprocessCore, processx, ProtGenerics, ps, pspline, psych, qgraph,
## quanteda, quantmod, quantreg, quantsmooth, qvalue, ragg, RandomFieldsUtils,
## randomForest, RBGL, rbibutils, RColorBrewer, Rcpp, RcppArmadillo,
## RcppEigen, RcppGSL, RCurl, Rdpack, readxl, recipes, REddyProc,
## ResidualMatrix, Rfast, rgdal, rgenoud, Rgraphviz, rhdf5, rhdf5filters,
## Rhdf5lib, Rhtslib, RInside, rjags, rlang, rmarkdown, rmio, rms, robustbase,
## rprojroot, RProtoBufLib, rrcov, Rsamtools, RSpectra, RSQLite, rstan,
## rstanarm, rstantools, Rsubread, rsvg, rtracklayer, Rtsne, S4Vectors, sass,
## ScaledMatrix, scales, scater, scattermore, scran, scuttle, segmented,
## SeqArray, seqminer, SeqVarTools, seriation, sets, Seurat, SeuratObject, sf,
## sfsmisc, shinystan, ShortRead, SHT, SingleCellExperiment, SingleR, SKAT,
## sn, SNPRelate, sp, spaMM, sparseMatrixStats, spatstat.core, spatstat.data,
## spatstat.geom, spatstat.linnet, spatstat.sparse, spatstat.utils, spikeslab,
## splancs, statnet.common, subplex, SummarizedExperiment, survival, sva,
## svglite, systemfonts, systemPipeR, tclust, testthat, text2vec, TH.data,
## tibble, tidygraph, tidyselect, tidytree, tinytex, TMB, tmvtnorm, tseries,
## TSP, tzdb, udunits2, umap, units, unmarked, uuid, V8, VariantAnnotation,
## vctrs, vegan, VGAM, waldo, webshot, withr, xfun, xgboost, XML, XVector,
## yaml, zlibbioc
## Old packages: 'BiocManager', 'LDheatmap'
The gpart package is now within BiocManager, so I had to install from within that package.
Finally, we can produce a heat map in order to visualize the linkage blocks.
library(gpart)
## Loading required package: grid
##
## Attaching package: 'grid'
## The following object is masked from 'package:Biostrings':
##
## pattern
## Loading required package: Homo.sapiens
## Loading required package: OrganismDbi
## Loading required package: GO.db
##
## Loading required package: org.Hs.eg.db
##
## Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene
## Loading required package: TxDb.Hsapiens.UCSC.hg38.knownGene
pyg_NPY_res1 = BigLD(genofile = "pygNPY.vcf",LD="r2")
## data reformating done!
## working chromosome name: CAE21
## remove SNPs with MAF < 0.05
## Number of SNPs after prunning SNPs with MAF< 0.05 : 152
## start to split the whole sequence into sub-task regions
## there is only one sub-region!
## 2022-05-08 21:50:32 | chrCAE21:34058548-34085036 | sub-region: 1/1
## CLQ done!
constructLDblock done!
##
## BigLD done!
pyg_NPY_res1
## chr start.index end.index start.rsID end.rsID start.bp end.bp
## 1 CAE21 2 526 . . 34058548 34085036
nowcex=0.5
LDblockHeatmap(genofile = "pygNPY.vcf", chrN = "CAE21", geneDB = "ensembl", geneid = "hgnc_symbol", blocktype="bigld", LD="r2", type="png", filename = "heatmap_pyg_NPY_r2", CLQmode = "density", res = 600)
NPY Heat Map
This shows that there is only one linkage block created by BigLD, and the start and end points are given.
First the VCF file must be converted to a SNP matrix
library(tidyverse)
library(snpStats)
library(VariantAnnotation)
library(LDheatmap)
pygNPY1vcf<-readVcf("pygNPY1.vcf")
pygNPY1matrix<-genotypeToSnpMatrix(pygNPY1vcf)
pygNPY1vcf
## class: CollapsedVCF
## dim: 432 51
## rowRanges(vcf):
## GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
## info(vcf):
## DataFrame with 0 columns:
## geno(vcf):
## List of length 1: GT
## geno(header(vcf)):
## Number Type Description
## GT 1 String Phased Genotype
pygNPY1_LD<-ld(pygNPY1matrix$genotypes,depth=431,stats="R.squared")
Since the data in pygerthrus NPY Y1 receptor is small, it had to be reset to a depth of 431 for the matrix
cols=colorRampPalette(c("yellow","red"))(10)
image(pygNPY1_LD,lwd=0,cuts=9,col.regions=cols,colorkey=TRUE)
The block of red indicates that there is a linkage block of selection occurring on this part of the gene
We can align the SNPs to the vervet reference as done above:
library(GenomicFeatures)
library(RMariaDB)
txdb <- makeTxDbFromEnsembl(organism="Chlorocebus sabaeus", server="useastdb.ensembl.org") #Create txdb for chlSab2
vcfbig <- readVcf("pygNPY1.vcf", "txdb") #align data with vervet reference genome
ALLmatrix <- genotypeToSnpMatrix(vcfbig)
ALLmatrix
However there’s an easier way to do this:
BiocManager::install("gpart")
The gpart package is now within BiocManager, so I had to install from within that package.
library(gpart)
pyg_npy1_res1 = BigLD(genofile = "pygNPY1.vcf",LD="r2")
## data reformating done!
## working chromosome name: CAE7
## remove SNPs with MAF < 0.05
## Number of SNPs after prunning SNPs with MAF< 0.05 : 104
## start to split the whole sequence into sub-task regions
## there is only one sub-region!
## 2022-05-08 21:50:33 | chrCAE7:109764999-109786902 | sub-region: 1/1
## CLQ done!
constructLDblock done!
##
## BigLD done!
pyg_npy1_res1
## chr start.index end.index start.rsID end.rsID start.bp end.bp
## 1 CAE7 2 3 . . 109764999 109765029
## 2 CAE7 26 87 . . 109766082 109768247
## 3 CAE7 97 202 . . 109768697 109773839
## 4 CAE7 214 233 . . 109775029 109777165
## 5 CAE7 234 243 . . 109777223 109777564
## 6 CAE7 245 250 . . 109777686 109778565
## 7 CAE7 253 254 . . 109778772 109778809
## 8 CAE7 268 322 . . 109779336 109781797
## 9 CAE7 334 406 . . 109782396 109786018
## 10 CAE7 429 432 . . 109786819 109786902
nowcex=0.5
LDblockHeatmap(genofile = "pygNPY1.vcf", chrN = "CAE7", geneDB = "ensembl", geneid = "hgnc_symbol", blocktype="bigld", LD="r2", type="png", filename = "heatmap_pyg_npy1_r2", CLQmode = "density", res = 600)
This code creates a LD heat map to help visualize the linkage blocks produced from the gpart code.
Here we can see that there are 7 linkage blocks that can be distinguished. This next code below will help in getting specific LD values between SNP points of interest that can be exported to excel and used later when trying to determine the loci we want to access for selection in the easiest manner possible.
library(Matrix)
library(tidyverse)
sparse2triples<-function(m) {
SM = summary(m)
D1 = m@Dimnames[[1]][SM[,1]]
D2 = m@Dimnames[[2]][SM[,2]]
data.frame(row=D1,col=D2,x=m@x)
}
LD2<-sparse2triples(pygNPY_LD)
LD2<-as_tibble(LD2)
LD3<-
LD2 %>%
filter(x!=NA)
write.csv(LD3,"pygnpy1_LDr2.csv")
Now I can repeat this process for finding linkage disequilibrium in both NPY and NPY1 in the Souther Expansion population
library(snpStats)
library(VariantAnnotation)
library(LDheatmap)
seNPYvcf<-readVcf("seNPY.vcf")
seNPYmatrix<-genotypeToSnpMatrix(seNPYvcf)
seNPYvcf
## class: CollapsedVCF
## dim: 526 73
## rowRanges(vcf):
## GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
## info(vcf):
## DataFrame with 0 columns:
## geno(vcf):
## List of length 1: GT
## geno(header(vcf)):
## Number Type Description
## GT 1 String Phased Genotype
seNPY_LD<-ld(seNPYmatrix$genotypes,depth=525,stats="R.squared")
Next we can visualize the linkage:
cols=colorRampPalette(c("yellow","red"))(10)
image(seNPY_LD,lwd=0,cuts=9,col.regions=cols,colorkey=TRUE)
And use the easy way to visualize the linkage in a heat map that incorporates the genetic distance:
library(gpart)
se_NPY_res1 = BigLD(genofile = "seNPY.vcf",LD="r2")
## data reformating done!
## working chromosome name: CAE21
## remove SNPs with MAF < 0.05
## Number of SNPs after prunning SNPs with MAF< 0.05 : 125
## start to split the whole sequence into sub-task regions
## there is only one sub-region!
## 2022-05-08 21:50:34 | chrCAE21:34059089-34084994 | sub-region: 1/1
## CLQ done!
constructLDblock done!
##
## BigLD done!
se_NPY_res1
## chr start.index end.index start.rsID end.rsID start.bp end.bp
## 1 CAE21 11 525 . . 34059089 34084994
This shows that the southern expansion NPY genomic regions fall into 1 distinct linkage block, and it gives us the specific start and end points for the blocks. This is very intriguing, as it shows that the NPY gene is completely coinherited.
Next I can visualize the linkage blocks similar to how I did with the pygerythrus population.
nowcex=0.5
LDblockHeatmap(genofile = "seNPY.vcf", chrN = "CAE21", geneDB = "ensembl", geneid = "hgnc_symbol", blocktype="bigld", LD="r2", type="png", filename = "heatmap_se_NPY_r2", CLQmode = "density", res = 600)
## data reformating done!
## Warning in if (assembly == "GRCh38") {: the condition has length > 1 and only
## the first element will be used
## load gene information from ensembl (assembly GRCh38)
## Number of SNPs after prunning SNPs with MAF< 0.05 : 125
## working chromosome name: CAE21
## remove SNPs with MAF < 0.05
## Number of SNPs after prunning SNPs with MAF< 0.05 : 125
## start to split the whole sequence into sub-task regions
## there is only one sub-region!
## 2022-05-08 21:50:46 | chrCAE21:34059089-34084994 | sub-region: 1/1
## CLQ done!
constructLDblock done!
##
## BigLD done!
## This region does not overlap any gene region!
## Generating file....
## heatmap_se_NPY_r2.png
NPY Heat Map
Once again, all of these steps can be repeated for the NPY Y1 receptor in the southern expansion population.
library(snpStats)
library(VariantAnnotation)
library(LDheatmap)
seNPY1vcf<-readVcf("seNPY1.vcf")
seNPY1matrix<-genotypeToSnpMatrix(seNPY1vcf)
seNPY1vcf
## class: CollapsedVCF
## dim: 432 73
## rowRanges(vcf):
## GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
## info(vcf):
## DataFrame with 0 columns:
## geno(vcf):
## List of length 1: GT
## geno(header(vcf)):
## Number Type Description
## GT 1 String Phased Genotype
seNPY1_LD<-ld(seNPY1matrix$genotypes,depth=431,stats="R.squared")
For NPY Y1, the bounds of the data is a bit smaller, so I had to reset the depth to 431.
The linkage blocks matrix can then be visualized into a heatmap:
cols=colorRampPalette(c("yellow","red"))(10)
image(seNPY1_LD,lwd=0,cuts=9,col.regions=cols,colorkey=TRUE)
And the linkage blocks can be characterized and visualized with the genetic distance incorporated:
library(gpart)
se_npy1_res1 = BigLD(genofile = "seNPY1.vcf",LD="r2")
## data reformating done!
## working chromosome name: CAE7
## remove SNPs with MAF < 0.05
## Number of SNPs after prunning SNPs with MAF< 0.05 : 119
## start to split the whole sequence into sub-task regions
## there is only one sub-region!
## 2022-05-08 21:50:49 | chrCAE7:109764999-109786902 | sub-region: 1/1
## CLQ done!
constructLDblock done!
##
## BigLD done!
se_npy1_res1
## chr start.index end.index start.rsID end.rsID start.bp end.bp
## 1 CAE7 2 3 . . 109764999 109765029
## 2 CAE7 20 90 . . 109765855 109768322
## 3 CAE7 97 113 . . 109768697 109770013
## 4 CAE7 115 126 . . 109770093 109770701
## 5 CAE7 133 146 . . 109770921 109771383
## 6 CAE7 148 158 . . 109771489 109771902
## 7 CAE7 161 169 . . 109772125 109772447
## 8 CAE7 171 250 . . 109772516 109778565
## 9 CAE7 253 254 . . 109778772 109778809
## 10 CAE7 264 335 . . 109779088 109782413
## 11 CAE7 352 359 . . 109783893 109784174
## 12 CAE7 369 432 . . 109784778 109786902
This shows that the southern expansion NPY genomic regions fall into 12 distinct linkage blocks, and it gives us the specific start and end points for the blocks.
Finally, we can see the final heatmap for NPY Y1 created below:
nowcex=0.5
LDblockHeatmap(genofile = "seNPY1.vcf", chrN = "CAE7", geneDB = "ensembl", geneid = "hgnc_symbol", blocktype="bigld", LD="r2", type="png", filename = "heatmap_se_npy1_r2", CLQmode = "density", res = 600)
## data reformating done!
## Warning in if (assembly == "GRCh38") {: the condition has length > 1 and only
## the first element will be used
## load gene information from ensembl (assembly GRCh38)
## Number of SNPs after prunning SNPs with MAF< 0.05 : 119
## working chromosome name: CAE7
## remove SNPs with MAF < 0.05
## Number of SNPs after prunning SNPs with MAF< 0.05 : 119
## start to split the whole sequence into sub-task regions
## there is only one sub-region!
## 2022-05-08 21:51:02 | chrCAE7:109764999-109786902 | sub-region: 1/1
## CLQ done!
constructLDblock done!
##
## BigLD done!
## This region does not overlap any gene region!
## Generating file....
## heatmap_se_npy1_r2.png
NPY Heat Map
The final analysis I will be completing will be in order to determine the population structure. Analyzing population structure is important because it tells us whether the variation in the gene region is structured by phylogeny, and therefore due to something like genetic drift, or fundamentally different, showing that selection or something similar is altering the pattern of variation.
The two analyses used will be PCA (principal component analysis) and DAPC (discriminant analysis of principal components) plots to see if they show a pattern consistent with differentiation at the taxonomic level (pygerythrus, cynosuros, and hilgerti), as well as the population level. And we will test the significance of differentiation using Adonis (tests for molecular variation; multivariate AMOVA) and AMOVA (analysis of molecular variance)
First I need to download the necessary packages:
library(devtools)
## Loading required package: usethis
library(vcfR)
##
## ***** *** vcfR *** *****
## This is vcfR 1.12.0
## browseVignettes('vcfR') # Documentation
## citation('vcfR') # Citation
## ***** ***** ***** *****
library(adegenet)
## Loading required package: ade4
##
## Attaching package: 'ade4'
## The following object is masked from 'package:Biostrings':
##
## score
## The following object is masked from 'package:GenomicRanges':
##
## score
## The following object is masked from 'package:BiocGenerics':
##
## score
##
## /// adegenet 2.1.5 is loaded ////////////
##
## > overview: '?adegenet'
## > tutorials/doc/questions: 'adegenetWeb()'
## > bug reports/feature requests: adegenetIssues()
##
## Attaching package: 'adegenet'
## The following object is masked from 'package:survival':
##
## strata
library(ggplot2)
library(vegan)
## Loading required package: permute
##
## Attaching package: 'permute'
## The following object is masked from 'package:devtools':
##
## check
## Loading required package: lattice
## This is vegan 2.5-7
library(poppr)
## Registered S3 method overwritten by 'pegas':
## method from
## print.amova ade4
## This is poppr version 2.9.3. To get started, type package?poppr
## OMP parallel support: available
library(dplyr)
And load the right VCF files:
NPY <- read.vcfR("seNPY.vcf", verbose = TRUE)
## Scanning file to determine attributes.
## File attributes:
## meta lines: 38
## header_line: 39
## variant count: 526
## column count: 82
##
Meta line 38 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
## Character matrix gt rows: 526
## Character matrix gt cols: 82
## skip: 0
## nrows: 526
## row_num: 0
##
Processed variant: 526
## All variants processed
NPY1 <- read.vcfR("seNPY1.vcf", verbose = TRUE)
## Scanning file to determine attributes.
## File attributes:
## meta lines: 38
## header_line: 39
## variant count: 432
## column count: 82
##
Meta line 38 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
## Character matrix gt rows: 432
## Character matrix gt cols: 82
## skip: 0
## nrows: 432
## row_num: 0
##
Processed variant: 432
## All variants processed
To organize my work, lets start with the background organizational population structure work for NPY.
The following code will convert the vcf file into a genind file, read the population panel data, and filter out all non SE-individuals. If this works, all the samples in the VCF and population panel should be the same, and the final line should read TRUE
NPY_genind <- vcfR2genind(NPY, sep = "[|/]")
pop.data <- read.table("vervet.population.panel", sep = "\t", header = FALSE)
colnames(pop.data) <- c("sample", "pop", "region", "country", "taxon")
pop.data<-filter(pop.data,taxon == "pygerythrus" | taxon == "hilgerti" | taxon == "cynosuros")
all(colnames(NPY@gt)[-1] == pop.data$AccessID)
## [1] TRUE
The vectors in the code above did not align due to there being 6 column names but only 5 values in the vector that it was labeling. So I had to delete one name and now the vectors align and the code is TRUE
The next code will assign the population panel data to GENIND, the the population panel as the population, and create a GENLIGHT object for PCA:
strata(NPY_genind) <- pop.data[(match(indNames(NPY_genind), pop.data$sample)), ]
setPop(NPY_genind) <- ~pop
NPY_genlight <- vcfR2genlight(NPY)
strata(NPY_genlight) <- pop.data[(match(indNames(NPY_genlight), pop.data$sample)), ]
setPop(NPY_genlight) <- ~pop
Now we can repeat this process for NPY Y1!
NPY1_genind <- vcfR2genind(NPY1, sep = "[|/]")
pop.data <- read.table("vervet.population.panel", sep = "\t", header = FALSE)
colnames(pop.data) <- c("sample", "region", "pop", "country", "taxon")
pop.data <-filter(pop.data,taxon == "pygerythrus" | taxon == "hilgerti" | taxon == "cynosuros")
all(colnames(NPY1@gt)[-1] == pop.data$AccessID)
## [1] TRUE
strata(NPY1_genind) <- pop.data[(match(indNames(NPY1_genind), pop.data$sample)), ]
setPop(NPY1_genind) <- ~pop
NPY1_genlight <- vcfR2genlight(NPY1)
strata(NPY1_genlight) <- pop.data[(match(indNames(NPY1_genlight), pop.data$sample)), ]
setPop(NPY1_genlight) <- ~pop
Once this has been done for the gene and it’s receptor, we can move on to the analyses.
setPop(NPY_genlight) <- ~taxon
pca <- glPca(NPY_genlight, nf = 2)
barplot(100*pca$eig/sum(pca$eig), col = heat.colors(50), main="PCA Eigenvalues",ylab="% of Variance Explained",xlab="Eigenvalues")
This graph will show the amount of variation from the original dataset that is kept in the PCA. This will help us determine the amount of PC’s to keep, usually those that explain the most variation would want to be kept.
Once this is determined, we can graph with ggplot.
pca.scores <- as.data.frame(pca$scores)
pca.scores$taxon <- pop(NPY_genlight)
p <- ggplot(pca.scores, aes(x=PC1, y=PC2, colour= taxon)) + geom_point(size=2) + stat_ellipse(level = 0.95, size = 1) + theme_bw()
print(p)
To see if these differences are significant, we can run the multivariate AMOVA:
adonis1 <- adonis(pca$scores ~ taxon, data = pca.scores, method='eu', na.rm = TRUE)
adonis1
##
## Call:
## adonis(formula = pca$scores ~ taxon, data = pca.scores, method = "eu", na.rm = TRUE)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## taxon 2 10.37 5.1848 1.2681 0.03496 0.303
## Residuals 70 286.21 4.0887 0.96504
## Total 72 296.58 1.00000
According to this output, about 35% of genomic variance of the NPY gene is accounted for at a taxonomic level, but the difference in variance is not significant, with a significance value of 0.291.
setPop(NPY_genlight) <- ~pop
pca <- glPca(NPY_genlight, nf = 2)
barplot(100*pca$eig/sum(pca$eig), col = heat.colors(50), main="PCA Eigenvalues")
title(ylab="Percent of variance\nexplained", line = 2)
title(xlab="Eigenvalues", line = 1)
This code repeats the process of seeing how much variation from the original data is seen in the PCA for population. This will help us determine what PC’s we should keep.
The results can then be graphed into a ggplot
pca.scores <- as.data.frame(pca$scores)
pca.scores$pop <- pop(NPY_genlight)
p <- ggplot(pca.scores, aes(x=PC1, y=PC2, colour=pop)) + geom_point(size=2) + stat_ellipse(level = 0.95, size = 1) + theme_bw() + scale_color_manual(values=c("purple4","salmon","dark green","forest green","seagreen1","dodgerblue","green","pink", "red","black","light blue","blue3","lavender","yellow","grey","brown","cyan","black","light grey","orange","darkturquoise","darkgoldenrod1"))
print(p)
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Warning: Removed 14 row(s) containing missing values (geom_path).
ggsave("NPY_PCA.png",plot=p,width=5.5,height=4,units=c("in"),dpi=300)
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Warning: Removed 14 row(s) containing missing values (geom_path).
Now we can run the multivariate AMOVA using adonis to see if the differences are significant
adonis2 <- adonis(pca$scores ~pop, data = pca.scores, method='eu', na.rm = TRUE)
adonis2
##
## Call:
## adonis(formula = pca$scores ~ pop, data = pca.scores, method = "eu", na.rm = TRUE)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## pop 21 176.01 8.3816 3.5454 0.59348 0.001 ***
## Residuals 51 120.57 2.3640 0.40652
## Total 72 296.58 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This shows that there is a significant difference in populations (p<0.001), and population explains about 59.3% of the genomic variance in the gene region.
Now we can move onto the DAPC population structure analysis!
dapc <- dapc(NPY_genind, n.pca = 2, n.da = 2)
dapc$grp
## [1] Unknown Unknown Kasane Kasane
## [5] Samburu Mosiro Naivasha Kimana
## [9] Soetdoring Soetdoring Soetdoring Gariep
## [13] Gariep Gariep Gariep Gariep
## [17] Gariep Gariep Gariep Gariep
## [21] Gariep Gariep Sandveld Sandveld
## [25] Sandveld Sandveld Sandveld Parys
## [29] Zinkwazi Zinkwazi Zinkwazi Zinkwazi
## [33] Zinkwazi Zinkwazi Blythedale Beach Blythedale Beach
## [37] Blythedale Beach Blythedale Beach Blythedale Beach Kwela
## [41] Kwela Kwela Anerley Letsitele
## [45] Letsitele Shamwari Shamwari Shamwari
## [49] Shamwari Shamwari Shamwari Amakhala
## [53] Amakhala Amakhala Amakhala Amakhala
## [57] Addo Gate Gate Gate
## [61] Gate Gate Gate Mukambi
## [65] Mukambi Mukambi Lilayi Lilayi
## [69] Livingstone Livingstone Livingstone Livingstone
## [73] Livingstone
## 22 Levels: Unknown Kasane Samburu Mosiro Naivasha Kimana Soetdoring ... Livingstone
scatter(dapc, cex = 2, legend = TRUE, clabel = F, posi.leg = "bottomleft", scree.pca = TRUE,
posi.pca = "topright", cleg = 0.75, col=c("purple4","pink","dark green","forest green","seagreen1","dodgerblue","green","salmon","red","black","light blue","blue3","lavender","yellow","grey","brown","cyan","black","light grey","orange","darkturquoise","darkgoldenrod1"))
Though there is generally much overlap in variation for NPY between populations and taxons, we can see a bit more clearly that the variation we do see in South Africa appears to scale away a bit from the rest of Africa along DA2, and within South Africa along DA1.
We’ve seen a few things from looking at the PCAs and DAPC from NPY. With the first graph, we see that there is no separation along PC1 and PC2 for the three taxons (pygerythrus, hilgerti and cynosuros). When we look at the sub-populations of each species in a PCA, we see that there’s yet again not a large amount of separation, but the values are shown significance by the Adonis resuls. The cluster of variation that is maintained to a small area within hilgerti and cynosuros taxons is made up of a couple of specific sub-populations not also found in the pygerythrus including but not limited to: Gate, Livingstone, Zinkwazi, Amakhala, Shamwari, Samburu, and Lilayi. There is also a variation in the Anerley population that is within only pygerythrus. In the DAPC, we again see the aforementioned populations clustering within hilgerti and cynosuros. In both graphs, it appears that the Gariep population is an area of overlap between these clusters of variation.
Now we can perform the AMOVA analysis:
library(poppr)
NPY_genlight2 <- popsub(NPY_genlight, exclude = c("Limpopo"))
amova1 <- poppr.amova(NPY_genlight2, ~taxon/pop)
amova1
## $call
## ade4::amova(samples = xtab, distances = xdist, structures = xstruct)
##
## $results
## Df Sum Sq Mean Sq
## Between taxon 2 109.4299 54.71494
## Between pop Within taxon 19 678.4398 35.70736
## Between samples Within pop 51 843.4591 16.53841
## Within samples 73 1090.5000 14.93836
## Total 145 2721.8288 18.77123
##
## $componentsofcovariance
## Sigma %
## Variations Between taxon 0.5565822 2.888387
## Variations Between pop Within taxon 2.9746899 15.437171
## Variations Between samples Within pop 0.8000287 4.151754
## Variations Within samples 14.9383562 77.522689
## Total variations 19.2696569 100.000000
##
## $statphi
## Phi
## Phi-samples-total 0.22477311
## Phi-samples-pop 0.05083296
## Phi-pop-taxon 0.15896318
## Phi-taxon-total 0.02888387
amova.test <- randtest(amova1)
plot(amova.test)
amova.test
## class: krandtest lightkrandtest
## Monte-Carlo tests
## Call: randtest.amova(xtest = amova1)
##
## Number of tests: 4
##
## Adjustment method for multiple comparisons: none
## Permutation number: 99
## Test Obs Std.Obs Alter Pvalue
## 1 Variations within samples 14.9383562 -6.0074447 less 0.01
## 2 Variations between samples 0.8000287 0.6320627 greater 0.26
## 3 Variations between pop 2.9746899 6.7976024 greater 0.01
## 4 Variations between taxon 0.5565822 1.2725286 greater 0.15
When we run the AMOVA on NPY looking for molecular variance, we can see that 2.8% of the variance is between taxa, 15.4% is between populations within taxa, 4.15% between samples within populations, and the remaining 77.5% is within samples outside these designations. This suggests that our population designations are relatively homogenous within themselves, suggesting good mixing at that level, but significantly stratified by population, and not significantly stratified by taxon.
library(hierfstat)
##
## Attaching package: 'hierfstat'
## The following objects are masked from 'package:adegenet':
##
## Hs, read.fstat
wc(NPY_genind)
## $FST
## [1] 0.171096
##
## $FIS
## [1] 0.05083296
This gives us FST among all of the populations combined (so that means that about 17.1% of the variants in NPY are not shared between populations), but this doesn’t tell us which populations are more or less different. Let’s look at the pairwise FST between each population:
NPYFst <- genet.dist(NPY_genind[1:73,],method="WC84")
NPYFst
## Unknown Kasane Samburu Mosiro
## Kasane 0.115044248
## Samburu 0.263157895 0.361111111
## Mosiro -0.075812274 -0.222707424 0.000000000
## Naivasha -0.052631579 -0.105882353 0.000000000 0.000000000
## Kimana 0.056250000 0.144578313 0.000000000 0.000000000
## Soetdoring 0.009736330 0.033611623 0.089304258 -0.142302717
## Gariep 0.133840051 0.081888429 0.196413884 0.001223990
## Sandveld 0.093729589 0.164662147 0.084194978 0.021226415
## Parys 0.185840708 0.146341463 0.000000000 0.000000000
## Zinkwazi 0.196373057 0.190174039 0.288836292 0.141394026
## Blythedale Beach 0.194668391 0.243608119 0.267167382 0.180334395
## Kwela 0.178505705 0.248938178 0.131359852 0.130634775
## Anerley 0.542288557 0.574257426 0.000000000 0.000000000
## Letsitele 0.004739336 -0.026455026 -0.014492754 -0.268389662
## Shamwari 0.121543920 0.168794412 0.173590982 0.055290785
## Amakhala 0.253581856 0.239687767 0.337187789 0.206270627
## Addo 0.139175258 0.036585366 0.000000000 0.000000000
## Gate 0.186662784 0.029411765 0.471294766 0.042927794
## Mukambi 0.184128848 0.032129404 0.570637119 0.112359551
## Lilayi 0.165137615 0.135416667 0.485992692 0.127789047
## Livingstone 0.200241209 0.200047498 0.299526440 0.184931507
## Naivasha Kimana Soetdoring Gariep
## Kasane
## Samburu
## Mosiro
## Naivasha
## Kimana 0.000000000
## Soetdoring -0.047281324 -0.112640801
## Gariep 0.020972735 0.107980367 0.049820223
## Sandveld 0.033694810 0.007148531 -0.014263698 0.084685104
## Parys 0.000000000 0.000000000 0.004032258 0.087673689
## Zinkwazi 0.163301060 0.189121712 0.108393720 0.006833853
## Blythedale Beach 0.196261682 0.237656595 0.109523105 0.110862011
## Kwela 0.115671642 0.180878553 0.074468085 0.117444427
## Anerley 0.000000000 0.000000000 0.358974359 0.372542659
## Letsitele -0.222857143 -0.023771791 -0.082713478 0.099833145
## Shamwari 0.101218162 0.052869739 0.079412065 0.190474410
## Amakhala 0.246618106 0.276394422 0.155108973 0.243288045
## Addo 0.000000000 0.000000000 -0.144078144 0.032539221
## Gate 0.185127202 0.317603171 0.136553112 0.177410667
## Mukambi 0.289532294 0.368627451 0.097087379 0.167560133
## Lilayi 0.229129663 0.389121339 0.038212981 0.058559051
## Livingstone 0.183394161 0.269731489 0.110080186 0.042914770
## Sandveld Parys Zinkwazi Blythedale Beach
## Kasane
## Samburu
## Mosiro
## Naivasha
## Kimana
## Soetdoring
## Gariep
## Sandveld
## Parys -0.003388554
## Zinkwazi 0.049092518 0.209669142
## Blythedale Beach 0.052600473 0.099193548 0.097776406
## Kwela -0.006312603 -0.035502959 0.135299625 -0.030628501
## Anerley 0.204768125 0.000000000 0.434503361 0.247747748
## Letsitele 0.086255699 0.026282854 0.202975558 0.143712198
## Shamwari 0.206980034 0.125993972 0.267888255 0.263986992
## Amakhala 0.306992139 0.186934673 0.363450417 0.346774194
## Addo 0.069264069 0.000000000 0.229097857 0.208762887
## Gate 0.257716535 0.359645404 0.299693252 0.316385688
## Mukambi 0.199585147 0.414464534 0.271457511 0.287076127
## Lilayi 0.091380652 0.243027888 0.135773318 0.146062541
## Livingstone 0.091019417 0.248466258 0.067273054 0.072798487
## Kwela Anerley Letsitele Shamwari
## Kasane
## Samburu
## Mosiro
## Naivasha
## Kimana
## Soetdoring
## Gariep
## Sandveld
## Parys
## Zinkwazi
## Blythedale Beach
## Kwela
## Anerley 0.109375000
## Letsitele 0.099875156 0.483028721
## Shamwari 0.239948224 0.487153839 0.034851430
## Amakhala 0.343491480 0.616574586 0.103546205 0.070336354
## Addo 0.124452235 0.000000000 -0.446028513 -0.180581745
## Gate 0.366388723 0.670809077 0.134426230 0.230897704
## Mukambi 0.325503356 0.751099384 0.105159705 0.196299276
## Lilayi 0.142677298 0.609706775 0.032967033 0.169439687
## Livingstone 0.114726972 0.485425342 0.153104592 0.230138915
## Amakhala Addo Gate Mukambi
## Kasane
## Samburu
## Mosiro
## Naivasha
## Kimana
## Soetdoring
## Gariep
## Sandveld
## Parys
## Zinkwazi
## Blythedale Beach
## Kwela
## Anerley
## Letsitele
## Shamwari
## Amakhala
## Addo -0.403846154
## Gate 0.277538297 0.175138852
## Mukambi 0.299324654 0.317343173 0.029970760
## Lilayi 0.281333767 0.247376312 0.184278351 0.244705882
## Livingstone 0.313798220 0.204834606 0.242188319 0.265708368
## Lilayi
## Kasane
## Samburu
## Mosiro
## Naivasha
## Kimana
## Soetdoring
## Gariep
## Sandveld
## Parys
## Zinkwazi
## Blythedale Beach
## Kwela
## Anerley
## Letsitele
## Shamwari
## Amakhala
## Addo
## Gate
## Mukambi
## Lilayi
## Livingstone 0.145811252
Now that we have that, we can create a heatmap to visualize FST using {ggplot2}. The output that {hierfstat} gave us is a dist class file, so first we will use the {reshape2} package to make it a matrix.
library(reshape2)
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
NPYFst <- as.matrix(NPYFst)
NPYFst
## Unknown Kasane Samburu Mosiro Naivasha
## Unknown 0.000000000 0.11504425 0.26315789 -0.07581227 -0.05263158
## Kasane 0.115044248 0.00000000 0.36111111 -0.22270742 -0.10588235
## Samburu 0.263157895 0.36111111 0.00000000 0.00000000 0.00000000
## Mosiro -0.075812274 -0.22270742 0.00000000 0.00000000 0.00000000
## Naivasha -0.052631579 -0.10588235 0.00000000 0.00000000 0.00000000
## Kimana 0.056250000 0.14457831 0.00000000 0.00000000 0.00000000
## Soetdoring 0.009736330 0.03361162 0.08930426 -0.14230272 -0.04728132
## Gariep 0.133840051 0.08188843 0.19641388 0.00122399 0.02097274
## Sandveld 0.093729589 0.16466215 0.08419498 0.02122642 0.03369481
## Parys 0.185840708 0.14634146 0.00000000 0.00000000 0.00000000
## Zinkwazi 0.196373057 0.19017404 0.28883629 0.14139403 0.16330106
## Blythedale Beach 0.194668391 0.24360812 0.26716738 0.18033439 0.19626168
## Kwela 0.178505705 0.24893818 0.13135985 0.13063477 0.11567164
## Anerley 0.542288557 0.57425743 0.00000000 0.00000000 0.00000000
## Letsitele 0.004739336 -0.02645503 -0.01449275 -0.26838966 -0.22285714
## Shamwari 0.121543920 0.16879441 0.17359098 0.05529078 0.10121816
## Amakhala 0.253581856 0.23968777 0.33718779 0.20627063 0.24661811
## Addo 0.139175258 0.03658537 0.00000000 0.00000000 0.00000000
## Gate 0.186662784 0.02941176 0.47129477 0.04292779 0.18512720
## Mukambi 0.184128848 0.03212940 0.57063712 0.11235955 0.28953229
## Lilayi 0.165137615 0.13541667 0.48599269 0.12778905 0.22912966
## Livingstone 0.200241209 0.20004750 0.29952644 0.18493151 0.18339416
## Kimana Soetdoring Gariep Sandveld
## Unknown 0.056250000 0.009736330 0.133840051 0.093729589
## Kasane 0.144578313 0.033611623 0.081888429 0.164662147
## Samburu 0.000000000 0.089304258 0.196413884 0.084194978
## Mosiro 0.000000000 -0.142302717 0.001223990 0.021226415
## Naivasha 0.000000000 -0.047281324 0.020972735 0.033694810
## Kimana 0.000000000 -0.112640801 0.107980367 0.007148531
## Soetdoring -0.112640801 0.000000000 0.049820223 -0.014263698
## Gariep 0.107980367 0.049820223 0.000000000 0.084685104
## Sandveld 0.007148531 -0.014263698 0.084685104 0.000000000
## Parys 0.000000000 0.004032258 0.087673689 -0.003388554
## Zinkwazi 0.189121712 0.108393720 0.006833853 0.049092518
## Blythedale Beach 0.237656595 0.109523105 0.110862011 0.052600473
## Kwela 0.180878553 0.074468085 0.117444427 -0.006312603
## Anerley 0.000000000 0.358974359 0.372542659 0.204768125
## Letsitele -0.023771791 -0.082713478 0.099833145 0.086255699
## Shamwari 0.052869739 0.079412065 0.190474410 0.206980034
## Amakhala 0.276394422 0.155108973 0.243288045 0.306992139
## Addo 0.000000000 -0.144078144 0.032539221 0.069264069
## Gate 0.317603171 0.136553112 0.177410667 0.257716535
## Mukambi 0.368627451 0.097087379 0.167560133 0.199585147
## Lilayi 0.389121339 0.038212981 0.058559051 0.091380652
## Livingstone 0.269731489 0.110080186 0.042914770 0.091019417
## Parys Zinkwazi Blythedale Beach Kwela
## Unknown 0.185840708 0.196373057 0.19466839 0.178505705
## Kasane 0.146341463 0.190174039 0.24360812 0.248938178
## Samburu 0.000000000 0.288836292 0.26716738 0.131359852
## Mosiro 0.000000000 0.141394026 0.18033439 0.130634775
## Naivasha 0.000000000 0.163301060 0.19626168 0.115671642
## Kimana 0.000000000 0.189121712 0.23765660 0.180878553
## Soetdoring 0.004032258 0.108393720 0.10952311 0.074468085
## Gariep 0.087673689 0.006833853 0.11086201 0.117444427
## Sandveld -0.003388554 0.049092518 0.05260047 -0.006312603
## Parys 0.000000000 0.209669142 0.09919355 -0.035502959
## Zinkwazi 0.209669142 0.000000000 0.09777641 0.135299625
## Blythedale Beach 0.099193548 0.097776406 0.00000000 -0.030628501
## Kwela -0.035502959 0.135299625 -0.03062850 0.000000000
## Anerley 0.000000000 0.434503361 0.24774775 0.109375000
## Letsitele 0.026282854 0.202975558 0.14371220 0.099875156
## Shamwari 0.125993972 0.267888255 0.26398699 0.239948224
## Amakhala 0.186934673 0.363450417 0.34677419 0.343491480
## Addo 0.000000000 0.229097857 0.20876289 0.124452235
## Gate 0.359645404 0.299693252 0.31638569 0.366388723
## Mukambi 0.414464534 0.271457511 0.28707613 0.325503356
## Lilayi 0.243027888 0.135773318 0.14606254 0.142677298
## Livingstone 0.248466258 0.067273054 0.07279849 0.114726972
## Anerley Letsitele Shamwari Amakhala Addo
## Unknown 0.5422886 0.004739336 0.12154392 0.25358186 0.13917526
## Kasane 0.5742574 -0.026455026 0.16879441 0.23968777 0.03658537
## Samburu 0.0000000 -0.014492754 0.17359098 0.33718779 0.00000000
## Mosiro 0.0000000 -0.268389662 0.05529078 0.20627063 0.00000000
## Naivasha 0.0000000 -0.222857143 0.10121816 0.24661811 0.00000000
## Kimana 0.0000000 -0.023771791 0.05286974 0.27639442 0.00000000
## Soetdoring 0.3589744 -0.082713478 0.07941206 0.15510897 -0.14407814
## Gariep 0.3725427 0.099833145 0.19047441 0.24328804 0.03253922
## Sandveld 0.2047681 0.086255699 0.20698003 0.30699214 0.06926407
## Parys 0.0000000 0.026282854 0.12599397 0.18693467 0.00000000
## Zinkwazi 0.4345034 0.202975558 0.26788825 0.36345042 0.22909786
## Blythedale Beach 0.2477477 0.143712198 0.26398699 0.34677419 0.20876289
## Kwela 0.1093750 0.099875156 0.23994822 0.34349148 0.12445223
## Anerley 0.0000000 0.483028721 0.48715384 0.61657459 0.00000000
## Letsitele 0.4830287 0.000000000 0.03485143 0.10354621 -0.44602851
## Shamwari 0.4871538 0.034851430 0.00000000 0.07033635 -0.18058175
## Amakhala 0.6165746 0.103546205 0.07033635 0.00000000 -0.40384615
## Addo 0.0000000 -0.446028513 -0.18058175 -0.40384615 0.00000000
## Gate 0.6708091 0.134426230 0.23089770 0.27753830 0.17513885
## Mukambi 0.7510994 0.105159705 0.19629928 0.29932465 0.31734317
## Lilayi 0.6097068 0.032967033 0.16943969 0.28133377 0.24737631
## Livingstone 0.4854253 0.153104592 0.23013891 0.31379822 0.20483461
## Gate Mukambi Lilayi Livingstone
## Unknown 0.18666278 0.18412885 0.16513761 0.20024121
## Kasane 0.02941176 0.03212940 0.13541667 0.20004750
## Samburu 0.47129477 0.57063712 0.48599269 0.29952644
## Mosiro 0.04292779 0.11235955 0.12778905 0.18493151
## Naivasha 0.18512720 0.28953229 0.22912966 0.18339416
## Kimana 0.31760317 0.36862745 0.38912134 0.26973149
## Soetdoring 0.13655311 0.09708738 0.03821298 0.11008019
## Gariep 0.17741067 0.16756013 0.05855905 0.04291477
## Sandveld 0.25771653 0.19958515 0.09138065 0.09101942
## Parys 0.35964540 0.41446453 0.24302789 0.24846626
## Zinkwazi 0.29969325 0.27145751 0.13577332 0.06727305
## Blythedale Beach 0.31638569 0.28707613 0.14606254 0.07279849
## Kwela 0.36638872 0.32550336 0.14267730 0.11472697
## Anerley 0.67080908 0.75109938 0.60970677 0.48542534
## Letsitele 0.13442623 0.10515971 0.03296703 0.15310459
## Shamwari 0.23089770 0.19629928 0.16943969 0.23013891
## Amakhala 0.27753830 0.29932465 0.28133377 0.31379822
## Addo 0.17513885 0.31734317 0.24737631 0.20483461
## Gate 0.00000000 0.02997076 0.18427835 0.24218832
## Mukambi 0.02997076 0.00000000 0.24470588 0.26570837
## Lilayi 0.18427835 0.24470588 0.00000000 0.14581125
## Livingstone 0.24218832 0.26570837 0.14581125 0.00000000
colnames(NPYFst) <- c("Unknown","Kasane","Samburu","Mosiro","Naivasha","Kimana","Soetdoring","Gariep","Sandveld","Parys","Zinkwazi","Blythedale Beach","Kwela","Anerley","Letsitele","Shamwari","Amakhala","Addo","Gate","Mukambi","Lilayi","Livingstone") #rename columns from numbers to population names.
rownames(NPYFst) <- c("Unknown","Kasane","Samburu","Mosiro","Naivasha","Kimana","Soetdoring","Gariep","Sandveld","Parys","Zinkwazi","Blythedale Beach","Kwela","Anerley","Letsitele","Shamwari","Amakhala","Addo","Gate","Mukambi","Lilayi","Livingstone") #rename rows from numbers to population names.
NPYFst
## Unknown Kasane Samburu Mosiro Naivasha
## Unknown 0.000000000 0.11504425 0.26315789 -0.07581227 -0.05263158
## Kasane 0.115044248 0.00000000 0.36111111 -0.22270742 -0.10588235
## Samburu 0.263157895 0.36111111 0.00000000 0.00000000 0.00000000
## Mosiro -0.075812274 -0.22270742 0.00000000 0.00000000 0.00000000
## Naivasha -0.052631579 -0.10588235 0.00000000 0.00000000 0.00000000
## Kimana 0.056250000 0.14457831 0.00000000 0.00000000 0.00000000
## Soetdoring 0.009736330 0.03361162 0.08930426 -0.14230272 -0.04728132
## Gariep 0.133840051 0.08188843 0.19641388 0.00122399 0.02097274
## Sandveld 0.093729589 0.16466215 0.08419498 0.02122642 0.03369481
## Parys 0.185840708 0.14634146 0.00000000 0.00000000 0.00000000
## Zinkwazi 0.196373057 0.19017404 0.28883629 0.14139403 0.16330106
## Blythedale Beach 0.194668391 0.24360812 0.26716738 0.18033439 0.19626168
## Kwela 0.178505705 0.24893818 0.13135985 0.13063477 0.11567164
## Anerley 0.542288557 0.57425743 0.00000000 0.00000000 0.00000000
## Letsitele 0.004739336 -0.02645503 -0.01449275 -0.26838966 -0.22285714
## Shamwari 0.121543920 0.16879441 0.17359098 0.05529078 0.10121816
## Amakhala 0.253581856 0.23968777 0.33718779 0.20627063 0.24661811
## Addo 0.139175258 0.03658537 0.00000000 0.00000000 0.00000000
## Gate 0.186662784 0.02941176 0.47129477 0.04292779 0.18512720
## Mukambi 0.184128848 0.03212940 0.57063712 0.11235955 0.28953229
## Lilayi 0.165137615 0.13541667 0.48599269 0.12778905 0.22912966
## Livingstone 0.200241209 0.20004750 0.29952644 0.18493151 0.18339416
## Kimana Soetdoring Gariep Sandveld
## Unknown 0.056250000 0.009736330 0.133840051 0.093729589
## Kasane 0.144578313 0.033611623 0.081888429 0.164662147
## Samburu 0.000000000 0.089304258 0.196413884 0.084194978
## Mosiro 0.000000000 -0.142302717 0.001223990 0.021226415
## Naivasha 0.000000000 -0.047281324 0.020972735 0.033694810
## Kimana 0.000000000 -0.112640801 0.107980367 0.007148531
## Soetdoring -0.112640801 0.000000000 0.049820223 -0.014263698
## Gariep 0.107980367 0.049820223 0.000000000 0.084685104
## Sandveld 0.007148531 -0.014263698 0.084685104 0.000000000
## Parys 0.000000000 0.004032258 0.087673689 -0.003388554
## Zinkwazi 0.189121712 0.108393720 0.006833853 0.049092518
## Blythedale Beach 0.237656595 0.109523105 0.110862011 0.052600473
## Kwela 0.180878553 0.074468085 0.117444427 -0.006312603
## Anerley 0.000000000 0.358974359 0.372542659 0.204768125
## Letsitele -0.023771791 -0.082713478 0.099833145 0.086255699
## Shamwari 0.052869739 0.079412065 0.190474410 0.206980034
## Amakhala 0.276394422 0.155108973 0.243288045 0.306992139
## Addo 0.000000000 -0.144078144 0.032539221 0.069264069
## Gate 0.317603171 0.136553112 0.177410667 0.257716535
## Mukambi 0.368627451 0.097087379 0.167560133 0.199585147
## Lilayi 0.389121339 0.038212981 0.058559051 0.091380652
## Livingstone 0.269731489 0.110080186 0.042914770 0.091019417
## Parys Zinkwazi Blythedale Beach Kwela
## Unknown 0.185840708 0.196373057 0.19466839 0.178505705
## Kasane 0.146341463 0.190174039 0.24360812 0.248938178
## Samburu 0.000000000 0.288836292 0.26716738 0.131359852
## Mosiro 0.000000000 0.141394026 0.18033439 0.130634775
## Naivasha 0.000000000 0.163301060 0.19626168 0.115671642
## Kimana 0.000000000 0.189121712 0.23765660 0.180878553
## Soetdoring 0.004032258 0.108393720 0.10952311 0.074468085
## Gariep 0.087673689 0.006833853 0.11086201 0.117444427
## Sandveld -0.003388554 0.049092518 0.05260047 -0.006312603
## Parys 0.000000000 0.209669142 0.09919355 -0.035502959
## Zinkwazi 0.209669142 0.000000000 0.09777641 0.135299625
## Blythedale Beach 0.099193548 0.097776406 0.00000000 -0.030628501
## Kwela -0.035502959 0.135299625 -0.03062850 0.000000000
## Anerley 0.000000000 0.434503361 0.24774775 0.109375000
## Letsitele 0.026282854 0.202975558 0.14371220 0.099875156
## Shamwari 0.125993972 0.267888255 0.26398699 0.239948224
## Amakhala 0.186934673 0.363450417 0.34677419 0.343491480
## Addo 0.000000000 0.229097857 0.20876289 0.124452235
## Gate 0.359645404 0.299693252 0.31638569 0.366388723
## Mukambi 0.414464534 0.271457511 0.28707613 0.325503356
## Lilayi 0.243027888 0.135773318 0.14606254 0.142677298
## Livingstone 0.248466258 0.067273054 0.07279849 0.114726972
## Anerley Letsitele Shamwari Amakhala Addo
## Unknown 0.5422886 0.004739336 0.12154392 0.25358186 0.13917526
## Kasane 0.5742574 -0.026455026 0.16879441 0.23968777 0.03658537
## Samburu 0.0000000 -0.014492754 0.17359098 0.33718779 0.00000000
## Mosiro 0.0000000 -0.268389662 0.05529078 0.20627063 0.00000000
## Naivasha 0.0000000 -0.222857143 0.10121816 0.24661811 0.00000000
## Kimana 0.0000000 -0.023771791 0.05286974 0.27639442 0.00000000
## Soetdoring 0.3589744 -0.082713478 0.07941206 0.15510897 -0.14407814
## Gariep 0.3725427 0.099833145 0.19047441 0.24328804 0.03253922
## Sandveld 0.2047681 0.086255699 0.20698003 0.30699214 0.06926407
## Parys 0.0000000 0.026282854 0.12599397 0.18693467 0.00000000
## Zinkwazi 0.4345034 0.202975558 0.26788825 0.36345042 0.22909786
## Blythedale Beach 0.2477477 0.143712198 0.26398699 0.34677419 0.20876289
## Kwela 0.1093750 0.099875156 0.23994822 0.34349148 0.12445223
## Anerley 0.0000000 0.483028721 0.48715384 0.61657459 0.00000000
## Letsitele 0.4830287 0.000000000 0.03485143 0.10354621 -0.44602851
## Shamwari 0.4871538 0.034851430 0.00000000 0.07033635 -0.18058175
## Amakhala 0.6165746 0.103546205 0.07033635 0.00000000 -0.40384615
## Addo 0.0000000 -0.446028513 -0.18058175 -0.40384615 0.00000000
## Gate 0.6708091 0.134426230 0.23089770 0.27753830 0.17513885
## Mukambi 0.7510994 0.105159705 0.19629928 0.29932465 0.31734317
## Lilayi 0.6097068 0.032967033 0.16943969 0.28133377 0.24737631
## Livingstone 0.4854253 0.153104592 0.23013891 0.31379822 0.20483461
## Gate Mukambi Lilayi Livingstone
## Unknown 0.18666278 0.18412885 0.16513761 0.20024121
## Kasane 0.02941176 0.03212940 0.13541667 0.20004750
## Samburu 0.47129477 0.57063712 0.48599269 0.29952644
## Mosiro 0.04292779 0.11235955 0.12778905 0.18493151
## Naivasha 0.18512720 0.28953229 0.22912966 0.18339416
## Kimana 0.31760317 0.36862745 0.38912134 0.26973149
## Soetdoring 0.13655311 0.09708738 0.03821298 0.11008019
## Gariep 0.17741067 0.16756013 0.05855905 0.04291477
## Sandveld 0.25771653 0.19958515 0.09138065 0.09101942
## Parys 0.35964540 0.41446453 0.24302789 0.24846626
## Zinkwazi 0.29969325 0.27145751 0.13577332 0.06727305
## Blythedale Beach 0.31638569 0.28707613 0.14606254 0.07279849
## Kwela 0.36638872 0.32550336 0.14267730 0.11472697
## Anerley 0.67080908 0.75109938 0.60970677 0.48542534
## Letsitele 0.13442623 0.10515971 0.03296703 0.15310459
## Shamwari 0.23089770 0.19629928 0.16943969 0.23013891
## Amakhala 0.27753830 0.29932465 0.28133377 0.31379822
## Addo 0.17513885 0.31734317 0.24737631 0.20483461
## Gate 0.00000000 0.02997076 0.18427835 0.24218832
## Mukambi 0.02997076 0.00000000 0.24470588 0.26570837
## Lilayi 0.18427835 0.24470588 0.00000000 0.14581125
## Livingstone 0.24218832 0.26570837 0.14581125 0.00000000
melted_fst <- melt(NPYFst) #melt the data to be used for a heatmap
melted_fst
## Var1 Var2 value
## 1 Unknown Unknown 0.000000000
## 2 Kasane Unknown 0.115044248
## 3 Samburu Unknown 0.263157895
## 4 Mosiro Unknown -0.075812274
## 5 Naivasha Unknown -0.052631579
## 6 Kimana Unknown 0.056250000
## 7 Soetdoring Unknown 0.009736330
## 8 Gariep Unknown 0.133840051
## 9 Sandveld Unknown 0.093729589
## 10 Parys Unknown 0.185840708
## 11 Zinkwazi Unknown 0.196373057
## 12 Blythedale Beach Unknown 0.194668391
## 13 Kwela Unknown 0.178505705
## 14 Anerley Unknown 0.542288557
## 15 Letsitele Unknown 0.004739336
## 16 Shamwari Unknown 0.121543920
## 17 Amakhala Unknown 0.253581856
## 18 Addo Unknown 0.139175258
## 19 Gate Unknown 0.186662784
## 20 Mukambi Unknown 0.184128848
## 21 Lilayi Unknown 0.165137615
## 22 Livingstone Unknown 0.200241209
## 23 Unknown Kasane 0.115044248
## 24 Kasane Kasane 0.000000000
## 25 Samburu Kasane 0.361111111
## 26 Mosiro Kasane -0.222707424
## 27 Naivasha Kasane -0.105882353
## 28 Kimana Kasane 0.144578313
## 29 Soetdoring Kasane 0.033611623
## 30 Gariep Kasane 0.081888429
## 31 Sandveld Kasane 0.164662147
## 32 Parys Kasane 0.146341463
## 33 Zinkwazi Kasane 0.190174039
## 34 Blythedale Beach Kasane 0.243608119
## 35 Kwela Kasane 0.248938178
## 36 Anerley Kasane 0.574257426
## 37 Letsitele Kasane -0.026455026
## 38 Shamwari Kasane 0.168794412
## 39 Amakhala Kasane 0.239687767
## 40 Addo Kasane 0.036585366
## 41 Gate Kasane 0.029411765
## 42 Mukambi Kasane 0.032129404
## 43 Lilayi Kasane 0.135416667
## 44 Livingstone Kasane 0.200047498
## 45 Unknown Samburu 0.263157895
## 46 Kasane Samburu 0.361111111
## 47 Samburu Samburu 0.000000000
## 48 Mosiro Samburu 0.000000000
## 49 Naivasha Samburu 0.000000000
## 50 Kimana Samburu 0.000000000
## 51 Soetdoring Samburu 0.089304258
## 52 Gariep Samburu 0.196413884
## 53 Sandveld Samburu 0.084194978
## 54 Parys Samburu 0.000000000
## 55 Zinkwazi Samburu 0.288836292
## 56 Blythedale Beach Samburu 0.267167382
## 57 Kwela Samburu 0.131359852
## 58 Anerley Samburu 0.000000000
## 59 Letsitele Samburu -0.014492754
## 60 Shamwari Samburu 0.173590982
## 61 Amakhala Samburu 0.337187789
## 62 Addo Samburu 0.000000000
## 63 Gate Samburu 0.471294766
## 64 Mukambi Samburu 0.570637119
## 65 Lilayi Samburu 0.485992692
## 66 Livingstone Samburu 0.299526440
## 67 Unknown Mosiro -0.075812274
## 68 Kasane Mosiro -0.222707424
## 69 Samburu Mosiro 0.000000000
## 70 Mosiro Mosiro 0.000000000
## 71 Naivasha Mosiro 0.000000000
## 72 Kimana Mosiro 0.000000000
## 73 Soetdoring Mosiro -0.142302717
## 74 Gariep Mosiro 0.001223990
## 75 Sandveld Mosiro 0.021226415
## 76 Parys Mosiro 0.000000000
## 77 Zinkwazi Mosiro 0.141394026
## 78 Blythedale Beach Mosiro 0.180334395
## 79 Kwela Mosiro 0.130634775
## 80 Anerley Mosiro 0.000000000
## 81 Letsitele Mosiro -0.268389662
## 82 Shamwari Mosiro 0.055290785
## 83 Amakhala Mosiro 0.206270627
## 84 Addo Mosiro 0.000000000
## 85 Gate Mosiro 0.042927794
## 86 Mukambi Mosiro 0.112359551
## 87 Lilayi Mosiro 0.127789047
## 88 Livingstone Mosiro 0.184931507
## 89 Unknown Naivasha -0.052631579
## 90 Kasane Naivasha -0.105882353
## 91 Samburu Naivasha 0.000000000
## 92 Mosiro Naivasha 0.000000000
## 93 Naivasha Naivasha 0.000000000
## 94 Kimana Naivasha 0.000000000
## 95 Soetdoring Naivasha -0.047281324
## 96 Gariep Naivasha 0.020972735
## 97 Sandveld Naivasha 0.033694810
## 98 Parys Naivasha 0.000000000
## 99 Zinkwazi Naivasha 0.163301060
## 100 Blythedale Beach Naivasha 0.196261682
## 101 Kwela Naivasha 0.115671642
## 102 Anerley Naivasha 0.000000000
## 103 Letsitele Naivasha -0.222857143
## 104 Shamwari Naivasha 0.101218162
## 105 Amakhala Naivasha 0.246618106
## 106 Addo Naivasha 0.000000000
## 107 Gate Naivasha 0.185127202
## 108 Mukambi Naivasha 0.289532294
## 109 Lilayi Naivasha 0.229129663
## 110 Livingstone Naivasha 0.183394161
## 111 Unknown Kimana 0.056250000
## 112 Kasane Kimana 0.144578313
## 113 Samburu Kimana 0.000000000
## 114 Mosiro Kimana 0.000000000
## 115 Naivasha Kimana 0.000000000
## 116 Kimana Kimana 0.000000000
## 117 Soetdoring Kimana -0.112640801
## 118 Gariep Kimana 0.107980367
## 119 Sandveld Kimana 0.007148531
## 120 Parys Kimana 0.000000000
## 121 Zinkwazi Kimana 0.189121712
## 122 Blythedale Beach Kimana 0.237656595
## 123 Kwela Kimana 0.180878553
## 124 Anerley Kimana 0.000000000
## 125 Letsitele Kimana -0.023771791
## 126 Shamwari Kimana 0.052869739
## 127 Amakhala Kimana 0.276394422
## 128 Addo Kimana 0.000000000
## 129 Gate Kimana 0.317603171
## 130 Mukambi Kimana 0.368627451
## 131 Lilayi Kimana 0.389121339
## 132 Livingstone Kimana 0.269731489
## 133 Unknown Soetdoring 0.009736330
## 134 Kasane Soetdoring 0.033611623
## 135 Samburu Soetdoring 0.089304258
## 136 Mosiro Soetdoring -0.142302717
## 137 Naivasha Soetdoring -0.047281324
## 138 Kimana Soetdoring -0.112640801
## 139 Soetdoring Soetdoring 0.000000000
## 140 Gariep Soetdoring 0.049820223
## 141 Sandveld Soetdoring -0.014263698
## 142 Parys Soetdoring 0.004032258
## 143 Zinkwazi Soetdoring 0.108393720
## 144 Blythedale Beach Soetdoring 0.109523105
## 145 Kwela Soetdoring 0.074468085
## 146 Anerley Soetdoring 0.358974359
## 147 Letsitele Soetdoring -0.082713478
## 148 Shamwari Soetdoring 0.079412065
## 149 Amakhala Soetdoring 0.155108973
## 150 Addo Soetdoring -0.144078144
## 151 Gate Soetdoring 0.136553112
## 152 Mukambi Soetdoring 0.097087379
## 153 Lilayi Soetdoring 0.038212981
## 154 Livingstone Soetdoring 0.110080186
## 155 Unknown Gariep 0.133840051
## 156 Kasane Gariep 0.081888429
## 157 Samburu Gariep 0.196413884
## 158 Mosiro Gariep 0.001223990
## 159 Naivasha Gariep 0.020972735
## 160 Kimana Gariep 0.107980367
## 161 Soetdoring Gariep 0.049820223
## 162 Gariep Gariep 0.000000000
## 163 Sandveld Gariep 0.084685104
## 164 Parys Gariep 0.087673689
## 165 Zinkwazi Gariep 0.006833853
## 166 Blythedale Beach Gariep 0.110862011
## 167 Kwela Gariep 0.117444427
## 168 Anerley Gariep 0.372542659
## 169 Letsitele Gariep 0.099833145
## 170 Shamwari Gariep 0.190474410
## 171 Amakhala Gariep 0.243288045
## 172 Addo Gariep 0.032539221
## 173 Gate Gariep 0.177410667
## 174 Mukambi Gariep 0.167560133
## 175 Lilayi Gariep 0.058559051
## 176 Livingstone Gariep 0.042914770
## 177 Unknown Sandveld 0.093729589
## 178 Kasane Sandveld 0.164662147
## 179 Samburu Sandveld 0.084194978
## 180 Mosiro Sandveld 0.021226415
## 181 Naivasha Sandveld 0.033694810
## 182 Kimana Sandveld 0.007148531
## 183 Soetdoring Sandveld -0.014263698
## 184 Gariep Sandveld 0.084685104
## 185 Sandveld Sandveld 0.000000000
## 186 Parys Sandveld -0.003388554
## 187 Zinkwazi Sandveld 0.049092518
## 188 Blythedale Beach Sandveld 0.052600473
## 189 Kwela Sandveld -0.006312603
## 190 Anerley Sandveld 0.204768125
## 191 Letsitele Sandveld 0.086255699
## 192 Shamwari Sandveld 0.206980034
## 193 Amakhala Sandveld 0.306992139
## 194 Addo Sandveld 0.069264069
## 195 Gate Sandveld 0.257716535
## 196 Mukambi Sandveld 0.199585147
## 197 Lilayi Sandveld 0.091380652
## 198 Livingstone Sandveld 0.091019417
## 199 Unknown Parys 0.185840708
## 200 Kasane Parys 0.146341463
## 201 Samburu Parys 0.000000000
## 202 Mosiro Parys 0.000000000
## 203 Naivasha Parys 0.000000000
## 204 Kimana Parys 0.000000000
## 205 Soetdoring Parys 0.004032258
## 206 Gariep Parys 0.087673689
## 207 Sandveld Parys -0.003388554
## 208 Parys Parys 0.000000000
## 209 Zinkwazi Parys 0.209669142
## 210 Blythedale Beach Parys 0.099193548
## 211 Kwela Parys -0.035502959
## 212 Anerley Parys 0.000000000
## 213 Letsitele Parys 0.026282854
## 214 Shamwari Parys 0.125993972
## 215 Amakhala Parys 0.186934673
## 216 Addo Parys 0.000000000
## 217 Gate Parys 0.359645404
## 218 Mukambi Parys 0.414464534
## 219 Lilayi Parys 0.243027888
## 220 Livingstone Parys 0.248466258
## 221 Unknown Zinkwazi 0.196373057
## 222 Kasane Zinkwazi 0.190174039
## 223 Samburu Zinkwazi 0.288836292
## 224 Mosiro Zinkwazi 0.141394026
## 225 Naivasha Zinkwazi 0.163301060
## 226 Kimana Zinkwazi 0.189121712
## 227 Soetdoring Zinkwazi 0.108393720
## 228 Gariep Zinkwazi 0.006833853
## 229 Sandveld Zinkwazi 0.049092518
## 230 Parys Zinkwazi 0.209669142
## 231 Zinkwazi Zinkwazi 0.000000000
## 232 Blythedale Beach Zinkwazi 0.097776406
## 233 Kwela Zinkwazi 0.135299625
## 234 Anerley Zinkwazi 0.434503361
## 235 Letsitele Zinkwazi 0.202975558
## 236 Shamwari Zinkwazi 0.267888255
## 237 Amakhala Zinkwazi 0.363450417
## 238 Addo Zinkwazi 0.229097857
## 239 Gate Zinkwazi 0.299693252
## 240 Mukambi Zinkwazi 0.271457511
## 241 Lilayi Zinkwazi 0.135773318
## 242 Livingstone Zinkwazi 0.067273054
## 243 Unknown Blythedale Beach 0.194668391
## 244 Kasane Blythedale Beach 0.243608119
## 245 Samburu Blythedale Beach 0.267167382
## 246 Mosiro Blythedale Beach 0.180334395
## 247 Naivasha Blythedale Beach 0.196261682
## 248 Kimana Blythedale Beach 0.237656595
## 249 Soetdoring Blythedale Beach 0.109523105
## 250 Gariep Blythedale Beach 0.110862011
## 251 Sandveld Blythedale Beach 0.052600473
## 252 Parys Blythedale Beach 0.099193548
## 253 Zinkwazi Blythedale Beach 0.097776406
## 254 Blythedale Beach Blythedale Beach 0.000000000
## 255 Kwela Blythedale Beach -0.030628501
## 256 Anerley Blythedale Beach 0.247747748
## 257 Letsitele Blythedale Beach 0.143712198
## 258 Shamwari Blythedale Beach 0.263986992
## 259 Amakhala Blythedale Beach 0.346774194
## 260 Addo Blythedale Beach 0.208762887
## 261 Gate Blythedale Beach 0.316385688
## 262 Mukambi Blythedale Beach 0.287076127
## 263 Lilayi Blythedale Beach 0.146062541
## 264 Livingstone Blythedale Beach 0.072798487
## 265 Unknown Kwela 0.178505705
## 266 Kasane Kwela 0.248938178
## 267 Samburu Kwela 0.131359852
## 268 Mosiro Kwela 0.130634775
## 269 Naivasha Kwela 0.115671642
## 270 Kimana Kwela 0.180878553
## 271 Soetdoring Kwela 0.074468085
## 272 Gariep Kwela 0.117444427
## 273 Sandveld Kwela -0.006312603
## 274 Parys Kwela -0.035502959
## 275 Zinkwazi Kwela 0.135299625
## 276 Blythedale Beach Kwela -0.030628501
## 277 Kwela Kwela 0.000000000
## 278 Anerley Kwela 0.109375000
## 279 Letsitele Kwela 0.099875156
## 280 Shamwari Kwela 0.239948224
## 281 Amakhala Kwela 0.343491480
## 282 Addo Kwela 0.124452235
## 283 Gate Kwela 0.366388723
## 284 Mukambi Kwela 0.325503356
## 285 Lilayi Kwela 0.142677298
## 286 Livingstone Kwela 0.114726972
## 287 Unknown Anerley 0.542288557
## 288 Kasane Anerley 0.574257426
## 289 Samburu Anerley 0.000000000
## 290 Mosiro Anerley 0.000000000
## 291 Naivasha Anerley 0.000000000
## 292 Kimana Anerley 0.000000000
## 293 Soetdoring Anerley 0.358974359
## 294 Gariep Anerley 0.372542659
## 295 Sandveld Anerley 0.204768125
## 296 Parys Anerley 0.000000000
## 297 Zinkwazi Anerley 0.434503361
## 298 Blythedale Beach Anerley 0.247747748
## 299 Kwela Anerley 0.109375000
## 300 Anerley Anerley 0.000000000
## 301 Letsitele Anerley 0.483028721
## 302 Shamwari Anerley 0.487153839
## 303 Amakhala Anerley 0.616574586
## 304 Addo Anerley 0.000000000
## 305 Gate Anerley 0.670809077
## 306 Mukambi Anerley 0.751099384
## 307 Lilayi Anerley 0.609706775
## 308 Livingstone Anerley 0.485425342
## 309 Unknown Letsitele 0.004739336
## 310 Kasane Letsitele -0.026455026
## 311 Samburu Letsitele -0.014492754
## 312 Mosiro Letsitele -0.268389662
## 313 Naivasha Letsitele -0.222857143
## 314 Kimana Letsitele -0.023771791
## 315 Soetdoring Letsitele -0.082713478
## 316 Gariep Letsitele 0.099833145
## 317 Sandveld Letsitele 0.086255699
## 318 Parys Letsitele 0.026282854
## 319 Zinkwazi Letsitele 0.202975558
## 320 Blythedale Beach Letsitele 0.143712198
## 321 Kwela Letsitele 0.099875156
## 322 Anerley Letsitele 0.483028721
## 323 Letsitele Letsitele 0.000000000
## 324 Shamwari Letsitele 0.034851430
## 325 Amakhala Letsitele 0.103546205
## 326 Addo Letsitele -0.446028513
## 327 Gate Letsitele 0.134426230
## 328 Mukambi Letsitele 0.105159705
## 329 Lilayi Letsitele 0.032967033
## 330 Livingstone Letsitele 0.153104592
## 331 Unknown Shamwari 0.121543920
## 332 Kasane Shamwari 0.168794412
## 333 Samburu Shamwari 0.173590982
## 334 Mosiro Shamwari 0.055290785
## 335 Naivasha Shamwari 0.101218162
## 336 Kimana Shamwari 0.052869739
## 337 Soetdoring Shamwari 0.079412065
## 338 Gariep Shamwari 0.190474410
## 339 Sandveld Shamwari 0.206980034
## 340 Parys Shamwari 0.125993972
## 341 Zinkwazi Shamwari 0.267888255
## 342 Blythedale Beach Shamwari 0.263986992
## 343 Kwela Shamwari 0.239948224
## 344 Anerley Shamwari 0.487153839
## 345 Letsitele Shamwari 0.034851430
## 346 Shamwari Shamwari 0.000000000
## 347 Amakhala Shamwari 0.070336354
## 348 Addo Shamwari -0.180581745
## 349 Gate Shamwari 0.230897704
## 350 Mukambi Shamwari 0.196299276
## 351 Lilayi Shamwari 0.169439687
## 352 Livingstone Shamwari 0.230138915
## 353 Unknown Amakhala 0.253581856
## 354 Kasane Amakhala 0.239687767
## 355 Samburu Amakhala 0.337187789
## 356 Mosiro Amakhala 0.206270627
## 357 Naivasha Amakhala 0.246618106
## 358 Kimana Amakhala 0.276394422
## 359 Soetdoring Amakhala 0.155108973
## 360 Gariep Amakhala 0.243288045
## 361 Sandveld Amakhala 0.306992139
## 362 Parys Amakhala 0.186934673
## 363 Zinkwazi Amakhala 0.363450417
## 364 Blythedale Beach Amakhala 0.346774194
## 365 Kwela Amakhala 0.343491480
## 366 Anerley Amakhala 0.616574586
## 367 Letsitele Amakhala 0.103546205
## 368 Shamwari Amakhala 0.070336354
## 369 Amakhala Amakhala 0.000000000
## 370 Addo Amakhala -0.403846154
## 371 Gate Amakhala 0.277538297
## 372 Mukambi Amakhala 0.299324654
## 373 Lilayi Amakhala 0.281333767
## 374 Livingstone Amakhala 0.313798220
## 375 Unknown Addo 0.139175258
## 376 Kasane Addo 0.036585366
## 377 Samburu Addo 0.000000000
## 378 Mosiro Addo 0.000000000
## 379 Naivasha Addo 0.000000000
## 380 Kimana Addo 0.000000000
## 381 Soetdoring Addo -0.144078144
## 382 Gariep Addo 0.032539221
## 383 Sandveld Addo 0.069264069
## 384 Parys Addo 0.000000000
## 385 Zinkwazi Addo 0.229097857
## 386 Blythedale Beach Addo 0.208762887
## 387 Kwela Addo 0.124452235
## 388 Anerley Addo 0.000000000
## 389 Letsitele Addo -0.446028513
## 390 Shamwari Addo -0.180581745
## 391 Amakhala Addo -0.403846154
## 392 Addo Addo 0.000000000
## 393 Gate Addo 0.175138852
## 394 Mukambi Addo 0.317343173
## 395 Lilayi Addo 0.247376312
## 396 Livingstone Addo 0.204834606
## 397 Unknown Gate 0.186662784
## 398 Kasane Gate 0.029411765
## 399 Samburu Gate 0.471294766
## 400 Mosiro Gate 0.042927794
## 401 Naivasha Gate 0.185127202
## 402 Kimana Gate 0.317603171
## 403 Soetdoring Gate 0.136553112
## 404 Gariep Gate 0.177410667
## 405 Sandveld Gate 0.257716535
## 406 Parys Gate 0.359645404
## 407 Zinkwazi Gate 0.299693252
## 408 Blythedale Beach Gate 0.316385688
## 409 Kwela Gate 0.366388723
## 410 Anerley Gate 0.670809077
## 411 Letsitele Gate 0.134426230
## 412 Shamwari Gate 0.230897704
## 413 Amakhala Gate 0.277538297
## 414 Addo Gate 0.175138852
## 415 Gate Gate 0.000000000
## 416 Mukambi Gate 0.029970760
## 417 Lilayi Gate 0.184278351
## 418 Livingstone Gate 0.242188319
## 419 Unknown Mukambi 0.184128848
## 420 Kasane Mukambi 0.032129404
## 421 Samburu Mukambi 0.570637119
## 422 Mosiro Mukambi 0.112359551
## 423 Naivasha Mukambi 0.289532294
## 424 Kimana Mukambi 0.368627451
## 425 Soetdoring Mukambi 0.097087379
## 426 Gariep Mukambi 0.167560133
## 427 Sandveld Mukambi 0.199585147
## 428 Parys Mukambi 0.414464534
## 429 Zinkwazi Mukambi 0.271457511
## 430 Blythedale Beach Mukambi 0.287076127
## 431 Kwela Mukambi 0.325503356
## 432 Anerley Mukambi 0.751099384
## 433 Letsitele Mukambi 0.105159705
## 434 Shamwari Mukambi 0.196299276
## 435 Amakhala Mukambi 0.299324654
## 436 Addo Mukambi 0.317343173
## 437 Gate Mukambi 0.029970760
## 438 Mukambi Mukambi 0.000000000
## 439 Lilayi Mukambi 0.244705882
## 440 Livingstone Mukambi 0.265708368
## 441 Unknown Lilayi 0.165137615
## 442 Kasane Lilayi 0.135416667
## 443 Samburu Lilayi 0.485992692
## 444 Mosiro Lilayi 0.127789047
## 445 Naivasha Lilayi 0.229129663
## 446 Kimana Lilayi 0.389121339
## 447 Soetdoring Lilayi 0.038212981
## 448 Gariep Lilayi 0.058559051
## 449 Sandveld Lilayi 0.091380652
## 450 Parys Lilayi 0.243027888
## 451 Zinkwazi Lilayi 0.135773318
## 452 Blythedale Beach Lilayi 0.146062541
## 453 Kwela Lilayi 0.142677298
## 454 Anerley Lilayi 0.609706775
## 455 Letsitele Lilayi 0.032967033
## 456 Shamwari Lilayi 0.169439687
## 457 Amakhala Lilayi 0.281333767
## 458 Addo Lilayi 0.247376312
## 459 Gate Lilayi 0.184278351
## 460 Mukambi Lilayi 0.244705882
## 461 Lilayi Lilayi 0.000000000
## 462 Livingstone Lilayi 0.145811252
## 463 Unknown Livingstone 0.200241209
## 464 Kasane Livingstone 0.200047498
## 465 Samburu Livingstone 0.299526440
## 466 Mosiro Livingstone 0.184931507
## 467 Naivasha Livingstone 0.183394161
## 468 Kimana Livingstone 0.269731489
## 469 Soetdoring Livingstone 0.110080186
## 470 Gariep Livingstone 0.042914770
## 471 Sandveld Livingstone 0.091019417
## 472 Parys Livingstone 0.248466258
## 473 Zinkwazi Livingstone 0.067273054
## 474 Blythedale Beach Livingstone 0.072798487
## 475 Kwela Livingstone 0.114726972
## 476 Anerley Livingstone 0.485425342
## 477 Letsitele Livingstone 0.153104592
## 478 Shamwari Livingstone 0.230138915
## 479 Amakhala Livingstone 0.313798220
## 480 Addo Livingstone 0.204834606
## 481 Gate Livingstone 0.242188319
## 482 Mukambi Livingstone 0.265708368
## 483 Lilayi Livingstone 0.145811252
## 484 Livingstone Livingstone 0.000000000
library(ggplot2)
ggplot(data = melted_fst, aes(x=Var1, y=Var2, fill=value)) + ylab(NULL) + xlab(NULL) + coord_fixed() +
theme(axis.text=element_text(angle=45,vjust=0.5)) + geom_tile()
#this looks okay, but we'd rather not have the redundance of the same values being shown twice. So we will just use the lower half of the matrix:
get_lower_tri<-function(NPYFst){
NPYFst[lower.tri(NPYFst)] <- NA
return(NPYFst)
}
lower_tri <- get_lower_tri(NPYFst) #Fill half of the repeated values with "NA's"
lower_tri
## Unknown Kasane Samburu Mosiro Naivasha Kimana
## Unknown 0 0.1150442 0.2631579 -0.07581227 -0.05263158 0.0562500
## Kasane NA 0.0000000 0.3611111 -0.22270742 -0.10588235 0.1445783
## Samburu NA NA 0.0000000 0.00000000 0.00000000 0.0000000
## Mosiro NA NA NA 0.00000000 0.00000000 0.0000000
## Naivasha NA NA NA NA 0.00000000 0.0000000
## Kimana NA NA NA NA NA 0.0000000
## Soetdoring NA NA NA NA NA NA
## Gariep NA NA NA NA NA NA
## Sandveld NA NA NA NA NA NA
## Parys NA NA NA NA NA NA
## Zinkwazi NA NA NA NA NA NA
## Blythedale Beach NA NA NA NA NA NA
## Kwela NA NA NA NA NA NA
## Anerley NA NA NA NA NA NA
## Letsitele NA NA NA NA NA NA
## Shamwari NA NA NA NA NA NA
## Amakhala NA NA NA NA NA NA
## Addo NA NA NA NA NA NA
## Gate NA NA NA NA NA NA
## Mukambi NA NA NA NA NA NA
## Lilayi NA NA NA NA NA NA
## Livingstone NA NA NA NA NA NA
## Soetdoring Gariep Sandveld Parys Zinkwazi
## Unknown 0.00973633 0.13384005 0.093729589 0.185840708 0.196373057
## Kasane 0.03361162 0.08188843 0.164662147 0.146341463 0.190174039
## Samburu 0.08930426 0.19641388 0.084194978 0.000000000 0.288836292
## Mosiro -0.14230272 0.00122399 0.021226415 0.000000000 0.141394026
## Naivasha -0.04728132 0.02097274 0.033694810 0.000000000 0.163301060
## Kimana -0.11264080 0.10798037 0.007148531 0.000000000 0.189121712
## Soetdoring 0.00000000 0.04982022 -0.014263698 0.004032258 0.108393720
## Gariep NA 0.00000000 0.084685104 0.087673689 0.006833853
## Sandveld NA NA 0.000000000 -0.003388554 0.049092518
## Parys NA NA NA 0.000000000 0.209669142
## Zinkwazi NA NA NA NA 0.000000000
## Blythedale Beach NA NA NA NA NA
## Kwela NA NA NA NA NA
## Anerley NA NA NA NA NA
## Letsitele NA NA NA NA NA
## Shamwari NA NA NA NA NA
## Amakhala NA NA NA NA NA
## Addo NA NA NA NA NA
## Gate NA NA NA NA NA
## Mukambi NA NA NA NA NA
## Lilayi NA NA NA NA NA
## Livingstone NA NA NA NA NA
## Blythedale Beach Kwela Anerley Letsitele
## Unknown 0.19466839 0.178505705 0.5422886 0.004739336
## Kasane 0.24360812 0.248938178 0.5742574 -0.026455026
## Samburu 0.26716738 0.131359852 0.0000000 -0.014492754
## Mosiro 0.18033439 0.130634775 0.0000000 -0.268389662
## Naivasha 0.19626168 0.115671642 0.0000000 -0.222857143
## Kimana 0.23765660 0.180878553 0.0000000 -0.023771791
## Soetdoring 0.10952311 0.074468085 0.3589744 -0.082713478
## Gariep 0.11086201 0.117444427 0.3725427 0.099833145
## Sandveld 0.05260047 -0.006312603 0.2047681 0.086255699
## Parys 0.09919355 -0.035502959 0.0000000 0.026282854
## Zinkwazi 0.09777641 0.135299625 0.4345034 0.202975558
## Blythedale Beach 0.00000000 -0.030628501 0.2477477 0.143712198
## Kwela NA 0.000000000 0.1093750 0.099875156
## Anerley NA NA 0.0000000 0.483028721
## Letsitele NA NA NA 0.000000000
## Shamwari NA NA NA NA
## Amakhala NA NA NA NA
## Addo NA NA NA NA
## Gate NA NA NA NA
## Mukambi NA NA NA NA
## Lilayi NA NA NA NA
## Livingstone NA NA NA NA
## Shamwari Amakhala Addo Gate Mukambi
## Unknown 0.12154392 0.25358186 0.13917526 0.18666278 0.18412885
## Kasane 0.16879441 0.23968777 0.03658537 0.02941176 0.03212940
## Samburu 0.17359098 0.33718779 0.00000000 0.47129477 0.57063712
## Mosiro 0.05529078 0.20627063 0.00000000 0.04292779 0.11235955
## Naivasha 0.10121816 0.24661811 0.00000000 0.18512720 0.28953229
## Kimana 0.05286974 0.27639442 0.00000000 0.31760317 0.36862745
## Soetdoring 0.07941206 0.15510897 -0.14407814 0.13655311 0.09708738
## Gariep 0.19047441 0.24328804 0.03253922 0.17741067 0.16756013
## Sandveld 0.20698003 0.30699214 0.06926407 0.25771653 0.19958515
## Parys 0.12599397 0.18693467 0.00000000 0.35964540 0.41446453
## Zinkwazi 0.26788825 0.36345042 0.22909786 0.29969325 0.27145751
## Blythedale Beach 0.26398699 0.34677419 0.20876289 0.31638569 0.28707613
## Kwela 0.23994822 0.34349148 0.12445223 0.36638872 0.32550336
## Anerley 0.48715384 0.61657459 0.00000000 0.67080908 0.75109938
## Letsitele 0.03485143 0.10354621 -0.44602851 0.13442623 0.10515971
## Shamwari 0.00000000 0.07033635 -0.18058175 0.23089770 0.19629928
## Amakhala NA 0.00000000 -0.40384615 0.27753830 0.29932465
## Addo NA NA 0.00000000 0.17513885 0.31734317
## Gate NA NA NA 0.00000000 0.02997076
## Mukambi NA NA NA NA 0.00000000
## Lilayi NA NA NA NA NA
## Livingstone NA NA NA NA NA
## Lilayi Livingstone
## Unknown 0.16513761 0.20024121
## Kasane 0.13541667 0.20004750
## Samburu 0.48599269 0.29952644
## Mosiro 0.12778905 0.18493151
## Naivasha 0.22912966 0.18339416
## Kimana 0.38912134 0.26973149
## Soetdoring 0.03821298 0.11008019
## Gariep 0.05855905 0.04291477
## Sandveld 0.09138065 0.09101942
## Parys 0.24302789 0.24846626
## Zinkwazi 0.13577332 0.06727305
## Blythedale Beach 0.14606254 0.07279849
## Kwela 0.14267730 0.11472697
## Anerley 0.60970677 0.48542534
## Letsitele 0.03296703 0.15310459
## Shamwari 0.16943969 0.23013891
## Amakhala 0.28133377 0.31379822
## Addo 0.24737631 0.20483461
## Gate 0.18427835 0.24218832
## Mukambi 0.24470588 0.26570837
## Lilayi 0.00000000 0.14581125
## Livingstone NA 0.00000000
melted_NPY <- melt(lower_tri, na.rm = TRUE) #melt this half of the matrix
ggplot(data = melted_NPY, aes(x=Var1, y=Var2, fill=value)) + theme_minimal() + ylab(NULL) + xlab(NULL) + coord_fixed() + theme(axis.text=element_text(angle=45, vjust=0.5)) + geom_tile(color = "white")
FST matrices and heatmaps are another way of statistically showing us genetic differentiation within NPY between our populations. Our pairwise FST matrix and heatmap helpfully shows us pairwise levels of differentiation between our populations. For example, Mukambi and Anerley populations seem to have very few differences in their NPY regions, while the Addo and Letsitele populations are among the most different (note that these two populations are also the furthest apart in our DAPC/PCA plots).
In order to run analyses similar to structure, we must use this code:
install.packages(c("fields","RColorBrewer","mapplots"),repos = "http://cran.us.r-project.org")
## Installing packages into '/usr4/ugrad/masiedu/R/x86_64-pc-linux-gnu-library/4.1'
## (as 'lib' is unspecified)
BiocManager::install("LEA")
## Bioconductor version 3.14 (BiocManager 1.30.16), R 4.1.2 (2021-11-01)
## Warning: package(s) not installed when version(s) same as current; use `force = TRUE` to
## re-install: 'LEA'
## Installation paths not writeable, unable to update packages
## path: /share/pkg.7/r/4.1.2/install/lib64/R/library
## packages:
## actuar, AdaptFitOS, ade4, admisc, affxparser, affy, affyio, affyPLM, akima,
## annotate, AnnotationFilter, AnnotationForge, AnnotationHub, antiword, ape,
## apeglm, aplot, argparse, arrow, ashr, assertive.properties, babelgene,
## batchelor, bayesplot, BBmisc, BDgraph, beachmat, BHC, bigleaf, bigmemory,
## bigsnpr, bigsparser, bigstatsr, Biobase, BiocFileCache, BiocGenerics,
## BiocIO, BiocNeighbors, BiocParallel, BiocSingular, biomaRt, Biostrings,
## biovizBase, blob, bluster, BMA, bnlearn, brew, Brobdingnag, broom,
## BSgenome, C50, Cairo, car, caret, Category, CDM, checkmate, circlize, cli,
## clipr, cluster, commonmark, ComplexHeatmap, conquer, crayon, cubature,
## Cubist, cytolib, DelayedArray, DelayedMatrixStats, densEstBayes,
## densityClust, DEoptimR, desc, DESeq2, deSolve, DiffBind, DNAcopy,
## doParallel, dplyr, DropletUtils, DT, edgeR, ensembldb, EpiModel, evaluate,
## evd, expint, extRemes, fansi, fastGHQuad, ff, fftw, fishpond, fitdistrplus,
## flexclust, float, foreach, formatR, future, future.apply, gam, gamlss,
## gamlss.dist, gcrma, gdsfmt, gdtools, gee, genefilter, geneplotter, GENESIS,
## GenomeInfoDb, GenomeInfoDbData, GenomicAlignments, GenomicRanges,
## geojsonsf, geometry, gert, ggdendro, ggfun, ggplot2, git2r, glmmML,
## glmmTMB, glmnet, glue, gmp, GO.db, GOstats, gower, gplots, graph,
## GreyListChIP, GSEABase, gstat, GSVA, GWASTools, haven, HDF5Array, Hmisc,
## httr, hwriter, igraph, impute, infotheo, IRanges, iterators, jomo,
## jsonlite, KEGGREST, Kendall, kernlab, knitr, ks, lars, lavaan, leiden, lfe,
## lhs, limma, linprog, lme4, LMest, lmtest, locfit, logspline, loo,
## lpsymphony, magic, magrittr, maptools, MASS, MassSpecWavelet, Matching,
## mathjaxr, MatrixGenerics, matrixStats, mbkmeans, MCMCpack, metap, metapod,
## mgcv, minpack.lm, mlapi, MLEcens, mlegp, msigdbr, multtest, networkDynamic,
## nimble, nlme, nloptr, officer, oligo, oligoClasses, openair, OpenMx,
## openssl, pander, parallelly, parsedate, party, pbdZMQ, pcaMethods, pcaPP,
## pdftools, pdist, penalized, peperr, plyr, polspline, polynom,
## preprocessCore, processx, ProtGenerics, ps, pspline, psych, qgraph,
## quanteda, quantmod, quantreg, quantsmooth, qvalue, ragg, RandomFieldsUtils,
## randomForest, RBGL, rbibutils, RColorBrewer, Rcpp, RcppArmadillo,
## RcppEigen, RcppGSL, RCurl, Rdpack, readxl, recipes, REddyProc,
## ResidualMatrix, Rfast, rgdal, rgenoud, Rgraphviz, rhdf5, rhdf5filters,
## Rhdf5lib, Rhtslib, RInside, rjags, rlang, rmarkdown, rmio, rms, robustbase,
## rprojroot, RProtoBufLib, rrcov, Rsamtools, RSpectra, RSQLite, rstan,
## rstanarm, rstantools, Rsubread, rsvg, rtracklayer, Rtsne, S4Vectors, sass,
## ScaledMatrix, scales, scater, scattermore, scran, scuttle, segmented,
## SeqArray, seqminer, SeqVarTools, seriation, sets, Seurat, SeuratObject, sf,
## sfsmisc, shinystan, ShortRead, SHT, SingleCellExperiment, SingleR, SKAT,
## sn, SNPRelate, sp, spaMM, sparseMatrixStats, spatstat.core, spatstat.data,
## spatstat.geom, spatstat.linnet, spatstat.sparse, spatstat.utils, spikeslab,
## splancs, statnet.common, subplex, SummarizedExperiment, survival, sva,
## svglite, systemfonts, systemPipeR, tclust, testthat, text2vec, TH.data,
## tibble, tidygraph, tidyselect, tidytree, tinytex, TMB, tmvtnorm, tseries,
## TSP, tzdb, udunits2, umap, units, unmarked, uuid, V8, VariantAnnotation,
## vctrs, vegan, VGAM, waldo, webshot, withr, xfun, xgboost, XML, XVector,
## yaml, zlibbioc
## Old packages: 'BiocManager', 'LDheatmap'
library(LEA)
##
## Attaching package: 'LEA'
## The following object is masked from 'package:lattice':
##
## barchart
seNPYgeno<-vcf2geno("seNPY.vcf")
##
## - number of detected individuals: 73
## - number of detected loci: 526
##
## For SNP info, please check ./seNPY.vcfsnp.
##
## 0 line(s) were removed because these are not SNPs.
## Please, check ./seNPY.removed file, for more informations.
seNPY.snmf = snmf(seNPYgeno, K = 1:8, ploidy = 2, entropy = T,
alpha = 100, project = "new")
## The project is saved into :
## seNPY.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY.snmfProject")
##
## [1] 1197416926
## [1] "*************************************"
## [1] "* create.dataset *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 526
## -s (seed random init) 1197416926
## -r (percentage of masked data) 0.05
## -x (genotype file in .geno format) /projectnb2/vervpop/masiedu25/seNPY.geno
## -o (output file in .geno format) /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
##
## Write genotype file with masked data, /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno: OK.
##
## [1] "*************************************"
## [1] "* sNMF K = 1 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 526
## -K (number of ancestral pops) 1
## -x (input file) /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
## -q (individual admixture file) /projectnb2/vervpop/masiedu25/seNPY.snmf/K1/run1/seNPY_r1.1.Q
## -g (ancestral frequencies file) /projectnb2/vervpop/masiedu25/seNPY.snmf/K1/run1/seNPY_r1.1.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 1197416926
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno: OK.
##
##
## Main algorithm:
##
## Least-square error: 4211.151346
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K1/run1/seNPY_r1.1.Q: OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K1/run1/seNPY_r1.1.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 526
## -K (number of ancestral pops) 1
## -x (genotype file) /projectnb2/vervpop/masiedu25/seNPY.geno
## -q (individual admixture) /projectnb2/vervpop/masiedu25/seNPY.snmf/K1/run1/seNPY_r1.1.Q
## -g (ancestral frequencies) /projectnb2/vervpop/masiedu25/seNPY.snmf/K1/run1/seNPY_r1.1.G
## -i (with masked genotypes) /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.200906
## Cross-Entropy (masked data): 0.246183
## The project is saved into :
## seNPY.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 2 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 526
## -K (number of ancestral pops) 2
## -x (input file) /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
## -q (individual admixture file) /projectnb2/vervpop/masiedu25/seNPY.snmf/K2/run1/seNPY_r1.2.Q
## -g (ancestral frequencies file) /projectnb2/vervpop/masiedu25/seNPY.snmf/K2/run1/seNPY_r1.2.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 1197416926
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [=============]
## Number of iterations: 35
##
## Least-square error: 3703.736023
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K2/run1/seNPY_r1.2.Q: OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K2/run1/seNPY_r1.2.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 526
## -K (number of ancestral pops) 2
## -x (genotype file) /projectnb2/vervpop/masiedu25/seNPY.geno
## -q (individual admixture) /projectnb2/vervpop/masiedu25/seNPY.snmf/K2/run1/seNPY_r1.2.Q
## -g (ancestral frequencies) /projectnb2/vervpop/masiedu25/seNPY.snmf/K2/run1/seNPY_r1.2.G
## -i (with masked genotypes) /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.169994
## Cross-Entropy (masked data): 0.219773
## The project is saved into :
## seNPY.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 3 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 526
## -K (number of ancestral pops) 3
## -x (input file) /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
## -q (individual admixture file) /projectnb2/vervpop/masiedu25/seNPY.snmf/K3/run1/seNPY_r1.3.Q
## -g (ancestral frequencies file) /projectnb2/vervpop/masiedu25/seNPY.snmf/K3/run1/seNPY_r1.3.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 1197416926
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [======]
## Number of iterations: 16
##
## Least-square error: 3328.029911
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K3/run1/seNPY_r1.3.Q: OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K3/run1/seNPY_r1.3.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 526
## -K (number of ancestral pops) 3
## -x (genotype file) /projectnb2/vervpop/masiedu25/seNPY.geno
## -q (individual admixture) /projectnb2/vervpop/masiedu25/seNPY.snmf/K3/run1/seNPY_r1.3.Q
## -g (ancestral frequencies) /projectnb2/vervpop/masiedu25/seNPY.snmf/K3/run1/seNPY_r1.3.G
## -i (with masked genotypes) /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.15091
## Cross-Entropy (masked data): 0.201915
## The project is saved into :
## seNPY.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 4 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 526
## -K (number of ancestral pops) 4
## -x (input file) /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
## -q (individual admixture file) /projectnb2/vervpop/masiedu25/seNPY.snmf/K4/run1/seNPY_r1.4.Q
## -g (ancestral frequencies file) /projectnb2/vervpop/masiedu25/seNPY.snmf/K4/run1/seNPY_r1.4.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 1197416926
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [========]
## Number of iterations: 22
##
## Least-square error: 3104.167940
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K4/run1/seNPY_r1.4.Q: OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K4/run1/seNPY_r1.4.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 526
## -K (number of ancestral pops) 4
## -x (genotype file) /projectnb2/vervpop/masiedu25/seNPY.geno
## -q (individual admixture) /projectnb2/vervpop/masiedu25/seNPY.snmf/K4/run1/seNPY_r1.4.Q
## -g (ancestral frequencies) /projectnb2/vervpop/masiedu25/seNPY.snmf/K4/run1/seNPY_r1.4.G
## -i (with masked genotypes) /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.139231
## Cross-Entropy (masked data): 0.185925
## The project is saved into :
## seNPY.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 5 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 526
## -K (number of ancestral pops) 5
## -x (input file) /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
## -q (individual admixture file) /projectnb2/vervpop/masiedu25/seNPY.snmf/K5/run1/seNPY_r1.5.Q
## -g (ancestral frequencies file) /projectnb2/vervpop/masiedu25/seNPY.snmf/K5/run1/seNPY_r1.5.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 9787351518
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [========]
## Number of iterations: 22
##
## Least-square error: 2961.967697
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K5/run1/seNPY_r1.5.Q: OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K5/run1/seNPY_r1.5.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 526
## -K (number of ancestral pops) 5
## -x (genotype file) /projectnb2/vervpop/masiedu25/seNPY.geno
## -q (individual admixture) /projectnb2/vervpop/masiedu25/seNPY.snmf/K5/run1/seNPY_r1.5.Q
## -g (ancestral frequencies) /projectnb2/vervpop/masiedu25/seNPY.snmf/K5/run1/seNPY_r1.5.G
## -i (with masked genotypes) /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.132443
## Cross-Entropy (masked data): 0.185272
## The project is saved into :
## seNPY.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 6 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 526
## -K (number of ancestral pops) 6
## -x (input file) /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
## -q (individual admixture file) /projectnb2/vervpop/masiedu25/seNPY.snmf/K6/run1/seNPY_r1.6.Q
## -g (ancestral frequencies file) /projectnb2/vervpop/masiedu25/seNPY.snmf/K6/run1/seNPY_r1.6.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 5492384222
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [============]
## Number of iterations: 32
##
## Least-square error: 2675.056545
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K6/run1/seNPY_r1.6.Q: OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K6/run1/seNPY_r1.6.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 526
## -K (number of ancestral pops) 6
## -x (genotype file) /projectnb2/vervpop/masiedu25/seNPY.geno
## -q (individual admixture) /projectnb2/vervpop/masiedu25/seNPY.snmf/K6/run1/seNPY_r1.6.Q
## -g (ancestral frequencies) /projectnb2/vervpop/masiedu25/seNPY.snmf/K6/run1/seNPY_r1.6.G
## -i (with masked genotypes) /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.118842
## Cross-Entropy (masked data): 0.169688
## The project is saved into :
## seNPY.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 7 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 526
## -K (number of ancestral pops) 7
## -x (input file) /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
## -q (individual admixture file) /projectnb2/vervpop/masiedu25/seNPY.snmf/K7/run1/seNPY_r1.7.Q
## -g (ancestral frequencies file) /projectnb2/vervpop/masiedu25/seNPY.snmf/K7/run1/seNPY_r1.7.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 1197416926
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [=====================]
## Number of iterations: 55
##
## Least-square error: 2579.364127
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K7/run1/seNPY_r1.7.Q: OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K7/run1/seNPY_r1.7.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 526
## -K (number of ancestral pops) 7
## -x (genotype file) /projectnb2/vervpop/masiedu25/seNPY.geno
## -q (individual admixture) /projectnb2/vervpop/masiedu25/seNPY.snmf/K7/run1/seNPY_r1.7.Q
## -g (ancestral frequencies) /projectnb2/vervpop/masiedu25/seNPY.snmf/K7/run1/seNPY_r1.7.G
## -i (with masked genotypes) /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.113686
## Cross-Entropy (masked data): 0.180468
## The project is saved into :
## seNPY.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 8 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 526
## -K (number of ancestral pops) 8
## -x (input file) /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
## -q (individual admixture file) /projectnb2/vervpop/masiedu25/seNPY.snmf/K8/run1/seNPY_r1.8.Q
## -g (ancestral frequencies file) /projectnb2/vervpop/masiedu25/seNPY.snmf/K8/run1/seNPY_r1.8.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 44147089886
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [==============]
## Number of iterations: 38
##
## Least-square error: 2420.827401
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K8/run1/seNPY_r1.8.Q: OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K8/run1/seNPY_r1.8.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 526
## -K (number of ancestral pops) 8
## -x (genotype file) /projectnb2/vervpop/masiedu25/seNPY.geno
## -q (individual admixture) /projectnb2/vervpop/masiedu25/seNPY.snmf/K8/run1/seNPY_r1.8.Q
## -g (ancestral frequencies) /projectnb2/vervpop/masiedu25/seNPY.snmf/K8/run1/seNPY_r1.8.G
## -i (with masked genotypes) /projectnb2/vervpop/masiedu25/seNPY.snmf/masked/seNPY_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.106535
## Cross-Entropy (masked data): 0.167179
## The project is saved into :
## seNPY.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY.snmfProject")
plot(seNPY.snmf, col = "blue4", cex = 1.4, pch = 19)
Here we can see there is a gradual plateau down to 8, which should be the selected value for K. Typically, a gradual slide in values suggests that there is a potentially important role for isolation-by-distance processes in the history of this gene region for vervets (in other words, genetic drift might be a major explanation for the patterns of variation we see in this gene region).
To make things a bit simpler, let’s run a population structure analysis that includes K=3 clusters
seNPY.snmf = snmf(seNPYgeno, K = 3, alpha = 100, project = "new")
## The project is saved into :
## seNPY.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 3 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 526
## -K (number of ancestral pops) 3
## -x (input file) /projectnb2/vervpop/masiedu25/seNPY.geno
## -q (individual admixture file) /projectnb2/vervpop/masiedu25/seNPY.snmf/K3/run1/seNPY_r1.3.Q
## -g (ancestral frequencies file) /projectnb2/vervpop/masiedu25/seNPY.snmf/K3/run1/seNPY_r1.3.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 2120019740
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY.geno: OK.
##
##
## Main algorithm:
## [ ]
## [========]
## Number of iterations: 22
##
## Least-square error: 3255.507431
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K3/run1/seNPY_r1.3.Q: OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY.snmf/K3/run1/seNPY_r1.3.G: OK.
##
## The project is saved into :
## seNPY.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY.snmfProject")
qmatrix = Q(seNPY.snmf, K = 3)
And let’s visualize this in a STRUCTURE-type barplot:
barplot(t(qmatrix), col = c("#F8766D","#00BA38","#619CFF"), border = 1, space = 0,
xlab = "Individuals", ylab = "Admixture coefficients", names.arg = pop.data$sample, las = 2, cex.names = 0.5)
Here we can kind of see that the C1 cluster predominates in part of FS North and KwaZulu-Nata along with C2, while the C3 is mostly dominant in Zambia and East Africa and FS South and Eastern Cape along with a bit of C2 in Zambia and East Africa.
Now we can repeat the Population Structure Analysis for NPY Y1!
setPop(NPY1_genlight) <- ~taxon
pca2 <- glPca(NPY1_genlight, nf = 2)
barplot(100*pca2$eig/sum(pca$eig), col = heat.colors(50), main="PCA Eigenvalues",ylab="% of Variance Explained",xlab="Eigenvalues")
This graph will show the amount of variation from the original dataset that is kept in the PCA. This will help us determine the amount of PC’s to keep, usually those that explain the most variation would want to be kept.
Once this is determines, we can graph with ggplot.
pca2.scores <- as.data.frame(pca2$scores)
pca2.scores$taxon <- pop(NPY1_genlight)
p <- ggplot(pca2.scores, aes(x=PC1, y=PC2, colour=taxon)) + geom_point(size=2) + stat_ellipse(level = 0.95, size = 1) + theme_bw()
print(p)
To see if these differences are significant, we can run the multivariate AMOVA:
adonis3 <- adonis(pca2$scores ~ taxon, data = pca2.scores, method='eu', na.rm = TRUE)
adonis3
##
## Call:
## adonis(formula = pca2$scores ~ taxon, data = pca2.scores, method = "eu", na.rm = TRUE)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## taxon 2 110.23 55.113 16.063 0.31458 0.001 ***
## Residuals 70 240.17 3.431 0.68542
## Total 72 350.39 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This shows that there is a significant difference between taxons for the NPY Y1 receptor (p<0.001), and the taxons explain about 31.5% of the geonmic variation according to the PCA.
setPop(NPY1_genlight) <- ~pop
pca2 <- glPca(NPY1_genlight, nf = 2)
barplot(100*pca2$eig/sum(pca$eig), col = heat.colors(50), main="PCA Eigenvalues")
title(ylab="Percent of variance\nexplained", line = 2)
title(xlab="Eigenvalues", line = 1)
This code repeats the process of seeing how much variation from the original data is seen in the PCA for population. This will help us determine what PC’s we should keep.
The results can then be graphed into a ggplot
pca2.scores <- as.data.frame(pca2$scores)
pca2.scores$pop <- pop(NPY1_genlight)
p2 <- ggplot(pca2.scores, aes(x=PC1, y=PC2, colour=pop)) + geom_point(size=2) + stat_ellipse(level = 0.95, size = 1) + theme_bw() + scale_color_manual(values=c("purple4","salmon","dark green","forest green","seagreen1","dodgerblue","green","pink","cyan","red","darkblue"))
print(p2)
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Too few points to calculate an ellipse
## Warning: Removed 6 row(s) containing missing values (geom_path).
ggsave("NPY1_PCA.png",plot=p,width=5.5,height=4,units=c("in"),dpi=300)
Now we can run the multivariate AMOVA using adonis to see if the differences are significant
adonis4 <- adonis(pca2$scores ~ pop, data = pca2.scores, method='eu', na.rm = TRUE)
adonis4
##
## Call:
## adonis(formula = pca2$scores ~ pop, data = pca2.scores, method = "eu", na.rm = TRUE)
##
## Permutation: free
## Number of permutations: 999
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## pop 10 236.76 23.6757 12.917 0.67569 0.001 ***
## Residuals 62 113.64 1.8329 0.32431
## Total 72 350.39 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This shows that there is a significant difference in populations (p<0.001), and population explains about 67.6% of the genomic variance in the NPY Y1 gene region.
Between significant variation found in NPY and NPY receptor Y1 populations, variation was commonly found in an Unknown population as well as the Kasane and Livingstone sub-populations. This may suggest NPY variation, leading to possible differences in BMI and body weight, may be explained by variation in the NPY Y1 receptor of these populations.
Now we can move onto the DAPC population structure analysis!
dapc1 <- dapc(NPY1_genind, n.pca = 2, n.da = 2)
dapc1$grp
## [1] Unknown Unknown Kasane Kasane Lowland
## [6] Lowland Highland Highland Free State Free State
## [11] Free State Free State Free State Free State Free State
## [16] Free State Free State Free State Free State Free State
## [21] Free State Free State Free State Free State Free State
## [26] Free State Free State Free State KwaZulu-Natal KwaZulu-Natal
## [31] KwaZulu-Natal KwaZulu-Natal KwaZulu-Natal KwaZulu-Natal KwaZulu-Natal
## [36] KwaZulu-Natal KwaZulu-Natal KwaZulu-Natal KwaZulu-Natal KwaZulu-Natal
## [41] KwaZulu-Natal KwaZulu-Natal KwaZulu-Natal Limpopo Limpopo
## [46] Eastern Cape Eastern Cape Eastern Cape Eastern Cape Eastern Cape
## [51] Eastern Cape Eastern Cape Eastern Cape Eastern Cape Eastern Cape
## [56] Eastern Cape Eastern Cape Kafue Kafue Kafue
## [61] Kafue Kafue Kafue Kafue Kafue
## [66] Kafue Lusaka Lusaka Livingston Livingston
## [71] Livingston Livingston Livingston
## 11 Levels: Unknown Kasane Lowland Highland Free State KwaZulu-Natal ... Livingston
scatter(dapc1, cex = 2, legend = TRUE, clabel = F, posi.leg = "bottomleft", scree.pca2 = TRUE,
posi.pca2 = "topright", cleg = 0.75, col=c("purple4","salmon","dark green","forest green","seagreen1","dodgerblue","green","pink","cyan","red","darkblue"))
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "scree.pca2" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "posi.pca2" is not a
## graphical parameter
Here we can see a bit more clearly that the variation we see in South Africa appears to scale away from the rest of Africa along DA2, and within South Africa along DA1.
From looking at the PCAs and DAPC from NPY Y1, we can make a couple of observations. With the first graph, we see that pygerythrus is separating from hilgerti and cynosuros mainly along PC1 because of a single cluster of pygerythrus. When we look at the sub-populations of each species in a PCA, we see that the cluster of pygerythrus that is separating from the other two species is made up of four sub-populations: Free State, Limpopo, KwaZulu-Natal, and Eastern Cape. It is clear that the rest of the sub-populations are maintained only to hilgerti and cynosuros populations. The high significance of this separation is also shown by our Adonis results. In the DAPC, we again see the Free State, Limpopo, KwaZulu-Natal, and Eastern Cape populations clustering apart from the rest. In both graphs, it appears that there is very little to no overlap of variation between taxons and sub-populations for NPY Y1.
Now we can perform the AMOVA analysis:
library(poppr)
NPY1_genlight2 <- popsub(NPY1_genlight, exclude = c("Limpopo"))
amova2 <- poppr.amova(NPY1_genlight2, ~taxon/pop)
amova2
## $call
## ade4::amova(samples = xtab, distances = xdist, structures = xstruct)
##
## $results
## Df Sum Sq Mean Sq
## Between taxon 2 311.4445 155.72226
## Between pop Within taxon 7 432.8280 61.83258
## Between samples Within pop 61 1030.9528 16.90087
## Within samples 71 1172.5000 16.51408
## Total 141 2947.7254 20.90585
##
## $componentsofcovariance
## Sigma %
## Variations Between taxon 2.9619781 12.8433255
## Variations Between pop Within taxon 3.3929391 14.7119999
## Variations Between samples Within pop 0.1933904 0.8385529
## Variations Within samples 16.5140845 71.6061218
## Total variations 23.0623920 100.0000000
##
## $statphi
## Phi
## Phi-samples-total 0.28393878
## Phi-samples-pop 0.01157508
## Phi-pop-taxon 0.16879946
## Phi-taxon-total 0.12843326
amova.test <- randtest(amova2)
plot(amova.test)
When we run the AMOVA on NPY looking for molecular variance, we can see that 12.8% of the variance is between taxa, 14.7% is between populations within taxa, 0.84% between samples within populations, and the remaining 71.4% is within samples outside these designations. This suggests that our population designations are relatively homogeneous within themselves, suggesting good mixing at that level, but significantly stratified by population and taxon.
library(hierfstat)
wc(NPY1_genind)
## $FST
## [1] 0.2244068
##
## $FIS
## [1] 0.0100416
NPY1Fst <- genet.dist(NPY1_genind[1:73,],method="WC84")
NPY1Fst
## Unknown Kasane Lowland Highland Free State
## Kasane 0.133903134
## Lowland -0.065616798 0.147453083
## Highland -0.063829787 0.157556270 0.016216216
## Free State 0.260688124 0.146034125 0.199827920 0.254480197
## KwaZulu-Natal 0.202412833 0.155824645 0.151510963 0.218634966 0.079864810
## Limpopo 0.135678392 0.108013937 0.052910053 0.183783784 -0.088922312
## Eastern Cape 0.539645291 0.488852380 0.471345589 0.535889346 0.336229791
## Kafue 0.123359760 0.082095762 0.076870378 0.136315790 0.189493503
## Lusaka 0.151041667 0.209302326 0.116883117 0.248677249 0.195942547
## Livingston 0.255438332 0.204191741 0.199184679 0.271502096 0.246229463
## KwaZulu-Natal Limpopo Eastern Cape Kafue Lusaka
## Kasane
## Lowland
## Highland
## Free State
## KwaZulu-Natal
## Limpopo -0.009731088
## Eastern Cape 0.180984767 0.352527298
## Kafue 0.163601101 0.116933297 0.438388424
## Lusaka 0.184555854 0.165165165 0.552815198 0.042428920
## Livingston 0.213107516 0.174884614 0.507877984 0.132050531 0.172304741
Now that we have that, we can create a heatmap to visualize FST using {ggplot2}. The output that {hierfstat} gave us is a dist class file, so first we will use the {reshape2} package to make it a matrix.
library(reshape2)
NPY1Fst <- as.matrix(NPY1Fst)
NPY1Fst
## Unknown Kasane Lowland Highland Free State
## Unknown 0.00000000 0.13390313 -0.06561680 -0.06382979 0.26068812
## Kasane 0.13390313 0.00000000 0.14745308 0.15755627 0.14603413
## Lowland -0.06561680 0.14745308 0.00000000 0.01621622 0.19982792
## Highland -0.06382979 0.15755627 0.01621622 0.00000000 0.25448020
## Free State 0.26068812 0.14603413 0.19982792 0.25448020 0.00000000
## KwaZulu-Natal 0.20241283 0.15582464 0.15151096 0.21863497 0.07986481
## Limpopo 0.13567839 0.10801394 0.05291005 0.18378378 -0.08892231
## Eastern Cape 0.53964529 0.48885238 0.47134559 0.53588935 0.33622979
## Kafue 0.12335976 0.08209576 0.07687038 0.13631579 0.18949350
## Lusaka 0.15104167 0.20930233 0.11688312 0.24867725 0.19594255
## Livingston 0.25543833 0.20419174 0.19918468 0.27150210 0.24622946
## KwaZulu-Natal Limpopo Eastern Cape Kafue Lusaka
## Unknown 0.202412833 0.135678392 0.5396453 0.12335976 0.15104167
## Kasane 0.155824645 0.108013937 0.4888524 0.08209576 0.20930233
## Lowland 0.151510963 0.052910053 0.4713456 0.07687038 0.11688312
## Highland 0.218634966 0.183783784 0.5358893 0.13631579 0.24867725
## Free State 0.079864810 -0.088922312 0.3362298 0.18949350 0.19594255
## KwaZulu-Natal 0.000000000 -0.009731088 0.1809848 0.16360110 0.18455585
## Limpopo -0.009731088 0.000000000 0.3525273 0.11693330 0.16516517
## Eastern Cape 0.180984767 0.352527298 0.0000000 0.43838842 0.55281520
## Kafue 0.163601101 0.116933297 0.4383884 0.00000000 0.04242892
## Lusaka 0.184555854 0.165165165 0.5528152 0.04242892 0.00000000
## Livingston 0.213107516 0.174884614 0.5078780 0.13205053 0.17230474
## Livingston
## Unknown 0.2554383
## Kasane 0.2041917
## Lowland 0.1991847
## Highland 0.2715021
## Free State 0.2462295
## KwaZulu-Natal 0.2131075
## Limpopo 0.1748846
## Eastern Cape 0.5078780
## Kafue 0.1320505
## Lusaka 0.1723047
## Livingston 0.0000000
colnames(NPY1Fst) <- c("Unknown","Kasane","Lowland","Highland","Free State","KwaZulu-Natal","Limpopo","Eastern Cape","Kafue","Lusaka","Livingston") #rename columns from numbers to population names.
rownames(NPY1Fst) <- c("Unknown","Kasane","Lowland","Highland","Free State","KwaZulu-Natal","Limpopo","Eastern Cape","Kafue","Lusaka","Livingston") #rename rows from numbers to population names.
NPY1Fst
## Unknown Kasane Lowland Highland Free State
## Unknown 0.00000000 0.13390313 -0.06561680 -0.06382979 0.26068812
## Kasane 0.13390313 0.00000000 0.14745308 0.15755627 0.14603413
## Lowland -0.06561680 0.14745308 0.00000000 0.01621622 0.19982792
## Highland -0.06382979 0.15755627 0.01621622 0.00000000 0.25448020
## Free State 0.26068812 0.14603413 0.19982792 0.25448020 0.00000000
## KwaZulu-Natal 0.20241283 0.15582464 0.15151096 0.21863497 0.07986481
## Limpopo 0.13567839 0.10801394 0.05291005 0.18378378 -0.08892231
## Eastern Cape 0.53964529 0.48885238 0.47134559 0.53588935 0.33622979
## Kafue 0.12335976 0.08209576 0.07687038 0.13631579 0.18949350
## Lusaka 0.15104167 0.20930233 0.11688312 0.24867725 0.19594255
## Livingston 0.25543833 0.20419174 0.19918468 0.27150210 0.24622946
## KwaZulu-Natal Limpopo Eastern Cape Kafue Lusaka
## Unknown 0.202412833 0.135678392 0.5396453 0.12335976 0.15104167
## Kasane 0.155824645 0.108013937 0.4888524 0.08209576 0.20930233
## Lowland 0.151510963 0.052910053 0.4713456 0.07687038 0.11688312
## Highland 0.218634966 0.183783784 0.5358893 0.13631579 0.24867725
## Free State 0.079864810 -0.088922312 0.3362298 0.18949350 0.19594255
## KwaZulu-Natal 0.000000000 -0.009731088 0.1809848 0.16360110 0.18455585
## Limpopo -0.009731088 0.000000000 0.3525273 0.11693330 0.16516517
## Eastern Cape 0.180984767 0.352527298 0.0000000 0.43838842 0.55281520
## Kafue 0.163601101 0.116933297 0.4383884 0.00000000 0.04242892
## Lusaka 0.184555854 0.165165165 0.5528152 0.04242892 0.00000000
## Livingston 0.213107516 0.174884614 0.5078780 0.13205053 0.17230474
## Livingston
## Unknown 0.2554383
## Kasane 0.2041917
## Lowland 0.1991847
## Highland 0.2715021
## Free State 0.2462295
## KwaZulu-Natal 0.2131075
## Limpopo 0.1748846
## Eastern Cape 0.5078780
## Kafue 0.1320505
## Lusaka 0.1723047
## Livingston 0.0000000
melted_fst1 <- melt(NPY1Fst) #melt the data to be used for a heatmap
melted_fst1
## Var1 Var2 value
## 1 Unknown Unknown 0.000000000
## 2 Kasane Unknown 0.133903134
## 3 Lowland Unknown -0.065616798
## 4 Highland Unknown -0.063829787
## 5 Free State Unknown 0.260688124
## 6 KwaZulu-Natal Unknown 0.202412833
## 7 Limpopo Unknown 0.135678392
## 8 Eastern Cape Unknown 0.539645291
## 9 Kafue Unknown 0.123359760
## 10 Lusaka Unknown 0.151041667
## 11 Livingston Unknown 0.255438332
## 12 Unknown Kasane 0.133903134
## 13 Kasane Kasane 0.000000000
## 14 Lowland Kasane 0.147453083
## 15 Highland Kasane 0.157556270
## 16 Free State Kasane 0.146034125
## 17 KwaZulu-Natal Kasane 0.155824645
## 18 Limpopo Kasane 0.108013937
## 19 Eastern Cape Kasane 0.488852380
## 20 Kafue Kasane 0.082095762
## 21 Lusaka Kasane 0.209302326
## 22 Livingston Kasane 0.204191741
## 23 Unknown Lowland -0.065616798
## 24 Kasane Lowland 0.147453083
## 25 Lowland Lowland 0.000000000
## 26 Highland Lowland 0.016216216
## 27 Free State Lowland 0.199827920
## 28 KwaZulu-Natal Lowland 0.151510963
## 29 Limpopo Lowland 0.052910053
## 30 Eastern Cape Lowland 0.471345589
## 31 Kafue Lowland 0.076870378
## 32 Lusaka Lowland 0.116883117
## 33 Livingston Lowland 0.199184679
## 34 Unknown Highland -0.063829787
## 35 Kasane Highland 0.157556270
## 36 Lowland Highland 0.016216216
## 37 Highland Highland 0.000000000
## 38 Free State Highland 0.254480197
## 39 KwaZulu-Natal Highland 0.218634966
## 40 Limpopo Highland 0.183783784
## 41 Eastern Cape Highland 0.535889346
## 42 Kafue Highland 0.136315790
## 43 Lusaka Highland 0.248677249
## 44 Livingston Highland 0.271502096
## 45 Unknown Free State 0.260688124
## 46 Kasane Free State 0.146034125
## 47 Lowland Free State 0.199827920
## 48 Highland Free State 0.254480197
## 49 Free State Free State 0.000000000
## 50 KwaZulu-Natal Free State 0.079864810
## 51 Limpopo Free State -0.088922312
## 52 Eastern Cape Free State 0.336229791
## 53 Kafue Free State 0.189493503
## 54 Lusaka Free State 0.195942547
## 55 Livingston Free State 0.246229463
## 56 Unknown KwaZulu-Natal 0.202412833
## 57 Kasane KwaZulu-Natal 0.155824645
## 58 Lowland KwaZulu-Natal 0.151510963
## 59 Highland KwaZulu-Natal 0.218634966
## 60 Free State KwaZulu-Natal 0.079864810
## 61 KwaZulu-Natal KwaZulu-Natal 0.000000000
## 62 Limpopo KwaZulu-Natal -0.009731088
## 63 Eastern Cape KwaZulu-Natal 0.180984767
## 64 Kafue KwaZulu-Natal 0.163601101
## 65 Lusaka KwaZulu-Natal 0.184555854
## 66 Livingston KwaZulu-Natal 0.213107516
## 67 Unknown Limpopo 0.135678392
## 68 Kasane Limpopo 0.108013937
## 69 Lowland Limpopo 0.052910053
## 70 Highland Limpopo 0.183783784
## 71 Free State Limpopo -0.088922312
## 72 KwaZulu-Natal Limpopo -0.009731088
## 73 Limpopo Limpopo 0.000000000
## 74 Eastern Cape Limpopo 0.352527298
## 75 Kafue Limpopo 0.116933297
## 76 Lusaka Limpopo 0.165165165
## 77 Livingston Limpopo 0.174884614
## 78 Unknown Eastern Cape 0.539645291
## 79 Kasane Eastern Cape 0.488852380
## 80 Lowland Eastern Cape 0.471345589
## 81 Highland Eastern Cape 0.535889346
## 82 Free State Eastern Cape 0.336229791
## 83 KwaZulu-Natal Eastern Cape 0.180984767
## 84 Limpopo Eastern Cape 0.352527298
## 85 Eastern Cape Eastern Cape 0.000000000
## 86 Kafue Eastern Cape 0.438388424
## 87 Lusaka Eastern Cape 0.552815198
## 88 Livingston Eastern Cape 0.507877984
## 89 Unknown Kafue 0.123359760
## 90 Kasane Kafue 0.082095762
## 91 Lowland Kafue 0.076870378
## 92 Highland Kafue 0.136315790
## 93 Free State Kafue 0.189493503
## 94 KwaZulu-Natal Kafue 0.163601101
## 95 Limpopo Kafue 0.116933297
## 96 Eastern Cape Kafue 0.438388424
## 97 Kafue Kafue 0.000000000
## 98 Lusaka Kafue 0.042428920
## 99 Livingston Kafue 0.132050531
## 100 Unknown Lusaka 0.151041667
## 101 Kasane Lusaka 0.209302326
## 102 Lowland Lusaka 0.116883117
## 103 Highland Lusaka 0.248677249
## 104 Free State Lusaka 0.195942547
## 105 KwaZulu-Natal Lusaka 0.184555854
## 106 Limpopo Lusaka 0.165165165
## 107 Eastern Cape Lusaka 0.552815198
## 108 Kafue Lusaka 0.042428920
## 109 Lusaka Lusaka 0.000000000
## 110 Livingston Lusaka 0.172304741
## 111 Unknown Livingston 0.255438332
## 112 Kasane Livingston 0.204191741
## 113 Lowland Livingston 0.199184679
## 114 Highland Livingston 0.271502096
## 115 Free State Livingston 0.246229463
## 116 KwaZulu-Natal Livingston 0.213107516
## 117 Limpopo Livingston 0.174884614
## 118 Eastern Cape Livingston 0.507877984
## 119 Kafue Livingston 0.132050531
## 120 Lusaka Livingston 0.172304741
## 121 Livingston Livingston 0.000000000
library(ggplot2)
ggplot(data = melted_fst1, aes(x=Var1, y=Var2, fill=value)) + ylab(NULL) + xlab(NULL) + coord_fixed() +
theme(axis.text=element_text(angle=45,vjust=0.5)) + geom_tile()
#this looks okay, but as before we'd rather not have the redundance of the same values being shown twice. So we will just use the lower half of the matrix:
get_lower_tri1<-function(NPY1Fst){
NPY1Fst[lower.tri(NPY1Fst)] <- NA
return(NPY1Fst)
}
lower_tri1 <- get_lower_tri1(NPY1Fst) #Fill half of the repeated values with "NA's"
lower_tri1
## Unknown Kasane Lowland Highland Free State KwaZulu-Natal
## Unknown 0 0.1339031 -0.0656168 -0.06382979 0.2606881 0.20241283
## Kasane NA 0.0000000 0.1474531 0.15755627 0.1460341 0.15582464
## Lowland NA NA 0.0000000 0.01621622 0.1998279 0.15151096
## Highland NA NA NA 0.00000000 0.2544802 0.21863497
## Free State NA NA NA NA 0.0000000 0.07986481
## KwaZulu-Natal NA NA NA NA NA 0.00000000
## Limpopo NA NA NA NA NA NA
## Eastern Cape NA NA NA NA NA NA
## Kafue NA NA NA NA NA NA
## Lusaka NA NA NA NA NA NA
## Livingston NA NA NA NA NA NA
## Limpopo Eastern Cape Kafue Lusaka Livingston
## Unknown 0.135678392 0.5396453 0.12335976 0.15104167 0.2554383
## Kasane 0.108013937 0.4888524 0.08209576 0.20930233 0.2041917
## Lowland 0.052910053 0.4713456 0.07687038 0.11688312 0.1991847
## Highland 0.183783784 0.5358893 0.13631579 0.24867725 0.2715021
## Free State -0.088922312 0.3362298 0.18949350 0.19594255 0.2462295
## KwaZulu-Natal -0.009731088 0.1809848 0.16360110 0.18455585 0.2131075
## Limpopo 0.000000000 0.3525273 0.11693330 0.16516517 0.1748846
## Eastern Cape NA 0.0000000 0.43838842 0.55281520 0.5078780
## Kafue NA NA 0.00000000 0.04242892 0.1320505
## Lusaka NA NA NA 0.00000000 0.1723047
## Livingston NA NA NA NA 0.0000000
melted_NPY1 <- melt(lower_tri1, na.rm = TRUE) #melt this half of the matrix
ggplot(data = melted_NPY1, aes(x=Var1, y=Var2, fill=value)) + theme_minimal() + ylab(NULL) + xlab(NULL) + coord_fixed() + theme(axis.text=element_text(angle=45, vjust=0.5)) + geom_tile(color = "white")
Here we can see that there are very few differences in the Lusaka and Eastern Cape populations for their NPY Y1 regions, but Limpopo and Free State shows that their populations are among the most different
In order to run analyses similar to structure, we must use this code:
install.packages(c("fields","RColorBrewer","mapplots"))
BiocManager::install("LEA")
library(LEA)
seNPY1geno<-vcf2geno("seNPY1.vcf")
##
## - number of detected individuals: 73
## - number of detected loci: 432
##
## For SNP info, please check ./seNPY1.vcfsnp.
##
## 0 line(s) were removed because these are not SNPs.
## Please, check ./seNPY1.removed file, for more informations.
Once again, we can figure out how many clusters we should look at in our sample using cross-entropy criterion:
seNPY1.snmf = snmf(seNPY1geno, K = 1:8, ploidy = 2, entropy = T,
alpha = 100, project = "new")
## The project is saved into :
## seNPY1.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY1.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY1.snmfProject")
##
## [1] 281110398
## [1] "*************************************"
## [1] "* create.dataset *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 432
## -s (seed random init) 281110398
## -r (percentage of masked data) 0.05
## -x (genotype file in .geno format) /projectnb2/vervpop/masiedu25/seNPY1.geno
## -o (output file in .geno format) /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
##
## Write genotype file with masked data, /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno: OK.
##
## [1] "*************************************"
## [1] "* sNMF K = 1 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 432
## -K (number of ancestral pops) 1
## -x (input file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
## -q (individual admixture file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K1/run1/seNPY1_r1.1.Q
## -g (ancestral frequencies file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K1/run1/seNPY1_r1.1.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 281110398
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno: OK.
##
##
## Main algorithm:
##
## Least-square error: 4547.562109
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K1/run1/seNPY1_r1.1.Q: OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K1/run1/seNPY1_r1.1.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 432
## -K (number of ancestral pops) 1
## -x (genotype file) /projectnb2/vervpop/masiedu25/seNPY1.geno
## -q (individual admixture) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K1/run1/seNPY1_r1.1.Q
## -g (ancestral frequencies) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K1/run1/seNPY1_r1.1.G
## -i (with masked genotypes) /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.258672
## Cross-Entropy (masked data): 0.269014
## The project is saved into :
## seNPY1.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY1.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY1.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 2 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 432
## -K (number of ancestral pops) 2
## -x (input file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
## -q (individual admixture file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K2/run1/seNPY1_r1.2.Q
## -g (ancestral frequencies file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K2/run1/seNPY1_r1.2.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 281110398
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [================]
## Number of iterations: 44
##
## Least-square error: 4031.080744
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K2/run1/seNPY1_r1.2.Q: OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K2/run1/seNPY1_r1.2.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 432
## -K (number of ancestral pops) 2
## -x (genotype file) /projectnb2/vervpop/masiedu25/seNPY1.geno
## -q (individual admixture) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K2/run1/seNPY1_r1.2.Q
## -g (ancestral frequencies) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K2/run1/seNPY1_r1.2.G
## -i (with masked genotypes) /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.221146
## Cross-Entropy (masked data): 0.237834
## The project is saved into :
## seNPY1.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY1.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY1.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 3 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 432
## -K (number of ancestral pops) 3
## -x (input file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
## -q (individual admixture file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K3/run1/seNPY1_r1.3.Q
## -g (ancestral frequencies file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K3/run1/seNPY1_r1.3.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 281110398
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [=========]
## Number of iterations: 24
##
## Least-square error: 3652.375951
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K3/run1/seNPY1_r1.3.Q: OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K3/run1/seNPY1_r1.3.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 432
## -K (number of ancestral pops) 3
## -x (genotype file) /projectnb2/vervpop/masiedu25/seNPY1.geno
## -q (individual admixture) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K3/run1/seNPY1_r1.3.Q
## -g (ancestral frequencies) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K3/run1/seNPY1_r1.3.G
## -i (with masked genotypes) /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.198766
## Cross-Entropy (masked data): 0.223452
## The project is saved into :
## seNPY1.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY1.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY1.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 4 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 432
## -K (number of ancestral pops) 4
## -x (input file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
## -q (individual admixture file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K4/run1/seNPY1_r1.4.Q
## -g (ancestral frequencies file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K4/run1/seNPY1_r1.4.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 281110398
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [========================]
## Number of iterations: 65
##
## Least-square error: 3466.029958
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K4/run1/seNPY1_r1.4.Q: OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K4/run1/seNPY1_r1.4.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 432
## -K (number of ancestral pops) 4
## -x (genotype file) /projectnb2/vervpop/masiedu25/seNPY1.geno
## -q (individual admixture) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K4/run1/seNPY1_r1.4.Q
## -g (ancestral frequencies) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K4/run1/seNPY1_r1.4.G
## -i (with masked genotypes) /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.187517
## Cross-Entropy (masked data): 0.221579
## The project is saved into :
## seNPY1.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY1.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY1.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 5 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 432
## -K (number of ancestral pops) 5
## -x (input file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
## -q (individual admixture file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K5/run1/seNPY1_r1.5.Q
## -g (ancestral frequencies file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K5/run1/seNPY1_r1.5.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 8871044990
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [===============]
## Number of iterations: 41
##
## Least-square error: 3333.170342
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K5/run1/seNPY1_r1.5.Q: OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K5/run1/seNPY1_r1.5.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 432
## -K (number of ancestral pops) 5
## -x (genotype file) /projectnb2/vervpop/masiedu25/seNPY1.geno
## -q (individual admixture) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K5/run1/seNPY1_r1.5.Q
## -g (ancestral frequencies) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K5/run1/seNPY1_r1.5.G
## -i (with masked genotypes) /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.179007
## Cross-Entropy (masked data): 0.223992
## The project is saved into :
## seNPY1.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY1.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY1.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 6 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 432
## -K (number of ancestral pops) 6
## -x (input file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
## -q (individual admixture file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K6/run1/seNPY1_r1.6.Q
## -g (ancestral frequencies file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K6/run1/seNPY1_r1.6.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 4576077694
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [=========================]
## Number of iterations: 68
##
## Least-square error: 3157.853409
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K6/run1/seNPY1_r1.6.Q: OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K6/run1/seNPY1_r1.6.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 432
## -K (number of ancestral pops) 6
## -x (genotype file) /projectnb2/vervpop/masiedu25/seNPY1.geno
## -q (individual admixture) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K6/run1/seNPY1_r1.6.Q
## -g (ancestral frequencies) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K6/run1/seNPY1_r1.6.G
## -i (with masked genotypes) /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.166591
## Cross-Entropy (masked data): 0.21531
## The project is saved into :
## seNPY1.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY1.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY1.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 7 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 432
## -K (number of ancestral pops) 7
## -x (input file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
## -q (individual admixture file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K7/run1/seNPY1_r1.7.Q
## -g (ancestral frequencies file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K7/run1/seNPY1_r1.7.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 281110398
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [=======================]
## Number of iterations: 61
##
## Least-square error: 3080.779990
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K7/run1/seNPY1_r1.7.Q: OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K7/run1/seNPY1_r1.7.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 432
## -K (number of ancestral pops) 7
## -x (genotype file) /projectnb2/vervpop/masiedu25/seNPY1.geno
## -q (individual admixture) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K7/run1/seNPY1_r1.7.Q
## -g (ancestral frequencies) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K7/run1/seNPY1_r1.7.G
## -i (with masked genotypes) /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.163003
## Cross-Entropy (masked data): 0.214531
## The project is saved into :
## seNPY1.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY1.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY1.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 8 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 432
## -K (number of ancestral pops) 8
## -x (input file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
## -q (individual admixture file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K8/run1/seNPY1_r1.8.Q
## -g (ancestral frequencies file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K8/run1/seNPY1_r1.8.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 281110398
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno: OK.
##
##
## Main algorithm:
## [ ]
## [============================]
## Number of iterations: 76
##
## Least-square error: 2816.073589
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K8/run1/seNPY1_r1.8.Q: OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K8/run1/seNPY1_r1.8.G: OK.
##
## [1] "*************************************"
## [1] "* cross-entropy estimation *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 432
## -K (number of ancestral pops) 8
## -x (genotype file) /projectnb2/vervpop/masiedu25/seNPY1.geno
## -q (individual admixture) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K8/run1/seNPY1_r1.8.Q
## -g (ancestral frequencies) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K8/run1/seNPY1_r1.8.G
## -i (with masked genotypes) /projectnb2/vervpop/masiedu25/seNPY1.snmf/masked/seNPY1_I.geno
## - diploid
##
## Cross-Entropy (all data): 0.149025
## Cross-Entropy (masked data): 0.217321
## The project is saved into :
## seNPY1.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY1.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY1.snmfProject")
plot(seNPY1.snmf, col = "blue4", cex = 1.4, pch = 19)
Since there is not a clear gradual plateau present, it cannot be said that isolation-by-distance processes such as genetic drift explain variation among the NPY Y1 region, and it is possible that other population processes such as selection may be taking place.
To make things a bit simpler, let’s run a population structure analysis that includes K=3 clusters:
seNPY1.snmf = snmf(seNPY1geno, K = 3, alpha = 100, project = "new")
## The project is saved into :
## seNPY1.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY1.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY1.snmfProject")
##
## [1] "*************************************"
## [1] "* sNMF K = 3 repetition 1 *"
## [1] "*************************************"
## summary of the options:
##
## -n (number of individuals) 73
## -L (number of loci) 432
## -K (number of ancestral pops) 3
## -x (input file) /projectnb2/vervpop/masiedu25/seNPY1.geno
## -q (individual admixture file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K3/run1/seNPY1_r1.3.Q
## -g (ancestral frequencies file) /projectnb2/vervpop/masiedu25/seNPY1.snmf/K3/run1/seNPY1_r1.3.G
## -i (number max of iterations) 200
## -a (regularization parameter) 100
## -s (seed random init) 2010044859
## -e (tolerance error) 1E-05
## -p (number of processes) 1
## - diploid
##
## Read genotype file /projectnb2/vervpop/masiedu25/seNPY1.geno: OK.
##
##
## Main algorithm:
## [ ]
## [==========]
## Number of iterations: 26
##
## Least-square error: 3591.243243
## Write individual ancestry coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K3/run1/seNPY1_r1.3.Q: OK.
## Write ancestral allele frequency coefficient file /projectnb2/vervpop/masiedu25/seNPY1.snmf/K3/run1/seNPY1_r1.3.G: OK.
##
## The project is saved into :
## seNPY1.snmfProject
##
## To load the project, use:
## project = load.snmfProject("seNPY1.snmfProject")
##
## To remove the project, use:
## remove.snmfProject("seNPY1.snmfProject")
qmatrix1 = Q(seNPY1.snmf, K = 3)
And let’s visualize this in a STRUCTURE-type barplot:
barplot(t(qmatrix1), col = c("#F8766D","#00BA38","#619CFF"), border = 1, space = 0,
xlab = "Individuals", ylab = "Admixture coefficients", names.arg = pop.data$sample, las = 2, cex.names = 0.5)
Here we can kind of see that the C1 and C2 cluster largely predominates in part of FS North and KwaZulu-Nata along, while the C3 is mostly dominant in Zambia and East Africa and a little bit in FS South and Eastern Cape along with a bit of C1 in Zambia and East Africa.
In conclusion, it is clear that there is a significant difference in the variation of gene regions amongst the vervet sub-populations for both NPY and NPY Y1 genes (p<0.001). Though there was a significant difference in variation among taxons for the NPY Y1 receptor gene region (p<0.001), there was not significant difference in taxons for the NPY gene itself. This may imply that generally throughout the vervet species, NPY is highly conserved and therefore integral to their physiological functions. Seeing that the NPY Y1 receptor is integral in the regulation of feeding and body weight regulation, understanding how differences in receptor expression may correlate to overall expression of the NPY gene, and therefore orexigenic processes among the vervet species, may provide insight as to the significance for the type of mechanisms of selection that take place among the species. Analyzing the significance of linkage disequilibrium and the presentation of linkage blocks within both gene regions, which show that there are certainly genetic forces working on the gene regions, provides the space to inquire on why such forces are occurring.
From this research, it can be found that NPY Y1 receptor expresses a significant difference in variation only prevalent within certain taxon and populations. It was found in Mercer et. al. that NPY receptor Y1 knockout is correlated with a decrease in food intake, an increase in body weight, and a decrease in energy expenditure (2011). This shows that in individuals where Y1 variance leads to upregulation of the NPY Y1 gene, there may be a need to control functionality of the NPY gene to result in an increase in energy expenditure and food intake, and a decrease in body weight. In addition to this, the overlap of hypothalamic expression in the between receptor Y1 and Y5 (Mercer 2011) may correlate with the significance of the NPY gene function and its pathways. While this research details the significance of NPY and NPY Y1 gene variation among vervets, the role of variation among other receptors of interest is still unclear. With further analysis of the gene regions for the other prevalent NPY receptors that also commonly exist in humans (NPY Y2 and Y5), we will be able to gain insight as to how variation breaks down within and between vervet taxons and populations.
It is noted that NPY and NPY receptor expression and it’s interaction with corticotropin-releasing hormone (CRH) is linked to regulation of fear, stress and anxiety (Sajdyk 2004). When introduced to stressful or novel environments, some vervet monkeys display defensive or aggressive behavior, while others stop ongoing activities such as feeding and drinking (Elbejjani 2007). In continuation of this research, we may be able to answer why differences in gene expression, variation within the gene region and variation within receptor gene regions occurs in certain populations and taxons through field work analysis of activity, food availability/variability, stressors/external factors, and BMI among vervet populations of interest. This research may be expanded even further by forming understanding how the prevalence of anxiety disorders or other disorders related to stress, fear, and anxiety in human populations may be linked NPY and NPY receptor variation, which therefore correlates to variation of physiological functions related to food intake and body mass.
Abbott, C. R., Small, C. J., Kennedy, A. R., Neary, N. M., Sajedi, A., Ghatei, M. A., & Bloom, S. R. (2005). Blockade of the neuropeptide Y Y2 receptor with the specific antagonist BIIE0246 attenuates the effect of endogenous and exogenous peptide YY (3–36) on food intake. Brain research, 1043(1-2), 139-144.
Dumont, Y., Jacques, D., Bouchard, P., & Quirion, R. (1998). Species differences in the expression and distribution of the neuropeptide Y Y1, Y2, Y4, and Y5 receptors in rodents, guinea pig, and primates brains. Journal of Comparative Neurology, 402(3), 372-384.
Elbejjani, M. (2007). Dissecting anxiety in the vervet monkey: a search for association between polymorphisms in the corticotropin releasing hormone (CRH) and neuropeptide Y (NPY) genes and anxious behavior.
Mercer, R. E., Chee, M. J., & Colmers, W. F. (2011). The role of NPY in hypothalamic mediated food intake. Frontiers in neuroendocrinology, 32(4), 398-415.
Sajdyk, T. J., Shekhar, A., & Gehlert, D. R. (2004). Interactions between NPY and CRF in the amygdala to regulate emotionality. Neuropeptides, 38(4), 225-234.
Schmitt, C. A., Gagnon C. M. (2021). Vervet Selection Pipeline Guide: UCP1 Analysis.
Wahlestedt, C., Yanaihara, N., & Håkanson, R. (1986). Evidence for different pre-and post-junctional receptors for neuropeptide Y and related peptides. Regulatory peptides, 13(3-4), 307-318.
Wang, Q., Bing, C., Al-Barazanji, K., Mossakowaska, D. E., Wang, X. M., McBay, D. L., … & Williams, G. (1997). Interactions between leptin and hypothalamic neuropeptide Y neurons in the control of food intake and energy homeostasis in the rat. Diabetes, 46(3), 335-341.