Note 11/25/20: I see now this sort of clustering has been done before. I’ll leave this here for now, but will probably follow up using other tools such as those described in this blog post.
https://blog.kittycooper.com/2018/12/more-automated-dna-match-clustering/
GEDmatch shows an interesting pattern of matches between the following kits:
| DNA Kit | Name | GEDCOM | |
|---|---|---|---|
| YX8532528 | Richard Seiter | res_bulk@sonic.net | 6145717 |
| A627562 | *Nichols III | slohcinnhoj3@aol.com | 3980728 |
| A781825 | Ginger Carter | ginger.carter@ocps.net | |
| A389959 | Kathryn Hinton Lizenby | kdlizenby@gmail.com | |
| GN4430726 | Kathryn Lizenby | kdlizenby@gmail.com | |
| M101606 | Brian Barnes | brianb@fuse.net | 2221792 |
Kit GN4430726 appears to be a different test of the same person as kit A389959. The latter gives slightly closer matches, so I will focus on kit A389959 below.
The relationship can be seen most easily by looking at kits which match both of the first two kits above. Notice how the other four kits are the first four matches. I am including the rest of the list in case it turns out some of the others can be placed in this pattern as well.
Kits which match both YX8532528 and A627562
There is useful background information at ISOGG Wiki - Autosomal DNA statistics
Including this graphic which I thought was worth including inline to give perspective on the match lengths seen below.
Below are a series of observations about how these kits match (or not) as a group.
This is the closest match for both kits used to generate the table above. The interesting observation here is the pattern of matching we see in the 3D Chromosome Browser.
3D Chromosome Browser results
This is a bit hard to interpret, but let’s give it a try.
On Chromosome 1 it appears that A781825 has a long stretch of the ancestral version (roughly from 768448 to 21015685) while both of the other kits match nearby but non-overlapping portions of that. For YX8532528 the match is from 768448 to 12626549 and for A627562 the match is from 13782197 to 21015685.
Chromosome 6 is even more interesting, but harder to interpret. There is a large (about 40 cM) segment where all three kits overlap from 90638963 to 136410563. The kits then vary on how far they extend past either end of that.
I think it is quite the coincidence that all three kits have such a large mutual overlap given the distance of the relationships. It accounts for over half of all of the pairwise match lengths between those three kits.
The matches on chromosomes 11, 13, and 14 are relatively simple pairwise matches.
To understand this better it is worth taking a look at the pairwise results for the One-to-One Autosomal DNA Comparison tool.
If we look at comparisons for another pair of kits (YX8532528 and M101606) though we see something strikingly different (note I am leaving out GN4430726 as a duplicate). This pair has many more mutual matches than the pair above so I have truncated the list, but what is notable is that M101606 appears not to match A781825 at all (and I lowered the match threshold to 4 cM!) while A627562 and A389959 are both matches.
Match both YX8532528 and M101606
Based on email discussions with the submitters of kits A627562 and M101606 I believe the following.
My kit YX8532528 is related to kit A627562 as a third cousin through William Perry Nichols and Lucinda Barnes.
My kit YX8532528 is related to kit M101606 as a fourth cousin through John Andrew Barnes (father of Lucinda) but perhaps with a different wife.
Since Lucinda Barnes Nichols 1840-1919 is a key person here, this is a link to her page on FamilySearch. It requires a free account to view.
https://www.familysearch.org/tree/person/details/KLL1-KZS
Based on the above, I think kit A781825 is matching through William Perry Nichols but not Lucinda Barnes. I am unsure how exactly that is happening (perhaps chance?). I think kit A389959 is matching more through Lucinda Barnes than William Perry Nichols (see previous graphic) due to chance.
It would be nice to sort out these relationships with genealogical evidence.
I think focusing on the matches between M101606 and the other two helps clarify what might be coming from Lucinda Barnes.
3D Chromosome Browser
Chromosome 11 has a three way match similar to the one we saw in the earlier 3D Chromosome Browser plot (perhaps this is not such an uncommon thing then?). The common portion is about 10 cM with varying degrees of overlap on either side.
Chromosome 21 is interesting because A627562 and A389959 have two nearby matches which appear to be separated by an intervening non-match.
There was a chromosome 11 match in the earlier plot as well, so lets look at that with different kits.
3D Chromosome Browser
Chromosome 11 is interesting here in that A627562 has a long section which matches in nearby but non-overlapping fashion with both of the other kits.
Given all of these 3D Chromosome Browser results I think it is likely that the chromosome 6 matches (kits YX8532528, A627562, and A781825) are from William Perry Nichols and the chromosome 11 matches (kits YX8532528, A627562, A389959, and M101606) are from John Andrew Barnes. The one generation difference seems to align with the chromosome 6 matches tending to be longer than those for chromosome 11. Note that chromosome 1 appears to have matches from both lineages in different locations.
This is an attempt to capture all of the pairwise matches (Total cM) between these kits in one graphic. I think the easiest way to do this is to combine the three kit versions from multiple different runs of the 3D chromosome browser. Alternatively, just compile the results from multiple runs of the one to many match tool.
| Kits | YX8532528 | A627562 | A781825 | A389959 | M101606 |
|---|---|---|---|---|---|
| YX8532528 | NA | 85.3 | 85.3 | 41.4 | 22.8 |
| A627562 | 85.3 | NA | 68.3 | 51.6 | 36.6 |
| A781825 | 85.3 | 68.3 | NA | 0.0 | 0.0 |
| A389959 | 41.4 | 51.6 | 0.0 | NA | 31.7 |
| M101606 | 22.8 | 36.6 | 0.0 | 31.7 | NA |
I interpret this as dividing the kits into three groups.
YX8532528 and A627562 - match each other strongly and the others to varying degrees
A781825 - matches YX8532528 and A627562 strongly but the others not at all
A389959 and M101606 - match each other as well as YX8532528 and A627562 to varying degrees, but A781825 not at all
Based on this and thinking that YX8532528 and A627562 are third cousins through William Perry Nichols and Lucinda Barnes while those two kits are fourth cousins with M101606 through John Andrew Barnes (father of Lucinda Barnes) leads me to speculate that A389959 is on the John Andrew Barnes portion of the tree (perhaps a different child of his? that would make another fourth cousin through John Andrew Barnes).
I’m less sure about A781825. The matches with YX8532528 and A627562 seem too close for the common ancestor to be one of William Perry Nichols’ parents (which would have been useful, since we don’t know who they are). But the only alternatives I see are connections more recently in the tree and there would have to be two different connections.
Does anyone have the genealogical data to check this?
The 3D Chromosome Browser is even more powerful than I realized. Here are the results for all five kits at once. The most notable difference from my analysis above is that this reveals a 7.4 cM match between A781825 and A389959.
3D Chromosome Browser All Five 1
It also supports looking at the individual chromosomes. Here are chromosomes 1, 6, and 11 which show the most overlaps above. These are a bit hard to interpret. It helps to compare to the table of overlaps just above this.
3D Chromosome Browser Chromosome 1
3D Chromosome Browser Chromosome 6
3D Chromosome Browser Chromosome 11
Another good use of the 3D Chromosome Browser Five Way match shown above is to evaluate how additional kits fit into this pattern. I checked kits matching both YX8532528 and (match graphic at the very top) and found the following which appear to be related. I am attaching a summary plot and some comments, but they are worth taking a look for yourself. Since the browser supports 10 kits at once I did these as groups of five added to our original five. Overall it seems clear there are more which are associated with our group of five and other subgroups which might bear some further inspection.
Here I am using thresholds of 10 cM and 7 cM for total and largest in the two kit match and 7 cM in the 3D Chromosome Browser.
A744731 and FW8556328 look like good candidates to add to the group. Neither of these kits has a GEDCOM file associated with it.
3D Chromosome Browser All Five Plus Five 2
None of these look like candidates for our group.
3D Chromosome Browser All Five Plus Five 3
None of these look like candidates for our group.
Here are the results for the proposed seven kit group: YX8532528, A627562, A781825, A389959, M101606, A744731, and FW8556328. In particular notice the additional matches on chromosomes 1, 6, and 11 which we discussed above.
3D_Chromosome_Browser_Seven_Kits_1
3D_Chromosome_Browser_Seven_Kits_2
Look at chromosomes 1, 6, and 11 more closely below. Notice various overlapping match areas.
3D Chromosome Browser Chromosome 1
3D Chromosome Browser Chromosome 6
3D Chromosome Browser Chromosome 11
GEDCOM files for the kits discussed in this analysis are linked above. THere are currently only three, but by comparing them you can see the obvious matches.
I also uploaded a GEDCOM with all the descendants of John Andrew Barnes in FamilySearch as GEDCOM 6180384
Hopefully this will facilitate looking for GEDmatch GEDCOM matches in this portion of our tree. It is a fairly small file (312 people) so should match quickly and present a minimum of less relevant matches.
Unfortunately, there are 128685 GEDCOMs at GEDmatch and as far as I can tell we are only able to compare starting by the most recent date (see below). Since the comparison times out after a few thousand matches this makes it difficult to compare with more than a fraction of the GEDCOMs they have.
In this case, the only match I saw was my own kit which was uploaded recently. Even with high standards for the match there were a fair number of false positives.
Select Threshold for Individual Name Match: 90% Match (default 75%)
Select Threshold for Place Name Match: 80% Match (default 60%)
Select Threshold for Date (Year) Match: 2 Years {default 5 years)
Regarding “as far as I can tell we are only able to compare starting by the most recent date”
The One to Many GEDCOM match page does have an option to start from earlier dates:
Comparisons are made first to the most recently uploaded GEDCOMs. If you want to start comparisons prior to some other point in time, select a date from this pulldown:
The problem is when you use it and look at the “Date uploaded” on the matches returned it is clearly not working.
I posted about this in the GEDmatch Forums.
My primary focus here is on sites which allow uploading DNA data from other sources.
https://www.familytreedna.com/
Supports free DNA data upload: https://www.familytreedna.com/autosomal-transfer
Reviews
https://blog.genomelink.io/posts/familytreedna-review-by-experts
https://www.myheritage.com/dna/
Supports free DNA data upload: https://www.myheritage.com/dna/upload/802370621
Ancestry does not support uploading DNA data from other sources so I won’t say much about it here. It is notable because it is one of the most popular sites and the source for many of the kits on GEDmatch and in other places.
Here is a link to their site, an FAQ, and a review.
https://www.ancestry.com/dna/
https://www.ancestry.com/dna/en/legal/us/faq
https://www.pcmag.com/reviews/ancestrydna
https://blog.nebula.org/ancestry-review/
This site is interesting because it automates what I have been trying to do above.
This blog post offers a detailed look at their clustering and alternative ways to do it.
https://blog.kittycooper.com/2018/12/more-automated-dna-match-clustering/#more-6528
Also see this video: https://youtu.be/--eruCeJ9_8?t=2540
More clustering. http://geneticaffairs.com/
Licensed to MyHeritage: https://blog.myheritage.com/2019/02/introducing-autoclusters-for-dna-matches/
Use genealogical data to better explain these relationships.
It might be worthwhile to search through other pairwise matches to see if any other kits fit into this pattern, but with shorter matches.
Investigate the following R resources for displaying genealogical relationships here directly.
GENLIB package (Houde et al. 2020)
Also see the associated paper: GENLIB: an R package for the analysis of genealogical data (Gauvin et al. 2015)
ggenealogy package for visualization requires R >= 3.6.0
https://cran.r-project.org/web/packages/ggenealogy/index.html
Reading GEDCOM files.
https://github.com/jjfitz/readgedcom
https://www.r-bloggers.com/2019/07/gedcom-reader-for-the-r-language-analysing-family-history/
https://www.r-bloggers.com/2019/12/can-genealogical-data-be-tidy/
Some useful ideas: https://jasonfitzgerald.netlify.app/2019/09/how-far-does-the-apple-fall/
There is a lot here. Hopefully it is not too overwhelming.
I think the bottom line is that these five kits are all related through a similar portion of our respective trees and this comes across in the DNA matches. It would be good to understand how the genealogy relates to the DNA more completely. It may also be beneficial for us to collaborate in researching the common portions of our trees.
Thanks for reading. I would appreciate any comments or criticisms you might have.
File Initially created: Tuesday, November 24, 2020
File knitted: Wed Nov 25 15:22:10 2020
Gauvin, H’eloı̈se, Jean-François Lefebvre, Claudia Moreau, Eve-Marie Lavoie, Damian Labuda, H’el‘ene V’ezina, and Marie-H’el‘ene Roy-Gagnon. 2015. “GENLIB: An R Package for the Analysis of Genealogical Data.” BMC Bioinformatics 16 (1). https://doi.org/10.1186/s12859-015-0581-5.
Houde, Louis, Jean-Francois Lefebvre, Valery Roy-Lagace, and Sebastien Lemieux. 2020. GENLIB: Genealogical Data Analysis. https://CRAN.R-project.org/package=GENLIB.