Autosomal DNA analysis by Tim Janzen
Reposted with permission from Yahoo U106 discussion list on Oct. 5, 2013.
The section on PHASING is pulled from the very end of Tim Janzen’s DROPBOX article included in his comment below. I wanted to put it here so all the additional links are easily available. Links will also be added the the LINKS section of our discussion group.
I am not doing any intensive work with autosomal DNA for my own family, but wanted to provide any of our members with the tools to attempt their own work. I hope this helps.
Autosomal SNPs (and short or long phased autosomal haplotypes) must be carefully tracked back through time to a specific region. That can sometimes be done fairly easily, particularly in situations where the SNPs have medical implications (such as the SNPs that cause lactase persistence, Factor V Leiden deficiency, hemochromatosis, etc). However, in most cases it is a laborious process to trace phased autosomal haplotypes back through time. I think that it is particularly hard to trace phased autosomal haplotypes to a specific location in Europe due to the fact that people have been moving all over Europe a huge amount over the past 7000 years or so. What would be ideal would be to have frequency maps by country for many relatively short phased autosomal haplotypes (haplotypes between .1 cM and 1 cM). I believe that 23andMe and Ancestry.com could generate those if they were willing to, but I suspect that both companies will keep this information locked up in proprietary databases for a long time. I think that the best that any one person can do this is to map their own chromosomes as best as they possibly can using techniques such may be found in the basic guide that Emily Aulicino and I wrote at
Establishing the precise time when any one autosomal SNP occurred is more challenging. Looking at the regional distribution of that SNP can be helpful in this regard, but for “private” autosomal SNPs we simply don’t have large enough databases we can use to check for things such as this. The autosomal SNPs that are included on the larger SNP chips such as the Omni Express chip and Geno 2.0 are likely to be at least 2000 or more years old.
Phasing is the process of determining which allele values (A, T, C, or G) in an unphased autosomal DNA SNP dataset came from one parent and which came from the other parent. While phasing is not necessary for the purposes of chromosome mapping, it can still be helpful to phase your data.
David Pike has a phasing tool at http://www.math.mun.ca/~dapike/FF23utils/. In March 2011 Tim Janzen wrote a program in Excel that will phase either 23andMe or Family Finder data from two parents and one of their children. That program may be found on Tim Janzen’s Dropbox account at
Instructions on how to use the program may be found at
Tim Janzen has also uploaded a small version of the program that includes sample data from two parents and one of their children for 500 SNPs to give genetic genealogists an example of what the output looks like on a small scale. This data can be found at:
GEDmatch (www.gedmatch.com) is now phasing data as well. However, it does not provide you with the phased data files. Using phased data files for running the comparisons significantly reduces the number of identical by state that you will receive in your match list. At this time none of the three major DNA companies that perform autosomal DNA tests for genealogical purposes (Family Tree DNA, 23andMe, or AncestryDNA) allow customers to upload phased data files into their databases. To use GEDmatch see http://ww2.gedmatch.com:8006/autosomal/phase1.php.
There is more about phasing at http://www.isogg.org/wiki/Phasing. Information about phasing the X-chromosome is at
© Aulicino and Janzen, August 2013