  • Bill Howard
    Jul 7, 2014
      I just sent this off to the Rootweb Genealogy-DNA site. Comments appreciated.
      From: Bill Howard <weh8@...>
      Subject: Using Y-DNA Haplotypes to Estimate Their Dates of Origin --- Pitfalls and Prospects ---
      Date: July 7, 2014 at 7:55:31 PM EDT

      A year or more ago, I wrote FTDNA about how my RCC correlation technique could be used to date groups of haplotypes. The reply I received was cordial and suggested that their interests had begun to be focussed on trying to date SNPs. They pointed out that one might be only able to put a lower limit on the date since the earliest junction point on a phylogenetic tree refers to a pair of haplotypes, not to the progenitor who first carried the SNP earlier in time. While the date of the earliest pair might be determined, the date of origin of the SNP (the point where the mutation occurred in the progenitor) would be earlier.

      In a paper on dating the origin of the M222 SNP that John McLaughlin and I wrote in 2011 (see: https://dl.dropboxusercontent.com/u/59120192/Genealogy/Papers/M222.pdf), I developed an approach that attempted to estimate a date of origin. That approach came from a realization that the sequence of mutations dating back to the time of origin was very regular and could be predicted by an exponential function that could be fit to the run of junction points on an RCC-dated phylogenetic tree. Coalescence theory predicts that an exponential function will result from the mutations that are occurring at each DYS site along the haplotype string. And that is exactly what we see.

      As a result of a discussion that I was having with a colleague, we both used very different methods to date a very large group of L21-R1b haplotypes that had been SNP-tested. While I used that SNP as an example, the dating process can be applied to any well-defined group of haplotypes. Over the course of the past few months, I have written a paper resulting from many discussions with colleagues. The paper can be found at the following web site:

      The paper emphasizes that you need to carefully prepare samples prior to analysis, paying particular attention to: (1) Type A and Type B outliers and RecLOHs, (2) the relationships between genetic distance and predictions using Poisson statistics, (3) the importance of the sequence of junction points on the dated RCC phylogenetic tree, (4) the statistics of departures from an exponential function that expresses the number of junction points as a function of time, and (5) the purity and biases in the sample. The paper also suggests a way to determine relative ages of SNPs and subclades and points to areas of exploration that require more research, particularly when dating methods by other analysts are not in agreement. No matter how sophisticated a method looks, it might give wrong results if the approach to dating gives a insufficient emphases to important features of the sample being dated.

      I would be pleased to hear any comments you may have on this effort.

      - Bye from Bill Howard

