- Dec 30, 2013View SourceJim,In today's posting, I did not go into where you start on the tree, only that I have found that the fit is to an exponential. And you agreed to that. So, I do not understand why you think that mentioning that we are dealing with an exponential is SIGNIFICANTLY flawed. That seems a bit harsh to me.In my work so far, I have seen no significant difference in the analysis whether you sample cuts through the tree at regular intervals or only at the branch points. In fact, in one comparison I made between the two ways of counting, I found that the variance (R-squared) of the cuts at regular intervals was slightly higher than the branch point cuts, but the difference, although it favors my approach, is not significant.I also fail to see why you say that the result is significantly difficult depending on whether you count forward or backward in time. If we differ, I think it is because I believe that the progenitor lived before the TMRCA if the group and you believe that the progenitor is the MRCA. We apparently disagree here. I remember an exchange I had with a staff member at FTDNA a few years ago who seemed to understand what I was doing (dealing only with the TMRCA point), and they stated that the challenge was to date the progenitor of a Haplogroup (or a SNP) who lived before the MRCA. That is what I believe I am doing here. We can only directly determine the earliest branch point, but that is where the mutation occurred, not when the progenitor lived. One of is sons was responsible for the branching.-- Bye from Bill
Sent from Bill Howard's iPad
On Dec 30, 2013, at 18:42, "J. J. (Jim) Logan" <jjlnv@...> wrote:Bill:
I'm sorry you did that before we had hashed out our discussion about crossovers vs branch points. I believe your public statement is premature and significantly flawed. I thought we agreed to continue our discussions after the holidays.
My position is that you should interpret you tree by progressing from the root of the tree (i.e., from the progenitor) and moving forward in time counting branch points, instead of working from the present and working backward counting crossovers. The difference is subtle but it yields significant difference in result. I suggest that an estimate for the date of the most recent common ancestor (MRCA) is directly related to the first branch point on the tree and does not need to be extrapolated as you do graphically or from your derived exponential equation. Although your RCC approach uses clustering to develop a tree, the purpose is (or should be) to characterize clades from real past populations. And in the real world, clades are defined by mutations that cause branch points. If you wish to make some kind of adjustment to account for the difference in dates between the progenitor and MRCA for the entire set of genotypes you are working with, then that adjustment also applies to all other branch points that define clades and subclades, wach of which has its own progenitor and MRCA.
As I have mentioned to you before, within population genetics, there is a well developed coalescence theory. This theory is based on well stated assumptions such as a constant mutation rate and that the effect of mutations is neutral. The coalescence theory is now relatively mature taking into consideration other factors such as population bottlenecks and interbreeding between populations. You implicitly make similar simplifying assumptions in your RCC approach to developing a tree and then calibrating it. However, once you have actually drawn a tree, your interpretation of the that tree is significantly different and I believe incorrect. Subtle but significant.
==================== J. J. (Jim) Logan Logan DNA Project, GenGen-NV, ISOGG, GOONS, CWG/VASSAR ===================================================================On 12/30/2013 3:01 PM, weh8@... wrote:I have just posted this note to the genealogy-dna@... forum:I have had an interesting insight that I am working on. I can derive a dated Y-DNA phylogenetic tree, given a set of haplotypes that can be arbitrarily long. For a haplotype length of 37 markers, we can derive a time scale from a large number of pedigrees that gives the result that 10 RCC is about 433 years. The insight follows:In our dated Y-DNA phylogenetic tree, if you count (along a constant RCC line on the tree) the number of times that a descendant line is crossed, that number N is related to RCC by an exponential of the form: N equals K times e to the power ax, where:• N is the number of times a descendant line is crossed at each value of RCC on the tree,• K is the number of testees in the sample of haplotypes we use to form the tree,• x is RCC (a time scale derived from over 100 testee pedigrees),• e is 2.71828...., the base of the natural logarithm,• and 'a' is a constant of the set. Let's call 'a' the "tree factor".Call this relation, "the tree equation".For our phylogenetic trees, 'a' is a negative number and probably is composed of factors that include:the average number of sons along the descendant linesthe average rate at which descendant lines die outthe average mutation rate in the set of testeescharacteristics of the testee set chosen for the tree, etc.The tree equation is not a perfect exponential. It has glitches in it, but the quality of the exponential relationship can be quite high, with values of Rsquared (the variance) exceeding 0.9.The fact that the relationship is exponential is not unexpected, since the growth of the world's population is also exponential.This insight provides additional impetus to understanding what a well-defined set of inputs to the phylogenetic tree can tell us about the evolution of haplotypes, the dates of origin of family surname clusters and SNPs, and subhaplogroups. That date of origin is where N=1 in the tree equation and it can be found either from the equation or from an extrapolation of the graph of N vs. RCC from which the tree was derived!===============================Sincerely, Bill Howard