Loading ...
Sorry, an error occurred while loading the content.

320Re: Clarifying SNP's and STR's

Expand Messages
  • Emil
    Jun 5, 2012
    • 0 Attachment
      Very clear. Thank you.

      --- In I-M223@yahoogroups.com, Aaron Salles Torres <sallfertorr@...> wrote:
      >
      > Hello, all
      >
      > I notice there is some great confusion going on regarding SNP's and STR's. I also realize that even those who are on the right track as far as understanding the basics still have some serious misconceptions. So let's try to set the record straight.
      >
      > An SNP refers to "Single-nucleotide polymorphism" (http://en.wikipedia.org/wiki/Single-nucleotide_polymorphism). What does that mean? Well, note that whenever I share the news regarding a "Walk Through the Y" exam I always mention how many base pairs were analyzed? That is exactly what it refers to. A base pair is, for instance, C-G or A-T. This is the most basic element of our DNA. Millions and millions of humans share the exact values at the vast majority of our base pairs. However, sometimes there is a single mutation in a
      > single base pair. These mutations are very, very rare. And once they happen, it is even more unlikely the base pair will ever revert to what it was before. So these single base pair mutations are considered "permanent" (even though there are reported cases of reversions, some in our own project, let's not consider them for now). In 99,999999999999999999% of the cases these single base pair mutations are permanent. So, when we look at the whole, we can say there exists a polymorphism at that base pair site. That is to say, the vast majority of humans have the standard C-G combination at that specific site, whereas a few have a mutation (A-T). So that single base pair (or single nucleotide) comes in two possible combinations. Therefore, the less common combination is called an SNP (referring to it being "another existing version" of that site).
      >
      > Now, we also have areas in our DNA where a number of base pairs repeat themselves in a very
      > distinguishable pattern (which also distinguishes an area). It has been decided by scientists that we look at one side of the strand (for instance, A combines to T. But we'll just look at the A side of the "ziper" and see what comes before and after the A. The T side can be inferred, as A only combines to T and C only combines to G in our DNA). So there is a region that is defined by: ACACACATACACT. That is random, except it is repeated, let's say, 15 times in the majority of human beings. So. the whole area is defined by: ACACACATACACT ACACACATACACT ACACACATACACT ACACACATACACT ACACACATACACT ACACACATACACT ACACACATACACT ACACACATACACT ACACACATACACT ACACACATACACT ACACACATACACT ACACACATACACT ACACACATACACT ACACACATACACT ACACACATACACT. The other side of the "ziper" would be: TGTGTGTGTGTGA. And would look like this (15 times also): TGTGTGTGTGTGA TGTGTGTGTGTGA TGTGTGTGTGTGA TGTGTGTGTGTGA TGTGTGTGTGTGA TGTGTGTGTGTGA TGTGTGTGTGTGA TGTGTGTGTGTGA TGTGTGTGTGTGA
      > TGTGTGTGTGTGA TGTGTGTGTGTGA TGTGTGTGTGTGA TGTGTGTGTGTGA TGTGTGTGTGTGA TGTGTGTGTGTGA. As these areas of repeated base pairs form a pattern, they are referred to as an STR (http://en.wikipedia.org/wiki/Short_tandem_repeat). This specific area receives a specific name DYS888888 (let's say). Well, as we evolve and change there start to arise individuals with 14 repeats rather than 15, and then 13 repeats. Other lines evolve to have 16 or 17 repeats. But these are much less permanent than SNP's. So the son of someone who has 17 repeats may revert back to 15 repeats, etc... That is why that when STR's are concerned, we look at a group of STR's. This way we look at a larger surface. Having a larger number of STR's with values that are different from the norm leads one to consider shared ancestry. Yes, one STR may have reverted back. But there are 5 others in that same group of individuals that still look different from the norm. So we may classify that group
      > "Cont1c." They have some STR's in common with Cont1 but some others that are restricted to their smaller group.
      >
      > Because STR's change back and forth, we may still be tricked by convergence even when we look at a larger number of STR's. Convergence is when individuals end up sharing similar STR values even though they have very different ancestry and evolution histories. That is why it is important to combine our knowledge of a line's STR pattern to our knowledge of that line's SNP history. SNP's are much more reliable because they are considered permanent. So two individuals may look widely different in their STR patterns. Still, if they share the same "line" of SNP's (for instance M223>Z161>L801>Z76>Z78>L1198>Z79), we are absolutely certain they share a real common ancestry up until that last known SNP in their line happened (terminal SNP).
      >
      > While we are unable discover SNP's that distinguish a specific line, we have to
      > continue looking at STR patterns to group individuals. Because a number of highly unusual STR's indicate shared ancestry - this will be proven when we discover the SNP's to back it up. By grouping individuals based on STR patterns, we also know where to look for SNP's, and where to test them. Because STR patterns indicate shared ancestry, and SNP's prove them. Sometimes STR patterns are misleading (convergence) in which SNP's come to prove us wrong.
      >
      > Both STR and SNP's define male lines. An STR may have mutated today when you made this specific sperm. And an SNP may also have arisen today when you made this specific sperm. So if a son comes to be born from that single sperm, he'll have an SNP of his own that he'll pass on to his children (and only his children will have it). It's believed that there comes up about one new SNP per generation. Some known SNP's are very young  and can distinguish the descendents of a Most Recent Common Ancestor
      > born not long ago. Other SNP's are very old, so there are millions of individuals who share them (because the human population has grown so much in recent years). It is easier to find older SNP's because more people have them. But to find a "private-like SNP" (one that is shared by a specific family), one has to test members of that specific family, as it won't be found in other families more anciently related. So, an SNP is not necessarily old.
      >
      > By grouping individuals who share an old SNP (M223) and comparing their STR values, we come to understand what is the standard STR pattern for that haplogroup. The same can be said about that haplogroup's subgroups.
      >
      > An STR is not an SNP (because an STR is a number of repetitions of a group of base pairs). An SNP refers to a single base pair. But an SNP may exceptionally be contained within an STR (such is the case of L484, for example, which makes it difficult to test). And a primer is the
      > chemical substance used to test specific base pairs. 
      >
      > You can compare all SNP results from our project (http://www.familytreedna.com/public/M223-Y-Clan/default.aspx?section=ysnp) to all STR results from our project (http://www.familytreedna.com/public/M223-Y-Clan/default.aspx?section=yresults) and see how STR relates to SNP's and why we have separated the groups the way we did.
      >
      > I hope this clarified some questions.
      >
      > Thanks,
      > Aaron Torres
      >
    • Show all 3 messages in this topic