Loading ...
Sorry, an error occurred while loading the content.

Re: [ssahausers] SSAHA Psuedocode documentation

Expand Messages
  • Adam Spargo
    Hi, The algorithms are very simple. The key set is very small (when compared to the sequence data). We map ALL kmers whether they appear in the sequence or
    Message 1 of 2 , Nov 28, 2005
    • 0 Attachment
      Hi,
      The algorithms are very simple. The key set is very small (when compared
      to the sequence data). We map ALL kmers whether they appear in the
      sequence or not. There is a 2-bit encoding:

      A = 00
      C = 01
      G = 10
      T = 11

      Then each kmer is a 2*k bit 'integer'. Once k is defined you process each
      kmer by looping over the K bases, assign the 2 bits for each bases and
      bit-shift 2 to the left. So for example if k=12 you use 24 bits.

      There are (|A|)^k permuatations for an alphabet A, so for DNA there are
      4^k permutations, for k=12 there are 4^12 = 16777216 keys. The hashtable
      consists of a head, with one entry for each kmer, and a body.

      head[i] points to an entry in body, so that body[head[i]] is the first
      occurrance of kmer i in the subject sequence, body[head[i]+1] is the next
      occurrance etc, until you reach body[head[i+1]] which is the first
      occurrance of the next kmer.

      To process a query you loop over the (overlapping) kmers, once you have
      calculated i for a kmer, body[head[i]] ... body[head[i+1]-1] are all the
      locations of that kmer in the subject.

      Let me know if you need any clarification.

      Cheers,

      Adam.



      On Mon, 28 Nov 2005, greenexile4568 wrote:

      > Hi,
      > Is there a pseudocode representation of the SSAHA algorithm ?
      >
      > I'm particulary interested in learning what approaches were taken to
      > implement the various key steps in the algorithm. For example did you
      > use a decrease and conquer approach when generating the possible kmer
      > permutations..
      >
      > Be grateful for any information you can provide
      > Thanks
      > Brian Wilson
      >
      >
      >
      >
      >
      >
      >
      > Yahoo! Groups Links
      >
      >
      >
      >
      >
      >
      >

      --
      Dr Adam Spargo
      High Performance Assembly Group email: aws@...
      Wellcome Trust Sanger Institute Tel: +44 (0)1223 834244 x7728
      Hinxton, Cambridge CB10 1SA Fax: +44 (0)1223 494919
    Your message has been successfully submitted and would be delivered to recipients shortly.