Loading ...
Sorry, an error occurred while loading the content.

problem in code

Expand Messages
  • rohitgole2003
    hi to all; please if anybody can figure out the problem in the code,i will be very thankful: problem is: actually i want to print the CDS starting number of
    Message 1 of 5 , Jul 2, 2005
    • 0 Attachment
      hi to all;

      please
      if anybody can figure out the problem in the code,i will be very thankful:

      problem is: actually i want to print the "CDS" starting number of a
      'genbank' file and then printing the corresponding accession number,
      (the single file contains more then 20 thousand of genbank file means
      lots of genbank file is copied to a single file)

      but on running the program i am not getting the corresponding CDS
      number with the accession number

      but when i run the program to find out the size of array containing
      the CDS number i got less than the lenth of array containing accession
      number



      code is :

      #!/usr/bin/perl -w

      use Bio::Perl;
      use warnings;
      @files = ( );
      $folder = 'unzipped';

      # open the folder
      unless(opendir(FOLDER, $folder))
      {
      print "Cannot open folder $folder!\n";
      exit;
      }

      # read the contents of the folder
      @files = readdir(FOLDER);
      # close the folder
      closedir(FOLDER);
      #to open the file

      $pointer = 2;
      while($pointer<scalar @files){

      unless(open(MYFILE,$files[$pointer]))
      {
      print "File can't be opened due to wrong name or non availability of
      file\n";
      }
      print $files[$pointer],"\n\n";
      #collects all data of a file in an array
      @file_data = <MYFILE>;
      $p=0;
      $count=1;
      @cds = ( );
      @accession = ( );
      print scalar @file_data,"\n";
      #it reads the sequence of a single genebank file
      @seq_object = read_all_sequences($files[$pointer],'genbank');
      foreach my $line (@file_data)
      {
      #it checks the presence of line containing CDS
      if($line =~ /\s{5}CDS\s{13}/)
      {
      $line =~ s/[\sCDS>]//g;
      $line =~ s/[\<]//g;
      @cds = split('',$line);
      @temp = ( );
      $true = 1;
      $n=0;
      #removes '.' and the digit[0-9] after '.'
      while($true)
      {
      if($cds[$n] eq '.'){$true=0;}
      else{ $temp[$n]=$cds[$n];
      $n++;
      }
      }
      $num = join('',@temp);
      $cds[$p] = $num;
      print $num,"\n";
      $accession[$p] = $seq_object[$p]->accession_number();
      $p++;
      }

      $count++;
      }
      print scalar @cds,"\n";
      print scalar @accession,"\n";
      $pointer++;
      print "\n\n\n\n";
      }
      exit;


      thanx
    • Charles K. Clarkson
      ... First we need to see is examples of what a line of data might look like. Then we would need to know what you are trying to pull out of those sample lines.
      Message 2 of 5 , Jul 3, 2005
      • 0 Attachment
        rohitgole2003 <> wrote:

        : please
        : if anybody can figure out the problem in the code,i will be very
        : thankful:
        :
        : problem is: actually i want to print the "CDS" starting number
        : of a 'genbank' file and then printing the corresponding
        : accession number, (the single file contains more then 20
        : thousand of genbank file means lots of genbank file is copied to
        : a single file)
        :
        : but on running the program i am not getting the corresponding
        : CDS number with the accession number
        :
        : but when i run the program to find out the size of array
        : containing the CDS number i got less than the lenth (sic) of
        : array containing accession number

        First we need to see is examples of what a line of data might
        look like. Then we would need to know what you are trying to pull
        out of those sample lines. Can you provide those sample lines of
        data?


        HTH,

        Charles K. Clarkson
        --
        Mobile Homes Specialist
        254 968-8328
      • rohitgole2003
        this is the sample of file: like that i have lots of file with in the single file LOCUS NM_015698 1669 bp mRNA linear PRI
        Message 3 of 5 , Jul 3, 2005
        • 0 Attachment
          this is the sample of file:

          like that i have lots of file with in the single file


          LOCUS NM_015698 1669 bp mRNA linear PRI
          23-APR-2005
          DEFINITION Homo sapiens G patch domain and KOW motifs (GPKOW), mRNA.
          ACCESSION NM_015698
          VERSION NM_015698.3 GI:39725663
          KEYWORDS .
          SOURCE Homo sapiens (human)
          ORGANISM Homo sapiens
          Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
          Euteleostomi;
          Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini;
          Hominidae; Homo.
          REFERENCE 1 (bases 1 to 1669)
          AUTHORS Schindelhauer,D., Hellebrand,H., Grimm,L., Bader,I.,
          Meitinger,T.,
          Wehnert,M., Ross,M. and Meindl,A.
          TITLE Long-range map of a 3.5-Mb region in Xp11.23-22 with a
          sequence-ready map from a 1.1-Mb gene-rich interval
          JOURNAL Genome Res. 6 (11), 1056-1069 (1996)
          PUBMED 8938429
          COMMENT PROVISIONAL REFSEQ: This record has not yet been subject
          to final
          NCBI review. The reference sequence was derived from
          BC000397.2.
          On Dec 11, 2003 this sequence version replaced gi:15811781.
          FEATURES Location/Qualifiers
          source 1..1669
          /organism="Homo sapiens"
          /mol_type="mRNA"
          /db_xref="taxon:9606"
          /chromosome="X"
          /map="Xp11.23"
          gene 1..1669
          /gene="GPKOW"
          /note="synonym: T54"
          /db_xref="GeneID:27238"
          /db_xref="HPRD:HPRD_06737"
          CDS 8..1438
          /gene="GPKOW"
          /note="T54 protein;
          go_component: intracellular [goid 0005622]
          [evidence IEA];
          go_function: nucleic acid binding [goid 0003676]
          [evidence
          IEA];
          go_process: biological_process unknown [goid 0000004]
          [evidence ND]"
          /codon_start=1
          /product="G patch domain and KOW motifs"
          /protein_id="NP_056513.2"
          /db_xref="GI:15811782"
          /db_xref="GeneID:27238"
          /db_xref="HPRD:HPRD_06737"

          /translation="MADSKEGVLPLTAASTAPISFGFTRTSARRRLADSGDGAGPSPE

          EKDFLKTVEGRELQSVKPQEAPKELVIPLIQNGHRRQPPARPPGPSTDTGALADGVVS

          QAVKELIAESKKSLEERENAGVDPTLAIPMIQKGCTPSGEGADSEPRAETVPEEANYE

          AVPVEAYGLAMLRGMGWKPGEGIGRTFNQVVKPRVNSLRPKGLGLGANLTEAQALTPT

          GPSRMPRPDEEQEKDKEDQPQGLVPGGAVVVLSGPHRGLYGKVEGLDPDNVRAMVRLA

          VGSRVVTVSEYYLRPVSQQEFDKNTLDLRQQNGTASSRKTLWNQELYIQQDNSERKRK

          HLPDRQDGPAAKSEKAAPRSQHWLHRDLRVRFVDNMYKGGQYYNTKMIIEDVLSPDTC

          VCRTDEGRVLEGLREDMLETLVPKAEGDRVMVVLGPQTGRVGHLLSRDRARSRALVQL
          PRENQVVELHYDAICQYMGPSDTDDD"
          misc_feature 491..631
          /gene="GPKOW"
          /note="G_patch; Region: glycine rich nucleic binding
          domain"
          /db_xref="CDD:7365"
          variation complement(1302..1303)
          /gene="GPKOW"
          /replace="c"
          /replace="t"
          /db_xref="dbSNP:12012419"
          ORIGIN
          1 gagcaagatg gctgactcca aagagggtgt tttgccgctg acggctgctt
          ccactgcccc
          61 aatttcattc ggcttcactc gcacgtccgc acggaggcgg ctggccgact
          cgggagacgg
          121 cgcggggcca tctccggagg agaaggattt cttgaaaacc gtggaaggga
          gggagctgca
          181 gagtgtgaag ccccaggagg cccccaagga actcgtcatc cctttgatcc
          agaatggcca
          241 tcgcaggcag ccaccagccc ggccccctgg gccatccaca gatactgggg
          ccttggcgga
          301 tggggtggtg tcccaggctg tgaaggagct cattgcggaa tccaagaagt
          ctctggaaga
          361 gagagagaat gcgggtgtcg accccacgct cgctatcccc atgatccaga
          aaggatgcac
          421 ccccagcggg gaaggggcag acagcgaacc ccgggcagag acagtgccag
          aggaggctaa
          481 ttatgaggcg gtccccgtgg aggcctatgg gctggccatg ctgcggggca
          tgggctggaa
          541 acctggcgag ggcatcggcc gcaccttcaa tcaagtagtg aagccccgtg
          tcaactcact
          601 gaggcccaag gggttagggc tgggtgccaa cctgaccgag gcccaggcct
          tgacccccac
          661 tggcccctcc cgcatgccaa gaccagatga ggagcaagag aaagataagg
          aagatcagcc
          721 tcaagggctg gtgcctggag gagctgtggt ggttctttct ggccctcacc
          gaggcctcta
          781 tgggaaggtg gaaggccttg atcctgacaa tgttcgggcc atggttcgtc
          tggctgtggg
          841 gagccgggtg gtgactgtta gtgagtacta cctgcggcct gtctcccagc
          aggagtttga
          901 caagaacacc ttggatctca ggcaacagaa cggaactgcc tcatcacgga
          agaccctctg
          961 gaatcaagaa ctctacatcc agcaggacaa ctcagagagg aagcggaaac
          accttccaga
          1021 ccgacaggat gggcctgcag ccaagagtga gaaagcagcc cccagaagtc
          agcactggtt
          1081 gcacagggac ctgcgtgtgc ggtttgtgga caacatgtac aaaggaggcc
          aatattacaa
          1141 caccaagatg ataattgaag atgtcctaag cccagatacc tgtgtatgtc
          ggacagatga
          1201 aggccgagtc ctggaaggcc tgagggaaga catgctggag accctggttc
          ccaaggcaga
          1261 gggtgaccgt gtgatggtgg tgctgggccc acagactgga agggtgggac
          atttgctgag
          1321 ccgggacaga gcacggagcc gggctttggt gcaactgcca agagaaaatc
          aggtggtgga
          1381 gcttcactac gatgccatct gccagtacat gggccctagt gacacagatg
          atgactgacc
          1441 catgggactc ctcccatccc ccaggctggt accagttctg taccatatga
          gaaagttgcc
          1501 ttcagaaggt gggaagatca ttgttccatc ctctacttct ggtgcagtcc
          tgggacaagg
          1561 acaagggaaa gggatgggtg aaccagtagg gaagctagaa acaaacccaa
          tatttaccaa
          1621 aatttagggt ataataaaaa ccatttcaag taaaaaaaaa aaaaaaaaa
          //
        • Charles K. Clarkson
          ... Alrighty, that answers part of my question. Assuming some lines have wrapped, we now know what the file might look like. Now take a line that matches what
          Message 4 of 5 , Jul 4, 2005
          • 0 Attachment
            rohitgole2003 <> wrote:
            : this is the sample of file:
            :
            : like that i have lots of file with in the single file

            Alrighty, that answers part of my question. Assuming some lines
            have wrapped, we now know what the file might look like. Now take a
            line that matches what you are looking for and tells us what you
            expect to get when done.

            I think this is the only one you are trying to match. What do
            you want to get out of this line (or any other line you are trying
            to match)?

            : CDS 8..1438



            HTH,

            Charles K. Clarkson
            --
            Mobile Homes Specialist
            254 968-8328


            P.S. The subject to your reply post should be
            "RE: [PBML] problem in code" or
            "RE: problem in code". It should not restart the thread.
          • rohit gole
            i have solved my problem actually there is problem with the genebank file actually after finish one genbank file ,the file ends with the sign // And so
            Message 5 of 5 , Jul 4, 2005
            • 0 Attachment
              i have solved my problem
              actually there is problem with the genebank file

              actually after finish one genbank file ,the file ends with the sign "//"
              And so somewhere it is missing and that is why my program was not able to read that and that is why i got the wrong total cds start position

              thanx

              rohit singh
              Biological science and Bio-engineering
              IIT Kanpur



              "Charles K. Clarkson" <cclarkson@...> wrote:
              rohitgole2003 <> wrote:
              : this is the sample of file:
              :
              : like that i have lots of file with in the single file

              Alrighty, that answers part of my question. Assuming some lines
              have wrapped, we now know what the file might look like. Now take a
              line that matches what you are looking for and tells us what you
              expect to get when done.

              I think this is the only one you are trying to match. What do
              you want to get out of this line (or any other line you are trying
              to match)?

              : CDS 8..1438



              HTH,

              Charles K. Clarkson
              --
              Mobile Homes Specialist
              254 968-8328


              P.S. The subject to your reply post should be
              "RE: [PBML] problem in code" or
              "RE: problem in code". It should not restart the thread.




              Unsubscribing info is here: http://help.yahoo.com/help/us/groups/groups-32.html



              SPONSORED LINKS
              C programming language Computer programming languages The c programming language C programming language List of programming languages Programming languages

              ---------------------------------
              YAHOO! GROUPS LINKS


              Visit your group "perl-beginner" on the web.

              To unsubscribe from this group, send an email to:
              perl-beginner-unsubscribe@yahoogroups.com

              Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service.


              ---------------------------------




              ---------------------------------
              Free antispam, antivirus and 1GB to save all your messages
              Only in Yahoo! Mail: http://in.mail.yahoo.com

              [Non-text portions of this message have been removed]
            Your message has been successfully submitted and would be delivered to recipients shortly.