Loading ...
Sorry, an error occurred while loading the content.
 

RE: [PBML] Process binary files

Expand Messages
  • Nathan.Jeffrey@dhs.vic.gov.au
    Hmm. Reading binary files always makes me think of unpack(). Although I ve not been ever able to sucessfully use it, I hear it s very good. N Charles K.
    Message 1 of 6 , Oct 31, 2002
      Hmm. Reading binary files always makes me think of unpack(). Although
      I've not been ever able to sucessfully use it, I hear it's very good.

      N





      "Charles K. Clarkson" <cclarkson@...>
      01/11/2002 03:44 PM
      Please respond to perl-beginner


      To: <perl-beginner@yahoogroups.com>
      cc:
      Subject: RE: [PBML] Process binary files


      matt_johnsson [matt_johnsson@...] wrote:
      :
      : I have a perl script that processes a binary file with
      : variable length records in it. I'm trying to split it
      : into several binary files, one per record found (using
      : a part of the data to build the filename).

      : Each record starts with the string "HEADER". My script
      : does that but it does it in a very slow and ugly way,
      : I'm sure you will all agree... Could I please have
      : some ideas on how to do it better and faster!
      :
      : The bulk of the processing is.... :
      :
      : open IN,"< $filename" or die "can not open $filename to read";
      : binmode IN;
      : # Read the entire file byte by byte into a array called items
      : while (read IN,$buffer,1) {
      : push @items, $buffer;
      : }

      I'm not sure splitting the $buffer is the best
      approach. In any case the following two lines are
      considered to be efficient when I have investigated
      them separately. (I didn't benchmark them.)

      read IN, my $buffer, -s IN;
      my @items = split //, $buffer;

      : my $index=0;
      : while ($index < $#items+1){
      [snip]
      : $index++;
      : }

      foreach my $index ( 0 .. $#items ) {
      }

      Using 'foreach' instead of 'while' keeps $index in
      the scope of the block that uses it. It is also a lot
      easier to read. Whenever you find yourself writing code
      that manipulates a single array by its indexes,
      consider using a different algorithm. This one avoids
      some of perl's most powerful features.

      : if ($items[$index] eq 'H') {
      : if ( $items[$index+1] eq 'E' &&
      : $items[$index+2] eq 'A' &&
      : $items[$index+3] eq 'D' &&
      : $items[$index+4] eq 'E' &&
      : $items[$index+5] eq 'R' ) {
      [snip]
      : }
      : }

      Why separate these two 'if' blocks:

      if ( $items[$index] eq 'H'
      && $items[$index+1] eq 'E'
      && $items[$index+2] eq 'A'
      && $items[$index+3] eq 'D'
      && $items[$index+4] eq 'E'
      && $items[$index+5] eq 'R' ) {

      : my $serial =
      [snipped gruesomely long string]

      my $serial = join '', $items[$index + 28 .. $index + 47];

      Here's a powerful clue to inefficiency. First the file
      is split by character then it is joined back together
      in the algorithm. Why split it into an array in the first
      place?

      Another 'split' solution might look like this.
      Check out 'substr' for an efficient method to extract
      the serial from each $item. You'll need to put "HEADER"
      back into your print statements.

      foreach my $item ( split /HEADER/, $buffer ) {
      print "HEADER$item\n";
      }

      : # Strip out trailing blanks from the SERIAL
      : # so the new filename does not end in blanks
      : $serial =~ s/ *$//;

      The FAQ shows this standard form. It covers
      all trailing white space. Using '+' instead of
      '*' means substitution is only needed if the match
      is at least one character long.

      $serial =~ s/\s+$//;

      : print "$serial\n";
      :
      : # Build the new filename
      : $outfile = $outfilebase.".".$serial;

      $outfile = "$outfilebase.$serial";

      : # If the file already exists do not
      : # create a new one
      : if ( ! -e $outfile ) {
      : #no such file found, creating a new one
      : close OUT;
      : open OUT,"> $outfile"
      : or die "can not open file $outfile to write\n";
      : binmode OUT;
      : $fileisopen = 'Y';
      : }

      What if the file does exist? We don't create a
      new one, but we also don't change to a new file
      either. We just keep printing to the currently
      open one. 'close OUT' is unnecessary. 'open' will
      close the old file for you.

      : }
      :}
      : if ($fileisopen eq 'Y') {
      : print OUT $items[$index];
      : }
      : $index++;
      : }

      Perl is excellent at manipulating strings. That's
      one reason why Larry wrote it. Once the file has been
      opened in binmode, why not think of it as a big long
      string.


      HTH,

      Charles K. Clarkson
      --
      Head Bottle Washer,
      Clarkson Energy Homes, Inc.
      254 968-8328





      Unsubscribing info is here:
      http://help.yahoo.com/help/us/groups/groups-32.html

      Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/







      _________________________________________________________________________________

      This email contains confidential information intended only for the person named above and may be subject to legal privilege. If you are not the intended recipient, any disclosure, copying or use of this information is prohibited. The Department provides no guarantee that this communication is free of virus or that it has not been intercepted or interfered with. If you have received this email in error or have any other concerns regarding its transmission, please notify Postmaster@...
      _________________________________________________________________________________
    • Hans Ginzel
      ... Did not benchmark it, but you can try: ($WhereIam, $CmdName) = $0 =~ m#^(.*[/ ]|[A-Z] :)?(.+)$#i; $opt{debug}++; $BASNAME = out_ ; $SPLIT_STRING =
      Message 2 of 6 , Nov 1, 2002
        On Thu, Oct 31, 2002 at 02:43:46PM -0000, matt_johnsson wrote:
        > I have a perl script that processes a binary file with
        > variable lenght records in it. I'm trying to split it
        > into several binary files, one per record found (using a part of
        > the data to build the filename).
        > Each record starts with the string "HEADER".

        Did not benchmark it, but you can try:

        ($WhereIam, $CmdName) = $0 =~ m#^(.*[/\\]|[A-Z]\:)?(.+)$#i;

        $opt{debug}++;
        $BASNAME = "out_";
        $SPLIT_STRING = "HEADER";

        # perldoc perlvar
        undef $/; # enable "slurp" mode
        @parts = split /$SPLIT_STRING/o, <>; # whole file now here
        shift @parts; # throw out the first epmty part

        foreach (@parts) {
        ($filename = substr, $_, 22, 20) =~ s/\s*$//; # check indexes yourself
        warn "filename = `$filename'" if $opt{debug};
        -e ($filename="$BASNAME.$filename") and next; # bild filename, skip writting file, if it exists;

        open OUT, ">$filename" or die "$CmdName: Cannot open output file `$filename': $!.\n";
        # binmode OUT; # depends on your OS, see perldoc -f binmode
        print "$SPLIT_STRING$part"; # do not forget the cut off "HEADER" string by splitting
        close OUT or die "$CmdName: Cannot close output file `$filename': $!.\n";
        }

        # Take care of endianity (perldoc -f pack).

        > Regards

        Hans
      • Triphuong Nguyen
        ... From: Hans Ginzel [mailto:hans@matfyz.cz] Sent: Friday, November 01, 2002 1:31 AM To: perl-beginner@yahoogroups.com Subject: Re: [PBML] Process binary
        Message 3 of 6 , Nov 1, 2002
          -----Original Message-----
          From: Hans Ginzel [mailto:hans@...]
          Sent: Friday, November 01, 2002 1:31 AM
          To: perl-beginner@yahoogroups.com
          Subject: Re: [PBML] Process binary files



          On Thu, Oct 31, 2002 at 02:43:46PM -0000, matt_johnsson wrote:
          > I have a perl script that processes a binary file with
          > variable lenght records in it. I'm trying to split it
          > into several binary files, one per record found (using a part of
          > the data to build the filename).
          > Each record starts with the string "HEADER".

          Did not benchmark it, but you can try:

          ($WhereIam, $CmdName) = $0 =~ m#^(.*[/\\]|[A-Z]\:)?(.+)$#i;

          $opt{debug}++;
          $BASNAME = "out_";
          $SPLIT_STRING = "HEADER";

          # perldoc perlvar
          undef $/; # enable "slurp" mode
          @parts = split /$SPLIT_STRING/o, <>; # whole file now here
          shift @parts; # throw out the first epmty part

          foreach (@parts) {
          ($filename = substr, $_, 22, 20) =~ s/\s*$//; # check indexes
          yourself
          warn "filename = `$filename'" if $opt{debug};
          -e ($filename="$BASNAME.$filename") and next; # bild filename,
          skip writting file, if it exists;

          open OUT, ">$filename" or die "$CmdName: Cannot open output file
          `$filename': $!.\n";
          # binmode OUT; # depends on your OS, see perldoc -f binmode
          print "$SPLIT_STRING$part"; # do not forget the cut off "HEADER"
          string by splitting
          close OUT or die "$CmdName: Cannot close output file `$filename':
          $!.\n";
          }

          # Take care of endianity (perldoc -f pack).

          > Regards

          Hans


          Yahoo! Groups Sponsor

          ADVERTISEMENT

          <http://rd.yahoo.com/M=237459.2482214.3917349.2146399/D=egroupweb/S=17050069
          51:HM/A=1267611/R=0/*http://ad.doubleclick.net/jump/N2524.Yahoo/B1071650;sz=
          300x250;ord=1036143089945311?>

          <http://us.adserver.yahoo.com/l?M=237459.2482214.3917349.2146399/D=egroupmai
          l/S=:HM/A=1267611/rand=878897956>

          Unsubscribing info is here:
          http://help.yahoo.com/help/us/groups/groups-32.html
          <http://help.yahoo.com/help/us/groups/groups-32.html>

          Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service
          <http://docs.yahoo.com/info/terms/> .




          [Non-text portions of this message have been removed]
        • prakash
          This code is working fine.. but I am interested in creating CLEAN and BEST code.. I never say my code is the best code.. always there is possibility for
          Message 4 of 6 , Nov 1, 2002
            This code is working fine.. but I am interested in creating CLEAN and BEST
            code..
            I never say my code is the best code.. always there is possibility for
            improvement
            Please pass your valuable suggestion to improve the code... thank you very
            much..


            #!/usr/bin/perl -w
            $|=1;

            use lib '/home/aebolts/cgi-bin/PM';
            use strict;
            use Pros;
            use CGI::Carp qw(fatalsToBrowser);
            use CGI qw(:standard);

            my $username = cookie('user');
            if($username){
            my $cgi = new CGI;
            my $aPros = Pros->new($username);
            if($cgi->param('add')) {
            if(!$cgi->param('view'))
            { if(!$aPros->addInfo($cgi)){$aPros->printAll();} }
            $aPros->printAddNew();
            }
            elsif($cgi->param('update')) {
            $aPros->update($cgi);
            }
            elsif($cgi->param('aprospect')) {
            if(!$cgi->param('view'))
            { if($aPros->uploadInfo($cgi)){ $aPros->printAll(); } }
            $aPros->printViewProspect();
            }
            elsif($cgi->param('upload')) {
            if($cgi->param('file'))
            { if($aPros->uploadProspects($cgi)){ $aPros->printAll(); } }
            $aPros->printUploadNew();
            }
            else{
            $aPros->printAll($cgi->param('sort'));
            }
            }
            ######## PROGRAM HAS NO COOKIE TO EAT ##############################
            else{
            print header();
            print qq~<meta http-equiv="refresh" target="new"
            content="0;URL=http://www.asdfasdf.com">~;
            }

            exit(0);
          Your message has been successfully submitted and would be delivered to recipients shortly.