Loading ...
Sorry, an error occurred while loading the content.
 

Process binary files

Expand Messages
  • matt_johnsson
    Hello all, I have a perl script that processes a binary file with variable lenght records in it. I m trying to split it into several binary files, one per
    Message 1 of 6 , Oct 31, 2002
      Hello all,

      I have a perl script that processes a binary file with
      variable lenght records in it. I'm trying to split it
      into several binary files, one per record found (using a part of
      the data to build the filename).
      Each record starts with the string "HEADER".
      My script does that but it does it in a very slow and ugly way, I'm
      sure you will all agree...
      Could I please have some ideas on how to do it better and faster!

      Regards
      /Matt

      The bulk of the processing is.... :

      open (IN,"< $filename") or die "can not open $filename to read";
      binmode{IN};
      # Read the entire file byte by byte into a array called items
      while (read IN,$buffer,1) {
      push (@items, $buffer);
      }
      my $index=0;
      while ($index < $#items+1){
      if ($items[$index] eq 'H') {
      if ( $items[$index+1] eq 'E' &&
      $items[$index+2] eq 'A' &&
      $items[$index+3] eq 'D' &&
      $items[$index+4] eq 'E' &&
      $items[$index+5] eq 'R' ) {
      my
      $SERIAL="$items[$index+28]$items[$index+29]$items[$index+30]$items[$index+31]$items[$index+32]".

      "$items[$index+33]$items[$index+34]$items[$index+35]$items[$index+36]$items[$index+37]".

      "$items[$index+38]$items[$index+39]$items[$index+40]$items[$index+41]$items[$index+42]".

      "$items[$index+43]$items[$index+44]$items[$index+45]$items[$index+46]$items[$index+47]";
      # Strip out traling blanks from the SERIAL so
      the new filename does not end in blanks
      $SERIAL =~ s/ *$//;
      print "$SERIAL\n";
      # Build the new filename
      $outfile=$outfilebase.".".$SERIAL;
      # If the file already exists do not create a new one
      if ( ! -e $outfile ) {
      #no such file found, creating a new one
      close (OUT);
      open (OUT,"> $outfile") or die "can
      not open file $outfile to write\n";
      binmode(OUT);
      $fileisopen = 'Y';
      }
      }
      }
      if ($fileisopen eq 'Y') {
      print OUT $items[$index];
      }
      $index++;
      }
    • Charles K. Clarkson
      ... I m not sure splitting the $buffer is the best approach. In any case the following two lines are considered to be efficient when I have investigated them
      Message 2 of 6 , Oct 31, 2002
        matt_johnsson [matt_johnsson@...] wrote:
        :
        : I have a perl script that processes a binary file with
        : variable length records in it. I'm trying to split it
        : into several binary files, one per record found (using
        : a part of the data to build the filename).

        : Each record starts with the string "HEADER". My script
        : does that but it does it in a very slow and ugly way,
        : I'm sure you will all agree... Could I please have
        : some ideas on how to do it better and faster!
        :
        : The bulk of the processing is.... :
        :
        : open IN,"< $filename" or die "can not open $filename to read";
        : binmode IN;
        : # Read the entire file byte by byte into a array called items
        : while (read IN,$buffer,1) {
        : push @items, $buffer;
        : }

        I'm not sure splitting the $buffer is the best
        approach. In any case the following two lines are
        considered to be efficient when I have investigated
        them separately. (I didn't benchmark them.)

        read IN, my $buffer, -s IN;
        my @items = split //, $buffer;

        : my $index=0;
        : while ($index < $#items+1){
        [snip]
        : $index++;
        : }

        foreach my $index ( 0 .. $#items ) {
        }

        Using 'foreach' instead of 'while' keeps $index in
        the scope of the block that uses it. It is also a lot
        easier to read. Whenever you find yourself writing code
        that manipulates a single array by its indexes,
        consider using a different algorithm. This one avoids
        some of perl's most powerful features.

        : if ($items[$index] eq 'H') {
        : if ( $items[$index+1] eq 'E' &&
        : $items[$index+2] eq 'A' &&
        : $items[$index+3] eq 'D' &&
        : $items[$index+4] eq 'E' &&
        : $items[$index+5] eq 'R' ) {
        [snip]
        : }
        : }

        Why separate these two 'if' blocks:

        if ( $items[$index] eq 'H'
        && $items[$index+1] eq 'E'
        && $items[$index+2] eq 'A'
        && $items[$index+3] eq 'D'
        && $items[$index+4] eq 'E'
        && $items[$index+5] eq 'R' ) {

        : my $serial =
        [snipped gruesomely long string]

        my $serial = join '', $items[$index + 28 .. $index + 47];

        Here's a powerful clue to inefficiency. First the file
        is split by character then it is joined back together
        in the algorithm. Why split it into an array in the first
        place?

        Another 'split' solution might look like this.
        Check out 'substr' for an efficient method to extract
        the serial from each $item. You'll need to put "HEADER"
        back into your print statements.

        foreach my $item ( split /HEADER/, $buffer ) {
        print "HEADER$item\n";
        }

        : # Strip out trailing blanks from the SERIAL
        : # so the new filename does not end in blanks
        : $serial =~ s/ *$//;

        The FAQ shows this standard form. It covers
        all trailing white space. Using '+' instead of
        '*' means substitution is only needed if the match
        is at least one character long.

        $serial =~ s/\s+$//;

        : print "$serial\n";
        :
        : # Build the new filename
        : $outfile = $outfilebase.".".$serial;

        $outfile = "$outfilebase.$serial";

        : # If the file already exists do not
        : # create a new one
        : if ( ! -e $outfile ) {
        : #no such file found, creating a new one
        : close OUT;
        : open OUT,"> $outfile"
        : or die "can not open file $outfile to write\n";
        : binmode OUT;
        : $fileisopen = 'Y';
        : }

        What if the file does exist? We don't create a
        new one, but we also don't change to a new file
        either. We just keep printing to the currently
        open one. 'close OUT' is unnecessary. 'open' will
        close the old file for you.

        : }
        :}
        : if ($fileisopen eq 'Y') {
        : print OUT $items[$index];
        : }
        : $index++;
        : }

        Perl is excellent at manipulating strings. That's
        one reason why Larry wrote it. Once the file has been
        opened in binmode, why not think of it as a big long
        string.


        HTH,

        Charles K. Clarkson
        --
        Head Bottle Washer,
        Clarkson Energy Homes, Inc.
        254 968-8328
      • Nathan.Jeffrey@dhs.vic.gov.au
        Hmm. Reading binary files always makes me think of unpack(). Although I ve not been ever able to sucessfully use it, I hear it s very good. N Charles K.
        Message 3 of 6 , Oct 31, 2002
          Hmm. Reading binary files always makes me think of unpack(). Although
          I've not been ever able to sucessfully use it, I hear it's very good.

          N





          "Charles K. Clarkson" <cclarkson@...>
          01/11/2002 03:44 PM
          Please respond to perl-beginner


          To: <perl-beginner@yahoogroups.com>
          cc:
          Subject: RE: [PBML] Process binary files


          matt_johnsson [matt_johnsson@...] wrote:
          :
          : I have a perl script that processes a binary file with
          : variable length records in it. I'm trying to split it
          : into several binary files, one per record found (using
          : a part of the data to build the filename).

          : Each record starts with the string "HEADER". My script
          : does that but it does it in a very slow and ugly way,
          : I'm sure you will all agree... Could I please have
          : some ideas on how to do it better and faster!
          :
          : The bulk of the processing is.... :
          :
          : open IN,"< $filename" or die "can not open $filename to read";
          : binmode IN;
          : # Read the entire file byte by byte into a array called items
          : while (read IN,$buffer,1) {
          : push @items, $buffer;
          : }

          I'm not sure splitting the $buffer is the best
          approach. In any case the following two lines are
          considered to be efficient when I have investigated
          them separately. (I didn't benchmark them.)

          read IN, my $buffer, -s IN;
          my @items = split //, $buffer;

          : my $index=0;
          : while ($index < $#items+1){
          [snip]
          : $index++;
          : }

          foreach my $index ( 0 .. $#items ) {
          }

          Using 'foreach' instead of 'while' keeps $index in
          the scope of the block that uses it. It is also a lot
          easier to read. Whenever you find yourself writing code
          that manipulates a single array by its indexes,
          consider using a different algorithm. This one avoids
          some of perl's most powerful features.

          : if ($items[$index] eq 'H') {
          : if ( $items[$index+1] eq 'E' &&
          : $items[$index+2] eq 'A' &&
          : $items[$index+3] eq 'D' &&
          : $items[$index+4] eq 'E' &&
          : $items[$index+5] eq 'R' ) {
          [snip]
          : }
          : }

          Why separate these two 'if' blocks:

          if ( $items[$index] eq 'H'
          && $items[$index+1] eq 'E'
          && $items[$index+2] eq 'A'
          && $items[$index+3] eq 'D'
          && $items[$index+4] eq 'E'
          && $items[$index+5] eq 'R' ) {

          : my $serial =
          [snipped gruesomely long string]

          my $serial = join '', $items[$index + 28 .. $index + 47];

          Here's a powerful clue to inefficiency. First the file
          is split by character then it is joined back together
          in the algorithm. Why split it into an array in the first
          place?

          Another 'split' solution might look like this.
          Check out 'substr' for an efficient method to extract
          the serial from each $item. You'll need to put "HEADER"
          back into your print statements.

          foreach my $item ( split /HEADER/, $buffer ) {
          print "HEADER$item\n";
          }

          : # Strip out trailing blanks from the SERIAL
          : # so the new filename does not end in blanks
          : $serial =~ s/ *$//;

          The FAQ shows this standard form. It covers
          all trailing white space. Using '+' instead of
          '*' means substitution is only needed if the match
          is at least one character long.

          $serial =~ s/\s+$//;

          : print "$serial\n";
          :
          : # Build the new filename
          : $outfile = $outfilebase.".".$serial;

          $outfile = "$outfilebase.$serial";

          : # If the file already exists do not
          : # create a new one
          : if ( ! -e $outfile ) {
          : #no such file found, creating a new one
          : close OUT;
          : open OUT,"> $outfile"
          : or die "can not open file $outfile to write\n";
          : binmode OUT;
          : $fileisopen = 'Y';
          : }

          What if the file does exist? We don't create a
          new one, but we also don't change to a new file
          either. We just keep printing to the currently
          open one. 'close OUT' is unnecessary. 'open' will
          close the old file for you.

          : }
          :}
          : if ($fileisopen eq 'Y') {
          : print OUT $items[$index];
          : }
          : $index++;
          : }

          Perl is excellent at manipulating strings. That's
          one reason why Larry wrote it. Once the file has been
          opened in binmode, why not think of it as a big long
          string.


          HTH,

          Charles K. Clarkson
          --
          Head Bottle Washer,
          Clarkson Energy Homes, Inc.
          254 968-8328





          Unsubscribing info is here:
          http://help.yahoo.com/help/us/groups/groups-32.html

          Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/







          _________________________________________________________________________________

          This email contains confidential information intended only for the person named above and may be subject to legal privilege. If you are not the intended recipient, any disclosure, copying or use of this information is prohibited. The Department provides no guarantee that this communication is free of virus or that it has not been intercepted or interfered with. If you have received this email in error or have any other concerns regarding its transmission, please notify Postmaster@...
          _________________________________________________________________________________
        • Hans Ginzel
          ... Did not benchmark it, but you can try: ($WhereIam, $CmdName) = $0 =~ m#^(.*[/ ]|[A-Z] :)?(.+)$#i; $opt{debug}++; $BASNAME = out_ ; $SPLIT_STRING =
          Message 4 of 6 , Nov 1, 2002
            On Thu, Oct 31, 2002 at 02:43:46PM -0000, matt_johnsson wrote:
            > I have a perl script that processes a binary file with
            > variable lenght records in it. I'm trying to split it
            > into several binary files, one per record found (using a part of
            > the data to build the filename).
            > Each record starts with the string "HEADER".

            Did not benchmark it, but you can try:

            ($WhereIam, $CmdName) = $0 =~ m#^(.*[/\\]|[A-Z]\:)?(.+)$#i;

            $opt{debug}++;
            $BASNAME = "out_";
            $SPLIT_STRING = "HEADER";

            # perldoc perlvar
            undef $/; # enable "slurp" mode
            @parts = split /$SPLIT_STRING/o, <>; # whole file now here
            shift @parts; # throw out the first epmty part

            foreach (@parts) {
            ($filename = substr, $_, 22, 20) =~ s/\s*$//; # check indexes yourself
            warn "filename = `$filename'" if $opt{debug};
            -e ($filename="$BASNAME.$filename") and next; # bild filename, skip writting file, if it exists;

            open OUT, ">$filename" or die "$CmdName: Cannot open output file `$filename': $!.\n";
            # binmode OUT; # depends on your OS, see perldoc -f binmode
            print "$SPLIT_STRING$part"; # do not forget the cut off "HEADER" string by splitting
            close OUT or die "$CmdName: Cannot close output file `$filename': $!.\n";
            }

            # Take care of endianity (perldoc -f pack).

            > Regards

            Hans
          • Triphuong Nguyen
            ... From: Hans Ginzel [mailto:hans@matfyz.cz] Sent: Friday, November 01, 2002 1:31 AM To: perl-beginner@yahoogroups.com Subject: Re: [PBML] Process binary
            Message 5 of 6 , Nov 1, 2002
              -----Original Message-----
              From: Hans Ginzel [mailto:hans@...]
              Sent: Friday, November 01, 2002 1:31 AM
              To: perl-beginner@yahoogroups.com
              Subject: Re: [PBML] Process binary files



              On Thu, Oct 31, 2002 at 02:43:46PM -0000, matt_johnsson wrote:
              > I have a perl script that processes a binary file with
              > variable lenght records in it. I'm trying to split it
              > into several binary files, one per record found (using a part of
              > the data to build the filename).
              > Each record starts with the string "HEADER".

              Did not benchmark it, but you can try:

              ($WhereIam, $CmdName) = $0 =~ m#^(.*[/\\]|[A-Z]\:)?(.+)$#i;

              $opt{debug}++;
              $BASNAME = "out_";
              $SPLIT_STRING = "HEADER";

              # perldoc perlvar
              undef $/; # enable "slurp" mode
              @parts = split /$SPLIT_STRING/o, <>; # whole file now here
              shift @parts; # throw out the first epmty part

              foreach (@parts) {
              ($filename = substr, $_, 22, 20) =~ s/\s*$//; # check indexes
              yourself
              warn "filename = `$filename'" if $opt{debug};
              -e ($filename="$BASNAME.$filename") and next; # bild filename,
              skip writting file, if it exists;

              open OUT, ">$filename" or die "$CmdName: Cannot open output file
              `$filename': $!.\n";
              # binmode OUT; # depends on your OS, see perldoc -f binmode
              print "$SPLIT_STRING$part"; # do not forget the cut off "HEADER"
              string by splitting
              close OUT or die "$CmdName: Cannot close output file `$filename':
              $!.\n";
              }

              # Take care of endianity (perldoc -f pack).

              > Regards

              Hans


              Yahoo! Groups Sponsor

              ADVERTISEMENT

              <http://rd.yahoo.com/M=237459.2482214.3917349.2146399/D=egroupweb/S=17050069
              51:HM/A=1267611/R=0/*http://ad.doubleclick.net/jump/N2524.Yahoo/B1071650;sz=
              300x250;ord=1036143089945311?>

              <http://us.adserver.yahoo.com/l?M=237459.2482214.3917349.2146399/D=egroupmai
              l/S=:HM/A=1267611/rand=878897956>

              Unsubscribing info is here:
              http://help.yahoo.com/help/us/groups/groups-32.html
              <http://help.yahoo.com/help/us/groups/groups-32.html>

              Your use of Yahoo! Groups is subject to the Yahoo! Terms of Service
              <http://docs.yahoo.com/info/terms/> .




              [Non-text portions of this message have been removed]
            • prakash
              This code is working fine.. but I am interested in creating CLEAN and BEST code.. I never say my code is the best code.. always there is possibility for
              Message 6 of 6 , Nov 1, 2002
                This code is working fine.. but I am interested in creating CLEAN and BEST
                code..
                I never say my code is the best code.. always there is possibility for
                improvement
                Please pass your valuable suggestion to improve the code... thank you very
                much..


                #!/usr/bin/perl -w
                $|=1;

                use lib '/home/aebolts/cgi-bin/PM';
                use strict;
                use Pros;
                use CGI::Carp qw(fatalsToBrowser);
                use CGI qw(:standard);

                my $username = cookie('user');
                if($username){
                my $cgi = new CGI;
                my $aPros = Pros->new($username);
                if($cgi->param('add')) {
                if(!$cgi->param('view'))
                { if(!$aPros->addInfo($cgi)){$aPros->printAll();} }
                $aPros->printAddNew();
                }
                elsif($cgi->param('update')) {
                $aPros->update($cgi);
                }
                elsif($cgi->param('aprospect')) {
                if(!$cgi->param('view'))
                { if($aPros->uploadInfo($cgi)){ $aPros->printAll(); } }
                $aPros->printViewProspect();
                }
                elsif($cgi->param('upload')) {
                if($cgi->param('file'))
                { if($aPros->uploadProspects($cgi)){ $aPros->printAll(); } }
                $aPros->printUploadNew();
                }
                else{
                $aPros->printAll($cgi->param('sort'));
                }
                }
                ######## PROGRAM HAS NO COOKIE TO EAT ##############################
                else{
                print header();
                print qq~<meta http-equiv="refresh" target="new"
                content="0;URL=http://www.asdfasdf.com">~;
                }

                exit(0);
              Your message has been successfully submitted and would be delivered to recipients shortly.