Loading ...
Sorry, an error occurred while loading the content.

Re: [PBML] A File PArse Beginner Qeustion

Expand Messages
  • Charles K. Clarkson
    ... How big is the file? Do you want to read it in all at once or one story at a time? How are the news stories arranged? Are they catacgorized, etc.? What do
    Message 1 of 3 , Mar 29, 2002
    • 0 Attachment
      "Ofir" <ofirb1@...> asked:

      : Hi all,
      :
      : I have a file I need to read with the following structure:
      :
      : Line 1: URL (still need to split values in it)
      : Line 2: Time and Date (still need to split values in it)
      : Line 3: News Item (still need to split values in it)
      : Line 4: Item Source (still need to split values in it)
      : Line 5: Empty Line(!)
      : Line 6: URL (still need to split values in it)
      : Line 7: Time and Date (still need to split values in it)
      : Line 8: News Item (still need to split values in it)
      : Line 9: Item Source (still need to split values in it)
      : Line 10: Empty Line(!)
      : ...
      : ...
      :
      : How can I parse the file, read the first four lines, split data
      : on each line, move next to the next four line , skipping the
      : blank lines, and come up with the news?

      How big is the file? Do you want to read it in all at once or
      one story at a time? How are the news stories arranged? Are
      they catacgorized, etc.? What do you want to do once you get
      the news? How will it be displayed? Is there some field which
      can uniquely identify a story or do we have to provide our
      own unique id?

      Think of ten lines given as 2 records separated by "\n\n".
      'perlvar' describes $/ as an input record seperator.This will
      get all the news stories and place them in records.

      sub get_records {
      my $file_name = shift;
      local *FH;
      open FH, $file_name or die "Cannot open $file_name: $!";
      local $/ = "\n\n";
      return <FH>;
      }

      my @records = get_records( $news_file );

      Now each record must be split into thier various fields:

      my @news;
      foreach ( @records ) {
      my %news;
      @news{ qw/url date news source/ } = split /\n/;
      push @news, \%news;
      }

      print Dumper \@news;

      I got:

      [
      {
      'url' => 'Line 1: URL (still need to split values in it)',
      'date' => 'Line 2: Time and Date (still need to split values in it)',
      'news' => 'Line 3: News Item (still need to split values in it)',
      'source' => 'Line 4: Item Source (still need to split values in it)'
      },
      {
      'url' => 'Line 6: URL (still need to split values in it)',
      'date' => 'Line 7: Time and Date (still need to split values in it)',
      'news' => 'Line 8: News Item (still need to split values in it)',
      'source' => 'Line 9: Item Source (still need to split values in it)'
      }
      ];

      Of course there's no reason to keep it all separate:

      sub get_records {
      my $file_name = shift;
      local *FH;
      open FH, $file_name or die "Cannot open $file_name: $!";
      local $/ = "\n\n";
      my @news;
      while ( <FH> ) {
      my %news;
      @news{ qw/url date news source/ } = split /\n/;
      push @news, \%news;
      }
      return @news;
      }

      my @news = get_records( 'in.txt');

      print Dumper \@news;



      HTH,

      Charles K. Clarkson
      --
      Clarkson Energy Homes, Inc.
      CJ Web Work - Domains for Real Estate Investors.

      E Pluribus Unum -- One from many.
    • Ofir
      Thank you, dear Charles! Now, what is the routine for parsing a simpler file, i.e: file with a four-line record, WITHOUT the blank line which separate the
      Message 2 of 3 , Mar 30, 2002
      • 0 Attachment
        Thank you, dear Charles!
        Now, what is the routine for parsing a simpler file, i.e:
        file with a four-line record, WITHOUT the blank line which separate the
        records?
        i.e:
        Line 1
        Line 2
        Line 3
        Line 4
        Line 1
        Line 2
        Line 3
        Line 4
        I know it's basic, but I appreciate any help!
        TIA!
        Ofir

        ----- Original Message -----
        From: "Charles K. Clarkson" <cclarkson@...>
        To: <perl-beginner@yahoogroups.com>
        Sent: Friday, March 29, 2002 11:25 AM
        Subject: Re: [PBML] A File PArse Beginner Qeustion


        > "Ofir" <ofirb1@...> asked:
        >
        > : Hi all,
        > :
        > : I have a file I need to read with the following structure:
        > :
        > : Line 1: URL (still need to split values in it)
        > : Line 2: Time and Date (still need to split values in it)
        > : Line 3: News Item (still need to split values in it)
        > : Line 4: Item Source (still need to split values in it)
        > : Line 5: Empty Line(!)
        > : Line 6: URL (still need to split values in it)
        > : Line 7: Time and Date (still need to split values in it)
        > : Line 8: News Item (still need to split values in it)
        > : Line 9: Item Source (still need to split values in it)
        > : Line 10: Empty Line(!)
        > : ...
        > : ...
        > :
        > : How can I parse the file, read the first four lines, split data
        > : on each line, move next to the next four line , skipping the
        > : blank lines, and come up with the news?
        >
        > How big is the file? Do you want to read it in all at once or
        > one story at a time? How are the news stories arranged? Are
        > they catacgorized, etc.? What do you want to do once you get
        > the news? How will it be displayed? Is there some field which
        > can uniquely identify a story or do we have to provide our
        > own unique id?
        >
        > Think of ten lines given as 2 records separated by "\n\n".
        > 'perlvar' describes $/ as an input record seperator.This will
        > get all the news stories and place them in records.
        >
        > sub get_records {
        > my $file_name = shift;
        > local *FH;
        > open FH, $file_name or die "Cannot open $file_name: $!";
        > local $/ = "\n\n";
        > return <FH>;
        > }
        >
        > my @records = get_records( $news_file );
        >
        > Now each record must be split into thier various fields:
        >
        > my @news;
        > foreach ( @records ) {
        > my %news;
        > @news{ qw/url date news source/ } = split /\n/;
        > push @news, \%news;
        > }
        >
        > print Dumper \@news;
        >
        > I got:
        >
        > [
        > {
        > 'url' => 'Line 1: URL (still need to split values in it)',
        > 'date' => 'Line 2: Time and Date (still need to split values in it)',
        > 'news' => 'Line 3: News Item (still need to split values in it)',
        > 'source' => 'Line 4: Item Source (still need to split values in it)'
        > },
        > {
        > 'url' => 'Line 6: URL (still need to split values in it)',
        > 'date' => 'Line 7: Time and Date (still need to split values in it)',
        > 'news' => 'Line 8: News Item (still need to split values in it)',
        > 'source' => 'Line 9: Item Source (still need to split values in it)'
        > }
        > ];
        >
        > Of course there's no reason to keep it all separate:
        >
        > sub get_records {
        > my $file_name = shift;
        > local *FH;
        > open FH, $file_name or die "Cannot open $file_name: $!";
        > local $/ = "\n\n";
        > my @news;
        > while ( <FH> ) {
        > my %news;
        > @news{ qw/url date news source/ } = split /\n/;
        > push @news, \%news;
        > }
        > return @news;
        > }
        >
        > my @news = get_records( 'in.txt');
        >
        > print Dumper \@news;
        >
        >
        >
        > HTH,
        >
        > Charles K. Clarkson
        > --
        > Clarkson Energy Homes, Inc.
        > CJ Web Work - Domains for Real Estate Investors.
        >
        > E Pluribus Unum -- One from many.
        >
        >
        >
        >
        >
        >
        > Unsubscribing info is here:
        http://help.yahoo.com/help/us/groups/groups-32.html
        >
        > Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
        >
        >
        >
      • Charles K. Clarkson
        ... perl keeps track of line numbers in a file using $. In a while block, ($. % 4) will fail every fourth line. So, push @records, $_; next if $. % 4; will
        Message 3 of 3 , Mar 30, 2002
        • 0 Attachment
          "Ofir" <ofirb1@...> asked:

          : Now, what is the routine for parsing a simpler file, i.e:
          : file with a four-line record, WITHOUT the blank line which separate the
          : records?
          : i.e:
          : Line 1
          : Line 2
          : Line 3
          : Line 4
          : Line 1
          : Line 2
          : Line 3
          : Line 4
          : I know it's basic, but I appreciate any help!

          perl keeps track of line numbers in a file using $. In a
          'while' block, ($. % 4) will fail every fourth line. So,

          push @records, $_;
          next if $. % 4;

          will fall through with 4 more records each time.

          use Data::Dumper;

          print Dumper get_records('new.txt');

          sub get_records {
          my $file_name = shift;
          local *FH;
          open FH, $file_name or die "Cannot open $file_name: $!";
          my( @records, @news );
          while ( <FH> ) {
          chomp;
          push @records, $_;
          next if $. % 4;

          my %news;
          @news{ qw/url date news source/ } = @records;
          push @news, \%news;
          @records = ();
          }

          return @news;
          }


          HTH,

          Charles K. Clarkson
          --
          Clarkson Energy Homes, Inc.
          CJ Web Work - Domains for Real Estate Investors.

          E Pluribus Unum -- One from many.
        Your message has been successfully submitted and would be delivered to recipients shortly.