Loading ...
Sorry, an error occurred while loading the content.

Re: [PBML] transforming text to collection of individual words

Expand Messages
  • Jeff 'japhy' Pinyan
    ... @words = split , $string; That s assuming words are groups of non-spaces. If you need a more appropriate definition, maybe you want @words = split
    Message 1 of 6 , Apr 2, 2002
    • 0 Attachment
      On Apr 2, I G T -- 5 said:

      >is this a correct way of transforming a text
      >into an array of its individual words ?

      @words = split ' ', $string;

      That's assuming "words" are groups of non-spaces. If you need a more
      appropriate definition, maybe you want

      @words = split /\W+/, $string;

      >and how can one scramble that final array ?

      See `perldoc -q shuffle` for the answer.

      --
      Jeff "japhy" Pinyan japhy@... http://www.pobox.com/~japhy/
      RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/
      ** Look for "Regular Expressions in Perl" published by Manning, in 2002 **
      <stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.
      [ I'm looking for programming work. If you like my work, let me know. ]
    • Jeff Eggen
      ... Here s one that I use in a web site indexer. Probably not better than Japhy s, just different. Instead of dumping the file s contents into an array, this
      Message 2 of 6 , Apr 2, 2002
      • 0 Attachment
        >>> idia_goes_t@... 04/02/02 01:13pm >>>
        >are there better/other ways ?

        Here's one that I use in a web site indexer. Probably not better than Japhy's, just different. Instead of dumping the file's contents into an array, this puts them in one scalar, which is then split up.

        # Start Code
        open FILE, "</some/file.txt" or die "Bleat: $!";
        my $contents = join ('', <FILE>);
        $contents =~ tr/A-Z/a-z/;
        my @words = ($contents =~ /\w+/g);
        # End Code

        HTH,
        Jeff Eggen
      • Jeff 'japhy' Pinyan
        ... I don t suggest using an array to store the ENTIRE file, but your method ends up doing almost the exact same thing. You read the entire file at once.
        Message 3 of 6 , Apr 2, 2002
        • 0 Attachment
          On Apr 2, Jeff Eggen said:

          >Here's one that I use in a web site indexer. Probably not better than
          >Japhy's, just different. Instead of dumping the file's contents into an
          >array, this puts them in one scalar, which is then split up.

          I don't suggest using an array to store the ENTIRE file, but your method
          ends up doing almost the exact same thing. You read the entire file at
          once.

          Instead of reading all of the file as a LIST of lines and then joining
          them together, why not read the entire file as one string?

          >open FILE, "</some/file.txt" or die "Bleat: $!";
          >my $contents = join ('', <FILE>);
          >$contents =~ tr/A-Z/a-z/;
          >my @words = ($contents =~ /\w+/g);

          open FILE, "< $file" or die "can't read $file: $!";
          my ($contents, @words);
          {
          local $/;
          $contents = lc <FILE>;
          }
          @words = $contents =~ /\w+/g; # or however you wish to get them
          close FILE;

          --
          Jeff "japhy" Pinyan japhy@... http://www.pobox.com/~japhy/
          RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/
          ** Look for "Regular Expressions in Perl" published by Manning, in 2002 **
          <stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.
          [ I'm looking for programming work. If you like my work, let me know. ]
        • Jeff Eggen
          ... Hey, I never thought of locally changing the input line separator to nothing before. And the lc at once too...neat. Thanks Japhy! Always handy to see a
          Message 4 of 6 , Apr 3, 2002
          • 0 Attachment
            >I don't suggest using an array to store the ENTIRE file, but your method
            >ends up doing almost the exact same thing. You read the entire file at
            >once.
            >Instead of reading all of the file as a LIST of lines and then joining
            >them together, why not read the entire file as one string?

            > open FILE, "< $file" or die "can't read $file: $!";
            > my ($contents, @words);
            > {
            > local $/;
            > $contents = lc <FILE>;
            > }
            > @words = $contents =~ /\w+/g; # or however you wish to get them
            > close FILE;

            Hey, I never thought of locally changing the input line separator to nothing before. And the lc at once too...neat. Thanks Japhy! Always handy to see a different way of doing things.

            Jeff Eggen
          • Gregory Matthews
            Message 5 of 6 , Apr 4, 2002
            • 0 Attachment
              At 07:36 AM 4/3/2002 -0600, you wrote:
              > >I don't suggest using an array to store the ENTIRE file, but your method
              > >ends up doing almost the exact same thing. You read the entire file at
              > >once.
              > >Instead of reading all of the file as a LIST of lines and then joining
              > >them together, why not read the entire file as one string?
              >
              > > open FILE, "< $file" or die "can't read $file: $!";
              > > my ($contents, @words);
              > > {
              > > local $/;
              > > $contents = lc <FILE>;
              > > }
              > > @words = $contents =~ /\w+/g; # or however you wish to get them
              > > close FILE;
              >
              >Hey, I never thought of locally changing the input line separator to
              >nothing before. And the lc at once too...neat. Thanks Japhy! Always
              >handy to see a different way of doing things.
              >
              >Jeff Eggen
              >
              >
              >
              >Unsubscribing info is here:
              >http://help.yahoo.com/help/us/groups/groups-32.html
              >
              >Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
            Your message has been successfully submitted and would be delivered to recipients shortly.