Loading ...
Sorry, an error occurred while loading the content.

transforming text to collection of individual words

Expand Messages
  • I G T -- 5
    hi is this a correct way of transforming a text into an array of its individual words ? are there better/other ways ? and how can one scramble that final array
    Message 1 of 6 , Apr 2, 2002
    • 0 Attachment
      hi
      is this a correct way of transforming a text
      into an array of its individual words ?

      are there better/other ways ?

      and how can one scramble that final array ?


      #!usr/bin/perl

      open(FILE,"text.txt" || die "not there" );
      @text = <FILE>;
      @complete = (); #the complete text as individual words
      close(FILE);
      foreach $text(@text) #every line
      {
      @thetext = split(/ /, $text);
      push(@complete, @thetext);

      }

      thanks for your time
      idia

      _________________________________________________________________
      Rejoignez le plus grand service de messagerie au monde avec MSN Hotmail.
      http://www.hotmail.com/fr
    • Jeff 'japhy' Pinyan
      ... @words = split , $string; That s assuming words are groups of non-spaces. If you need a more appropriate definition, maybe you want @words = split
      Message 2 of 6 , Apr 2, 2002
      • 0 Attachment
        On Apr 2, I G T -- 5 said:

        >is this a correct way of transforming a text
        >into an array of its individual words ?

        @words = split ' ', $string;

        That's assuming "words" are groups of non-spaces. If you need a more
        appropriate definition, maybe you want

        @words = split /\W+/, $string;

        >and how can one scramble that final array ?

        See `perldoc -q shuffle` for the answer.

        --
        Jeff "japhy" Pinyan japhy@... http://www.pobox.com/~japhy/
        RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/
        ** Look for "Regular Expressions in Perl" published by Manning, in 2002 **
        <stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.
        [ I'm looking for programming work. If you like my work, let me know. ]
      • Jeff Eggen
        ... Here s one that I use in a web site indexer. Probably not better than Japhy s, just different. Instead of dumping the file s contents into an array, this
        Message 3 of 6 , Apr 2, 2002
        • 0 Attachment
          >>> idia_goes_t@... 04/02/02 01:13pm >>>
          >are there better/other ways ?

          Here's one that I use in a web site indexer. Probably not better than Japhy's, just different. Instead of dumping the file's contents into an array, this puts them in one scalar, which is then split up.

          # Start Code
          open FILE, "</some/file.txt" or die "Bleat: $!";
          my $contents = join ('', <FILE>);
          $contents =~ tr/A-Z/a-z/;
          my @words = ($contents =~ /\w+/g);
          # End Code

          HTH,
          Jeff Eggen
        • Jeff 'japhy' Pinyan
          ... I don t suggest using an array to store the ENTIRE file, but your method ends up doing almost the exact same thing. You read the entire file at once.
          Message 4 of 6 , Apr 2, 2002
          • 0 Attachment
            On Apr 2, Jeff Eggen said:

            >Here's one that I use in a web site indexer. Probably not better than
            >Japhy's, just different. Instead of dumping the file's contents into an
            >array, this puts them in one scalar, which is then split up.

            I don't suggest using an array to store the ENTIRE file, but your method
            ends up doing almost the exact same thing. You read the entire file at
            once.

            Instead of reading all of the file as a LIST of lines and then joining
            them together, why not read the entire file as one string?

            >open FILE, "</some/file.txt" or die "Bleat: $!";
            >my $contents = join ('', <FILE>);
            >$contents =~ tr/A-Z/a-z/;
            >my @words = ($contents =~ /\w+/g);

            open FILE, "< $file" or die "can't read $file: $!";
            my ($contents, @words);
            {
            local $/;
            $contents = lc <FILE>;
            }
            @words = $contents =~ /\w+/g; # or however you wish to get them
            close FILE;

            --
            Jeff "japhy" Pinyan japhy@... http://www.pobox.com/~japhy/
            RPI Acacia brother #734 http://www.perlmonks.org/ http://www.cpan.org/
            ** Look for "Regular Expressions in Perl" published by Manning, in 2002 **
            <stu> what does y/// stand for? <tenderpuss> why, yansliterate of course.
            [ I'm looking for programming work. If you like my work, let me know. ]
          • Jeff Eggen
            ... Hey, I never thought of locally changing the input line separator to nothing before. And the lc at once too...neat. Thanks Japhy! Always handy to see a
            Message 5 of 6 , Apr 3, 2002
            • 0 Attachment
              >I don't suggest using an array to store the ENTIRE file, but your method
              >ends up doing almost the exact same thing. You read the entire file at
              >once.
              >Instead of reading all of the file as a LIST of lines and then joining
              >them together, why not read the entire file as one string?

              > open FILE, "< $file" or die "can't read $file: $!";
              > my ($contents, @words);
              > {
              > local $/;
              > $contents = lc <FILE>;
              > }
              > @words = $contents =~ /\w+/g; # or however you wish to get them
              > close FILE;

              Hey, I never thought of locally changing the input line separator to nothing before. And the lc at once too...neat. Thanks Japhy! Always handy to see a different way of doing things.

              Jeff Eggen
            • Gregory Matthews
              Message 6 of 6 , Apr 4, 2002
              • 0 Attachment
                At 07:36 AM 4/3/2002 -0600, you wrote:
                > >I don't suggest using an array to store the ENTIRE file, but your method
                > >ends up doing almost the exact same thing. You read the entire file at
                > >once.
                > >Instead of reading all of the file as a LIST of lines and then joining
                > >them together, why not read the entire file as one string?
                >
                > > open FILE, "< $file" or die "can't read $file: $!";
                > > my ($contents, @words);
                > > {
                > > local $/;
                > > $contents = lc <FILE>;
                > > }
                > > @words = $contents =~ /\w+/g; # or however you wish to get them
                > > close FILE;
                >
                >Hey, I never thought of locally changing the input line separator to
                >nothing before. And the lc at once too...neat. Thanks Japhy! Always
                >handy to see a different way of doing things.
                >
                >Jeff Eggen
                >
                >
                >
                >Unsubscribing info is here:
                >http://help.yahoo.com/help/us/groups/groups-32.html
                >
                >Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
              Your message has been successfully submitted and would be delivered to recipients shortly.