Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] text processing with a Perl script

Expand Messages
  • Alec Burgess
    Ling: I tried going to: http://www.corpus4u.com/forum_view.asp?forum_id=38&view_id=688 to have a look at your friend s perl script but get redirected to a
    Message 1 of 4 , Aug 5, 2005
    • 0 Attachment
      Ling:

      I tried going to:
      http://www.corpus4u.com/forum_view.asp?forum_id=38&view_id=688 to have a
      look at your friend's perl script but get redirected to a chinese page:
      http://www.corpus4u.com/error.asp

      Permissions or something? Maybe you could put the script somewhere more
      accessible?

      In principle the following clip would do the job:
      H="direct Perl"
      ^!Runscript perl C:\TEMP\simpletest.pl
      ^!replace "" >> "" wsait


      H="_use with directPerl above"
      #simpletest.pl
      undef $/; # Enter "file slurp" mode
      while (<>) {
      s /((.*\n)+)/<p>$1<\/p>/;
      print;
      }

      ie. execute script "direct Perl" above and replace C:\TEMP\simpletest.pl by
      your desired script. It should pass either the the selected part of the
      current file or the entire file if nothing selected to the script. Sounds
      like you need *both* the current file *and* a selected word or phrase so
      some tinkering will probably be neccessary.

      Note: the line ^!replace "" >> "" wsait appears to be neccessary because
      any invocation of a Perl script from Notetab seems to drop a "" at the end
      of the last line when the output gets returned.

      undef $/; # Enter "file slurp" mode
      This is required if you want the file processed in slurp mode, but you
      probably know more about that than I do :-)


      See Help Clips - Using Perl, Gawk, and Other Scripts and a fairly
      complicated example in the Library Sample Code for handling the stderror
      file if any. When testing you can simply load it as one of your open files
      and rely on Notetab's "File changed - do you want to reload it?" to check
      for errors etc.

      Regards ... Alec
      --
      ; ( ) { } [ ] \ | 9 0 + = () {} []


      ---- Original Message ----
      From: "H Tao" <ht_ling@...>
      To: <ntb-Clips@yahoogroups.com>
      Sent: Thursday, August 04, 2005 19:59
      Subject: [gla: [Clip] text processing with a Perl script

      > Hello everyone,
      >
      > A novice user here. Got a question about Perl and
      > NoteTab, Hope someone can help me out.
      >
      > A friend of mine wrote a Perl script (KWIC) that can
      > do
      > concordancing (searching a word and displaying each
      > and every instances of its use in a
      > text).
      >
      > (http://www.corpus4u.com/forum_view.asp?forum_id=38&view_id=688)
      >
      > I'd like to be able to call upon this script and use
      > it on the currently open text in
      > NoteTab and search the use of the currently
      > highlighted word. Since the execution of that
      > script is command line based (under MS Windows, XP in
      > my case), I don't know how to do
      > it.
      >
      > Alternatuively, if the Perl script can be part of a
      > Clip, that'd be cool, too.
    • H Tao
      Thanks so much! I ll try out what you said in the mail, and meanwhile here is the perl script that I was referring to (also in the attachment). Your help is
      Message 2 of 4 , Aug 5, 2005
      • 0 Attachment
        Thanks so much! I'll try out what you said in the mail, and meanwhile here is the perl script that I was referring to (also in the attachment). Your help is very much appreciated. -Ling

        ---------------------beginning of script---------------
        use warnings;
        use strict;
        use IO::File;
        use File::Find;
        use Getopt::Std;
        use vars qw(
        $usage
        $SearchPattern
        $filnm
        $context
        %kwicc
        @kwic
        $opt_l
        $opt_r
        $opt_c
        $opt_m
        $opt_t
        $i);

        $usage=<<EOF;
        usage: skwic [ -lrct ][-m NUM] <pattern> <path>
        -l: Sort by left context
        -r: Sort by right context
        -c: Sort by pattern
        -m: Line limitation
        -t: Display collocations
        (In this version if you use option -t, search pattern
        is restricted to a single word.)
        EOF

        die $usage unless scalar(@ARGV)>1;
        $SearchPattern = shift;
        CheckReMsg($SearchPattern);
        $filnm=shift;
        getopts( "lrctm:" );
        if ($opt_t && $SearchPattern=~ /[^\w\d'-\\]/){die "In this version if you use option -t, search pattern is restricted to a single word.";}
        if(($opt_l && $opt_r)||($opt_l && $opt_c)||($opt_c && $opt_r))
        {die "argument error.";}
        $context=40;
        $i=0;

        sub CheckReMsg{
        my $pattern = shift;
        die "Illegal pattern: $@
        " unless my $re = compile_re($pattern);
        }

        sub compile_re{
        my $pattern = shift;
        my $re;
        eval {$re = qr/$pattern/;};
        return $re;
        }

        sub parsewL {
        if ($opt_m){
        if ($i==$opt_m){goto L;}}
        my $rawstr=shift;
        my @wordsb=();
        while ($rawstr=~/\b[\w\d'-]+\b/gi){
        push @wordsb, $&."_$i";
        }
        $i++;
        return $wordsb[$#wordsb];
        }
        sub parsewR {
        if ($opt_m){
        ; if ($i==$opt_m){goto L;}}
        my $rawstr=shift;
        my @wordsb=();
        while ($rawstr=~/\b[\w\d'-]+\b/gi){
        push @wordsb, $&."_$i";}
        $i++;
        return $wordsb[0];
        }

        sub domatch
        {
        my $myfh=shift;
        my $name=shift;
        warn "$name is a directory
        " and return if -d $name;
        local $/=undef;
        while(<$myfh>)
        {
        $_ =~ s/
        / /g;
        while($_=~/$SearchPattern/g){
        my $prepttn="";
        my $postpttn="";
        if(length($`)<=$context){$prepttn=$`;}
        elsif(length($`)>$context){$prepttn=substr($`,-$context);}
        if (length($')<=$context){$postpttn=$';}
        elsif(length($')>$context){$postpttn=substr($',0,$context);}
        no warnings;
        if ($opt_l){$kwicc{parsewL($prepttn)}=" "x($context-length($prepttn)).$prepttn.$&.$postpttn."
        ";
        }
        elsif ($opt_r){$kwicc{parsewR($postpttn)}=" "x($context-length($prepttn)).$prepttn.$&.$postpttn."
        ";
        }
        elsif ($opt_c) {$kwicc{$&."_$i"}=" "x($context-length($prepttn)).$prepttn.$&.$postpttn."
        ";
        $i++;
        if ($i==$opt_m){goto L;}
        }

        else {print " "x($context-length($prepttn)).$prepttn.$&.$postpttn."
        "
        push @kwic," "x($context-length($prepttn)).$prepttn.$&.$postpttn."
        ";
        $i++;
        if ($i==$opt_m){exit;}
          }
        }
        }
        close $myfh;
        }

        if ( -d $filnm )
        {
        find(
        sub {
        return unless -T;
        my $fh = IO::File->new( $_ );
        domatch $fh, $File::Find::name;
        },
        $filnm
        );
        }
         else
        {
        use warnings;
        open(my $fh, $filnm) or die "Unable to open: $!
        ";
        domatch $fh, $filnm;
        }

        L:
        if ($opt_l || $opt_r || $opt_c){
        foreach(sort keys %kwicc){
        push @kwic, $kwicc{$_};
        print $kwicc{$_};

        }
        }



        if ($opt_t){collocate($SearchPattern, @kwic);}
        sub collocate {
        no warnings;
        my ($keyword, @kwicfile)=@_;
        my @token=();
        my %count=();
        my @gset;
        foreach (@kwicfile){
        $_ =~ s/\s+/ /g;
        while (/(\b[\w\d'-]+\b)\s(\b[\w\d'-]+\b\s){0}\b$keyword\b/gi){
        my $buff=$1;
        $buff=~tr/A-Z/a-z/;
        $count{"$buff..."}++;}


        while (/\b$keyword\b(\s\b[\w\d'-]+\b){0}\s(\b[\w\d'-]+\b)/gi){
        my $buff=$2;
        $buff=~tr/A-Z/a-z/;
        $count{"...$buff"}++;}


        }
        print"
        --------------------------
        ";

        while ((my $key, my $value) = each %count) {
        push @gset,$value."_"."$key
        ";
        };
        @gset=sort {int($a)<=>int($b)or $a cmp $b} @gset;
        print reverse @gset;
        }

        -----------------------------end of script-------------------------------


        Alec Burgess <buralex@...> wrote:
        Ling:

        I tried going to:
        http://www.corpus4u.com/forum_view.asp?forum_id=38&view_id=688 to have a
        look at your friend's perl script but get redirected to a chinese page:
        http://www.corpus4u.com/error.asp

        Permissions or something? Maybe you could put the script somewhere more
        accessible?

        In principle the following clip would do the job:
        H="direct Perl"
        ^!Runscript perl C:\TEMP\simpletest.pl
        ^!replace "" >> "" wsait


        H="_use with directPerl above"
        #simpletest.pl
        undef $/; # Enter "file slurp" mode
        while (<>) {
        s /((.*\n)+)/
        $1<\/p>/;
        print;
        }

        ie. execute script "direct Perl" above and replace C:\TEMP\simpletest.pl by
        your desired script. It should pass either the the selected part of the
        current file or the entire file if nothing selected to the script. Sounds
        like you need *both* the current file *and* a selected word or phrase so
        some tinkering will probably be neccessary.

        Note: the line ^!replace "" >> "" wsait appears to be neccessary because
        any invocation of a Perl script from Notetab seems to drop a "" at the end
        of the last line when the output gets returned.

        undef $/; # Enter "file slurp" mode
        This is required if you want the file processed in slurp mode, but you
        probably know more about that than I do :-)


        See Help Clips - Using Perl, Gawk, and Other Scripts and a fairly
        complicated example in the Library Sample Code for handling the stderror
        file if any. When testing you can simply load it as one of your open files
        and rely on Notetab's "File changed - do you want to reload it?" to check
        for errors etc.

        Regards ... Alec
        --
        ; ( ) { } [ ] \ | 9 0 + = () {} []


        ---- Original Message ----
        From: "H Tao"
        To:
        Sent: Thursday, August 04, 2005 19:59
        Subject: [gla: [Clip] text processing with a Perl script

        > Hello everyone,
        >
        > A novice user here. Got a question about Perl and
        > NoteTab, Hope someone can help me out.
        >
        > A friend of mine wrote a Perl script (KWIC) that can
        > do
        > concordancing (searching a word and displaying each
        > and every instances of its use in a
        > text).
        >
        > (http://www.corpus4u.com/forum_view.asp?forum_id=38&view_id=688)
        >
        > I'd like to be able to call upon this script and use
        > it on the currently open text in
        > NoteTab and search the use of the currently
        > highlighted word. Since the execution of that
        > script is command line based (under MS Windows, XP in
        > my case), I don't know how to do
        > it.
        >
        > Alternatuively, if the Perl script can be part of a
        > Clip, that'd be cool, too.




        Fookes Software: http://www.fookes.us, http://www.fookes.com
        Fookes Software Mailing Lists: http://www.fookes.us/maillist.htm

        Yahoo! Groups Links









        ----------

        use warnings;
        use strict;
        use IO::File;
        use File::Find;
        use Getopt::Std;
        use vars qw(
        $usage
        $SearchPattern
        $filnm
        $context
        %kwicc
        @kwic
        $opt_l
        $opt_r
        $opt_c
        $opt_m
        $opt_t
        $i);

        $usage=<<EOF;
        usage: skwic [ -lrct ][-m NUM] <pattern> <path>
        -l: Sort by left context
        -r: Sort by right context
        -c: Sort by pattern
        -m: Line limitation
        -t: Display collocations
        (In this version if you use option -t, search pattern
        is restricted to a single word.)
        EOF

        die $usage unless scalar(@ARGV)>1;
        $SearchPattern = shift;
        CheckReMsg($SearchPattern);
        $filnm=shift;
        getopts( "lrctm:" );
        if ($opt_t && $SearchPattern=~ /[^\w\d'-\\]/){die "In this version if you use option -t, search pattern is restricted to a single word.";}
        if(($opt_l && $opt_r)||($opt_l && $opt_c)||($opt_c && $opt_r))
        {die "argument error.";}
        $context=40;
        $i=0;

        sub CheckReMsg{
        my $pattern = shift;
        die "Illegal pattern: $@ \n" unless my $re = compile_re($pattern);
        }

        sub compile_re{
        my $pattern = shift;
        my $re;
        eval {$re = qr/$pattern/;};
        return $re;
        }

        sub parsewL {
        if ($opt_m){
        if ($i==$opt_m){goto L;}}
        my $rawstr=shift;
        my @wordsb=();
        while ($rawstr=~/\b[\w\d'-]+\b/gi){
        push @wordsb, $&."_$i";
        }
        $i++;
        return $wordsb[$#wordsb];
        }
        sub parsewR {
        if ($opt_m){
        if ($i==$opt_m){goto L;}}
        my $rawstr=shift;
        my @wordsb=();
        while ($rawstr=~/\b[\w\d'-]+\b/gi){
        push @wordsb, $&."_$i";}
        $i++;
        return $wordsb[0];
        }

        sub domatch
        {
        my $myfh=shift;
        my $name=shift;
        warn "$name is a directory\n" and return if -d $name;
        local $/=undef;
        while(<$myfh>)
        {
        $_ =~ s/\n/ /g;
        while($_=~/$SearchPattern/g){
        my $prepttn="";
        my $postpttn="";
        if(length($`)<=$context){$prepttn=$`;}
        elsif(length($`)>$context){$prepttn=substr($`,-$context);}
        if (length($')<=$context){$postpttn=$';}
        elsif(length($')>$context){$postpttn=substr($',0,$context);}
        no warnings;
        if ($opt_l){$kwicc{parsewL($prepttn)}=" "x($context-length($prepttn)).$prepttn.$&.$postpttn."\n";
        }
        elsif ($opt_r){$kwicc{parsewR($postpttn)}=" "x($context-length($prepttn)).$prepttn.$&.$postpttn."\n";
        }
        elsif ($opt_c) {$kwicc{$&."_$i"}=" "x($context-length($prepttn)).$prepttn.$&.$postpttn."\n";
        $i++;
        if ($i==$opt_m){goto L;}
        }

        else {print " "x($context-length($prepttn)).$prepttn.$&.$postpttn."\n" ;
        push @kwic," "x($context-length($prepttn)).$prepttn.$&.$postpttn."\n";
        $i++;
        if ($i==$opt_m){exit;}
        }
        }
        }
        close $myfh;
        }

        if ( -d $filnm )
        {
        find(
        sub {
        return unless -T;
        my $fh = IO::File->new( $_ );
        domatch $fh, $File::Find::name;
        },
        $filnm
        );
        }
        else
        {
        use warnings;
        open(my $fh, $filnm) or die "Unable to open: $! \n";
        domatch $fh, $filnm;
        }

        L:
        if ($opt_l || $opt_r || $opt_c){
        foreach(sort keys %kwicc){
        push @kwic, $kwicc{$_};
        print $kwicc{$_};

        }
        }



        if ($opt_t){collocate($SearchPattern, @kwic);}
        sub collocate {
        no warnings;
        my ($keyword, @kwicfile)=@_;
        my @token=();
        my %count=();
        my @gset;
        foreach (@kwicfile){
        $_ =~ s/\s+/ /g;
        while (/(\b[\w\d'-]+\b)\s(\b[\w\d'-]+\b\s){0}\b$keyword\b/gi){
        my $buff=$1;
        $buff=~tr/A-Z/a-z/;
        $count{"$buff..."}++;}


        while (/\b$keyword\b(\s\b[\w\d'-]+\b){0}\s(\b[\w\d'-]+\b)/gi){
        my $buff=$2;
        $buff=~tr/A-Z/a-z/;
        $count{"...$buff"}++;}


        }
        print"\n--------------------------\n";

        while ((my $key, my $value) = each %count) {
        push @gset,$value."_"."$key\n";
        };
        @gset=sort {int($a)<=>int($b)or $a cmp $b} @gset;
        print reverse @gset;
        }



        [Non-text portions of this message have been removed]
      • acumming@cwnet.com
        On Thu, 4 Aug 2005 16:59 , H Tao sent: [ . . ] ... (BTW, cc d to ntb-scripts since material herein is more so on topic at
        Message 3 of 4 , Aug 5, 2005
        • 0 Attachment
          On Thu, 4 Aug 2005 16:59 , H Tao <ht_ling@...> sent:
          [ . . ]
          >A friend of mine wrote a Perl script (KWIC) that can
          >do
          >concordancing (searching a word and displaying each
          >and every instances of its use in a
          >text).
          >
          >(http://www.corpus4u.com/forum_view.asp\?forum_id=38&view_id=688)
          >
          >I'd like to be able to call upon this script and use
          >it on the currently open text in
          >NoteTab and search the use of the currently
          >highlighted word.

          (BTW, cc'd to ntb-scripts since material herein is more so on topic at
          ntb-scripts@yahoogroups.com than it is here at ntb-clips@yahoogroups.com)

          a clip can_do/able to: 1. gettext the selected word into a clip variable. 2.
          launch and hand_off/pass the commandline to perl

          >Alternatuively, if the Perl script can be part of a
          >Clip, that'd be cool, too.

          In order to run a Perl script with such script's target is a Ntab doc then need
          to do like number 1 and 2 that I enclosed above.

          The version of script you shared has bits and pieces missing and many syntax
          errors. Part of a heredoc in it was missing. And,

          if ($i==$opt_m){goto L;}

          is in it (goto is frowned upon by those who are structured programming language
          programmers) -- Perl is a structured programming language.

          So you may need to upload it to the files area in hope of avoiding this the
          missing pieces etc.

          The script is not very well documented. I'm not argueing about whether the
          script works or not -- these are just firsthand observations on my part at this
          point, that's all. The script will not run here and it's too messed up, too much
          missing etc.

          die $usage unless scalar(@ARGV)>1;
          $SearchPattern = shift;
          CheckReMsg($SearchPattern);
          $filnm=shift;

          shift brings one item off. search word is first to come off, file name comes off
          2nd.

          (@ARGV)>1 means commandline *must* have more than 1 item such said items *must*
          be a space delimited list. (if U have Win filepaths with space in them, you are
          in for some fun) must either not do that or 1st pass program~files/file~path then
          first thing in Perl substitute each tilde (~) for a space (with space in, as
          another alternative, getshort in ntab might make it work). I just set mine up
          without space in any file path is how I did it. But I once had space and
          overcame it with tilde substitute to space as a workaround. then set all up
          without space character.

          current doc in ntab clip code is: ^**

          ^!RunPerl is reserved for to *operate* on and modify the current doc.

          Not operating and/or modifying current doc in this case. Just searching instead.

          Which is why I agree with Alec ie the ^!RunScript.

          So, item 1 of commandline is file_path_name_of_your_perl_script

          then come the two shift mentioned above. Which leave us with something in ntab
          clip like:

          ^!RunScript $getperlexe$ file_path/kwic search_word ^**

          And when ntab runs that it evaluates/equates to:

          Perl file_path/kwic search_word path_filename_of_current_doc

          each parameter separated by a space, that's four parameters

          those get passed from ntab to Perl (insert image of something working, beast of
          burden or tractor etc.) Perl *works* and then, where Perl's output goes in this
          case I do not know but it rather likely goes to STDOUT

          which leads to: I slightly goofed on the command above. To capture STDOUT with ntab:

          $GetDosOutput$

          Ya don't need ^!runscript. All U need is the *evaluated* command and to capture
          the output from that:

          ^!Set %curdoc%=^**
          ^!Toolbar "new document"
          ^!Insert $GetDosOutput($getperlexe$ file_path/kwic search_word ^%curdoc%)$

          ^^that^^ 3 lines ntab clip code brings evaluated command passed to Perl and
          captures Perl's output in ntab. And, note that the entire command is wrapped
          with the ntab function GetDosOutput. But if U want output to go to command
          console I leave as another exercise for another time.

          Alan.


          ---- Msg sent via CWNet - http://www.cwnet.com/
        • Jody
          Hi Alan C, ... You did good Alan!!! :) mailto:ntb-scripts-subscribe@yahoogroups.com ntb-scripts-subscribe@yahoogroups.com Happy Script n! Jody Adair The
          Message 4 of 4 , Aug 11, 2005
          • 0 Attachment
            Hi Alan C,

            >(BTW, cc'd to ntb-scripts since material herein is more so on topic at
            >ntb-scripts@yahoogroups.com than it is here at ntb-clips@yahoogroups.com)

            You did good Alan!!! :)

            mailto:ntb-scripts-subscribe@yahoogroups.com

            ntb-scripts-subscribe@yahoogroups.com

            Happy Script'n!
            Jody Adair

            The NoteTabbers Assistant Page & Clean-Funnies dot com
            http://www.notetab.net http://www.clean-funnies.com
            Other Fookes Software Mailing Lists:
            http://www.fookes.us/maillist.htm
          Your message has been successfully submitted and would be delivered to recipients shortly.