Loading ...
Sorry, an error occurred while loading the content.

Re: [NTO] Extract Text

Expand Messages
  • Wayne VanWeerthuizen
    ... Well, you have precisely described the function of a small commandline utility called grep. It was originally for Unix machines, but version have been
    Message 1 of 5 , Apr 6, 2002
    • 0 Attachment
      Srish Agrawal wrote:

      >May some body advice me a free utility which can extract words from a .txt
      >file based on a given Regular Expression and save the extracted words in
      >another .txt file?

      Well, you have precisely described the function of a small commandline
      utility called grep. It was originally for Unix machines, but version
      have been made for most other operating systems, including Windows. Grep
      for windows can be found at:

      http://unxutils.sourceforge.net/

      The file "UnxUtils.zip" (found at the link above) contains a whole
      variety of command line utilities. (It also includes a powerful version
      of sort that has many more options than the sort program that comes with
      windows, or the sort option built into NoteTab.)

      Once you extract grep.exe from that archive, copy it to where your text
      files are (or add it to your windows path variable) then type something
      like this at the commandline.

      grep "RegularExp" <Input.txt >Output.txt


      Personally, I do not use grep. I tend to use Perl as one-size-fits-all
      tool whenever I need more power (or speed) than NoteTab provides.

      If you download and install activeperl (see www.activeperl.com), then
      you can use a command line similar this to extract lines:

      perl -n -e "print if m/RegularExp/;" <Input.txt >Output.txt

      Perl has a better regular expression language than grep or NoteTab. And
      its regular expression engine is highly optimized. Perl can often do
      search and replace operations in under a minute, which would take NoteTab
      over an hour. No kidding. I LOVE NoteTab, but still one should use the
      correct tool for any job! And NoteTab and Perl coexist perfectly on my
      system. Perl may also be faster than grep, sed, or awk, but it depends
      on the program versions used, and the length of the input files.

      Before I learned to use Perl, and before NoteTab itself had good regular
      expression support, I wrote a clipbook called NoteAwk. It helped
      NoteTab users with many regular expression based tasks. It basically
      used NoteTab clipbook wizards to automatically create and run gawk
      scripts for the task. It also has a Wizard interface for the gnu sort
      command. You may still be able to find my NoteAwk clipbook at the
      NoteTab.com clipbook page. But I've not updated it for awhile. It was
      last tested with NoteTab version 4.6.
    • Wayne VanWeerthuizen
      ... If you want to use grep from within NoteTab to extract lines from one open document into a new document, this clip will do it: H= Grep Test ; Make sure
      Message 2 of 5 , Apr 6, 2002
      • 0 Attachment
        >Srish Agrawal wrote:
        >
        >>May some body advice me a free utility which can extract words
        >>from a .txt file based on a given Regular Expression and save
        >>the extracted words in another .txt file?

        I wrote:
        >
        >grep "RegularExp" <Input.txt >Output.txt
        >

        If you want to use grep from within NoteTab to extract lines from
        one open document into a new document, this clip will do it:

        H="Grep Test"
        ; Make sure this points to your path to grep.exe
        ^!Set %Grep%="C:\grep.exe"
        ^!Set %Grep%=^$GetShort("^%Grep%")$

        ; Prompts for search expression
        ^!Set %RegEx%="^?{Search for Regular Expression}"

        ; Do the magic!
        ^!Set %Temp%="^$GetInputOutput(^%Grep% "^%RegEx%")$"

        ; Pastes results into a new document.
        ^!Toolbar "New Document"
        ^!InsertText ^%Temp%



        I may post a Perl example later, but I am out of time for now.

        FooteNote:

        The Set and InsertText commands may fail if the document you
        are searching has lines that too closely resemble NoteTab
        clip code. In that case you can changing them to SetCode
        and InsertCode. But then it may fail if the total results
        returned exceed NoteTab's maximum paragraph size.
      • future.com@vsnl.com
        Sir/Madam Thanks for a very quick and instant reply. But GREP returns lines containg the pattern while I want words matching the given pattern. Any suggestion?
        Message 3 of 5 , Apr 7, 2002
        • 0 Attachment
          Sir/Madam

          Thanks for a very quick and instant reply.

          But GREP returns lines containg the pattern while I want words matching the
          given pattern. Any suggestion?

          s k agrawal



          At 03:14 PM 4/6/02 -0800, you wrote:
          >
          >>Srish Agrawal wrote:
          >>
          >>>May some body advice me a free utility which can extract words
          >>>from a .txt file based on a given Regular Expression and save
          >>>the extracted words in another .txt file?
          >
          >I wrote:
          >>
          >>grep "RegularExp" <Input.txt >Output.txt
          >>
          >
          >If you want to use grep from within NoteTab to extract lines from
          >one open document into a new document, this clip will do it:
          >
          >H="Grep Test"
          >; Make sure this points to your path to grep.exe
          >^!Set %Grep%="C:\grep.exe"
          >^!Set %Grep%=^$GetShort("^%Grep%")$
          >
          >; Prompts for search expression
          >^!Set %RegEx%="^?{Search for Regular Expression}"
          >
          >; Do the magic!
          >^!Set %Temp%="^$GetInputOutput(^%Grep% "^%RegEx%")$"
          >
          >; Pastes results into a new document.
          >^!Toolbar "New Document"
          >^!InsertText ^%Temp%
          >
          >
          >
          >I may post a Perl example later, but I am out of time for now.
          >
          >FooteNote:
          >
          >The Set and InsertText commands may fail if the document you
          >are searching has lines that too closely resemble NoteTab
          >clip code. In that case you can changing them to SetCode
          >and InsertCode. But then it may fail if the total results
          >returned exceed NoteTab's maximum paragraph size.
          >
          >
          >
          >
          >
          >
          >Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
          >
          >
          >
          >
          >
          >---
          >Incoming mail is certified Virus Free.
          >Checked by AVG anti-virus system (http://www.grisoft.com).
          >Version: 6.0.344 / Virus Database: 191 - Release Date: 4/2/02
          >


          [Non-text portions of this message have been removed]
        • Wayne VanWeerthuizen
          ... Here is an example of how to do this using Perl and NoteTabs ^!RunPerl command. Remember to set $find = to your own regular expression.
          Message 4 of 5 , Apr 7, 2002
          • 0 Attachment
            On Sun, 07 Apr 2002 19:31:41 +0500, you wrote:

            >
            >Sir/Madam
            >
            >Thanks for a very quick and instant reply.
            >
            >But GREP returns lines containg the pattern while I want words matching the
            >given pattern. Any suggestion?
            >

            Here is an example of how to do this using Perl and NoteTabs ^!RunPerl
            command. Remember to set $find = to your own regular expression.



            H="RunMyPerlScript"
            ^!Toolbar "Copy All"
            ^!Toolbar "Paste New"
            ^!RunPerl "MyPerlScript"


            H="_MyPerlScript"

            #perl

            $find = "[a-z]*ee[a-z]*";

            while (<>) {
            while ( m/\b($find)\b/ig ) { $found{"$1"}=1 };
            };

            foreach $key (sort keys %found) { print $key, "\n" };
          Your message has been successfully submitted and would be delivered to recipients shortly.