Loading ...
Sorry, an error occurred while loading the content.

[Clip] Re: url then find email ...

Expand Messages
  • Wayne VanWeerthuizen
    ... By the way, when I get back to working on NoteAwk, I will convert all it s awk scripts to Perl. The gawk support will remain for using the user s own gawk
    Message 1 of 24 , Jul 2, 1999
    View Source
    • 0 Attachment
      David Seidman <seidmand@...> wrote:

      >Since GAWK is a much smaller download than PERL, you may wonder about my
      >useof perl rather than gawk here. One reason is that gawk does not have a
      >built-in sort capability. A second is that gawk does not provide a
      >variable (like $& in perl) that contains what the regex matched, so you
      >have to get that in somewhat more complicated ways. A third is that perl
      >makes it easier to get pattern matches beyond the first in a line, by using
      >the g option attached to the regex. But the real reason is that I was
      >trying to learn perl when I wrote the original version of this program.

      By the way, when I get back to working on NoteAwk, I will convert all
      it's awk scripts to Perl. The gawk support will remain for using the
      user's own gawk scripts, but I need Perl to make the built in scripts
      work as well as I'd like. Gawk is just too limited.










      --
      Wayne M. VanWeerthuizen
      ICQ: 15117288
      Homepage: http://landru.myhome.net/wayne
    • David Seidman
      ... ... I got lost in the thread and so I don t know what input data this is supposed to find email addresses in, but for some input data, there could
      Message 2 of 24 , Aug 1, 1999
      View Source
      • 0 Attachment
        At 08:16 AM 7/31/1999 -0500, Jody wrote:
        >H=Get eMail Addies
        <SNIP>

        >^!Find "@" S
        >^!IfError Sort
        >^!Set %Add%=^$GetBlock$

        I got lost in the thread and so I don't know what input data this is
        supposed to find email addresses in, but for some input data, there could
        be a problem. GetBlock returns, as I understand the HLP file, a sequence
        of bytes delimited by tab or space (or, I assume, new line). In general,
        there is no reason to assume one of those will be the delimiter. Regular
        expressions make it relatively easy to handle larger sets of delimiters.
        I'm not exactly sure what bytes are legal in an Internet email address, but
        I have constructed a regex based on some assumptions about that. It
        appears between the opening and closing / in the little perl program below.
        The program reads a text file from standard input and writes to standard
        output a sorted list of the email addresses in the file (after conversion
        to all lower case), one per line, with duplicates omitted. If you run it
        using Wayne's NoteAwk (as I generally do), the text of the input file is
        replaced by the output. The print line could easily be changed to
        accommodate different needs. For example, if you want each line of output
        to begin with mailto:, you can rewrite the print line as:

        print "mailto:$_\n";

        =========================
        while (<>) {
        chomp;
        while (/[^ ,`'":;()<>[\]]+@[^ ,`'":;()<>[\]]+/g) {
        $a = lc $&;
        if ($a =~ /\.$/){
        chop($a);}
        $addresses{$a}++;
        }
        }
        foreach (sort keys %addresses){
        print "$_\n";
        }
        ==========================

        Since GAWK is a much smaller download than PERL, you may wonder about my
        useof perl rather than gawk here. One reason is that gawk does not have a
        built-in sort capability. A second is that gawk does not provide a
        variable (like $& in perl) that contains what the regex matched, so you
        have to get that in somewhat more complicated ways. A third is that perl
        makes it easier to get pattern matches beyond the first in a line, by using
        the g option attached to the regex. But the real reason is that I was
        trying to learn perl when I wrote the original version of this program.
      • Jody
        Hi David, ... I had mentioned in one of the first posts that the Clip was specific and would fail if there was text on either side of the address. Both Bong
        Message 3 of 24 , Aug 1, 1999
        View Source
        • 0 Attachment
          Hi David,

          >> ^!Find "@" S
          >> ^!IfError Sort
          >> ^!Set %Add%=^$GetBlock$

          > I got lost in the thread and so I don't know what input data
          > this is supposed to find email addresses in, but for some input
          > data, there could be a problem. GetBlock returns, as I
          > understand the HLP file, a sequence of bytes delimited by tab
          > or space (or, I assume, new line). In general, there is no
          > reason to assume one of those will be the delimiter.

          I had mentioned in one of the first posts that the Clip was
          specific and would fail if there was text on either side of the
          address. Both Bong and Bill's source file had the addresses
          delimited by many spaces or had a hard return on the right end.
          They were even on lines by themselves used code which was the
          shortest to write, but most of all the fastest in achieving the
          ends results desired.

          Thanks for you perl contribution to the list! Your method if I
          understand it correctly, will get the most of them regardless of
          the delimiter. I generally try to do it all in NoteTab though
          because it is faster, but sometimes it just can't be done
          reasonably. I like to make specific clips so I can make the do
          the job quicker than trying to incorporate all the different
          possibilities there may be slowing the script down. Usually, most
          the specific task clips can be slightly edited to do different
          variations of something general. For instance, this one will get
          all the eMail addresses on a web page in the mailto format and
          put them into eMail format and I only changed a few things from
          previous clips posted. The change of tag names was not necessary
          - just did that because... :)

          ^!ClearVariable %Addresses%
          :Loop
          ^!Find "mailto:" SI
          ^!IfError Format
          ^!Select Url
          ^!Set %Address%=^$GetSelection$
          ^!Append %Addresses%=^%Address%^%nl%
          ^!Goto Loop

          :Format
          ^!Set %fmt%=^$StrSort("^%Addresses%";False;True;True)$
          ^!Set %fmt%=^$StrReplace("mailto:";"^%Empty%";"^%fmt%";False;False)$
          ^!Set %Addresses%=^$StrReplace("^%nl%";",^%Space%";"^%fmt%";False;False)$
          ^!Jump 1
          ^!Info ^%Addresses%

          With others it is just a matter of changing the search criteria
          finding the start coordinates and selecting back to it. Thanks again for
          you perl contributions. I consider you our resident pro. :)

          Happy Camper,
          Jody Adair
          Prov. 15:15

          Clean-Funnies
          mailto:CF@...?Subject=subscribe
        • David Seidman
          At 01:18 AM 8/2/1999 -0500, Jody wrote: ... And I generally try to do it using awk or perl because (a) I tend to think of those as part of NoteTab,
          Message 4 of 24 , Aug 2, 1999
          View Source
          • 0 Attachment
            At 01:18 AM 8/2/1999 -0500, Jody wrote:
            <SNIP>

            >the delimiter. I generally try to do it all in NoteTab though
            >because it is faster, but sometimes it just can't be done
            >reasonably.

            And I generally try to do it using awk or perl because (a) I tend to think
            of those as part of NoteTab, given how nicely Eric has integrated them and
            how handy Wayne's NoteAwk is; and (b) it saves me the trouble of actually
            learning how to write clips. But sometimes I use the awk or perl script
            simply to illustrate the use of a regex, and the regex itself is also
            useful if you prefer clips to awk or perl scripts.

            I like to make specific clips so I can make the do
            >the job quicker than trying to incorporate all the different
            >possibilities there may be slowing the script down. Usually, most
            >the specific task clips can be slightly edited to do different
            >variations of something general.

            I use perl and awk scripts the same way. The one I posted in this thread,
            for example, has been used with various different regex's and some minor
            differences in the print statement to extract all sorts of different things.

            <SNIP>

            >^!Select Url

            But there are clear advantages to learning more about clips. For example,
            I didn't realize until I read your message that you could just "Select Url".

            <SNIP>

            > Thanks again for
            >you perl contributions. I consider you our resident pro. :)
            >

            If I'm the resident perl pro here, we are all in deep trouble.
            Fortunately, it appears that a number of contributors here are better
            qualified than I am to claim the title.
          • Lawrence M Hamilton, Jr.
            ... When you stay up until the sun starts to rise trying to get a clip to work.... Jodie & I have a joke about starting CPA - Clip Programmers Anonymous.
            Message 5 of 24 , Aug 2, 1999
            View Source
            • 0 Attachment
              On Fri, 30 Jul 1999 15:16:20 -0500 Jody <KJB1611@...> writes:
              > Hi Larry and Bong,
              >
              > How do you know when you are a NoteTab junkie?
              >

              When you stay up until the sun starts to rise trying to get a clip to
              work....

              Jodie & I have a joke about starting CPA - Clip Programmers Anonymous.

              .......

              Hi, My Name's Larry and I am a NoteTab junkie. ;)



              Larry Hamilton, Jr. lmhamilton@...
              Hamilton National Genealogical Society, Inc.
              http://www.HamiltonGenSociety.org/
              My Web Site: http://members.tripod.com/notlimaH/

              ___________________________________________________________________
              Get the Internet just the way you want it.
              Free software, free e-mail, and free Internet access for a month!
              Try Juno Web: http://dl.www.juno.com/dynoget/tagj.
            • Lawrence M Hamilton, Jr.
              ... I find it to be much easier than it at first appears. Some things I first tried to do with clips were like using a sledgehammer to crack an egg. It works
              Message 6 of 24 , Aug 2, 1999
              View Source
              • 0 Attachment
                On Sat, 31 Jul 1999 14:54:06 +0300 "BONG" <bong@...> writes:
                > What's the public opinion: is the clip language easy?
                >
                > BONG

                I find it to be much easier than it at first appears. Some things I first
                tried to do with clips were like using a sledgehammer to crack an egg. It
                works but takes a lot of energy to do it right.

                Many of my first clips that I asked for help on went from twenty plus
                lines to under ten, from suggestions on the list.

                Once you get the hang of it, it starts to make sense. You just have to
                play with it.


                Larry Hamilton, Jr. lmhamilton@...
                Hamilton National Genealogical Society, Inc.
                http://www.HamiltonGenSociety.org/
                My Web Site: http://members.tripod.com/notlimaH/

                ___________________________________________________________________
                Get the Internet just the way you want it.
                Free software, free e-mail, and free Internet access for a month!
                Try Juno Web: http://dl.www.juno.com/dynoget/tagj.
              • Jody
                Hi David, ... And scripts save me the trouble of learning perl. I find perl/gawk a lot harder than NoteTab s scripting. ... Perhaps, but you always seem to
                Message 7 of 24 , Aug 3, 1999
                View Source
                • 0 Attachment
                  Hi David,

                  > And I generally try to do it using awk or perl because (a) I
                  > tend to think of those as part of NoteTab, given how nicely
                  > Eric has integrated them and how handy Wayne's NoteAwk is; and
                  > (b) it saves me the trouble of actually learning how to write
                  > clips.

                  And scripts save me the trouble of learning perl. I find
                  perl/gawk a lot harder than NoteTab's scripting.

                  > If I'm the resident perl pro here, we are all in deep trouble.
                  > Fortunately, it appears that a number of contributors here are
                  > better qualified than I am to claim the title.

                  Perhaps, but you always seem to have an answer for us. :) I know
                  Wayne knows it and Wren, hmmm, where is he, does well also.

                  Bye for now,
                  Jody Adair
                  Prov. 3:5-7; 4:23

                  http://www.sureword.com/sojourner
                  http://www.sureword.com/kjb1611
                  http://www.sureword.com/notetab
                • Nicole Simon
                  ... Fortunately, we all have our little project to go, so a good big perl library is up to you to make ;)) Nicole -- »So, you re searching for alien life
                  Message 8 of 24 , Aug 3, 1999
                  View Source
                  • 0 Attachment
                    David Seidman wrote:
                    > If I'm the resident perl pro here, we are all in deep trouble.
                    > Fortunately, it appears that a number of contributors here are better
                    > qualified than I am to claim the title.

                    Fortunately, we all have our little project to go, so a good big perl
                    library is up to you to make ;))

                    Nicole

                    --
                    »So, you're searching for alien life forms? Don't you meet
                    enough strange people in discussion lists like this one? ;o)«
                    Anthony V. Vitale
                  • David Seidman
                    ... Perl is big, messy, complicated, and extremely powerful. I pretty much restrict myself to a fairly small subset of what was available in version 3. I
                    Message 9 of 24 , Aug 3, 1999
                    View Source
                    • 0 Attachment
                      At 02:42 AM 8/3/1999 -0500, Jody wrote:

                      >And scripts save me the trouble of learning perl. I find
                      >perl/gawk a lot harder than NoteTab's scripting.
                      >
                      Perl is big, messy, complicated, and extremely powerful. I pretty much
                      restrict myself to a fairly small subset of what was available in version
                      3. I really don't even try to understand the things that differentiate
                      version 5 from version 4.

                      Awk, on the other hand, is pretty small, orders of magnitude less
                      complicated, and a good deal less powerful (Gawk adds a little bit to awk,
                      but not really anything you have to worry about, although one or two of the
                      added functions are extremely handy). The whole thing is set out and
                      explained in one 45-page chapter of The Awk Programming Language, by Aho,
                      Kernighan, and Weinberger (a book which I would recommend highly to anyone
                      interested in computer programming). I doubt you would find it a lot
                      harder than NoteTab's scripting if you spent a few hours with the book.
                      Not that I'm recommending that you do -- if you can do what you want in
                      NoteTab scripting, you don't need awk.
                    • David Seidman
                      ... But wouldn t I need to learn the clip language to do that?
                      Message 10 of 24 , Aug 3, 1999
                      View Source
                      • 0 Attachment
                        At 10:38 AM 8/3/1999 +0200, Nicole Simon wrote:

                        >Fortunately, we all have our little project to go, so a good big perl
                        >library is up to you to make ;))
                        >
                        But wouldn't I need to learn the clip language to do that?
                      • Nicole Simon
                        ... Nope. A library for _writing_ perlscripts is not that complicated ;o) Take my awk.clbs and make them suitable for perl would be a good start ;o) (I have a
                        Message 11 of 24 , Aug 4, 1999
                        View Source
                        • 0 Attachment
                          David Seidman wrote:
                          > But wouldn't I need to learn the clip language to do that?

                          Nope. A library for _writing_ perlscripts is not that complicated ;o)
                          Take my awk.clbs and make them suitable for perl would be a good start ;o)
                          (I have a newer version on my harddisk)

                          A typical entry from my 'awk special' library goes like this:
                          d. comma --> d. point
                          sub(",", ".", ^?[variable]) ^?[With comment ==_No^=|Yes^=#
                          replaces decimal comma with point]


                          awk_f for function:
                          for
                          for(^?[initialization=i=1];^?[condition=i<=];^?[increment=i++]) {
                          ^&
                          } # for (^?[initialization];^?[condition];^?[increment])

                          awk_v for variables
                          ARGV command-line arguments
                          # ARGV is indexed from zero to ARGC - 1
                          ARGV

                          You see, a little timesaving, a litte lookup and a little helpfile.
                          :o)

                          Nicole

                          --
                          »So, you're searching for alien life forms? Don't you meet
                          enough strange people in discussion lists like this one? ;o)«
                          Anthony V. Vitale
                        • David Seidman
                          ... Nope, she says, but Yes she means. Yes, I would have to learn more of the clip language than I ve ever used before, but no, I would not have to learn
                          Message 12 of 24 , Aug 4, 1999
                          View Source
                          • 0 Attachment
                            At 03:33 PM 8/4/1999 +0200, Nicole Simon wrote:
                            >David Seidman wrote:
                            >> But wouldn't I need to learn the clip language to do that?
                            >
                            >Nope. A library for _writing_ perlscripts is not that complicated ;o)

                            "Nope," she says, but "Yes" she means. Yes, I would have to learn more of
                            the clip language than I've ever used before, but no, I would not have to
                            learn the whole thing.

                            >Take my awk.clbs and make them suitable for perl would be a good start ;o)

                            But Michael E. Schechter's Perl_clb.Zip already exists. Do you not think
                            that is useful?

                            >(I have a newer version on my harddisk)
                            >
                            Then that is the version that should be on Eric's web site. Surely when
                            you tell me to take your awk.clbs and make them suitable for perl, I should
                            be using the newest version of your awk.clbs.
                          • Nicole Simon
                            ... The parts you could use are _really_ basic clip language. If you want to see a real big clip, take a look at Wayne s Projectmanager - but be carefull, it
                            Message 13 of 24 , Aug 5, 1999
                            View Source
                            • 0 Attachment
                              David Seidman wrote:
                              > "Nope," she says, but "Yes" she means. Yes, I would have to learn more of
                              > the clip language than I've ever used before, but no, I would not have to
                              > learn the whole thing.

                              The parts you could use are _really_ basic clip language. If you want to
                              see a real big clip, take a look at Wayne's Projectmanager - but be
                              carefull, it is very strucured and good programmed ;o)


                              > But Michael E. Schechter's Perl_clb.Zip already exists. Do you not think
                              > that is useful?

                              No. The last time I saw this he had 5 or more librarys with different parts
                              - this is way to much. Like the special char library: Yes, you can have one
                              whole library for it but you can also put this in one clip.

                              > Then that is the version that should be on Eric's web site. Surely when

                              Yes and no.

                              The newer one I am using has some 'special' gawk function, little
                              snipplets, special things which _I_ need but no one else and, another big
                              problem:

                              NR display
                              if(NR % ^?[In welchen Abständen Zeilen zählen?=100] == 0) { #
                              Lebenszeichen geben
                              printf "\rVerarbeitete Zeilen: " NR
                              }

                              They are mostly in german.

                              The _basic_ set is complete with all most needed functions and variables.
                              But I need time to seperate the 'unwanted' special-Nicole-in-german clips
                              out of it. If you just have a look at it's okay. But not for the library.

                              I don't have the time to translate it proper and I never had any real
                              feedback (add this, correct that) that someone uses them really ;o)

                              But, they are a good base for other languages. :o)

                              Nicole

                              --
                              »So, you're searching for alien life forms? Don't you meet
                              enough strange people in discussion lists like this one? ;o)«
                              Anthony V. Vitale
                            Your message has been successfully submitted and would be delivered to recipients shortly.