Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] Need to extract surnames

Expand Messages
  • Martha Hambrick Harrell
    Eb, Claes & Elizabeth, Thank you all for responding. I made Claes corrections to Eb s clip but cannot get it to work at all. What am I doing wrong? (By the
    Message 1 of 30 , Jan 24, 2001
    • 0 Attachment
      Eb, Claes & Elizabeth,

      Thank you all for responding. I made Claes' corrections to Eb's clip
      but cannot get it to work at all. What am I doing wrong? (By the way,
      I might add that I still consider myself a newbie in the NoteTab Pro
      world!)

      :loop
      ; find (and select) the TD tag, that contains ","
      ^!Find <TD [^/]+>[^,]+,[^<]*</TD> RSI
      ^!IfError Done
      ; strip off the html
      ^!Set %name%=^$StrStripHTML("^$GetSelection$";0)$
      ; copy the name to (including) the comma
      ^!Set %Lname%=^$StrCopy(^%name%;1;^$StrPos(",";^%name%;1)$)$
      ; strip off the comma
      ^!Set %Lname%=^$StrDeleteRight(^%Lname%;1)$
      ^!Goto loop
      :Done


      I also tried Elizabeth's clip. It tried to work but didn't, probably
      because the number of lines between the names are not exactly the
      same.? If that's the only problem, I can probably fix that.

      ^!SetCursor 1:1 -- or whatever the first line is

      :Loop
      ^!Select Line
      ^!AppendToFile "index.txt" ^$GetSelection$
      ^!SetCursor ^$Calc(^$GetRow$+12)$:1 -- the plus factor should be
      however far it is to the next name
      ^!If ^$GetRow$=^$GetLineCount$ Skip
      ^!GoTo Loop


      Martha

      Gauffin Claes wrote:

      > Hi Eb,
      >
      > Pretty clip. There are two typos though:
      > ^!Set %name%=^$StrStripHTML("^$GetSelection";0)$
      > should be
      > ^!Set %name%=^$StrStripHTML("^$GetSelection$";0)$
      >
      > and
      > ^!Set %Lname%=^$StrCopy(^%name%;1;^$StrPos(",";^%name%;1$)$
      > should be
      > ^!Set %Lname%=^$StrCopy(^%name%;1;^$StrPos(",";^%name%;1)$)$
      >
      > Regards /Claes
      >
    • Piotr Bienkowski
      Wonded why my message popped up with a few days delay... :(
      Message 2 of 30 , Feb 1, 2001
      • 0 Attachment
        Wonded why my message popped up with a few days' delay... :(

        On 25 Jan 2001, at 11:33, Piotr Bienkowski wrote:

        > On 18 Jan 2001, at 11:19, Grant wrote:
        >
        > > If you want to take a look posted a DOM way to do this on the
        > > Notetab html list. subj: Extracting table data with the DOM Have
        > > heard 'reg exp' losing favour because of the power of the DOM. It
        > > certainly is a lot more intuitive than writing a reg exp to do the
        > > same thing. So if you want to check it out have a look.
        > >
        > >
        > Hi,
        >
        > I take interest in both DOM and regexes. DOM can get you the contents
        > of a tag, but can it check if these contens match a particular
        > pattern?
        >
        > Piotr
        >
      • Gauffin Claes
        Hello Martha, ... First: Eb s clip is not quite complete. It uses a clever regular expression to catch the surnames from your data, but does not deal with what
        Message 3 of 30 , Feb 1, 2001
        • 0 Attachment
          Hello Martha,

          You wrote:

          > Thank you all for responding. I made Claes' corrections to Eb's clip
          > but cannot get it to work at all. What am I doing wrong?
          >
          > :loop
          > ; find (and select) the TD tag, that contains ","
          > ^!Find <TD [^/]+>[^,]+,[^<]*</TD> RSI
          > ^!IfError Done
          > ; strip off the html
          > ^!Set %name%=^$StrStripHTML("^$GetSelection$";0)$
          > ; copy the name to (including) the comma
          > ^!Set %Lname%=^$StrCopy(^%name%;1;^$StrPos(",";^%name%;1)$)$
          > ; strip off the comma
          > ^!Set %Lname%=^$StrDeleteRight(^%Lname%;1)$
          > ^!Goto loop
          > :Done
          >

          First:
          Eb's clip is not quite complete. It uses a clever regular expression to
          catch
          the surnames from your data, but does not deal with what you do with the
          extracted names. One possible completion of his clip could be this
          which places the extracted names in a new document:

          H="Extract surnames (version 1)"
          ^!ClearVariable %a%
          ^!Jump text_start
          :loop
          ; find (and select) the TD tag, that contains ","
          ^!Find <TD [^/]+>[^,]+,[^<]*</TD> RSI
          ^!IfError Done
          ; strip off the html
          ^!Set %name%=^$StrStripHTML("^$GetSelection$";0)$
          ; copy the name to (including) the comma
          ^!Set %Lname%=^$StrCopy(^%name%;1;^$StrPos(",";^%name%;1)$)$
          ; strip off the comma
          ^!Set %Lname%=^$StrDeleteRight(^%Lname%;1)$
          ^!append %a%=^%Lname%^p
          ^!Goto loop
          :Done
          ^!Toolbar New Document
          ^!Inserttext ^%a%


          Second:
          The regular expression approach suffers from being rather slow when
          executing. Therefore I would like to push a bit for the strip-html approach.

          Third:
          There seems to be some confusion on what it is you really want.
          Your mails say "surnames" but in a previous mail you wrote
          >...
          >When what I want is this:
          >
          >Reitz, Ed. G.
          >Reitz, Ida A.
          >Reitz, Edward W.
          >...
          which indicates that you want the full names.
          Eb's clip extracts the surnames.

          The following is a clip using html-strip (therefore quite fast) which
          will extract the full names.
          The result will be sorted. When sorting, you can choose whether
          you want duplicates removed or not.
          This is done by checking or unchecking
          View > Options > Tools > Sort Removes Duplicates
          If you do want just the surnames, this can be done too.

          H="Extract surnames (version 2)"
          ^!SetHintInfo Working...
          ^!SetScreenUpdate Off
          ^!SetWordWrap OFF
          ^!Jump TEXT_START
          ^!Select ALL
          ^!Keyboard SHIFT+CTRL+T
          ^!Replace "^t" >> "^pzzzzz" TWSAI
          ^!ToolBar Sort Ascending
          ^!Find "zzzzz" TIWS
          ^!Set %r%=^$getrow$
          ^!Jump text_end
          ^!SelectTo ^%r%:1
          ^!Keyboard DELETE

          Regards /Claes
        • Grant
          ... No it can t, but using reg exp to extract a tables first col surnames in an html doc is like using a chainsaw to cut butter. In comparison it took me about
          Message 4 of 30 , Feb 1, 2001
          • 0 Attachment
            > > If you want to take a look posted a DOM way to do this on the Notetab
            > > html list. subj: Extracting table data with the DOM Have heard 'reg
            > > exp' losing favour because of the power of the DOM. It certainly is a
            > > lot more intuitive than writing a reg exp to do the same thing. So if
            > > you want to check it out have a look.

            > I take interest in both DOM and regexes. DOM can get you the contents
            > of a tag, but can it check if these contents match a particular
            > pattern?

            No it can't, but using reg exp to extract a tables first col surnames in an
            html doc is like using a chainsaw to cut butter.
            In comparison it took me about 5 minutes to write that dom script to extract
            the tables first collum data because it's the right tool for this job.
            The dom provides an easy way to navigate text marked up with html or xhtml
            or xml while Reg expressions are good at finding patterns in the
            unstructured text. They are not competing technologies but complementary.
            Working with the dom I'm not pattern matching but working directly with the
            documents structured objects.
            the tables collection of rows and the first child of each row, to get the
            first td column.
            Having extracted the first col, if I want to find all the 'parkers' in that
            extracted data then using reg ex is handy.
          • Jody
            Hi Martha, ... It has been so long now I forget what it was and can t find it. I know it worked on whatever you sent in. At the present I do not have time for
            Message 5 of 30 , Feb 1, 2001
            • 0 Attachment
              Hi Martha,

              >I tried this, too. It stripped the HTML tags but it left
              >everything in a single column. I could take out several of them
              >but not all, by using search and replace. This can't be what you
              >mean because it took me more than a few seconds. Would you
              >please be a little more specific about what I need to do?

              It has been so long now I forget what it was and can't find it.
              I know it worked on whatever you sent in. At the present I do
              not have time for it though. Maybe the others are not working
              for you either because what you are sending in is not the same as
              what you are running the Clip over.

              I just saw you got it another way, so whatever works! :)

              Happy Clip'n!
              Jody

              http://www.notetab.net

              Subscribe, UnSubscribe, Options
              mailto:Ntb-Clips-Subscribe@yahoogroups.com
              mailto:Ntb-Clips-UnSubscribe@yahoogroups.com
              http://www.egroups.com/group/ntb-clips
            • Piotr Bienkowski
              ... Righto! Chisels are not for fixing tractors. :) Piotr
              Message 6 of 30 , Feb 3, 2001
              • 0 Attachment
                On 2 Feb 2001, at 10:22, Grant wrote:

                > No it can't, but using reg exp to extract a tables first col surnames
                > in an html doc is like using a chainsaw to cut butter. In comparison
                > it took me about 5 minutes to write that dom script to extract the
                > tables first collum data because it's the right tool for this job.

                Righto! Chisels are not for fixing tractors. :)

                Piotr
              • Jody
                Hi Piotr, ... It appears a few of them just got spit out. ... Happy Clip n! Jody http://www.notetab.net Subscribe, UnSubscribe, Options
                Message 7 of 30 , Feb 5, 2001
                • 0 Attachment
                  Hi Piotr,

                  >Wonded why my message popped up with a few days' delay... :(

                  It appears a few of them just got spit out.

                  > > If you want to take a look posted a DOM way to do this on the
                  > > Notetab html list. subj: Extracting table data with the DOM
                  > > Have heard 'reg exp' losing favour because of the power of the
                  > > DOM.

                  Happy Clip'n!
                  Jody

                  http://www.notetab.net

                  Subscribe, UnSubscribe, Options
                  mailto:Ntb-Clips-Subscribe@yahoogroups.com
                  mailto:Ntb-Clips-UnSubscribe@yahoogroups.com
                  http://www.egroups.com/group/ntb-clips
                • Luuk.Houwen@t-online.de
                  I would like to count the number of times my program gies through a loop. I tried the following line within the loop, but it does not work. Any ideas about
                  Message 8 of 30 , Feb 5, 2001
                  • 0 Attachment
                    I would like to count the number of times my program gies through a loop. I
                    tried the following line within the loop, but it does not work. Any ideas
                    about improving it?

                    ^!Set %Counter%=^$Calc(x=x+1)$

                    Luuk
                  Your message has been successfully submitted and would be delivered to recipients shortly.