Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] Need to extract surnames

Expand Messages
  • Michael Edmonson
    Martha what about something like this? (this works I tried it) It extracts only the name field. Note that there is a space after the ^P in the first find
    Message 1 of 30 , Jan 24, 2001
    • 0 Attachment
      Martha

      what about something like this? (this works I tried it) It extracts only the
      name field.
      Note that there is a space after the "^P" in the first find statement I
      believe that is correct as I stripped the reply ">"'s out of another message

      Michael Edmonson

      ^!SetHintInfo Only Name Field (thanks to Jody for helping me on this, in
      another application)
      ^!SetScreenUpdate Off
      ^!ClearVariable %Addies%

      ^!ClearVariable %MyRowX%
      ^!ClearVariable %MyColX%

      ^!ClearVariable %MyRowY%
      ^!ClearVariable %MyColY%

      :Loop
      ^!Find "<TR ALIGN="left" BGCOLOR="#FFEFD5">^P <TD ALIGN="left"
      BGCOLOR="#FFEFD5">" S
      ^!IfError Output
      ^!Jump Select_End
      ^!set %MyRowX%=^$GetRow$; %MyColX%=^$GetCol$
      ^!Find "</TD>^P" S
      ^!Jump Select_Start
      ^!set %MyRowY%=^$GetRow$; %MyColY%=^$GetCol$

      ^!SetCursor ^%MyRowX%:^%MyColX%
      ^!SelectTo ^%MyRowY%:^%MyColY%

      ^!Set %Address%=^$StrTrim("^$GetSelection$")$
      ^!Append %Addies%=^%Address%^%nl%
      ^!Goto Loop

      :Output
      ;^!Info ^$StrSort("^%Addies%";0;1;1)$
      ^!Info ^%Addies%
      ;!Toolbar New Document
      ; ^$StrSort("^%Addies%";0;1;1)$
      ; ^!SetWordWrap False
      ; ^!Jump 1

      ----- Original Message -----
      From: "Martha Hambrick Harrell" <mehharrell@...>
      To: <ntb-clips@...>
      Sent: Monday, January 22, 2001 8:03 PM
      Subject: [Clip] Need to extract surnames


      > Can someone please tell me how to extract only the name from the
      > following sample so that I can make an index? I have several of these
      > to do. Thank you. Martha
      >
      > <FONT FACE="Courier New, Courier" SIZE="2">
      > <TR ALIGN="left" BGCOLOR="#FFEFD5">
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">Parker, Victoria</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">wife</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">f</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">mu</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">21</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">_</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">TN</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">KY</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">IN</TD>
      > </TR>
      >
      > <FONT FACE="Courier New, Courier" SIZE="2">
      > <TR ALIGN="left" BGCOLOR="#FFEFD5">
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">Parker, Anna L.</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">dau</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">f</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">mu</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">3</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">_</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">AR</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">TN</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">TN</TD>
      > </TR>
      >
      > <FONT FACE="Courier New, Courier" SIZE="2">
      > <TR ALIGN="left" BGCOLOR="#FFEFD5">
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">Parker, Pauline</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">dau</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">f</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">mu</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">7/12</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">_</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">AR</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">TN</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">TN</TD>
      > </TR>
      >
      >
      >
      >
      >
      >
    • Martha Hambrick Harrell
      Thanks, Larry. I am working on doing just that right now. Martha
      Message 2 of 30 , Jan 24, 2001
      • 0 Attachment
        Thanks, Larry. I am working on doing just that right now.

        Martha


        Larry Hamilton wrote:

        > Martha,
        >
        > I understand. I know of no easy way to do it. One suggestion would
        > be to enter your data in such a way that future use of the data, in
        > whatever format, is easier to work with.
      • Martha Hambrick Harrell
        Eb, Claes & Elizabeth, Thank you all for responding. I made Claes corrections to Eb s clip but cannot get it to work at all. What am I doing wrong? (By the
        Message 3 of 30 , Jan 24, 2001
        • 0 Attachment
          Eb, Claes & Elizabeth,

          Thank you all for responding. I made Claes' corrections to Eb's clip
          but cannot get it to work at all. What am I doing wrong? (By the way,
          I might add that I still consider myself a newbie in the NoteTab Pro
          world!)

          :loop
          ; find (and select) the TD tag, that contains ","
          ^!Find <TD [^/]+>[^,]+,[^<]*</TD> RSI
          ^!IfError Done
          ; strip off the html
          ^!Set %name%=^$StrStripHTML("^$GetSelection$";0)$
          ; copy the name to (including) the comma
          ^!Set %Lname%=^$StrCopy(^%name%;1;^$StrPos(",";^%name%;1)$)$
          ; strip off the comma
          ^!Set %Lname%=^$StrDeleteRight(^%Lname%;1)$
          ^!Goto loop
          :Done


          I also tried Elizabeth's clip. It tried to work but didn't, probably
          because the number of lines between the names are not exactly the
          same.? If that's the only problem, I can probably fix that.

          ^!SetCursor 1:1 -- or whatever the first line is

          :Loop
          ^!Select Line
          ^!AppendToFile "index.txt" ^$GetSelection$
          ^!SetCursor ^$Calc(^$GetRow$+12)$:1 -- the plus factor should be
          however far it is to the next name
          ^!If ^$GetRow$=^$GetLineCount$ Skip
          ^!GoTo Loop


          Martha

          Gauffin Claes wrote:

          > Hi Eb,
          >
          > Pretty clip. There are two typos though:
          > ^!Set %name%=^$StrStripHTML("^$GetSelection";0)$
          > should be
          > ^!Set %name%=^$StrStripHTML("^$GetSelection$";0)$
          >
          > and
          > ^!Set %Lname%=^$StrCopy(^%name%;1;^$StrPos(",";^%name%;1$)$
          > should be
          > ^!Set %Lname%=^$StrCopy(^%name%;1;^$StrPos(",";^%name%;1)$)$
          >
          > Regards /Claes
          >
        • Martha Hambrick Harrell
          Jody, I tried this, too. It stripped the HTML tags but it left everything in a single column. I could take out several of them but not all, by using search
          Message 4 of 30 , Jan 24, 2001
          • 0 Attachment
            Jody,

            I tried this, too. It stripped the HTML tags but it left everything in
            a single column. I could take out several of them but not all, by using
            search and replace. This can't be what you mean because it took me more
            than a few seconds. Would you please be a little more specific about
            what I need to do?

            Thank you,
            Martha


            Jody wrote:

            > Hi Martha,
            >
            > >We are talking about hundreds of names here. I had done the last
            > >two indexes the way Harvey suggested. It takes soooo long. I
            > >was just hoping there was an easier way.
            >
            > My method would take a few seconds, I speak figuratively.
            >
            > Bye for now,
            > Jody Adair
            > Prov. 3:5-7; 4:23
            >
            > http://www.purewords.org/sojourner
            > http://www.purewords.org/kjb1611
            > http://www.notetab.net
          • Piotr Bienkowski
            ... Hi, I take interest in both DOM and regexes. DOM can get you the contents of a tag, but can it check if these contens match a particular pattern? Piotr
            Message 5 of 30 , Jan 25, 2001
            • 0 Attachment
              On 18 Jan 2001, at 11:19, Grant wrote:

              > If you want to take a look posted a DOM way to do this on the Notetab
              > html list. subj: Extracting table data with the DOM Have heard 'reg
              > exp' losing favour because of the power of the DOM. It certainly is a
              > lot more intuitive than writing a reg exp to do the same thing. So if
              > you want to check it out have a look.
              >
              >
              Hi,

              I take interest in both DOM and regexes. DOM can get you the contents
              of a tag, but can it check if these contens match a particular
              pattern?

              Piotr
            • Michael Edmonson
              Martha what about something like this? (this works I tried it) It extracts only the name field. Note that there is a space after the ^P in the first find
              Message 6 of 30 , Jan 25, 2001
              • 0 Attachment
                Martha

                what about something like this? (this works I tried it) It extracts only the
                name field.
                Note that there is a space after the "^P" in the first find statement I
                believe that is correct as I stripped the reply ">"'s out of another message

                Michael Edmonson

                ^!SetHintInfo Only Name Field (thanks to Jody for helping me on this, in
                another application)
                ^!SetScreenUpdate Off
                ^!ClearVariable %Addies%

                ^!ClearVariable %MyRowX%
                ^!ClearVariable %MyColX%

                ^!ClearVariable %MyRowY%
                ^!ClearVariable %MyColY%

                :Loop
                ^!Find "<TR ALIGN="left" BGCOLOR="#FFEFD5">^P <TD ALIGN="left"
                BGCOLOR="#FFEFD5">" S
                ^!IfError Output
                ^!Jump Select_End
                ^!set %MyRowX%=^$GetRow$; %MyColX%=^$GetCol$
                ^!Find "</TD>^P" S
                ^!Jump Select_Start
                ^!set %MyRowY%=^$GetRow$; %MyColY%=^$GetCol$

                ^!SetCursor ^%MyRowX%:^%MyColX%
                ^!SelectTo ^%MyRowY%:^%MyColY%

                ^!Set %Address%=^$StrTrim("^$GetSelection$")$
                ^!Append %Addies%=^%Address%^%nl%
                ^!Goto Loop

                :Output
                ;^!Info ^$StrSort("^%Addies%";0;1;1)$
                ^!Info ^%Addies%
                ;!Toolbar New Document
                ; ^$StrSort("^%Addies%";0;1;1)$
                ; ^!SetWordWrap False
                ; ^!Jump 1

                ----- Original Message -----
                From: "Martha Hambrick Harrell" <mehharrell@...>
                To: <ntb-clips@...>
                Sent: Monday, January 22, 2001 8:03 PM
                Subject: [Clip] Need to extract surnames


                > Can someone please tell me how to extract only the name from the
                > following sample so that I can make an index? I have several of these
                > to do. Thank you. Martha
                >
                > <FONT FACE="Courier New, Courier" SIZE="2">
                > <TR ALIGN="left" BGCOLOR="#FFEFD5">
                > <TD ALIGN="left" BGCOLOR="#FFEFD5">Parker, Victoria</TD>
                > <TD ALIGN="left" BGCOLOR="#FFEFD5">wife</TD>
                > <TD ALIGN="left" BGCOLOR="#FFEFD5">f</TD>
                > <TD ALIGN="left" BGCOLOR="#FFEFD5">mu</TD>
                > <TD ALIGN="left" BGCOLOR="#FFEFD5">21</TD>
                > <TD ALIGN="left" BGCOLOR="#FFEFD5">_</TD>
                > <TD ALIGN="left" BGCOLOR="#FFEFD5">TN</TD>
                > <TD ALIGN="left" BGCOLOR="#FFEFD5">KY</TD>
                > <TD ALIGN="left" BGCOLOR="#FFEFD5">IN</TD>
                > </TR>
              • Piotr Bienkowski
                Wonded why my message popped up with a few days delay... :(
                Message 7 of 30 , Feb 1, 2001
                • 0 Attachment
                  Wonded why my message popped up with a few days' delay... :(

                  On 25 Jan 2001, at 11:33, Piotr Bienkowski wrote:

                  > On 18 Jan 2001, at 11:19, Grant wrote:
                  >
                  > > If you want to take a look posted a DOM way to do this on the
                  > > Notetab html list. subj: Extracting table data with the DOM Have
                  > > heard 'reg exp' losing favour because of the power of the DOM. It
                  > > certainly is a lot more intuitive than writing a reg exp to do the
                  > > same thing. So if you want to check it out have a look.
                  > >
                  > >
                  > Hi,
                  >
                  > I take interest in both DOM and regexes. DOM can get you the contents
                  > of a tag, but can it check if these contens match a particular
                  > pattern?
                  >
                  > Piotr
                  >
                • Gauffin Claes
                  Hello Martha, ... First: Eb s clip is not quite complete. It uses a clever regular expression to catch the surnames from your data, but does not deal with what
                  Message 8 of 30 , Feb 1, 2001
                  • 0 Attachment
                    Hello Martha,

                    You wrote:

                    > Thank you all for responding. I made Claes' corrections to Eb's clip
                    > but cannot get it to work at all. What am I doing wrong?
                    >
                    > :loop
                    > ; find (and select) the TD tag, that contains ","
                    > ^!Find <TD [^/]+>[^,]+,[^<]*</TD> RSI
                    > ^!IfError Done
                    > ; strip off the html
                    > ^!Set %name%=^$StrStripHTML("^$GetSelection$";0)$
                    > ; copy the name to (including) the comma
                    > ^!Set %Lname%=^$StrCopy(^%name%;1;^$StrPos(",";^%name%;1)$)$
                    > ; strip off the comma
                    > ^!Set %Lname%=^$StrDeleteRight(^%Lname%;1)$
                    > ^!Goto loop
                    > :Done
                    >

                    First:
                    Eb's clip is not quite complete. It uses a clever regular expression to
                    catch
                    the surnames from your data, but does not deal with what you do with the
                    extracted names. One possible completion of his clip could be this
                    which places the extracted names in a new document:

                    H="Extract surnames (version 1)"
                    ^!ClearVariable %a%
                    ^!Jump text_start
                    :loop
                    ; find (and select) the TD tag, that contains ","
                    ^!Find <TD [^/]+>[^,]+,[^<]*</TD> RSI
                    ^!IfError Done
                    ; strip off the html
                    ^!Set %name%=^$StrStripHTML("^$GetSelection$";0)$
                    ; copy the name to (including) the comma
                    ^!Set %Lname%=^$StrCopy(^%name%;1;^$StrPos(",";^%name%;1)$)$
                    ; strip off the comma
                    ^!Set %Lname%=^$StrDeleteRight(^%Lname%;1)$
                    ^!append %a%=^%Lname%^p
                    ^!Goto loop
                    :Done
                    ^!Toolbar New Document
                    ^!Inserttext ^%a%


                    Second:
                    The regular expression approach suffers from being rather slow when
                    executing. Therefore I would like to push a bit for the strip-html approach.

                    Third:
                    There seems to be some confusion on what it is you really want.
                    Your mails say "surnames" but in a previous mail you wrote
                    >...
                    >When what I want is this:
                    >
                    >Reitz, Ed. G.
                    >Reitz, Ida A.
                    >Reitz, Edward W.
                    >...
                    which indicates that you want the full names.
                    Eb's clip extracts the surnames.

                    The following is a clip using html-strip (therefore quite fast) which
                    will extract the full names.
                    The result will be sorted. When sorting, you can choose whether
                    you want duplicates removed or not.
                    This is done by checking or unchecking
                    View > Options > Tools > Sort Removes Duplicates
                    If you do want just the surnames, this can be done too.

                    H="Extract surnames (version 2)"
                    ^!SetHintInfo Working...
                    ^!SetScreenUpdate Off
                    ^!SetWordWrap OFF
                    ^!Jump TEXT_START
                    ^!Select ALL
                    ^!Keyboard SHIFT+CTRL+T
                    ^!Replace "^t" >> "^pzzzzz" TWSAI
                    ^!ToolBar Sort Ascending
                    ^!Find "zzzzz" TIWS
                    ^!Set %r%=^$getrow$
                    ^!Jump text_end
                    ^!SelectTo ^%r%:1
                    ^!Keyboard DELETE

                    Regards /Claes
                  • Grant
                    ... No it can t, but using reg exp to extract a tables first col surnames in an html doc is like using a chainsaw to cut butter. In comparison it took me about
                    Message 9 of 30 , Feb 1, 2001
                    • 0 Attachment
                      > > If you want to take a look posted a DOM way to do this on the Notetab
                      > > html list. subj: Extracting table data with the DOM Have heard 'reg
                      > > exp' losing favour because of the power of the DOM. It certainly is a
                      > > lot more intuitive than writing a reg exp to do the same thing. So if
                      > > you want to check it out have a look.

                      > I take interest in both DOM and regexes. DOM can get you the contents
                      > of a tag, but can it check if these contents match a particular
                      > pattern?

                      No it can't, but using reg exp to extract a tables first col surnames in an
                      html doc is like using a chainsaw to cut butter.
                      In comparison it took me about 5 minutes to write that dom script to extract
                      the tables first collum data because it's the right tool for this job.
                      The dom provides an easy way to navigate text marked up with html or xhtml
                      or xml while Reg expressions are good at finding patterns in the
                      unstructured text. They are not competing technologies but complementary.
                      Working with the dom I'm not pattern matching but working directly with the
                      documents structured objects.
                      the tables collection of rows and the first child of each row, to get the
                      first td column.
                      Having extracted the first col, if I want to find all the 'parkers' in that
                      extracted data then using reg ex is handy.
                    • Jody
                      Hi Martha, ... It has been so long now I forget what it was and can t find it. I know it worked on whatever you sent in. At the present I do not have time for
                      Message 10 of 30 , Feb 1, 2001
                      • 0 Attachment
                        Hi Martha,

                        >I tried this, too. It stripped the HTML tags but it left
                        >everything in a single column. I could take out several of them
                        >but not all, by using search and replace. This can't be what you
                        >mean because it took me more than a few seconds. Would you
                        >please be a little more specific about what I need to do?

                        It has been so long now I forget what it was and can't find it.
                        I know it worked on whatever you sent in. At the present I do
                        not have time for it though. Maybe the others are not working
                        for you either because what you are sending in is not the same as
                        what you are running the Clip over.

                        I just saw you got it another way, so whatever works! :)

                        Happy Clip'n!
                        Jody

                        http://www.notetab.net

                        Subscribe, UnSubscribe, Options
                        mailto:Ntb-Clips-Subscribe@yahoogroups.com
                        mailto:Ntb-Clips-UnSubscribe@yahoogroups.com
                        http://www.egroups.com/group/ntb-clips
                      • Piotr Bienkowski
                        ... Righto! Chisels are not for fixing tractors. :) Piotr
                        Message 11 of 30 , Feb 3, 2001
                        • 0 Attachment
                          On 2 Feb 2001, at 10:22, Grant wrote:

                          > No it can't, but using reg exp to extract a tables first col surnames
                          > in an html doc is like using a chainsaw to cut butter. In comparison
                          > it took me about 5 minutes to write that dom script to extract the
                          > tables first collum data because it's the right tool for this job.

                          Righto! Chisels are not for fixing tractors. :)

                          Piotr
                        • Jody
                          Hi Piotr, ... It appears a few of them just got spit out. ... Happy Clip n! Jody http://www.notetab.net Subscribe, UnSubscribe, Options
                          Message 12 of 30 , Feb 5, 2001
                          • 0 Attachment
                            Hi Piotr,

                            >Wonded why my message popped up with a few days' delay... :(

                            It appears a few of them just got spit out.

                            > > If you want to take a look posted a DOM way to do this on the
                            > > Notetab html list. subj: Extracting table data with the DOM
                            > > Have heard 'reg exp' losing favour because of the power of the
                            > > DOM.

                            Happy Clip'n!
                            Jody

                            http://www.notetab.net

                            Subscribe, UnSubscribe, Options
                            mailto:Ntb-Clips-Subscribe@yahoogroups.com
                            mailto:Ntb-Clips-UnSubscribe@yahoogroups.com
                            http://www.egroups.com/group/ntb-clips
                          • Luuk.Houwen@t-online.de
                            I would like to count the number of times my program gies through a loop. I tried the following line within the loop, but it does not work. Any ideas about
                            Message 13 of 30 , Feb 5, 2001
                            • 0 Attachment
                              I would like to count the number of times my program gies through a loop. I
                              tried the following line within the loop, but it does not work. Any ideas
                              about improving it?

                              ^!Set %Counter%=^$Calc(x=x+1)$

                              Luuk
                            Your message has been successfully submitted and would be delivered to recipients shortly.