Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] Need to extract surnames

Expand Messages
  • Michael Edmonson
    Martha what about something like this? (this works I tried it) It extracts only the name field. Note that there is a space after the ^P in the first find
    Message 1 of 30 , Jan 24, 2001
    • 0 Attachment
      Martha

      what about something like this? (this works I tried it) It extracts only the
      name field.
      Note that there is a space after the "^P" in the first find statement I
      believe that is correct as I stripped the reply ">"'s out of another message

      Michael Edmonson

      ^!SetHintInfo Only Name Field (thanks to Jody for helping me on this, in
      another application)
      ^!SetScreenUpdate Off
      ^!ClearVariable %Addies%

      ^!ClearVariable %MyRowX%
      ^!ClearVariable %MyColX%

      ^!ClearVariable %MyRowY%
      ^!ClearVariable %MyColY%

      :Loop
      ^!Find "<TR ALIGN="left" BGCOLOR="#FFEFD5">^P <TD ALIGN="left"
      BGCOLOR="#FFEFD5">" S
      ^!IfError Output
      ^!Jump Select_End
      ^!set %MyRowX%=^$GetRow$; %MyColX%=^$GetCol$
      ^!Find "</TD>^P" S
      ^!Jump Select_Start
      ^!set %MyRowY%=^$GetRow$; %MyColY%=^$GetCol$

      ^!SetCursor ^%MyRowX%:^%MyColX%
      ^!SelectTo ^%MyRowY%:^%MyColY%

      ^!Set %Address%=^$StrTrim("^$GetSelection$")$
      ^!Append %Addies%=^%Address%^%nl%
      ^!Goto Loop

      :Output
      ;^!Info ^$StrSort("^%Addies%";0;1;1)$
      ^!Info ^%Addies%
      ;!Toolbar New Document
      ; ^$StrSort("^%Addies%";0;1;1)$
      ; ^!SetWordWrap False
      ; ^!Jump 1

      ----- Original Message -----
      From: "Martha Hambrick Harrell" <mehharrell@...>
      To: <ntb-clips@...>
      Sent: Monday, January 22, 2001 8:03 PM
      Subject: [Clip] Need to extract surnames


      > Can someone please tell me how to extract only the name from the
      > following sample so that I can make an index? I have several of these
      > to do. Thank you. Martha
      >
      > <FONT FACE="Courier New, Courier" SIZE="2">
      > <TR ALIGN="left" BGCOLOR="#FFEFD5">
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">Parker, Victoria</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">wife</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">f</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">mu</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">21</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">_</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">TN</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">KY</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">IN</TD>
      > </TR>
      >
      > <FONT FACE="Courier New, Courier" SIZE="2">
      > <TR ALIGN="left" BGCOLOR="#FFEFD5">
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">Parker, Anna L.</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">dau</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">f</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">mu</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">3</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">_</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">AR</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">TN</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">TN</TD>
      > </TR>
      >
      > <FONT FACE="Courier New, Courier" SIZE="2">
      > <TR ALIGN="left" BGCOLOR="#FFEFD5">
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">Parker, Pauline</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">dau</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">f</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">mu</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">7/12</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">_</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">AR</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">TN</TD>
      > <TD ALIGN="left" BGCOLOR="#FFEFD5">TN</TD>
      > </TR>
      >
      >
      >
      >
      >
      >
    • Martha Hambrick Harrell
      Thanks, Larry. I am working on doing just that right now. Martha
      Message 2 of 30 , Jan 24, 2001
      • 0 Attachment
        Thanks, Larry. I am working on doing just that right now.

        Martha


        Larry Hamilton wrote:

        > Martha,
        >
        > I understand. I know of no easy way to do it. One suggestion would
        > be to enter your data in such a way that future use of the data, in
        > whatever format, is easier to work with.
      • Martha Hambrick Harrell
        Eb, Claes & Elizabeth, Thank you all for responding. I made Claes corrections to Eb s clip but cannot get it to work at all. What am I doing wrong? (By the
        Message 3 of 30 , Jan 24, 2001
        • 0 Attachment
          Eb, Claes & Elizabeth,

          Thank you all for responding. I made Claes' corrections to Eb's clip
          but cannot get it to work at all. What am I doing wrong? (By the way,
          I might add that I still consider myself a newbie in the NoteTab Pro
          world!)

          :loop
          ; find (and select) the TD tag, that contains ","
          ^!Find <TD [^/]+>[^,]+,[^<]*</TD> RSI
          ^!IfError Done
          ; strip off the html
          ^!Set %name%=^$StrStripHTML("^$GetSelection$";0)$
          ; copy the name to (including) the comma
          ^!Set %Lname%=^$StrCopy(^%name%;1;^$StrPos(",";^%name%;1)$)$
          ; strip off the comma
          ^!Set %Lname%=^$StrDeleteRight(^%Lname%;1)$
          ^!Goto loop
          :Done


          I also tried Elizabeth's clip. It tried to work but didn't, probably
          because the number of lines between the names are not exactly the
          same.? If that's the only problem, I can probably fix that.

          ^!SetCursor 1:1 -- or whatever the first line is

          :Loop
          ^!Select Line
          ^!AppendToFile "index.txt" ^$GetSelection$
          ^!SetCursor ^$Calc(^$GetRow$+12)$:1 -- the plus factor should be
          however far it is to the next name
          ^!If ^$GetRow$=^$GetLineCount$ Skip
          ^!GoTo Loop


          Martha

          Gauffin Claes wrote:

          > Hi Eb,
          >
          > Pretty clip. There are two typos though:
          > ^!Set %name%=^$StrStripHTML("^$GetSelection";0)$
          > should be
          > ^!Set %name%=^$StrStripHTML("^$GetSelection$";0)$
          >
          > and
          > ^!Set %Lname%=^$StrCopy(^%name%;1;^$StrPos(",";^%name%;1$)$
          > should be
          > ^!Set %Lname%=^$StrCopy(^%name%;1;^$StrPos(",";^%name%;1)$)$
          >
          > Regards /Claes
          >
        • Martha Hambrick Harrell
          Jody, I tried this, too. It stripped the HTML tags but it left everything in a single column. I could take out several of them but not all, by using search
          Message 4 of 30 , Jan 24, 2001
          • 0 Attachment
            Jody,

            I tried this, too. It stripped the HTML tags but it left everything in
            a single column. I could take out several of them but not all, by using
            search and replace. This can't be what you mean because it took me more
            than a few seconds. Would you please be a little more specific about
            what I need to do?

            Thank you,
            Martha


            Jody wrote:

            > Hi Martha,
            >
            > >We are talking about hundreds of names here. I had done the last
            > >two indexes the way Harvey suggested. It takes soooo long. I
            > >was just hoping there was an easier way.
            >
            > My method would take a few seconds, I speak figuratively.
            >
            > Bye for now,
            > Jody Adair
            > Prov. 3:5-7; 4:23
            >
            > http://www.purewords.org/sojourner
            > http://www.purewords.org/kjb1611
            > http://www.notetab.net
          • Martha Hambrick Harrell
            Hi, Folks! I must still be doing something wrong here. I ve tried all your suggestions but no matter what, I get this: Reitz, Ed. G. head m w 35 Laborer IL
            Message 5 of 30 , Jan 24, 2001
            • 0 Attachment
              Hi, Folks!

              I must still be doing something wrong here. I've tried all your suggestions but no matter what, I get this:

              Reitz, Ed. G. head m w 35 Laborer IL Ger Ger
              Reitz, Ida A. wife f w 28 _ AR IN IL
              Reitz, Edward W. son m w 9 _ AR IL AR
              Reitz, Earl A. son m w 1 5/12 _ AR IL AR
              Hoffman, John head m w 67 Own Income Switz/Ger Switz/Ger Switz/Ger
              Hoffman, Catherine wife f w 67 _ Switz/Ger Switz/Ger Switz/Ger
              Payer, Francis head m w 86 Own Income Hung/Magyer Hung/Magyer Hung/Magyer
              Payer, Frederica wife f w 74 _ Ger Ger Ger
              Reitz, Louisa, Mrs. dau f w 47 Retail Merchant - Grocery IL Hung/Magyer Ger
              Reitz, Louise F. Granddau f w 20 Saleswoman - Grocery AR IL IL

              When what I want is this:

              Reitz, Ed. G.
              Reitz, Ida A.
              Reitz, Edward W.
              Reitz, Earl A.
              Hoffman, John
              Hoffman, Catherine
              Payer, Francis
              Payer, Frederica
              Reitz, Louisa, Mrs.
              Reitz, Louise F.

              Thank you for your patience.

              Martha

              Jody wrote:

              > The problem with that method is NoteTab does not use CR/LF as a line bral. I treats it just like a browers. Sso, the Clip would ne:
              >
              > H=Make Index
              > ^!Jump 1
              > ^!SetPasteIndent Off
              > ^!Replace "<TD>" >> "</TD><BR>" WASI
              > ^!Keyboard Shift+Ctrl+T
              > ^!Replace "^t" >> "^p" WASI
            • Piotr Bienkowski
              ... Hi, I take interest in both DOM and regexes. DOM can get you the contents of a tag, but can it check if these contens match a particular pattern? Piotr
              Message 6 of 30 , Jan 25, 2001
              • 0 Attachment
                On 18 Jan 2001, at 11:19, Grant wrote:

                > If you want to take a look posted a DOM way to do this on the Notetab
                > html list. subj: Extracting table data with the DOM Have heard 'reg
                > exp' losing favour because of the power of the DOM. It certainly is a
                > lot more intuitive than writing a reg exp to do the same thing. So if
                > you want to check it out have a look.
                >
                >
                Hi,

                I take interest in both DOM and regexes. DOM can get you the contents
                of a tag, but can it check if these contens match a particular
                pattern?

                Piotr
              • Michael Edmonson
                Martha what about something like this? (this works I tried it) It extracts only the name field. Note that there is a space after the ^P in the first find
                Message 7 of 30 , Jan 25, 2001
                • 0 Attachment
                  Martha

                  what about something like this? (this works I tried it) It extracts only the
                  name field.
                  Note that there is a space after the "^P" in the first find statement I
                  believe that is correct as I stripped the reply ">"'s out of another message

                  Michael Edmonson

                  ^!SetHintInfo Only Name Field (thanks to Jody for helping me on this, in
                  another application)
                  ^!SetScreenUpdate Off
                  ^!ClearVariable %Addies%

                  ^!ClearVariable %MyRowX%
                  ^!ClearVariable %MyColX%

                  ^!ClearVariable %MyRowY%
                  ^!ClearVariable %MyColY%

                  :Loop
                  ^!Find "<TR ALIGN="left" BGCOLOR="#FFEFD5">^P <TD ALIGN="left"
                  BGCOLOR="#FFEFD5">" S
                  ^!IfError Output
                  ^!Jump Select_End
                  ^!set %MyRowX%=^$GetRow$; %MyColX%=^$GetCol$
                  ^!Find "</TD>^P" S
                  ^!Jump Select_Start
                  ^!set %MyRowY%=^$GetRow$; %MyColY%=^$GetCol$

                  ^!SetCursor ^%MyRowX%:^%MyColX%
                  ^!SelectTo ^%MyRowY%:^%MyColY%

                  ^!Set %Address%=^$StrTrim("^$GetSelection$")$
                  ^!Append %Addies%=^%Address%^%nl%
                  ^!Goto Loop

                  :Output
                  ;^!Info ^$StrSort("^%Addies%";0;1;1)$
                  ^!Info ^%Addies%
                  ;!Toolbar New Document
                  ; ^$StrSort("^%Addies%";0;1;1)$
                  ; ^!SetWordWrap False
                  ; ^!Jump 1

                  ----- Original Message -----
                  From: "Martha Hambrick Harrell" <mehharrell@...>
                  To: <ntb-clips@...>
                  Sent: Monday, January 22, 2001 8:03 PM
                  Subject: [Clip] Need to extract surnames


                  > Can someone please tell me how to extract only the name from the
                  > following sample so that I can make an index? I have several of these
                  > to do. Thank you. Martha
                  >
                  > <FONT FACE="Courier New, Courier" SIZE="2">
                  > <TR ALIGN="left" BGCOLOR="#FFEFD5">
                  > <TD ALIGN="left" BGCOLOR="#FFEFD5">Parker, Victoria</TD>
                  > <TD ALIGN="left" BGCOLOR="#FFEFD5">wife</TD>
                  > <TD ALIGN="left" BGCOLOR="#FFEFD5">f</TD>
                  > <TD ALIGN="left" BGCOLOR="#FFEFD5">mu</TD>
                  > <TD ALIGN="left" BGCOLOR="#FFEFD5">21</TD>
                  > <TD ALIGN="left" BGCOLOR="#FFEFD5">_</TD>
                  > <TD ALIGN="left" BGCOLOR="#FFEFD5">TN</TD>
                  > <TD ALIGN="left" BGCOLOR="#FFEFD5">KY</TD>
                  > <TD ALIGN="left" BGCOLOR="#FFEFD5">IN</TD>
                  > </TR>
                • Piotr Bienkowski
                  Wonded why my message popped up with a few days delay... :(
                  Message 8 of 30 , Feb 1 6:08 AM
                  • 0 Attachment
                    Wonded why my message popped up with a few days' delay... :(

                    On 25 Jan 2001, at 11:33, Piotr Bienkowski wrote:

                    > On 18 Jan 2001, at 11:19, Grant wrote:
                    >
                    > > If you want to take a look posted a DOM way to do this on the
                    > > Notetab html list. subj: Extracting table data with the DOM Have
                    > > heard 'reg exp' losing favour because of the power of the DOM. It
                    > > certainly is a lot more intuitive than writing a reg exp to do the
                    > > same thing. So if you want to check it out have a look.
                    > >
                    > >
                    > Hi,
                    >
                    > I take interest in both DOM and regexes. DOM can get you the contents
                    > of a tag, but can it check if these contens match a particular
                    > pattern?
                    >
                    > Piotr
                    >
                  • Gauffin Claes
                    Hello Martha, ... First: Eb s clip is not quite complete. It uses a clever regular expression to catch the surnames from your data, but does not deal with what
                    Message 9 of 30 , Feb 1 8:36 AM
                    • 0 Attachment
                      Hello Martha,

                      You wrote:

                      > Thank you all for responding. I made Claes' corrections to Eb's clip
                      > but cannot get it to work at all. What am I doing wrong?
                      >
                      > :loop
                      > ; find (and select) the TD tag, that contains ","
                      > ^!Find <TD [^/]+>[^,]+,[^<]*</TD> RSI
                      > ^!IfError Done
                      > ; strip off the html
                      > ^!Set %name%=^$StrStripHTML("^$GetSelection$";0)$
                      > ; copy the name to (including) the comma
                      > ^!Set %Lname%=^$StrCopy(^%name%;1;^$StrPos(",";^%name%;1)$)$
                      > ; strip off the comma
                      > ^!Set %Lname%=^$StrDeleteRight(^%Lname%;1)$
                      > ^!Goto loop
                      > :Done
                      >

                      First:
                      Eb's clip is not quite complete. It uses a clever regular expression to
                      catch
                      the surnames from your data, but does not deal with what you do with the
                      extracted names. One possible completion of his clip could be this
                      which places the extracted names in a new document:

                      H="Extract surnames (version 1)"
                      ^!ClearVariable %a%
                      ^!Jump text_start
                      :loop
                      ; find (and select) the TD tag, that contains ","
                      ^!Find <TD [^/]+>[^,]+,[^<]*</TD> RSI
                      ^!IfError Done
                      ; strip off the html
                      ^!Set %name%=^$StrStripHTML("^$GetSelection$";0)$
                      ; copy the name to (including) the comma
                      ^!Set %Lname%=^$StrCopy(^%name%;1;^$StrPos(",";^%name%;1)$)$
                      ; strip off the comma
                      ^!Set %Lname%=^$StrDeleteRight(^%Lname%;1)$
                      ^!append %a%=^%Lname%^p
                      ^!Goto loop
                      :Done
                      ^!Toolbar New Document
                      ^!Inserttext ^%a%


                      Second:
                      The regular expression approach suffers from being rather slow when
                      executing. Therefore I would like to push a bit for the strip-html approach.

                      Third:
                      There seems to be some confusion on what it is you really want.
                      Your mails say "surnames" but in a previous mail you wrote
                      >...
                      >When what I want is this:
                      >
                      >Reitz, Ed. G.
                      >Reitz, Ida A.
                      >Reitz, Edward W.
                      >...
                      which indicates that you want the full names.
                      Eb's clip extracts the surnames.

                      The following is a clip using html-strip (therefore quite fast) which
                      will extract the full names.
                      The result will be sorted. When sorting, you can choose whether
                      you want duplicates removed or not.
                      This is done by checking or unchecking
                      View > Options > Tools > Sort Removes Duplicates
                      If you do want just the surnames, this can be done too.

                      H="Extract surnames (version 2)"
                      ^!SetHintInfo Working...
                      ^!SetScreenUpdate Off
                      ^!SetWordWrap OFF
                      ^!Jump TEXT_START
                      ^!Select ALL
                      ^!Keyboard SHIFT+CTRL+T
                      ^!Replace "^t" >> "^pzzzzz" TWSAI
                      ^!ToolBar Sort Ascending
                      ^!Find "zzzzz" TIWS
                      ^!Set %r%=^$getrow$
                      ^!Jump text_end
                      ^!SelectTo ^%r%:1
                      ^!Keyboard DELETE

                      Regards /Claes
                    • Grant
                      ... No it can t, but using reg exp to extract a tables first col surnames in an html doc is like using a chainsaw to cut butter. In comparison it took me about
                      Message 10 of 30 , Feb 1 1:22 PM
                      • 0 Attachment
                        > > If you want to take a look posted a DOM way to do this on the Notetab
                        > > html list. subj: Extracting table data with the DOM Have heard 'reg
                        > > exp' losing favour because of the power of the DOM. It certainly is a
                        > > lot more intuitive than writing a reg exp to do the same thing. So if
                        > > you want to check it out have a look.

                        > I take interest in both DOM and regexes. DOM can get you the contents
                        > of a tag, but can it check if these contents match a particular
                        > pattern?

                        No it can't, but using reg exp to extract a tables first col surnames in an
                        html doc is like using a chainsaw to cut butter.
                        In comparison it took me about 5 minutes to write that dom script to extract
                        the tables first collum data because it's the right tool for this job.
                        The dom provides an easy way to navigate text marked up with html or xhtml
                        or xml while Reg expressions are good at finding patterns in the
                        unstructured text. They are not competing technologies but complementary.
                        Working with the dom I'm not pattern matching but working directly with the
                        documents structured objects.
                        the tables collection of rows and the first child of each row, to get the
                        first td column.
                        Having extracted the first col, if I want to find all the 'parkers' in that
                        extracted data then using reg ex is handy.
                      • Jody
                        Hi Martha, ... It has been so long now I forget what it was and can t find it. I know it worked on whatever you sent in. At the present I do not have time for
                        Message 11 of 30 , Feb 1 11:46 PM
                        • 0 Attachment
                          Hi Martha,

                          >I tried this, too. It stripped the HTML tags but it left
                          >everything in a single column. I could take out several of them
                          >but not all, by using search and replace. This can't be what you
                          >mean because it took me more than a few seconds. Would you
                          >please be a little more specific about what I need to do?

                          It has been so long now I forget what it was and can't find it.
                          I know it worked on whatever you sent in. At the present I do
                          not have time for it though. Maybe the others are not working
                          for you either because what you are sending in is not the same as
                          what you are running the Clip over.

                          I just saw you got it another way, so whatever works! :)

                          Happy Clip'n!
                          Jody

                          http://www.notetab.net

                          Subscribe, UnSubscribe, Options
                          mailto:Ntb-Clips-Subscribe@yahoogroups.com
                          mailto:Ntb-Clips-UnSubscribe@yahoogroups.com
                          http://www.egroups.com/group/ntb-clips
                        • Piotr Bienkowski
                          ... Righto! Chisels are not for fixing tractors. :) Piotr
                          Message 12 of 30 , Feb 3 5:28 AM
                          • 0 Attachment
                            On 2 Feb 2001, at 10:22, Grant wrote:

                            > No it can't, but using reg exp to extract a tables first col surnames
                            > in an html doc is like using a chainsaw to cut butter. In comparison
                            > it took me about 5 minutes to write that dom script to extract the
                            > tables first collum data because it's the right tool for this job.

                            Righto! Chisels are not for fixing tractors. :)

                            Piotr
                          • Jody
                            Hi Piotr, ... It appears a few of them just got spit out. ... Happy Clip n! Jody http://www.notetab.net Subscribe, UnSubscribe, Options
                            Message 13 of 30 , Feb 5 12:02 PM
                            • 0 Attachment
                              Hi Piotr,

                              >Wonded why my message popped up with a few days' delay... :(

                              It appears a few of them just got spit out.

                              > > If you want to take a look posted a DOM way to do this on the
                              > > Notetab html list. subj: Extracting table data with the DOM
                              > > Have heard 'reg exp' losing favour because of the power of the
                              > > DOM.

                              Happy Clip'n!
                              Jody

                              http://www.notetab.net

                              Subscribe, UnSubscribe, Options
                              mailto:Ntb-Clips-Subscribe@yahoogroups.com
                              mailto:Ntb-Clips-UnSubscribe@yahoogroups.com
                              http://www.egroups.com/group/ntb-clips
                            • Luuk.Houwen@t-online.de
                              I would like to count the number of times my program gies through a loop. I tried the following line within the loop, but it does not work. Any ideas about
                              Message 14 of 30 , Feb 5 1:42 PM
                              • 0 Attachment
                                I would like to count the number of times my program gies through a loop. I
                                tried the following line within the loop, but it does not work. Any ideas
                                about improving it?

                                ^!Set %Counter%=^$Calc(x=x+1)$

                                Luuk
                              Your message has been successfully submitted and would be delivered to recipients shortly.