Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] Need to extract surnames

Expand Messages
  • Martha Hambrick Harrell
    Thanks, Larry. I am working on doing just that right now. Martha
    Message 1 of 30 , Jan 24, 2001
    • 0 Attachment
      Thanks, Larry. I am working on doing just that right now.

      Martha


      Larry Hamilton wrote:

      > Martha,
      >
      > I understand. I know of no easy way to do it. One suggestion would
      > be to enter your data in such a way that future use of the data, in
      > whatever format, is easier to work with.
    • Martha Hambrick Harrell
      Eb, Claes & Elizabeth, Thank you all for responding. I made Claes corrections to Eb s clip but cannot get it to work at all. What am I doing wrong? (By the
      Message 2 of 30 , Jan 24, 2001
      • 0 Attachment
        Eb, Claes & Elizabeth,

        Thank you all for responding. I made Claes' corrections to Eb's clip
        but cannot get it to work at all. What am I doing wrong? (By the way,
        I might add that I still consider myself a newbie in the NoteTab Pro
        world!)

        :loop
        ; find (and select) the TD tag, that contains ","
        ^!Find <TD [^/]+>[^,]+,[^<]*</TD> RSI
        ^!IfError Done
        ; strip off the html
        ^!Set %name%=^$StrStripHTML("^$GetSelection$";0)$
        ; copy the name to (including) the comma
        ^!Set %Lname%=^$StrCopy(^%name%;1;^$StrPos(",";^%name%;1)$)$
        ; strip off the comma
        ^!Set %Lname%=^$StrDeleteRight(^%Lname%;1)$
        ^!Goto loop
        :Done


        I also tried Elizabeth's clip. It tried to work but didn't, probably
        because the number of lines between the names are not exactly the
        same.? If that's the only problem, I can probably fix that.

        ^!SetCursor 1:1 -- or whatever the first line is

        :Loop
        ^!Select Line
        ^!AppendToFile "index.txt" ^$GetSelection$
        ^!SetCursor ^$Calc(^$GetRow$+12)$:1 -- the plus factor should be
        however far it is to the next name
        ^!If ^$GetRow$=^$GetLineCount$ Skip
        ^!GoTo Loop


        Martha

        Gauffin Claes wrote:

        > Hi Eb,
        >
        > Pretty clip. There are two typos though:
        > ^!Set %name%=^$StrStripHTML("^$GetSelection";0)$
        > should be
        > ^!Set %name%=^$StrStripHTML("^$GetSelection$";0)$
        >
        > and
        > ^!Set %Lname%=^$StrCopy(^%name%;1;^$StrPos(",";^%name%;1$)$
        > should be
        > ^!Set %Lname%=^$StrCopy(^%name%;1;^$StrPos(",";^%name%;1)$)$
        >
        > Regards /Claes
        >
      • Martha Hambrick Harrell
        Jody, I tried this, too. It stripped the HTML tags but it left everything in a single column. I could take out several of them but not all, by using search
        Message 3 of 30 , Jan 24, 2001
        • 0 Attachment
          Jody,

          I tried this, too. It stripped the HTML tags but it left everything in
          a single column. I could take out several of them but not all, by using
          search and replace. This can't be what you mean because it took me more
          than a few seconds. Would you please be a little more specific about
          what I need to do?

          Thank you,
          Martha


          Jody wrote:

          > Hi Martha,
          >
          > >We are talking about hundreds of names here. I had done the last
          > >two indexes the way Harvey suggested. It takes soooo long. I
          > >was just hoping there was an easier way.
          >
          > My method would take a few seconds, I speak figuratively.
          >
          > Bye for now,
          > Jody Adair
          > Prov. 3:5-7; 4:23
          >
          > http://www.purewords.org/sojourner
          > http://www.purewords.org/kjb1611
          > http://www.notetab.net
        • Michael Edmonson
          Martha what about something like this? (this works I tried it) It extracts only the name field. Note that there is a space after the ^P in the first find
          Message 4 of 30 , Jan 25, 2001
          • 0 Attachment
            Martha

            what about something like this? (this works I tried it) It extracts only the
            name field.
            Note that there is a space after the "^P" in the first find statement I
            believe that is correct as I stripped the reply ">"'s out of another message

            Michael Edmonson

            ^!SetHintInfo Only Name Field (thanks to Jody for helping me on this, in
            another application)
            ^!SetScreenUpdate Off
            ^!ClearVariable %Addies%

            ^!ClearVariable %MyRowX%
            ^!ClearVariable %MyColX%

            ^!ClearVariable %MyRowY%
            ^!ClearVariable %MyColY%

            :Loop
            ^!Find "<TR ALIGN="left" BGCOLOR="#FFEFD5">^P <TD ALIGN="left"
            BGCOLOR="#FFEFD5">" S
            ^!IfError Output
            ^!Jump Select_End
            ^!set %MyRowX%=^$GetRow$; %MyColX%=^$GetCol$
            ^!Find "</TD>^P" S
            ^!Jump Select_Start
            ^!set %MyRowY%=^$GetRow$; %MyColY%=^$GetCol$

            ^!SetCursor ^%MyRowX%:^%MyColX%
            ^!SelectTo ^%MyRowY%:^%MyColY%

            ^!Set %Address%=^$StrTrim("^$GetSelection$")$
            ^!Append %Addies%=^%Address%^%nl%
            ^!Goto Loop

            :Output
            ;^!Info ^$StrSort("^%Addies%";0;1;1)$
            ^!Info ^%Addies%
            ;!Toolbar New Document
            ; ^$StrSort("^%Addies%";0;1;1)$
            ; ^!SetWordWrap False
            ; ^!Jump 1

            ----- Original Message -----
            From: "Martha Hambrick Harrell" <mehharrell@...>
            To: <ntb-clips@...>
            Sent: Monday, January 22, 2001 8:03 PM
            Subject: [Clip] Need to extract surnames


            > Can someone please tell me how to extract only the name from the
            > following sample so that I can make an index? I have several of these
            > to do. Thank you. Martha
            >
            > <FONT FACE="Courier New, Courier" SIZE="2">
            > <TR ALIGN="left" BGCOLOR="#FFEFD5">
            > <TD ALIGN="left" BGCOLOR="#FFEFD5">Parker, Victoria</TD>
            > <TD ALIGN="left" BGCOLOR="#FFEFD5">wife</TD>
            > <TD ALIGN="left" BGCOLOR="#FFEFD5">f</TD>
            > <TD ALIGN="left" BGCOLOR="#FFEFD5">mu</TD>
            > <TD ALIGN="left" BGCOLOR="#FFEFD5">21</TD>
            > <TD ALIGN="left" BGCOLOR="#FFEFD5">_</TD>
            > <TD ALIGN="left" BGCOLOR="#FFEFD5">TN</TD>
            > <TD ALIGN="left" BGCOLOR="#FFEFD5">KY</TD>
            > <TD ALIGN="left" BGCOLOR="#FFEFD5">IN</TD>
            > </TR>
          • Piotr Bienkowski
            Wonded why my message popped up with a few days delay... :(
            Message 5 of 30 , Feb 1, 2001
            • 0 Attachment
              Wonded why my message popped up with a few days' delay... :(

              On 25 Jan 2001, at 11:33, Piotr Bienkowski wrote:

              > On 18 Jan 2001, at 11:19, Grant wrote:
              >
              > > If you want to take a look posted a DOM way to do this on the
              > > Notetab html list. subj: Extracting table data with the DOM Have
              > > heard 'reg exp' losing favour because of the power of the DOM. It
              > > certainly is a lot more intuitive than writing a reg exp to do the
              > > same thing. So if you want to check it out have a look.
              > >
              > >
              > Hi,
              >
              > I take interest in both DOM and regexes. DOM can get you the contents
              > of a tag, but can it check if these contens match a particular
              > pattern?
              >
              > Piotr
              >
            • Gauffin Claes
              Hello Martha, ... First: Eb s clip is not quite complete. It uses a clever regular expression to catch the surnames from your data, but does not deal with what
              Message 6 of 30 , Feb 1, 2001
              • 0 Attachment
                Hello Martha,

                You wrote:

                > Thank you all for responding. I made Claes' corrections to Eb's clip
                > but cannot get it to work at all. What am I doing wrong?
                >
                > :loop
                > ; find (and select) the TD tag, that contains ","
                > ^!Find <TD [^/]+>[^,]+,[^<]*</TD> RSI
                > ^!IfError Done
                > ; strip off the html
                > ^!Set %name%=^$StrStripHTML("^$GetSelection$";0)$
                > ; copy the name to (including) the comma
                > ^!Set %Lname%=^$StrCopy(^%name%;1;^$StrPos(",";^%name%;1)$)$
                > ; strip off the comma
                > ^!Set %Lname%=^$StrDeleteRight(^%Lname%;1)$
                > ^!Goto loop
                > :Done
                >

                First:
                Eb's clip is not quite complete. It uses a clever regular expression to
                catch
                the surnames from your data, but does not deal with what you do with the
                extracted names. One possible completion of his clip could be this
                which places the extracted names in a new document:

                H="Extract surnames (version 1)"
                ^!ClearVariable %a%
                ^!Jump text_start
                :loop
                ; find (and select) the TD tag, that contains ","
                ^!Find <TD [^/]+>[^,]+,[^<]*</TD> RSI
                ^!IfError Done
                ; strip off the html
                ^!Set %name%=^$StrStripHTML("^$GetSelection$";0)$
                ; copy the name to (including) the comma
                ^!Set %Lname%=^$StrCopy(^%name%;1;^$StrPos(",";^%name%;1)$)$
                ; strip off the comma
                ^!Set %Lname%=^$StrDeleteRight(^%Lname%;1)$
                ^!append %a%=^%Lname%^p
                ^!Goto loop
                :Done
                ^!Toolbar New Document
                ^!Inserttext ^%a%


                Second:
                The regular expression approach suffers from being rather slow when
                executing. Therefore I would like to push a bit for the strip-html approach.

                Third:
                There seems to be some confusion on what it is you really want.
                Your mails say "surnames" but in a previous mail you wrote
                >...
                >When what I want is this:
                >
                >Reitz, Ed. G.
                >Reitz, Ida A.
                >Reitz, Edward W.
                >...
                which indicates that you want the full names.
                Eb's clip extracts the surnames.

                The following is a clip using html-strip (therefore quite fast) which
                will extract the full names.
                The result will be sorted. When sorting, you can choose whether
                you want duplicates removed or not.
                This is done by checking or unchecking
                View > Options > Tools > Sort Removes Duplicates
                If you do want just the surnames, this can be done too.

                H="Extract surnames (version 2)"
                ^!SetHintInfo Working...
                ^!SetScreenUpdate Off
                ^!SetWordWrap OFF
                ^!Jump TEXT_START
                ^!Select ALL
                ^!Keyboard SHIFT+CTRL+T
                ^!Replace "^t" >> "^pzzzzz" TWSAI
                ^!ToolBar Sort Ascending
                ^!Find "zzzzz" TIWS
                ^!Set %r%=^$getrow$
                ^!Jump text_end
                ^!SelectTo ^%r%:1
                ^!Keyboard DELETE

                Regards /Claes
              • Grant
                ... No it can t, but using reg exp to extract a tables first col surnames in an html doc is like using a chainsaw to cut butter. In comparison it took me about
                Message 7 of 30 , Feb 1, 2001
                • 0 Attachment
                  > > If you want to take a look posted a DOM way to do this on the Notetab
                  > > html list. subj: Extracting table data with the DOM Have heard 'reg
                  > > exp' losing favour because of the power of the DOM. It certainly is a
                  > > lot more intuitive than writing a reg exp to do the same thing. So if
                  > > you want to check it out have a look.

                  > I take interest in both DOM and regexes. DOM can get you the contents
                  > of a tag, but can it check if these contents match a particular
                  > pattern?

                  No it can't, but using reg exp to extract a tables first col surnames in an
                  html doc is like using a chainsaw to cut butter.
                  In comparison it took me about 5 minutes to write that dom script to extract
                  the tables first collum data because it's the right tool for this job.
                  The dom provides an easy way to navigate text marked up with html or xhtml
                  or xml while Reg expressions are good at finding patterns in the
                  unstructured text. They are not competing technologies but complementary.
                  Working with the dom I'm not pattern matching but working directly with the
                  documents structured objects.
                  the tables collection of rows and the first child of each row, to get the
                  first td column.
                  Having extracted the first col, if I want to find all the 'parkers' in that
                  extracted data then using reg ex is handy.
                • Jody
                  Hi Martha, ... It has been so long now I forget what it was and can t find it. I know it worked on whatever you sent in. At the present I do not have time for
                  Message 8 of 30 , Feb 1, 2001
                  • 0 Attachment
                    Hi Martha,

                    >I tried this, too. It stripped the HTML tags but it left
                    >everything in a single column. I could take out several of them
                    >but not all, by using search and replace. This can't be what you
                    >mean because it took me more than a few seconds. Would you
                    >please be a little more specific about what I need to do?

                    It has been so long now I forget what it was and can't find it.
                    I know it worked on whatever you sent in. At the present I do
                    not have time for it though. Maybe the others are not working
                    for you either because what you are sending in is not the same as
                    what you are running the Clip over.

                    I just saw you got it another way, so whatever works! :)

                    Happy Clip'n!
                    Jody

                    http://www.notetab.net

                    Subscribe, UnSubscribe, Options
                    mailto:Ntb-Clips-Subscribe@yahoogroups.com
                    mailto:Ntb-Clips-UnSubscribe@yahoogroups.com
                    http://www.egroups.com/group/ntb-clips
                  • Piotr Bienkowski
                    ... Righto! Chisels are not for fixing tractors. :) Piotr
                    Message 9 of 30 , Feb 3, 2001
                    • 0 Attachment
                      On 2 Feb 2001, at 10:22, Grant wrote:

                      > No it can't, but using reg exp to extract a tables first col surnames
                      > in an html doc is like using a chainsaw to cut butter. In comparison
                      > it took me about 5 minutes to write that dom script to extract the
                      > tables first collum data because it's the right tool for this job.

                      Righto! Chisels are not for fixing tractors. :)

                      Piotr
                    • Jody
                      Hi Piotr, ... It appears a few of them just got spit out. ... Happy Clip n! Jody http://www.notetab.net Subscribe, UnSubscribe, Options
                      Message 10 of 30 , Feb 5, 2001
                      • 0 Attachment
                        Hi Piotr,

                        >Wonded why my message popped up with a few days' delay... :(

                        It appears a few of them just got spit out.

                        > > If you want to take a look posted a DOM way to do this on the
                        > > Notetab html list. subj: Extracting table data with the DOM
                        > > Have heard 'reg exp' losing favour because of the power of the
                        > > DOM.

                        Happy Clip'n!
                        Jody

                        http://www.notetab.net

                        Subscribe, UnSubscribe, Options
                        mailto:Ntb-Clips-Subscribe@yahoogroups.com
                        mailto:Ntb-Clips-UnSubscribe@yahoogroups.com
                        http://www.egroups.com/group/ntb-clips
                      • Luuk.Houwen@t-online.de
                        I would like to count the number of times my program gies through a loop. I tried the following line within the loop, but it does not work. Any ideas about
                        Message 11 of 30 , Feb 5, 2001
                        • 0 Attachment
                          I would like to count the number of times my program gies through a loop. I
                          tried the following line within the loop, but it does not work. Any ideas
                          about improving it?

                          ^!Set %Counter%=^$Calc(x=x+1)$

                          Luuk
                        Your message has been successfully submitted and would be delivered to recipients shortly.