Loading ...
Sorry, an error occurred while loading the content.

24079RE: [Clip] Capitalization without a loop

Expand Messages
  • John Shotsky
    Oct 17, 2013
    • 0 Attachment

      My approach was similar, only I didn't use any extra characters. I just lower cased each letter after the first letter of each word (regardless of case) \p{L}, then upper cased the first letter of each word, with a similar, letter by letter approach. Then I used a list of words that are not to be capitalized (using a farclip) for a 3rd pass through the titles, all with replaces. I have screen update off, and the whole thing runs in just 30 seconds for the 13,500 upper case titles.

       

      Even though this works and is almost 50 times faster than the original loop method, I am considering Sheri's approach, since it represents a learning opportunity, and would be even faster.

       

      Regards,
      John
      RecipeTools Web Site: http://recipetools.gotdns.com/
      John's Mags Yahoo Group:  http://groups.yahoo.com/group/johnsmags/

       

      From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Ian NTnerd
      Sent: Thursday, October 17, 2013 00:56
      To: ntb-clips@yahoogroups.com
      Subject: Re: [Clip] Capitalization without a loop

       

       

      This seems like a RegEx problem.
      I could fine capitalised words in the Title with with this RE: ( \p{Lu})(\p{Lu}+)(?<!^Title\:\:)
      But there seems to be no way to output $1$2 with $2 as lower case. Though in other versions of PCRE they support \U \L that would seem to do it. That is a feature request.

      On dummy data of 9722 lines it took less than one second using the above RE above to replace in 3402 Titles with unchanged data. A selective loop took 74 sec to execute with the correct result.

      But this took under a second also.

      ^!set %start%=^$GetDate(hh:mm:ss)$
      ^!Replace "(\p{Lu})(\p{Lu})(?<!^Title\:\:)" >> "$1##$2" RAWS
      ^!Replace "(\p{Lu})(\p{Lu})(?<!^Title\:\:)" >> "$1##$2" RAWS
      ^!Replace "##A" >> "a" ATWS
      ^!replace "##B" >> "b" ATWS
      ^!replace "##C" >> "c" ATWS
      ^!replace "##D" >> "d" ATWS
      ^!replace "##E" >> "e" ATWS
      ^!replace "##F" >> "f" ATWS
      ^!replace "##G" >> "g" ATWS
      ^!replace "##H" >> "h" ATWS
      ^!replace "##I" >> "i" ATWS
      ^!replace "##J" >> "j" ATWS
      ^!replace "##K" >> "k" ATWS
      ^!replace "##L" >> "l" ATWS
      ^!replace "##M" >> "m" ATWS
      ^!replace "##N" >> "n" ATWS
      ^!replace "##O" >> "o" ATWS
      ^!replace "##P" >> "p" ATWS
      ^!replace "##Q" >> "q" ATWS
      ^!replace "##R" >> "r" ATWS
      ^!replace "##S" >> "s" ATWS
      ^!replace "##T" >> "t" ATWS
      ^!replace "##U" >> "u" ATWS
      ^!replace "##V" >> "v" ATWS
      ^!replace "##W" >> "w" ATWS
      ^!replace "##X" >> "x" ATWS
      ^!replace "##Y" >> "y" ATWS
      ^!replace "##Z" >> "z" ATWS
      ^!info Started ^%start% - Ended ^$GetDate(hh:mm:ss)$

      The RegEx need to be run twice to catch all the occurrences. But the rest only once. Though it needs to be extended for the accented characters.
      The first line turns:
      Title::THE BIG EASY CRAWFISH OMELET
      into:
      Title::T##HE B##IG E##AS##Y C##RA##WF##IS##H O##ME##LE##T
      the next RE creates:
      Title::T##H##E B##I##G E##A##S##Y C##R##A##W##F##I##S##H O##M##E##L##E##T

      Some one on this list used the tagging with ## type thing some where else. I don' remember who to credit you?

      The tagging can be something else if you use ##A elsewhere.

      The rest is straight forward.

      Ian

      On 17/10/2013 12:02 AM, John Shotsky wrote:

      Yes, the titles are randomly located at the tops of the recipes. 
      So far, using replace statements only, and no loops (except 'IfError'), I have gotten all 13,500 titles from all upper case to
      capitalized on the first character of each word in 30 seconds flat. That is close to 40 times faster than the loops were, so it's
      looking good already. It's a large clip, because every letter and every high-order letter (accented, etc) must be handled
      individually, twice, so it's a little over 200 lines counting the IfError lines between each clip. The average file will have under
      100 titles, so that should now be an insignificant amount of time. The nice thing about the replaces is that they don't activate
      unless the action is needed. The loops process each line, whether needed or not.
        
      Regards,
      John
      RecipeTools Web Site: http://recipetools.gotdns.com/
      John's Mags Yahoo Group:  http://groups.yahoo.com/group/johnsmags/
        
        
      -----Original Message-----
      From: ntb-clips@yahoogroups.com [mailto:ntb-clips@yahoogroups.com] On Behalf Of Ian NTnerd
      Sent: Wednesday, October 16, 2013 08:31
      To: ntb-clips@yahoogroups.com
      Subject: Re: [Clip] Capitalization without a loop
        
      John,
        
      So in your data is every line a "Title"? If not then your data sample 
      does not help me test a solution.
        
      With your data sample, you just select the whole document and Capitalize 
      that. I doubt that is what you mean.
        
      I am thinking that between each title is a recipe or some other data.
        
      Title::THE BIG EASY CRAWFISH OMELET
      Ingredients:
      etc
      Title::THE CLASSIC HOT BROWN
        
        
        
      Ian
        
      On 16/10/2013 8:48 PM, John Shotsky wrote:
      RecipeClips work files have sections with titles (recipes). Each title is preceded by a tag of Title::
      I want to capitalize each word in each title using the Toolbar Capitalize command rather than a loop, which is how I currently do
      it.
      In order for the toolbar command to function, the text in the title needs to be selected.
      So, I would like to select only the titles, all at once, use the toolbar command and be done with it.
      Currently, each title is selected separately, then capitalized using
      ^!InsertText ^$StrCapitalize("^$GetSelection$")$
      It is slightly faster to use the InsertText command in a loop than the Toolbar Capitalize command, but that would still leave the
      whole process in a loop.
      I've been looking at ^$GetDocReplaceAll, but I can't see how to form a command that would do what I want.
        
      For testing, I have a list of about 13,000 tagged titles.
      Using a loop with the above InsertText command title cases 7200 of them in 10 minutes.
      Using the same loop with a Toolbar Capitalize command title cases 5100 of them in 10 minutes.
      Selecting them all, and using the Toolbar Capitalize command without a loop takes about 1 second.
      Clearly, if I can determine how to select them all at once, a single toolbar command would be sufficient.
      Any ideas?
        
      Here are some sample titles:
      Title::THE BIG EASY CRAWFISH OMELET
      Title::THE CLASSIC HOT BROWN
      Title::THE GORE CREEK BAGEL
      Title::THE GUADALAJARA
      Title::THE INVISIBLE SALMON WRAPPED IN GHOSTLY RICE PAPER LYING IN A TOMATO BLOOD PUDDLE ON BLACK BEAN PAVEMENT BY A BROCCOLI TREE
      Title::THE PERFECT CHEESECAKE
      Title::THE SANTA FE WRAP
      Regards,
      John
        
        
        
        
        
      ------------------------------------
        
      Fookes Software: http://www.fookes.com/
      NoteTab website: http://www.notetab.com/
      NoteTab Discussion Lists: http://www.notetab.com/groups.php
        
      ***
      Yahoo! Groups Links
        
        
        
        
        
        
        
      ------------------------------------
        
      Fookes Software: http://www.fookes.com/
      NoteTab website: http://www.notetab.com/
      NoteTab Discussion Lists: http://www.notetab.com/groups.php
        
      ***
      Yahoo! Groups Links
        
        
        
        
        
      ------------------------------------
        
      Fookes Software: http://www.fookes.com/
      NoteTab website: http://www.notetab.com/
      NoteTab Discussion Lists: http://www.notetab.com/groups.php
        
      ***
      Yahoo! Groups Links
        
      <*> To visit your group on the web, go to:
          http://groups.yahoo.com/group/ntb-clips/
        
      <*> Your email settings:
          Individual Email | Traditional
        
      <*> To change settings online go to:
          http://groups.yahoo.com/group/ntb-clips/join
          (Yahoo! ID required)
        
      <*> To change settings via email:
          ntb-clips-digest@yahoogroups.com 
          ntb-clips-fullfeatured@yahoogroups.com
        
      <*> To unsubscribe from this group, send an email to:
          ntb-clips-unsubscribe@yahoogroups.com
        
      <*> Your use of Yahoo! Groups is subject to:
          http://info.yahoo.com/legal/us/yahoo/utos/terms/
        
        

       

    • Show all 12 messages in this topic