Loading ...
Sorry, an error occurred while loading the content.
 

massive search and replace

Expand Messages
  • Britton, Stephen
    I was just given the task of converting 10,700 html documents. What I need to do is strip out certain html tags from then so the pages can be slapped into our
    Message 1 of 5 , Feb 8, 2001
      I was just given the task of converting 10,700 html documents. What I need
      to
      do is strip out certain html tags from then so the pages can be slapped into
      our
      template using a database.

      I know I can do much of the search and replace using the extended batch
      search
      and replace utility in HomeSite 4.5 (my second favorite HTML editor after
      NTB, but
      I would use NTB if I could figure out how to get it to do batch search and
      replaces on
      several thousand files at one time ;-).

      HomeSite is fine for replacing single tags like <html> and <head>, but I
      need to
      remove the tags, plus the content between them like <title>Welcome</title>,
      and in
      every case the content in the title tags are different, so I would need a
      program that
      would be able to strip the type between the tags.

      I am open to all ideas and suggestions. I am sure there is a way to do it.

      Thanks,

      Steve

      Stephen Britton
      Internet Developer
      PersonalPath Systems Inc.
      10 Mountainview Road
      Upper Saddle River, NJ 07458
      ph: 201-512-7074
      brittons@... <mailto:brittons@...>




      [Non-text portions of this message have been removed]
    • Jody
      Hi Stephen, ... Maybe something like this will help you get started. I just did some copy/paste from a couple Clips and edited a thing here and there. You
      Message 2 of 5 , Feb 8, 2001
        Hi Stephen,

        >I was just given the task of converting 10,700 html documents.
        >What I need to do is strip out certain html tags from then so the
        >pages can be slapped into our template using a database.
        >
        > I need to remove the tags, plus the content between them like
        > <title>Welcome</title>, and in every case the content in the
        > title tags are different, so I would need a program that would
        > be able to strip the type between the tags.

        Maybe something like this will help you get started. I just
        did some copy/paste from a couple Clips and edited a thing
        here and there. You will need to edit a few things depending
        on exactly what you want. There is a couple of long lines.

        <--- Copy below this line --->
        H=Edit Title Tags
        ; Does folder and sub-folders, prompted to edit title tag, option
        ; to overwrite or save in new folder, the latter is preferred. ;)
        ; Last Updated 02/08/2001, Jody@...
        ; http://www.notetab.net

        ^!ClearVariables
        ; long line
        ^!Set %Dir%=^?{(T=D)&Edit what directory=E:\Test\}; %SaveTo%=^?{(T=D)&Save in what folder (Files will be overwritten if in same folder)=E:\Test\Converted\}; %Subs%=^?{&Include Sub-directories (v4.83 only)=Yes^=+|_No^=}; %Types%=^?{File &type (wildcards OK)=*.htm*}; %Attr%=^?{&Attributes, A: Archive, H: Hidden, R: Read-only, S: System=_All^=|AHRS}

        ^!SetHintInfo Getting Directory Information...
        ^!SetArray %FileList%=^$GetFiles("^%Subs%^%Dir%";^%Types%;^%Attr%)$
        ^!Set %Count%=^%FileList0%
        ^!Set %Index%=0; %FileCount%=0

        :Loop
        ^!Inc %Index%
        ^!If ^%Index% > ^%Count% Info
        ^!Open "^%FileList^%Index%%"
        ^!Delay 1
        ^!Find "<title>" SI
        ^!IfError NoTitle
        ^!Jump Select_End
        ^!Set %Start%=^$GetRow$:^$GetCol$
        ^!Find "</title>" SI
        ^!IfError NoTitle
        ^!SelectTo ^%Start%
        ^!InsertText ^?{New data=^$GetSelection$}
        ^!Delay 1
        ^!TextToFile "^%SaveTo%^$GetFileName(^%FileList^%Index%%)$" ^$GetText$
        ^!Inc %FileCount%
        ^!Close Discard
        ^!Goto Loop

        :NoTitle
        ^!Append %NoTitle%=^$GetFileName(^%FileList^%Index%%)$^%nl%
        ^!Goto Loop

        :Info
        ; long line
        ^!Info [L]^%FileCount% files were opened to edit. The following files, if any did not have the title tag in it or it was in an unexpected format and were left open in NoteTab - how 'bout it HomeSite. ;)^p^p^%NoTitle%

        <--- Copy above this line, right --->
        <--- click over a Library, and --->
        <--- choose "Add from Clipboard" --->

        Happy HTML'n!
        Jody

        http://www.notetab.net

        The NoteTab and Html List...
        mailto:Ntb-html-Subscribe@yahoogroups.com
        mailto:Ntb-html-UnSubscribe@yahoogroups.com
      • Banter (Stephen Boyle)
        ... I would use RegExp, thus Replace {.*} with nothing. So, just tick the Regular Exp box in the S&R dialog. I am not an expert on RegExp,
        Message 3 of 5 , Feb 8, 2001
          "Britton, Stephen" <brittons@...> wrote:

          >HomeSite is fine for replacing single tags like <html>
          >and <head>, but I need to remove the tags, plus the
          >content between them like <title>Welcome</title>, and
          >in every case the content in the title tags are
          >different, so I would need a program that would be
          >able to strip the type between the tags.
          >
          >I am open to all ideas and suggestions. I am sure there is a way to do it.

          I would use RegExp, thus
          Replace

          \<title\>{.*}\<\/title\>

          with nothing.


          So, just tick the Regular Exp box in the S&R dialog.

          I am not an expert on RegExp, but that stripping of stuff between tags
          of the title element looks simply enough;

          Well as long as the element is not broken by a line break, as then I
          would have to stumble to the help file for more ideas.

          --
          Stephen
        • Britton, Stephen
          Thanks Jody! This is a good start. I ll try to play with it over the weekend. - Steve ... From: Jody [mailto:av1611@earthlink.net] Sent: Thursday, February 08,
          Message 4 of 5 , Feb 9, 2001
            Thanks Jody!

            This is a good start. I'll try to play with it over
            the weekend.

            - Steve

            -----Original Message-----
            From: Jody [mailto:av1611@...]
            Sent: Thursday, February 08, 2001 8:44 PM
            To: ntb-html@yahoogroups.com
            Subject: Re: [NH] massive search and replace


            Hi Stephen,

            >I was just given the task of converting 10,700 html documents.
            >What I need to do is strip out certain html tags from then so the
            >pages can be slapped into our template using a database.
            >
            > I need to remove the tags, plus the content between them like
            > <title>Welcome</title>, and in every case the content in the
            > title tags are different, so I would need a program that would
            > be able to strip the type between the tags.

            Maybe something like this will help you get started. I just
            did some copy/paste from a couple Clips and edited a thing
            here and there. You will need to edit a few things depending
            on exactly what you want. There is a couple of long lines.

            <--- Copy below this line --->
            H=Edit Title Tags
            ; Does folder and sub-folders, prompted to edit title tag, option
            ; to overwrite or save in new folder, the latter is preferred. ;)
            ; Last Updated 02/08/2001, Jody@...
            ; http://www.notetab.net

            ^!ClearVariables
            ; long line
            ^!Set %Dir%=^?{(T=D)&Edit what directory=E:\Test\}; %SaveTo%=^?{(T=D)&Save
            in what folder (Files will be overwritten if in same
            folder)=E:\Test\Converted\}; %Subs%=^?{&Include Sub-directories (v4.83
            only)=Yes^=+|_No^=}; %Types%=^?{File &type (wildcards OK)=*.htm*};
            %Attr%=^?{&Attributes, A: Archive, H: Hidden, R: Read-only, S:
            System=_All^=|AHRS}

            ^!SetHintInfo Getting Directory Information...
            ^!SetArray %FileList%=^$GetFiles("^%Subs%^%Dir%";^%Types%;^%Attr%)$
            ^!Set %Count%=^%FileList0%
            ^!Set %Index%=0; %FileCount%=0

            :Loop
            ^!Inc %Index%
            ^!If ^%Index% > ^%Count% Info
            ^!Open "^%FileList^%Index%%"
            ^!Delay 1
            ^!Find "<title>" SI
            ^!IfError NoTitle
            ^!Jump Select_End
            ^!Set %Start%=^$GetRow$:^$GetCol$
            ^!Find "</title>" SI
            ^!IfError NoTitle
            ^!SelectTo ^%Start%
            ^!InsertText ^?{New data=^$GetSelection$}
            ^!Delay 1
            ^!TextToFile "^%SaveTo%^$GetFileName(^%FileList^%Index%%)$" ^$GetText$
            ^!Inc %FileCount%
            ^!Close Discard
            ^!Goto Loop

            :NoTitle
            ^!Append %NoTitle%=^$GetFileName(^%FileList^%Index%%)$^%nl%
            ^!Goto Loop

            :Info
            ; long line
            ^!Info [L]^%FileCount% files were opened to edit. The following files, if
            any did not have the title tag in it or it was in an unexpected format and
            were left open in NoteTab - how 'bout it HomeSite. ;)^p^p^%NoTitle%

            <--- Copy above this line, right --->
            <--- click over a Library, and --->
            <--- choose "Add from Clipboard" --->

            Happy HTML'n!
            Jody

            http://www.notetab.net

            The NoteTab and Html List...
            mailto:Ntb-html-Subscribe@yahoogroups.com
            mailto:Ntb-html-UnSubscribe@yahoogroups.com
          • Jody
            Hi Stephen, ... You re welcome!-) You can replace the little loop I made with Stephen B. s RegExp, but I found when doing massive S&R you are better of
            Message 5 of 5 , Feb 9, 2001
              Hi Stephen,

              >This is a good start. I'll try to play with it over the weekend.

              You're welcome!-) You can replace the little loop I made with
              Stephen B.'s RegExp, but I found when doing massive S&R you are
              better of staying away from RegExp due to resources - at least
              that is the way it is on my machine.

              Happy HTML'n!
              Jody

              http://www.notetab.net

              The NoteTab and Html List...
              mailto:Ntb-html-Subscribe@yahoogroups.com
              mailto:Ntb-html-UnSubscribe@yahoogroups.com
            Your message has been successfully submitted and would be delivered to recipients shortly.