Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] fixup off http:// references to file:/// references

Expand Messages
  • Vance E. Neff
    Thanks Don, I knew that was what was basiccally required, I was just hoping someone had generated a clip to automate it. Anyhow, I replaced all documents with
    Message 1 of 8 , Oct 3, 2008
    • 0 Attachment
      Thanks Don,

      I knew that was what was basiccally required, I was just hoping someone
      had generated a clip to automate it.

      Anyhow,
      I replaced all documents with all occurrances of

      http://www.website.com/dir1

      with

      file:///C:\website_copy\dir1

      Then I repeatedly replaced all occurrances of
      "file\:///C\:\\website_copy\\dir1(.+)/(.+)"
      with
      "file\:///C\:\\website_copy\\dir1

      The problem is that if there are no / in the reference,
      but there is another " on the same line with a / included
      it finds a match.
      How do I limit the match to the first " after the initial
      "file\:///C\:\\website_copy\\dir1
      match?

      Vance

      Don - HtmlFixIt.com wrote:

      >Just us search and replace with all documents open (make a back up
      >first). Better way is to use relative links in the first place.
      >
      >DP
      >
      >veneffy wrote:
      >
      >
      >>Does anyone have a clip that will convert web links to corresponding
      >>local disk links:
      >>
      >>such as from:
      >>"http//www.website.com/dir1/dir2/..."
      >>to:
      >>"file:///C:\website_copy\dir1\dir2\..."
      >>
      >>that takes care of converting any subsequent / to \
      >>and adding an \index.html to those references that do not specify a
      >>target file name (no extension at the end of the reference).
      >>
      >>Thanks for any info!
      >>Vance
      >>
      >>
      >>
      >
      >------------------------------------
      >
      >Fookes Software: http://www.fookes.com/
      >NoteTab website: http://www.notetab.com/
      >NoteTab Discussion Lists: http://www.notetab.com/groups.php
      >
      >***
      >Yahoo! Groups Links
      >
      >
      >
      >
      >
      >
      >
    • Vance E. Neff
      Just a correction: Then I repeatedly replaced all occurrances of file :///C : website_copy dir1(.+)/(.+) with file :///C : website_copy dir1$1 $2
      Message 2 of 8 , Oct 3, 2008
      • 0 Attachment
        Just a correction:

        Then I repeatedly replaced all occurrances of
        "file\:///C\:\\website_copy\\dir1(.+)/(.+)"
        with
        "file\:///C\:\\website_copy\\dir1$1\\$2"

        Vance



        Vance E. Neff wrote:

        >Thanks Don,
        >
        >I knew that was what was basically required, I was just hoping someone
        >had generated a clip to automate it.
        >
        >Anyhow,
        >I replaced all documents with all occurrances of
        >
        >http://www.website.com/dir1
        >
        >with
        >
        >file:///C:\website_copy\dir1
        >
        >Then I repeatedly replaced all occurrances of
        >"file\:///C\:\\website_copy\\dir1(.+)/(.+)"
        >with
        >"file\:///C\:\\website_copy\\dir1
        >
        >The problem is that if there are no / in the reference,
        > but there is another " on the same line with a / included
        >it finds a match.
        >How do I limit the match to the first " after the initial
        >"file\:///C\:\\website_copy\\dir1
        >match?
        >
        >Vance
        >
        >Don - HtmlFixIt.com wrote:
        >
        >
        >
        >>Just us search and replace with all documents open (make a back up
        >>first). Better way is to use relative links in the first place.
        >>
        >>DP
        >>
        >>veneffy wrote:
        >>
        >>
        >>
        >>
        >>>Does anyone have a clip that will convert web links to corresponding
        >>>local disk links:
        >>>
        >>>such as from:
        >>>"http//www.website.com/dir1/dir2/..."
        >>>to:
        >>>"file:///C:\website_copy\dir1\dir2\..."
        >>>
        >>>that takes care of converting any subsequent / to \
        >>>and adding an \index.html to those references that do not specify a
        >>>target file name (no extension at the end of the reference).
        >>>
        >>>Thanks for any info!
        >>>Vance
        >>>
        >>>
        >>>
        >>>
        >>>
        >>------------------------------------
        >>
        >>Fookes Software: http://www.fookes.com/
        >>NoteTab website: http://www.notetab.com/
        >>NoteTab Discussion Lists: http://www.notetab.com/groups.php
        >>
        >>***
        >>Yahoo! Groups Links
        >>
        >>
        >>
        >>
        >>
        >>
        >>
        >>
        >>
        >
        >
        >------------------------------------
        >
        >Fookes Software: http://www.fookes.com/
        >NoteTab website: http://www.notetab.com/
        >NoteTab Discussion Lists: http://www.notetab.com/groups.php
        >
        >***
        >Yahoo! Groups Links
        >
        >
        >
        >
        >
        >
        >
      • Don - HtmlFixIt.com
        ... Hi Vance, You are looking to make it non-greedy. Your magic character will be a ? after the .+ I think. Don Here are some slides from the wonderful help
        Message 3 of 8 , Oct 3, 2008
        • 0 Attachment
          Vance E. Neff wrote:
          > Thanks Don,
          >
          > I knew that was what was basiccally required, I was just hoping someone
          > had generated a clip to automate it.
          >
          > Anyhow,
          > I replaced all documents with all occurrances of
          >
          > http://www.website.com/dir1
          >
          > with
          >
          > file:///C:\website_copy\dir1
          >
          > Then I repeatedly replaced all occurrances of
          > "file\:///C\:\\website_copy\\dir1(.+)/(.+)"
          > with
          > "file\:///C\:\\website_copy\\dir1
          >
          > The problem is that if there are no / in the reference,
          > but there is another " on the same line with a / included
          > it finds a match.
          > How do I limit the match to the first " after the initial
          > "file\:///C\:\\website_copy\\dir1
          > match?
          >
          > Vance

          Hi Vance,

          You are looking to make it non-greedy. Your magic character will be a ?
          after the .+ I think.

          Don

          Here are some slides from the wonderful help file that Sheri sent out a
          long time ago:
          greediness
          Quantifiers try to grab as much as
          possible by default
          Applying <.+> to <i>greediness</i>
          matches the whole string rather than
          just <i>

          greediness
          If the entire match fails because they
          grabbed too much, then they are forced
          to give up as much as needed to make
          the rest of regex succeed

          greediness
          To find words ending in ness, you will
          probably use \w+ness
          On the first run \w+ takes the whole
          word
          But since ness still has to match, it gives
          up the last 4 characters and the match
          succeeds

          overcoming greediness
          The simplest solution is to
          make the repetition operators
          non-greedy, or lazy
          Lazy quantifiers grab as little
          as possible
          If the overall match fails, they
          grab a little more and the
          match is tried again

          overcoming greediness
          To make a greedy quantifier
          lazy, append ?
          Note that this use of the
          question mark is different from
          its use as a regular quantifier
          *?
          +?
          { , }?
          ??

          overcoming greediness
          *?
          +?
          { , }?
          ??
          Applying <.+?>
          to <i>greediness</i>
          gets us <i>
          <i>

          overcoming greediness
          Another option is to use
          negated character classes
          More efficient and clearer than
          lazy repetition

          overcoming greediness
          <.+?> can be turned into <[^>]+>
          Note that the second version
          will match tags spanning
          multiple lines
          Single-line version: <[^>\r\n]+>
        • Don - HtmlFixIt.com
          ... Vance, you just dump those two things into a clip and you are off to the races. So if you plan to do something like this again ... but the issue is that
          Message 4 of 8 , Oct 3, 2008
          • 0 Attachment
            Vance E. Neff wrote:
            > Just a correction:
            >
            > Then I repeatedly replaced all occurrances of
            > "file\:///C\:\\website_copy\\dir1(.+)/(.+)"
            > with
            > "file\:///C\:\\website_copy\\dir1$1\\$2"
            >
            > Vance
            >
            Vance, you just dump those two things into a clip and you are off to the
            races. So if you plan to do something like this again ...

            but the issue is that next time your regex might not be quite the same

            And why use a regex at all? You know the exact of what you are replacing.
          • Sheri
            ... I think something got lost there, you re not actually searching for double quotes are you? In any case, I think the problem you re experiencing is because
            Message 5 of 8 , Oct 3, 2008
            • 0 Attachment
              --- In ntb-clips@yahoogroups.com, "Vance E. Neff" <veneff@...> wrote:
              >
              > Just a correction:
              >
              > Then I repeatedly replaced all occurrances of
              > "file\:///C\:\\website_copy\\dir1(.+)/(.+)"
              > with
              > "file\:///C\:\\website_copy\\dir1$1\\$2"
              >
              > Vance

              > >The problem is that if there are no / in the reference,
              > > but there is another " on the same line with a / included
              > >it finds a match.
              > >How do I limit the match to the first " after the initial
              > >"file\:///C\:\\website_copy\\dir1
              > >match?

              I think something got lost there, you're not actually searching for
              double quotes are you?

              In any case, I think the problem you're experiencing is because you
              are searching for dot plus, which can match any multiple characters at
              all (except by default line break characters). (.+)/(.+) will match
              every thing on a line up to the last / that has other characters after
              it, plus those other characters. If you excluded spaces from what can
              be matched, it would probably help alot (since links can't have spaces
              in them). But when you make a negative character class, line break
              characters are not excluded by default, so you would want to exclude
              them too.

              Instead of dot plus, try [^\x20\r\n]+

              It will match up to the last / in the link instead of the last / on
              the line

              file\:///C\:\\website_copy\\dir1[^\x20\r\n]+/[^\x20\r\n]+

              Matching to the last one should be fine since you are repeatedly
              executing the replace until there are no more matches. You'd still
              have to do that in order to get all the /'s in the links (if there
              could be more than one) if you were using an ungreedy search. You can
              make it ungreedy (if you want to try it) by putting a question mark
              after those plus signs.

              Note \x20 is a space in hex. More obvious than empty space, but an
              empty space would also work fine in the character class.

              Regards,
              Sheri
            • Vance E. Neff
              Sheri, Thanks for the response. Unfortunately the file reference can and does have spaces in some of the directory names. I had simplified the actual leading
              Message 6 of 8 , Oct 3, 2008
              • 0 Attachment
                Sheri,

                Thanks for the response.
                Unfortunately the file reference can and does have spaces in some of the
                directory names. I had simplified the actual leading destination
                directory string with the website_copy term.
                But your approach gave me a good clue. I instead used
                [^"\r\n]+

                Thanks alot!
                Vance

                Sheri wrote:

                >--- In ntb-clips@yahoogroups.com, "Vance E. Neff" <veneff@...> wrote:
                >
                >
                >>Just a correction:
                >>
                >>Then I repeatedly replaced all occurrances of
                >>"file\:///C\:\\website_copy\\dir1(.+)/(.+)"
                >>with
                >>"file\:///C\:\\website_copy\\dir1$1\\$2"
                >>
                >>Vance
                >>
                >>
                >
                >
                >
                >>>The problem is that if there are no / in the reference,
                >>>but there is another " on the same line with a / included
                >>>it finds a match.
                >>>How do I limit the match to the first " after the initial
                >>>"file\:///C\:\\website_copy\\dir1
                >>>match?
                >>>
                >>>
                >
                >I think something got lost there, you're not actually searching for
                >double quotes are you?
                >
                >In any case, I think the problem you're experiencing is because you
                >are searching for dot plus, which can match any multiple characters at
                >all (except by default line break characters). (.+)/(.+) will match
                >every thing on a line up to the last / that has other characters after
                >it, plus those other characters. If you excluded spaces from what can
                >be matched, it would probably help alot (since links can't have spaces
                >in them). But when you make a negative character class, line break
                >characters are not excluded by default, so you would want to exclude
                >them too.
                >
                >Instead of dot plus, try [^\x20\r\n]+
                >
                >It will match up to the last / in the link instead of the last / on
                >the line
                >
                >file\:///C\:\\website_copy\\dir1[^\x20\r\n]+/[^\x20\r\n]+
                >
                >Matching to the last one should be fine since you are repeatedly
                >executing the replace until there are no more matches. You'd still
                >have to do that in order to get all the /'s in the links (if there
                >could be more than one) if you were using an ungreedy search. You can
                >make it ungreedy (if you want to try it) by putting a question mark
                >after those plus signs.
                >
                >Note \x20 is a space in hex. More obvious than empty space, but an
                >empty space would also work fine in the character class.
                >
                >Regards,
                >Sheri
                >
                >
                >
                >------------------------------------
                >
                >Fookes Software: http://www.fookes.com/
                >NoteTab website: http://www.notetab.com/
                >NoteTab Discussion Lists: http://www.notetab.com/groups.php
                >
                >***
                >Yahoo! Groups Links
                >
                >
                >
                >
                >
                >
                >
              Your message has been successfully submitted and would be delivered to recipients shortly.