Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] fixup off http:// references to file:/// references

Expand Messages
  • Don - HtmlFixIt.com
    Just us search and replace with all documents open (make a back up first). Better way is to use relative links in the first place. DP
    Message 1 of 8 , Oct 2, 2008
    View Source
    • 0 Attachment
      Just us search and replace with all documents open (make a back up
      first). Better way is to use relative links in the first place.

      DP

      veneffy wrote:
      > Does anyone have a clip that will convert web links to corresponding
      > local disk links:
      >
      > such as from:
      > "http//www.website.com/dir1/dir2/..."
      > to:
      > "file:///C:\website_copy\dir1\dir2\..."
      >
      > that takes care of converting any subsequent / to \
      > and adding an \index.html to those references that do not specify a
      > target file name (no extension at the end of the reference).
      >
      > Thanks for any info!
      > Vance
      >
    • Vance E. Neff
      Thanks Don, I knew that was what was basiccally required, I was just hoping someone had generated a clip to automate it. Anyhow, I replaced all documents with
      Message 2 of 8 , Oct 3, 2008
      View Source
      • 0 Attachment
        Thanks Don,

        I knew that was what was basiccally required, I was just hoping someone
        had generated a clip to automate it.

        Anyhow,
        I replaced all documents with all occurrances of

        http://www.website.com/dir1

        with

        file:///C:\website_copy\dir1

        Then I repeatedly replaced all occurrances of
        "file\:///C\:\\website_copy\\dir1(.+)/(.+)"
        with
        "file\:///C\:\\website_copy\\dir1

        The problem is that if there are no / in the reference,
        but there is another " on the same line with a / included
        it finds a match.
        How do I limit the match to the first " after the initial
        "file\:///C\:\\website_copy\\dir1
        match?

        Vance

        Don - HtmlFixIt.com wrote:

        >Just us search and replace with all documents open (make a back up
        >first). Better way is to use relative links in the first place.
        >
        >DP
        >
        >veneffy wrote:
        >
        >
        >>Does anyone have a clip that will convert web links to corresponding
        >>local disk links:
        >>
        >>such as from:
        >>"http//www.website.com/dir1/dir2/..."
        >>to:
        >>"file:///C:\website_copy\dir1\dir2\..."
        >>
        >>that takes care of converting any subsequent / to \
        >>and adding an \index.html to those references that do not specify a
        >>target file name (no extension at the end of the reference).
        >>
        >>Thanks for any info!
        >>Vance
        >>
        >>
        >>
        >
        >------------------------------------
        >
        >Fookes Software: http://www.fookes.com/
        >NoteTab website: http://www.notetab.com/
        >NoteTab Discussion Lists: http://www.notetab.com/groups.php
        >
        >***
        >Yahoo! Groups Links
        >
        >
        >
        >
        >
        >
        >
      • Vance E. Neff
        Just a correction: Then I repeatedly replaced all occurrances of file :///C : website_copy dir1(.+)/(.+) with file :///C : website_copy dir1$1 $2
        Message 3 of 8 , Oct 3, 2008
        View Source
        • 0 Attachment
          Just a correction:

          Then I repeatedly replaced all occurrances of
          "file\:///C\:\\website_copy\\dir1(.+)/(.+)"
          with
          "file\:///C\:\\website_copy\\dir1$1\\$2"

          Vance



          Vance E. Neff wrote:

          >Thanks Don,
          >
          >I knew that was what was basically required, I was just hoping someone
          >had generated a clip to automate it.
          >
          >Anyhow,
          >I replaced all documents with all occurrances of
          >
          >http://www.website.com/dir1
          >
          >with
          >
          >file:///C:\website_copy\dir1
          >
          >Then I repeatedly replaced all occurrances of
          >"file\:///C\:\\website_copy\\dir1(.+)/(.+)"
          >with
          >"file\:///C\:\\website_copy\\dir1
          >
          >The problem is that if there are no / in the reference,
          > but there is another " on the same line with a / included
          >it finds a match.
          >How do I limit the match to the first " after the initial
          >"file\:///C\:\\website_copy\\dir1
          >match?
          >
          >Vance
          >
          >Don - HtmlFixIt.com wrote:
          >
          >
          >
          >>Just us search and replace with all documents open (make a back up
          >>first). Better way is to use relative links in the first place.
          >>
          >>DP
          >>
          >>veneffy wrote:
          >>
          >>
          >>
          >>
          >>>Does anyone have a clip that will convert web links to corresponding
          >>>local disk links:
          >>>
          >>>such as from:
          >>>"http//www.website.com/dir1/dir2/..."
          >>>to:
          >>>"file:///C:\website_copy\dir1\dir2\..."
          >>>
          >>>that takes care of converting any subsequent / to \
          >>>and adding an \index.html to those references that do not specify a
          >>>target file name (no extension at the end of the reference).
          >>>
          >>>Thanks for any info!
          >>>Vance
          >>>
          >>>
          >>>
          >>>
          >>>
          >>------------------------------------
          >>
          >>Fookes Software: http://www.fookes.com/
          >>NoteTab website: http://www.notetab.com/
          >>NoteTab Discussion Lists: http://www.notetab.com/groups.php
          >>
          >>***
          >>Yahoo! Groups Links
          >>
          >>
          >>
          >>
          >>
          >>
          >>
          >>
          >>
          >
          >
          >------------------------------------
          >
          >Fookes Software: http://www.fookes.com/
          >NoteTab website: http://www.notetab.com/
          >NoteTab Discussion Lists: http://www.notetab.com/groups.php
          >
          >***
          >Yahoo! Groups Links
          >
          >
          >
          >
          >
          >
          >
        • Don - HtmlFixIt.com
          ... Hi Vance, You are looking to make it non-greedy. Your magic character will be a ? after the .+ I think. Don Here are some slides from the wonderful help
          Message 4 of 8 , Oct 3, 2008
          View Source
          • 0 Attachment
            Vance E. Neff wrote:
            > Thanks Don,
            >
            > I knew that was what was basiccally required, I was just hoping someone
            > had generated a clip to automate it.
            >
            > Anyhow,
            > I replaced all documents with all occurrances of
            >
            > http://www.website.com/dir1
            >
            > with
            >
            > file:///C:\website_copy\dir1
            >
            > Then I repeatedly replaced all occurrances of
            > "file\:///C\:\\website_copy\\dir1(.+)/(.+)"
            > with
            > "file\:///C\:\\website_copy\\dir1
            >
            > The problem is that if there are no / in the reference,
            > but there is another " on the same line with a / included
            > it finds a match.
            > How do I limit the match to the first " after the initial
            > "file\:///C\:\\website_copy\\dir1
            > match?
            >
            > Vance

            Hi Vance,

            You are looking to make it non-greedy. Your magic character will be a ?
            after the .+ I think.

            Don

            Here are some slides from the wonderful help file that Sheri sent out a
            long time ago:
            greediness
            Quantifiers try to grab as much as
            possible by default
            Applying <.+> to <i>greediness</i>
            matches the whole string rather than
            just <i>

            greediness
            If the entire match fails because they
            grabbed too much, then they are forced
            to give up as much as needed to make
            the rest of regex succeed

            greediness
            To find words ending in ness, you will
            probably use \w+ness
            On the first run \w+ takes the whole
            word
            But since ness still has to match, it gives
            up the last 4 characters and the match
            succeeds

            overcoming greediness
            The simplest solution is to
            make the repetition operators
            non-greedy, or lazy
            Lazy quantifiers grab as little
            as possible
            If the overall match fails, they
            grab a little more and the
            match is tried again

            overcoming greediness
            To make a greedy quantifier
            lazy, append ?
            Note that this use of the
            question mark is different from
            its use as a regular quantifier
            *?
            +?
            { , }?
            ??

            overcoming greediness
            *?
            +?
            { , }?
            ??
            Applying <.+?>
            to <i>greediness</i>
            gets us <i>
            <i>

            overcoming greediness
            Another option is to use
            negated character classes
            More efficient and clearer than
            lazy repetition

            overcoming greediness
            <.+?> can be turned into <[^>]+>
            Note that the second version
            will match tags spanning
            multiple lines
            Single-line version: <[^>\r\n]+>
          • Don - HtmlFixIt.com
            ... Vance, you just dump those two things into a clip and you are off to the races. So if you plan to do something like this again ... but the issue is that
            Message 5 of 8 , Oct 3, 2008
            View Source
            • 0 Attachment
              Vance E. Neff wrote:
              > Just a correction:
              >
              > Then I repeatedly replaced all occurrances of
              > "file\:///C\:\\website_copy\\dir1(.+)/(.+)"
              > with
              > "file\:///C\:\\website_copy\\dir1$1\\$2"
              >
              > Vance
              >
              Vance, you just dump those two things into a clip and you are off to the
              races. So if you plan to do something like this again ...

              but the issue is that next time your regex might not be quite the same

              And why use a regex at all? You know the exact of what you are replacing.
            • Sheri
              ... I think something got lost there, you re not actually searching for double quotes are you? In any case, I think the problem you re experiencing is because
              Message 6 of 8 , Oct 3, 2008
              View Source
              • 0 Attachment
                --- In ntb-clips@yahoogroups.com, "Vance E. Neff" <veneff@...> wrote:
                >
                > Just a correction:
                >
                > Then I repeatedly replaced all occurrances of
                > "file\:///C\:\\website_copy\\dir1(.+)/(.+)"
                > with
                > "file\:///C\:\\website_copy\\dir1$1\\$2"
                >
                > Vance

                > >The problem is that if there are no / in the reference,
                > > but there is another " on the same line with a / included
                > >it finds a match.
                > >How do I limit the match to the first " after the initial
                > >"file\:///C\:\\website_copy\\dir1
                > >match?

                I think something got lost there, you're not actually searching for
                double quotes are you?

                In any case, I think the problem you're experiencing is because you
                are searching for dot plus, which can match any multiple characters at
                all (except by default line break characters). (.+)/(.+) will match
                every thing on a line up to the last / that has other characters after
                it, plus those other characters. If you excluded spaces from what can
                be matched, it would probably help alot (since links can't have spaces
                in them). But when you make a negative character class, line break
                characters are not excluded by default, so you would want to exclude
                them too.

                Instead of dot plus, try [^\x20\r\n]+

                It will match up to the last / in the link instead of the last / on
                the line

                file\:///C\:\\website_copy\\dir1[^\x20\r\n]+/[^\x20\r\n]+

                Matching to the last one should be fine since you are repeatedly
                executing the replace until there are no more matches. You'd still
                have to do that in order to get all the /'s in the links (if there
                could be more than one) if you were using an ungreedy search. You can
                make it ungreedy (if you want to try it) by putting a question mark
                after those plus signs.

                Note \x20 is a space in hex. More obvious than empty space, but an
                empty space would also work fine in the character class.

                Regards,
                Sheri
              • Vance E. Neff
                Sheri, Thanks for the response. Unfortunately the file reference can and does have spaces in some of the directory names. I had simplified the actual leading
                Message 7 of 8 , Oct 3, 2008
                View Source
                • 0 Attachment
                  Sheri,

                  Thanks for the response.
                  Unfortunately the file reference can and does have spaces in some of the
                  directory names. I had simplified the actual leading destination
                  directory string with the website_copy term.
                  But your approach gave me a good clue. I instead used
                  [^"\r\n]+

                  Thanks alot!
                  Vance

                  Sheri wrote:

                  >--- In ntb-clips@yahoogroups.com, "Vance E. Neff" <veneff@...> wrote:
                  >
                  >
                  >>Just a correction:
                  >>
                  >>Then I repeatedly replaced all occurrances of
                  >>"file\:///C\:\\website_copy\\dir1(.+)/(.+)"
                  >>with
                  >>"file\:///C\:\\website_copy\\dir1$1\\$2"
                  >>
                  >>Vance
                  >>
                  >>
                  >
                  >
                  >
                  >>>The problem is that if there are no / in the reference,
                  >>>but there is another " on the same line with a / included
                  >>>it finds a match.
                  >>>How do I limit the match to the first " after the initial
                  >>>"file\:///C\:\\website_copy\\dir1
                  >>>match?
                  >>>
                  >>>
                  >
                  >I think something got lost there, you're not actually searching for
                  >double quotes are you?
                  >
                  >In any case, I think the problem you're experiencing is because you
                  >are searching for dot plus, which can match any multiple characters at
                  >all (except by default line break characters). (.+)/(.+) will match
                  >every thing on a line up to the last / that has other characters after
                  >it, plus those other characters. If you excluded spaces from what can
                  >be matched, it would probably help alot (since links can't have spaces
                  >in them). But when you make a negative character class, line break
                  >characters are not excluded by default, so you would want to exclude
                  >them too.
                  >
                  >Instead of dot plus, try [^\x20\r\n]+
                  >
                  >It will match up to the last / in the link instead of the last / on
                  >the line
                  >
                  >file\:///C\:\\website_copy\\dir1[^\x20\r\n]+/[^\x20\r\n]+
                  >
                  >Matching to the last one should be fine since you are repeatedly
                  >executing the replace until there are no more matches. You'd still
                  >have to do that in order to get all the /'s in the links (if there
                  >could be more than one) if you were using an ungreedy search. You can
                  >make it ungreedy (if you want to try it) by putting a question mark
                  >after those plus signs.
                  >
                  >Note \x20 is a space in hex. More obvious than empty space, but an
                  >empty space would also work fine in the character class.
                  >
                  >Regards,
                  >Sheri
                  >
                  >
                  >
                  >------------------------------------
                  >
                  >Fookes Software: http://www.fookes.com/
                  >NoteTab website: http://www.notetab.com/
                  >NoteTab Discussion Lists: http://www.notetab.com/groups.php
                  >
                  >***
                  >Yahoo! Groups Links
                  >
                  >
                  >
                  >
                  >
                  >
                  >
                Your message has been successfully submitted and would be delivered to recipients shortly.