Loading ...
Sorry, an error occurred while loading the content.

fixup off http:// references to file:/// references

Expand Messages
  • veneffy
    Does anyone have a clip that will convert web links to corresponding local disk links: such as from: http//www.website.com/dir1/dir2/... to:
    Message 1 of 8 , Oct 2, 2008
    • 0 Attachment
      Does anyone have a clip that will convert web links to corresponding
      local disk links:

      such as from:
      "http//www.website.com/dir1/dir2/..."
      to:
      "file:///C:\website_copy\dir1\dir2\..."

      that takes care of converting any subsequent / to \
      and adding an \index.html to those references that do not specify a
      target file name (no extension at the end of the reference).

      Thanks for any info!
      Vance
    • Don - HtmlFixIt.com
      Just us search and replace with all documents open (make a back up first). Better way is to use relative links in the first place. DP
      Message 2 of 8 , Oct 2, 2008
      • 0 Attachment
        Just us search and replace with all documents open (make a back up
        first). Better way is to use relative links in the first place.

        DP

        veneffy wrote:
        > Does anyone have a clip that will convert web links to corresponding
        > local disk links:
        >
        > such as from:
        > "http//www.website.com/dir1/dir2/..."
        > to:
        > "file:///C:\website_copy\dir1\dir2\..."
        >
        > that takes care of converting any subsequent / to \
        > and adding an \index.html to those references that do not specify a
        > target file name (no extension at the end of the reference).
        >
        > Thanks for any info!
        > Vance
        >
      • Vance E. Neff
        Thanks Don, I knew that was what was basiccally required, I was just hoping someone had generated a clip to automate it. Anyhow, I replaced all documents with
        Message 3 of 8 , Oct 3, 2008
        • 0 Attachment
          Thanks Don,

          I knew that was what was basiccally required, I was just hoping someone
          had generated a clip to automate it.

          Anyhow,
          I replaced all documents with all occurrances of

          http://www.website.com/dir1

          with

          file:///C:\website_copy\dir1

          Then I repeatedly replaced all occurrances of
          "file\:///C\:\\website_copy\\dir1(.+)/(.+)"
          with
          "file\:///C\:\\website_copy\\dir1

          The problem is that if there are no / in the reference,
          but there is another " on the same line with a / included
          it finds a match.
          How do I limit the match to the first " after the initial
          "file\:///C\:\\website_copy\\dir1
          match?

          Vance

          Don - HtmlFixIt.com wrote:

          >Just us search and replace with all documents open (make a back up
          >first). Better way is to use relative links in the first place.
          >
          >DP
          >
          >veneffy wrote:
          >
          >
          >>Does anyone have a clip that will convert web links to corresponding
          >>local disk links:
          >>
          >>such as from:
          >>"http//www.website.com/dir1/dir2/..."
          >>to:
          >>"file:///C:\website_copy\dir1\dir2\..."
          >>
          >>that takes care of converting any subsequent / to \
          >>and adding an \index.html to those references that do not specify a
          >>target file name (no extension at the end of the reference).
          >>
          >>Thanks for any info!
          >>Vance
          >>
          >>
          >>
          >
          >------------------------------------
          >
          >Fookes Software: http://www.fookes.com/
          >NoteTab website: http://www.notetab.com/
          >NoteTab Discussion Lists: http://www.notetab.com/groups.php
          >
          >***
          >Yahoo! Groups Links
          >
          >
          >
          >
          >
          >
          >
        • Vance E. Neff
          Just a correction: Then I repeatedly replaced all occurrances of file :///C : website_copy dir1(.+)/(.+) with file :///C : website_copy dir1$1 $2
          Message 4 of 8 , Oct 3, 2008
          • 0 Attachment
            Just a correction:

            Then I repeatedly replaced all occurrances of
            "file\:///C\:\\website_copy\\dir1(.+)/(.+)"
            with
            "file\:///C\:\\website_copy\\dir1$1\\$2"

            Vance



            Vance E. Neff wrote:

            >Thanks Don,
            >
            >I knew that was what was basically required, I was just hoping someone
            >had generated a clip to automate it.
            >
            >Anyhow,
            >I replaced all documents with all occurrances of
            >
            >http://www.website.com/dir1
            >
            >with
            >
            >file:///C:\website_copy\dir1
            >
            >Then I repeatedly replaced all occurrances of
            >"file\:///C\:\\website_copy\\dir1(.+)/(.+)"
            >with
            >"file\:///C\:\\website_copy\\dir1
            >
            >The problem is that if there are no / in the reference,
            > but there is another " on the same line with a / included
            >it finds a match.
            >How do I limit the match to the first " after the initial
            >"file\:///C\:\\website_copy\\dir1
            >match?
            >
            >Vance
            >
            >Don - HtmlFixIt.com wrote:
            >
            >
            >
            >>Just us search and replace with all documents open (make a back up
            >>first). Better way is to use relative links in the first place.
            >>
            >>DP
            >>
            >>veneffy wrote:
            >>
            >>
            >>
            >>
            >>>Does anyone have a clip that will convert web links to corresponding
            >>>local disk links:
            >>>
            >>>such as from:
            >>>"http//www.website.com/dir1/dir2/..."
            >>>to:
            >>>"file:///C:\website_copy\dir1\dir2\..."
            >>>
            >>>that takes care of converting any subsequent / to \
            >>>and adding an \index.html to those references that do not specify a
            >>>target file name (no extension at the end of the reference).
            >>>
            >>>Thanks for any info!
            >>>Vance
            >>>
            >>>
            >>>
            >>>
            >>>
            >>------------------------------------
            >>
            >>Fookes Software: http://www.fookes.com/
            >>NoteTab website: http://www.notetab.com/
            >>NoteTab Discussion Lists: http://www.notetab.com/groups.php
            >>
            >>***
            >>Yahoo! Groups Links
            >>
            >>
            >>
            >>
            >>
            >>
            >>
            >>
            >>
            >
            >
            >------------------------------------
            >
            >Fookes Software: http://www.fookes.com/
            >NoteTab website: http://www.notetab.com/
            >NoteTab Discussion Lists: http://www.notetab.com/groups.php
            >
            >***
            >Yahoo! Groups Links
            >
            >
            >
            >
            >
            >
            >
          • Don - HtmlFixIt.com
            ... Hi Vance, You are looking to make it non-greedy. Your magic character will be a ? after the .+ I think. Don Here are some slides from the wonderful help
            Message 5 of 8 , Oct 3, 2008
            • 0 Attachment
              Vance E. Neff wrote:
              > Thanks Don,
              >
              > I knew that was what was basiccally required, I was just hoping someone
              > had generated a clip to automate it.
              >
              > Anyhow,
              > I replaced all documents with all occurrances of
              >
              > http://www.website.com/dir1
              >
              > with
              >
              > file:///C:\website_copy\dir1
              >
              > Then I repeatedly replaced all occurrances of
              > "file\:///C\:\\website_copy\\dir1(.+)/(.+)"
              > with
              > "file\:///C\:\\website_copy\\dir1
              >
              > The problem is that if there are no / in the reference,
              > but there is another " on the same line with a / included
              > it finds a match.
              > How do I limit the match to the first " after the initial
              > "file\:///C\:\\website_copy\\dir1
              > match?
              >
              > Vance

              Hi Vance,

              You are looking to make it non-greedy. Your magic character will be a ?
              after the .+ I think.

              Don

              Here are some slides from the wonderful help file that Sheri sent out a
              long time ago:
              greediness
              Quantifiers try to grab as much as
              possible by default
              Applying <.+> to <i>greediness</i>
              matches the whole string rather than
              just <i>

              greediness
              If the entire match fails because they
              grabbed too much, then they are forced
              to give up as much as needed to make
              the rest of regex succeed

              greediness
              To find words ending in ness, you will
              probably use \w+ness
              On the first run \w+ takes the whole
              word
              But since ness still has to match, it gives
              up the last 4 characters and the match
              succeeds

              overcoming greediness
              The simplest solution is to
              make the repetition operators
              non-greedy, or lazy
              Lazy quantifiers grab as little
              as possible
              If the overall match fails, they
              grab a little more and the
              match is tried again

              overcoming greediness
              To make a greedy quantifier
              lazy, append ?
              Note that this use of the
              question mark is different from
              its use as a regular quantifier
              *?
              +?
              { , }?
              ??

              overcoming greediness
              *?
              +?
              { , }?
              ??
              Applying <.+?>
              to <i>greediness</i>
              gets us <i>
              <i>

              overcoming greediness
              Another option is to use
              negated character classes
              More efficient and clearer than
              lazy repetition

              overcoming greediness
              <.+?> can be turned into <[^>]+>
              Note that the second version
              will match tags spanning
              multiple lines
              Single-line version: <[^>\r\n]+>
            • Don - HtmlFixIt.com
              ... Vance, you just dump those two things into a clip and you are off to the races. So if you plan to do something like this again ... but the issue is that
              Message 6 of 8 , Oct 3, 2008
              • 0 Attachment
                Vance E. Neff wrote:
                > Just a correction:
                >
                > Then I repeatedly replaced all occurrances of
                > "file\:///C\:\\website_copy\\dir1(.+)/(.+)"
                > with
                > "file\:///C\:\\website_copy\\dir1$1\\$2"
                >
                > Vance
                >
                Vance, you just dump those two things into a clip and you are off to the
                races. So if you plan to do something like this again ...

                but the issue is that next time your regex might not be quite the same

                And why use a regex at all? You know the exact of what you are replacing.
              • Sheri
                ... I think something got lost there, you re not actually searching for double quotes are you? In any case, I think the problem you re experiencing is because
                Message 7 of 8 , Oct 3, 2008
                • 0 Attachment
                  --- In ntb-clips@yahoogroups.com, "Vance E. Neff" <veneff@...> wrote:
                  >
                  > Just a correction:
                  >
                  > Then I repeatedly replaced all occurrances of
                  > "file\:///C\:\\website_copy\\dir1(.+)/(.+)"
                  > with
                  > "file\:///C\:\\website_copy\\dir1$1\\$2"
                  >
                  > Vance

                  > >The problem is that if there are no / in the reference,
                  > > but there is another " on the same line with a / included
                  > >it finds a match.
                  > >How do I limit the match to the first " after the initial
                  > >"file\:///C\:\\website_copy\\dir1
                  > >match?

                  I think something got lost there, you're not actually searching for
                  double quotes are you?

                  In any case, I think the problem you're experiencing is because you
                  are searching for dot plus, which can match any multiple characters at
                  all (except by default line break characters). (.+)/(.+) will match
                  every thing on a line up to the last / that has other characters after
                  it, plus those other characters. If you excluded spaces from what can
                  be matched, it would probably help alot (since links can't have spaces
                  in them). But when you make a negative character class, line break
                  characters are not excluded by default, so you would want to exclude
                  them too.

                  Instead of dot plus, try [^\x20\r\n]+

                  It will match up to the last / in the link instead of the last / on
                  the line

                  file\:///C\:\\website_copy\\dir1[^\x20\r\n]+/[^\x20\r\n]+

                  Matching to the last one should be fine since you are repeatedly
                  executing the replace until there are no more matches. You'd still
                  have to do that in order to get all the /'s in the links (if there
                  could be more than one) if you were using an ungreedy search. You can
                  make it ungreedy (if you want to try it) by putting a question mark
                  after those plus signs.

                  Note \x20 is a space in hex. More obvious than empty space, but an
                  empty space would also work fine in the character class.

                  Regards,
                  Sheri
                • Vance E. Neff
                  Sheri, Thanks for the response. Unfortunately the file reference can and does have spaces in some of the directory names. I had simplified the actual leading
                  Message 8 of 8 , Oct 3, 2008
                  • 0 Attachment
                    Sheri,

                    Thanks for the response.
                    Unfortunately the file reference can and does have spaces in some of the
                    directory names. I had simplified the actual leading destination
                    directory string with the website_copy term.
                    But your approach gave me a good clue. I instead used
                    [^"\r\n]+

                    Thanks alot!
                    Vance

                    Sheri wrote:

                    >--- In ntb-clips@yahoogroups.com, "Vance E. Neff" <veneff@...> wrote:
                    >
                    >
                    >>Just a correction:
                    >>
                    >>Then I repeatedly replaced all occurrances of
                    >>"file\:///C\:\\website_copy\\dir1(.+)/(.+)"
                    >>with
                    >>"file\:///C\:\\website_copy\\dir1$1\\$2"
                    >>
                    >>Vance
                    >>
                    >>
                    >
                    >
                    >
                    >>>The problem is that if there are no / in the reference,
                    >>>but there is another " on the same line with a / included
                    >>>it finds a match.
                    >>>How do I limit the match to the first " after the initial
                    >>>"file\:///C\:\\website_copy\\dir1
                    >>>match?
                    >>>
                    >>>
                    >
                    >I think something got lost there, you're not actually searching for
                    >double quotes are you?
                    >
                    >In any case, I think the problem you're experiencing is because you
                    >are searching for dot plus, which can match any multiple characters at
                    >all (except by default line break characters). (.+)/(.+) will match
                    >every thing on a line up to the last / that has other characters after
                    >it, plus those other characters. If you excluded spaces from what can
                    >be matched, it would probably help alot (since links can't have spaces
                    >in them). But when you make a negative character class, line break
                    >characters are not excluded by default, so you would want to exclude
                    >them too.
                    >
                    >Instead of dot plus, try [^\x20\r\n]+
                    >
                    >It will match up to the last / in the link instead of the last / on
                    >the line
                    >
                    >file\:///C\:\\website_copy\\dir1[^\x20\r\n]+/[^\x20\r\n]+
                    >
                    >Matching to the last one should be fine since you are repeatedly
                    >executing the replace until there are no more matches. You'd still
                    >have to do that in order to get all the /'s in the links (if there
                    >could be more than one) if you were using an ungreedy search. You can
                    >make it ungreedy (if you want to try it) by putting a question mark
                    >after those plus signs.
                    >
                    >Note \x20 is a space in hex. More obvious than empty space, but an
                    >empty space would also work fine in the character class.
                    >
                    >Regards,
                    >Sheri
                    >
                    >
                    >
                    >------------------------------------
                    >
                    >Fookes Software: http://www.fookes.com/
                    >NoteTab website: http://www.notetab.com/
                    >NoteTab Discussion Lists: http://www.notetab.com/groups.php
                    >
                    >***
                    >Yahoo! Groups Links
                    >
                    >
                    >
                    >
                    >
                    >
                    >
                  Your message has been successfully submitted and would be delivered to recipients shortly.