Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] fixup off http:// references to file:/// references

Expand Messages
  • Don - HtmlFixIt.com
    ... Hi Vance, You are looking to make it non-greedy. Your magic character will be a ? after the .+ I think. Don Here are some slides from the wonderful help
    Message 1 of 8 , Oct 3, 2008
    • 0 Attachment
      Vance E. Neff wrote:
      > Thanks Don,
      >
      > I knew that was what was basiccally required, I was just hoping someone
      > had generated a clip to automate it.
      >
      > Anyhow,
      > I replaced all documents with all occurrances of
      >
      > http://www.website.com/dir1
      >
      > with
      >
      > file:///C:\website_copy\dir1
      >
      > Then I repeatedly replaced all occurrances of
      > "file\:///C\:\\website_copy\\dir1(.+)/(.+)"
      > with
      > "file\:///C\:\\website_copy\\dir1
      >
      > The problem is that if there are no / in the reference,
      > but there is another " on the same line with a / included
      > it finds a match.
      > How do I limit the match to the first " after the initial
      > "file\:///C\:\\website_copy\\dir1
      > match?
      >
      > Vance

      Hi Vance,

      You are looking to make it non-greedy. Your magic character will be a ?
      after the .+ I think.

      Don

      Here are some slides from the wonderful help file that Sheri sent out a
      long time ago:
      greediness
      Quantifiers try to grab as much as
      possible by default
      Applying <.+> to <i>greediness</i>
      matches the whole string rather than
      just <i>

      greediness
      If the entire match fails because they
      grabbed too much, then they are forced
      to give up as much as needed to make
      the rest of regex succeed

      greediness
      To find words ending in ness, you will
      probably use \w+ness
      On the first run \w+ takes the whole
      word
      But since ness still has to match, it gives
      up the last 4 characters and the match
      succeeds

      overcoming greediness
      The simplest solution is to
      make the repetition operators
      non-greedy, or lazy
      Lazy quantifiers grab as little
      as possible
      If the overall match fails, they
      grab a little more and the
      match is tried again

      overcoming greediness
      To make a greedy quantifier
      lazy, append ?
      Note that this use of the
      question mark is different from
      its use as a regular quantifier
      *?
      +?
      { , }?
      ??

      overcoming greediness
      *?
      +?
      { , }?
      ??
      Applying <.+?>
      to <i>greediness</i>
      gets us <i>
      <i>

      overcoming greediness
      Another option is to use
      negated character classes
      More efficient and clearer than
      lazy repetition

      overcoming greediness
      <.+?> can be turned into <[^>]+>
      Note that the second version
      will match tags spanning
      multiple lines
      Single-line version: <[^>\r\n]+>
    • Don - HtmlFixIt.com
      ... Vance, you just dump those two things into a clip and you are off to the races. So if you plan to do something like this again ... but the issue is that
      Message 2 of 8 , Oct 3, 2008
      • 0 Attachment
        Vance E. Neff wrote:
        > Just a correction:
        >
        > Then I repeatedly replaced all occurrances of
        > "file\:///C\:\\website_copy\\dir1(.+)/(.+)"
        > with
        > "file\:///C\:\\website_copy\\dir1$1\\$2"
        >
        > Vance
        >
        Vance, you just dump those two things into a clip and you are off to the
        races. So if you plan to do something like this again ...

        but the issue is that next time your regex might not be quite the same

        And why use a regex at all? You know the exact of what you are replacing.
      • Sheri
        ... I think something got lost there, you re not actually searching for double quotes are you? In any case, I think the problem you re experiencing is because
        Message 3 of 8 , Oct 3, 2008
        • 0 Attachment
          --- In ntb-clips@yahoogroups.com, "Vance E. Neff" <veneff@...> wrote:
          >
          > Just a correction:
          >
          > Then I repeatedly replaced all occurrances of
          > "file\:///C\:\\website_copy\\dir1(.+)/(.+)"
          > with
          > "file\:///C\:\\website_copy\\dir1$1\\$2"
          >
          > Vance

          > >The problem is that if there are no / in the reference,
          > > but there is another " on the same line with a / included
          > >it finds a match.
          > >How do I limit the match to the first " after the initial
          > >"file\:///C\:\\website_copy\\dir1
          > >match?

          I think something got lost there, you're not actually searching for
          double quotes are you?

          In any case, I think the problem you're experiencing is because you
          are searching for dot plus, which can match any multiple characters at
          all (except by default line break characters). (.+)/(.+) will match
          every thing on a line up to the last / that has other characters after
          it, plus those other characters. If you excluded spaces from what can
          be matched, it would probably help alot (since links can't have spaces
          in them). But when you make a negative character class, line break
          characters are not excluded by default, so you would want to exclude
          them too.

          Instead of dot plus, try [^\x20\r\n]+

          It will match up to the last / in the link instead of the last / on
          the line

          file\:///C\:\\website_copy\\dir1[^\x20\r\n]+/[^\x20\r\n]+

          Matching to the last one should be fine since you are repeatedly
          executing the replace until there are no more matches. You'd still
          have to do that in order to get all the /'s in the links (if there
          could be more than one) if you were using an ungreedy search. You can
          make it ungreedy (if you want to try it) by putting a question mark
          after those plus signs.

          Note \x20 is a space in hex. More obvious than empty space, but an
          empty space would also work fine in the character class.

          Regards,
          Sheri
        • Vance E. Neff
          Sheri, Thanks for the response. Unfortunately the file reference can and does have spaces in some of the directory names. I had simplified the actual leading
          Message 4 of 8 , Oct 3, 2008
          • 0 Attachment
            Sheri,

            Thanks for the response.
            Unfortunately the file reference can and does have spaces in some of the
            directory names. I had simplified the actual leading destination
            directory string with the website_copy term.
            But your approach gave me a good clue. I instead used
            [^"\r\n]+

            Thanks alot!
            Vance

            Sheri wrote:

            >--- In ntb-clips@yahoogroups.com, "Vance E. Neff" <veneff@...> wrote:
            >
            >
            >>Just a correction:
            >>
            >>Then I repeatedly replaced all occurrances of
            >>"file\:///C\:\\website_copy\\dir1(.+)/(.+)"
            >>with
            >>"file\:///C\:\\website_copy\\dir1$1\\$2"
            >>
            >>Vance
            >>
            >>
            >
            >
            >
            >>>The problem is that if there are no / in the reference,
            >>>but there is another " on the same line with a / included
            >>>it finds a match.
            >>>How do I limit the match to the first " after the initial
            >>>"file\:///C\:\\website_copy\\dir1
            >>>match?
            >>>
            >>>
            >
            >I think something got lost there, you're not actually searching for
            >double quotes are you?
            >
            >In any case, I think the problem you're experiencing is because you
            >are searching for dot plus, which can match any multiple characters at
            >all (except by default line break characters). (.+)/(.+) will match
            >every thing on a line up to the last / that has other characters after
            >it, plus those other characters. If you excluded spaces from what can
            >be matched, it would probably help alot (since links can't have spaces
            >in them). But when you make a negative character class, line break
            >characters are not excluded by default, so you would want to exclude
            >them too.
            >
            >Instead of dot plus, try [^\x20\r\n]+
            >
            >It will match up to the last / in the link instead of the last / on
            >the line
            >
            >file\:///C\:\\website_copy\\dir1[^\x20\r\n]+/[^\x20\r\n]+
            >
            >Matching to the last one should be fine since you are repeatedly
            >executing the replace until there are no more matches. You'd still
            >have to do that in order to get all the /'s in the links (if there
            >could be more than one) if you were using an ungreedy search. You can
            >make it ungreedy (if you want to try it) by putting a question mark
            >after those plus signs.
            >
            >Note \x20 is a space in hex. More obvious than empty space, but an
            >empty space would also work fine in the character class.
            >
            >Regards,
            >Sheri
            >
            >
            >
            >------------------------------------
            >
            >Fookes Software: http://www.fookes.com/
            >NoteTab website: http://www.notetab.com/
            >NoteTab Discussion Lists: http://www.notetab.com/groups.php
            >
            >***
            >Yahoo! Groups Links
            >
            >
            >
            >
            >
            >
            >
          Your message has been successfully submitted and would be delivered to recipients shortly.