Loading ...
Sorry, an error occurred while loading the content.

Special Character Problems

Expand Messages
  • letspetpuppies
    I receive files from clients in word documents on a regular basis and cut-and-paste the text into html files (which I edit in SubEthaEdit). That s great except
    Message 1 of 4 , May 4, 2007
    • 0 Attachment
      I receive files from clients in word documents on a regular basis and
      cut-and-paste the text into html files (which I edit in SubEthaEdit).
      That's great except for the special characters that Word uses like
      open and close single and double quotes, ellipses and dashes.

      If I use a very basic file encoding (like ASCII) then I am warned
      about the special characters and if I insert as-is they are stripped
      out. That's not ideal. If I use something like Mac OS Roman, they
      are pasted into the document, but I have to try to find them and
      search-and-replace each one. Inevitably, I miss one or two (the
      dashes are particularly hard), and it adds a round of iteration with
      the client (the chars get translated to "?" marks or small squares).

      Ideally, I'd like if there was a SubEthaEdit feature that was aware
      that there the document special characters that won't translate to the
      web, and converts them to their ASCII counterparts en-masse, without
      me having to specify them one at a time. Is there such a feature in
      SubEthaEdit or could it be added with a script or plugin or something?

      Love the editor and thanks for your help!
    • dasgeniedotcom
      Thanks for your input. In the meantime (until we add a feature that will add HTML entity conversion) what you can do is convert the text to ASCII (via the
      Message 2 of 4 , May 4, 2007
      • 0 Attachment
        Thanks for your input. In the meantime (until we add a feature that will add HTML entity
        conversion) what you can do is convert the text to ASCII (via the bottom bar or the format
        menu). When you do that and there are still non-ASCII characters in there, the encoding
        doctor will come up showing you all these and then you can change them.

        Best,
        dom

        --- In SubEthaEdit@yahoogroups.com, "letspetpuppies" <rory@...> wrote:
        >
        > I receive files from clients in word documents on a regular basis and
        > cut-and-paste the text into html files (which I edit in SubEthaEdit).
        > That's great except for the special characters that Word uses like
        > open and close single and double quotes, ellipses and dashes.
        >
        > If I use a very basic file encoding (like ASCII) then I am warned
        > about the special characters and if I insert as-is they are stripped
        > out. That's not ideal. If I use something like Mac OS Roman, they
        > are pasted into the document, but I have to try to find them and
        > search-and-replace each one. Inevitably, I miss one or two (the
        > dashes are particularly hard), and it adds a round of iteration with
        > the client (the chars get translated to "?" marks or small squares).
        >
        > Ideally, I'd like if there was a SubEthaEdit feature that was aware
        > that there the document special characters that won't translate to the
        > web, and converts them to their ASCII counterparts en-masse, without
        > me having to specify them one at a time. Is there such a feature in
        > SubEthaEdit or could it be added with a script or plugin or something?
        >
        > Love the editor and thanks for your help!
        >
      • Cole Tierney
        ... Before you paste the text into the document try running the following applescript: do shell script pbpaste | perl -pe
        Message 3 of 4 , May 4, 2007
        • 0 Attachment
          At 1:01 PM +0000 5/4/07, letspetpuppies wrote:
          >If I use a very basic file encoding (like ASCII) then I am warned
          >about the special characters and if I insert as-is they are stripped
          >out. That's not ideal. If I use something like Mac OS Roman, they
          >are pasted into the document, but I have to try to find them and
          >search-and-replace each one. Inevitably, I miss one or two (the
          >dashes are particularly hard), and it adds a round of iteration with
          >the client (the chars get translated to "?" marks or small squares).

          Before you paste the text into the document try running the following
          applescript:

          do shell script "
          pbpaste |
          perl -pe 's/([^[:ascii:]])/sprintf(\"&#%03d;\", ord($1))/eg' |
          pbcopy
          "

          That should convert all non ascii characters to their character
          entity equivalents.

          --
          Cole
        • Martin Pittenauer
          ... I d recommend using UnicodeChecker[1] to convert characters to their HTML entities (and vice versa). Install it and there will be an entry in
          Message 4 of 4 , May 4, 2007
          • 0 Attachment
            On 04.05.2007, at 15:01, letspetpuppies wrote:

            > Ideally, I'd like if there was a SubEthaEdit feature that was aware
            > that there the document special characters that won't translate to the
            > web, and converts them to their ASCII counterparts en-masse, without
            > me having to specify them one at a time. Is there such a feature in
            > SubEthaEdit or could it be added with a script or plugin or something?

            I'd recommend using UnicodeChecker[1] to convert characters to their
            HTML entities (and vice versa).
            Install it and there will be an entry in "SubEthaEdit/Services"
            called Unicode that contains the necessary tools that can be used on
            the currently selected text.

            [1] http://earthlingsoft.net/UnicodeChecker/

            All the best,
            Martin
          Your message has been successfully submitted and would be delivered to recipients shortly.