  • Marcelo Bastos
    Oct 10, 2012
      Interviewed by CNN on 10/10/2012 07:01, Axel Berger told the world:
      > Marcelo Bastos wrote:
      >> The problem: if there were Unicode characters there, you lost them.
      > Which is why that's not the way to do it. Hope the following is correct
      > (i.e. works first time), I really hate this "feature". You can
      > a) Open the file as codepage (UTF-8 (no conversion)" and possibly also
      > switch off document --> Read only.
      That's a very nice piece of clip programming, and yes, it DID work first
      time. (Well, after I fixed a couple statements that had been
      line-wrapped by the mail systems, that is.) Thank you, it will prove
      most useful in the coming weeks.
      I had a quick look at the logic, and it seems to be generic enough to
      tackle the entire Basic Multilingual Plane. Which is good, since I have
      deal with a couple text sources who just *love* to use obscure
      characters from languages you never heard about for aesthetic effect.

      I'm already thinking about four or five ways I can integrate it into my
      workflow. It will probably end up as the main subroutine of a larger
      clip. I'm thinking of starting with an auto-reload of the file as "UTF-8
      (no conversion)," then a preprocessing search-and-replace to get rid of
      the most common cases, like "smart quotes" (not strictly needed, but it
      should speed up the process quite a bit), and a post-processing
      "cleanup" phase using a couple clips I already have in hand.


