Loading ...
Sorry, an error occurred while loading the content.

Re: [emacs-nxml-mode] Character entities

Expand Messages
  • Sebastian Rahtz
    ... agreed in principle, but I am not as convinced as you ... that is the author s choice, to make life hard ... thats true, and beyond our control ... so we
    Message 1 of 25 , Sep 30, 2003
    View Source
    • 0 Attachment
      > I think there are legitimate reasons for not using Unicode
      > characters directly:

      agreed in principle, but I am not as convinced as you

      > - you might want to use an encoding that cannot encode a character that
      > you want

      that is the author's choice, to make life hard

      > - Emacs doesn't support all of Unicode yet (Unicode CJK and non-BMP
      > ranges are not supported)

      thats true, and beyond our control

      > - you might not have a font that contains an appropriate glyph for a
      > character

      so we get a font ...

      > - you might have a font with a appropriate glyph but it might be too
      > hard to distinguish from other characters (for example, in a fixed width
      > font an emdash might be hard to distinguish from an endash)

      maybe


      > Problem (a) is easily soluble by providing a command that allows you to
      > enter a character reference by specifying an entity name.
      Love's sgml-input could obviously be varied to do this in two shakes of
      a lamb's tail

      > Problem (b)
      > is potentially soluble by having a better display of character
      > references. For example, instead of displaying
      >
      > ‘
      >
      > Emacs might display
      >
      > ‘[lsquo];

      I find this a bit desparate, to be honest. it perpetuates those
      short names which, while familiar to SGML english-speaking long-timers,
      mean nothing to people new to the field. it would help a bit, I suppose,
      if its easy to implement

      by the way, are there emacs commands to switch from character entities
      to UTF-8 and vice-versa?

      --
      Sebastian Rahtz <sebastian.rahtz@...>
      OUCS
    • Sebastian Rahtz
      ... I quite like the idea of seeing ߠ[X] where X is the actual character if available. but how do you tell if it will be available, and not just be a
      Message 2 of 25 , Oct 1, 2003
      View Source
      • 0 Attachment
        > > One is to try and show a
        > >glyph for the referenced character as well as/instead of the reference.
        > > Another possibility is to take advantage of the Unicode names. These
        > >are a bit long to display inline, but they could be used for input and
        > >for providing a tooltip over a character reference.

        I quite like the idea of seeing ߠ[X] where X is the
        actual character if available. but how do you tell if it will
        be available, and not just be a white box?
        The full Unicode name as tooltip would be good, though.

        > > by the way, are there emacs commands to switch from character entities
        > > to UTF-8 and vice-versa?
        >
        > Nope. This is not entirely trivial since character references/entities
        > aren't recognized in all contexts.
        ah, I had not considered that. this is attribute and element names?

        > Is this a feature request?
        Perhaps. The same thing can be done with an identity transform
        using some XML language, so its not vital; but an emacs solution
        would be nice. If my file is full of white boxes, toggling to a
        display full of number codes might be a good alternative view.
        --
        Sebastian Rahtz <sebastian.rahtz@...>
        OUCS
      • James Clark
        ... In an ideal world, everybody would have their environment set up to display all the characters they want and XML files wouldn t use character references. I
        Message 3 of 25 , Oct 1, 2003
        View Source
        • 0 Attachment
          Sebastian Rahtz wrote:

          > I find this a bit desparate, to be honest. it perpetuates those
          > short names which, while familiar to SGML english-speaking long-timers,
          > mean nothing to people new to the field.

          In an ideal world, everybody would have their environment set up to
          display all the characters they want and XML files wouldn't use
          character references. I still think this is some way off. Even if you
          have got yourself properly set up, you may need to exchange files with
          somebody who hasn't. So I would like to nxml mode to help people out here.

          I think I would prefer to point people in the direction of character
          references rather than entities. Your point about the short SGML names
          is well-taken. So what's the alternative? One is to try and show a
          glyph for the referenced character as well as/instead of the reference.
          Another possibility is to take advantage of the Unicode names. These
          are a bit long to display inline, but they could be used for input and
          for providing a tooltip over a character reference.

          > by the way, are there emacs commands to switch from character entities
          > to UTF-8 and vice-versa?

          Nope. This is not entirely trivial since character references/entities
          aren't recognized in all contexts. Is this a feature request?

          James
        • Lars Marius Garshol
          * James Clark ... I definitely agree with all of this. This is a real problem for people, and character entities are the wrong solution. Character references
          Message 4 of 25 , Oct 1, 2003
          View Source
          • 0 Attachment
            * James Clark
            |
            | In an ideal world, everybody would have their environment set up to
            | display all the characters they want and XML files wouldn't use
            | character references. I still think this is some way off. Even if
            | you have got yourself properly set up, you may need to exchange
            | files with somebody who hasn't. So I would like to nxml mode to
            | help people out here.
            |
            | I think I would prefer to point people in the direction of character
            | references rather than entities.

            I definitely agree with all of this. This is a real problem for
            people, and character entities are the wrong solution. Character
            references are much better, and the job of making them user-friendly
            effectively rests with the editor.

            | Your point about the short SGML names is well-taken. So what's the
            | alternative?

            I think having some form of name-to-character mapping is the way to
            go, but perhaps there should be support for different kinds of names?
            Some people might prefer the SGML entity names, others the LaTeX macro
            names, and still others the names from the Unicode character database.

            If there is a configurable mapping list with tab-completion I think
            that might do the trick. I'd be perfectly happy to insert — by
            typing C-c something EM SPC S TAB RET, for example.

            | One is to try and show a glyph for the referenced character as well
            | as/instead of the reference.

            The glyph is probably the best. My Emacs happily displays Chinese,
            Japanese, and Korean glyphs, so I guess the only problem would be
            non-BMP characters or ones missing from my fonts.

            | Another possibility is to take advantage of the Unicode names.
            | These are a bit long to display inline, but they could be used for
            | input and for providing a tooltip over a character reference.

            Yep. If the glyph is missing your best reference is really the Unicode
            code point and the Unicode name. The names are usually accurate and
            usually quite helpful.

            --
            Lars Marius Garshol, Ontopian <URL: http://www.ontopia.net >
            GSM: +47 98 21 55 50 <URL: http://www.garshol.priv.no >
          • James Clark
            ... As far as I know, Emacs doesn t provide a way to tell, so nxml mode would have to guess. Based on the window-system, you can make a reasonable guess at a
            Message 5 of 25 , Oct 1, 2003
            View Source
            • 0 Attachment
              Sebastian Rahtz wrote:

              > I quite like the idea of seeing ߠ[X] where X is the
              > actual character if available. but how do you tell if it will
              > be available, and not just be a white box?

              As far as I know, Emacs doesn't provide a way to tell, so nxml mode
              would have to guess. Based on the window-system, you can make a
              reasonable guess at a minimum set of Unicode characters that should be
              displayable. The user could customize this to augment with particular
              Unicode blocks. If the guess is occasionally wrong, it's not a big problem.

              >>>by the way, are there emacs commands to switch from character entities
              >>>to UTF-8 and vice-versa?
              >>
              >>Nope. This is not entirely trivial since character references/entities
              >>aren't recognized in all contexts.
              >
              > ah, I had not considered that. this is attribute and element names?

              And comments and processing instructions.

              James
            • Xavier Cazin
              ... I agree, provision for such [any typist-readable string] = [character reference] mappings would be great. That would imply that user may provide nxml with
              Message 6 of 25 , Oct 3, 2003
              View Source
              • 0 Attachment
                Lars Marius Garshol <larsga@...> writes:
                >
                > I think having some form of name-to-character mapping is the way to
                > go, but perhaps there should be support for different kinds of names?
                > Some people might prefer the SGML entity names, others the LaTeX macro
                > names, and still others the names from the Unicode character database.

                I agree, provision for such

                [any typist-readable string] => [character reference]

                mappings would be great. That would imply that user may provide nxml
                with arbitray maps.


                > If there is a configurable mapping list with tab-completion I think
                > that might do the trick. I'd be perfectly happy to insert — by
                > typing C-c something EM SPC S TAB RET, for example.

                Me too.

                > Yep. If the glyph is missing your best reference is really the Unicode
                > code point and the Unicode name. The names are usually accurate and
                > usually quite helpful.

                Agreed again. But since code/names may be unconveniently long, I'd
                rather see a toggle for missing glyphs that either displays a one
                character long default glyph or the whole code+unicode name.

                Would it be bad to represent those unicode names as empty elements
                like <uni:zero-width-no-break-space code="65279" type="sepchar"/>. By
                the way isn't that a favorite way to represent user-defined entities
                (like when James uses <point/> in the source doc for nxml?)

                X.
                --
                Posté depuis la rue des Prairies
              • Vidar Gundersen
                ... this choice of color requires that users have a background color in emacs, that produces a distinguishable contrast. i wonder about the fontification
                Message 7 of 25 , Oct 3, 2003
                View Source
                • 0 Attachment
                  ===== Original message from James Clark | Wed, 01 Oct 2003:
                  > Emacs might display
                  > ‘[lsquo];
                  > where [lsquo] is in a special face (perhaps a gray background)

                  this choice of color requires that users have a background
                  color in emacs, that produces a distinguishable contrast.

                  i wonder about the fontification colors used with nxml-mode:
                  the colors used here are given colors, not faces defined in
                  emacs, like font-lock-function-name-face (which is default
                  for start tags in psgml-mode)?

                  can i customize colors in nxml?


                  Vidar___
                • Jason Rumney
                  ... They are customizable, but it would be better IMHO if the default values were taken from the closest appropriate font-lock- face. I realise that the
                  Message 8 of 25 , Oct 3, 2003
                  View Source
                  • 0 Attachment
                    Vidar Gundersen wrote:

                    > i wonder about the fontification colors used with nxml-mode:
                    > the colors used here are given colors, not faces defined in
                    > emacs, like font-lock-function-name-face (which is default
                    > for start tags in psgml-mode)?
                    >
                    > can i customize colors in nxml?

                    They are customizable, but it would be better IMHO if the default values
                    were taken from the closest appropriate font-lock- face. I realise that
                    the font-lock face names are not directly applicable to XML, but having
                    nxml-mode colored in the author's preferred color scheme makes it clash
                    with the rest of my color scheme. Comments at least are directly
                    comparable, and it is not a huge leap to see a parallel between
                    function-name and element name, and variable-name and attribute name.
                  • James Clark
                    ... M-x customize-group RET nxml-highlighting-faces RET There are a lot of faces but they use inheritance to make it easier to customize. Everything inherits
                    Message 9 of 25 , Oct 3, 2003
                    View Source
                    • 0 Attachment
                      Vidar Gundersen wrote:

                      > this choice of color requires that users have a background
                      > color in emacs, that produces a distinguishable contrast.
                      >
                      > i wonder about the fontification colors used with nxml-mode:
                      > the colors used here are given colors, not faces defined in
                      > emacs, like font-lock-function-name-face (which is default
                      > for start tags in psgml-mode)?
                      >
                      > can i customize colors in nxml?

                      M-x customize-group RET nxml-highlighting-faces RET

                      There are a lot of faces but they use inheritance to make it easier to
                      customize. Everything inherits from one of delimited data, name, ref,
                      delimiter, text or comment content face.

                      I guess the problem here is that you use a dark background and the
                      default colors are very hard to read. That's a bug: Emacs has a way of
                      defining faces depending on whether the background is light or dark, but
                      I haven't yet taken advantage of this.

                      James
                    • Norman Walsh
                      ... Hash: SHA1 FWIW, I hacked a bit more at my Emacs code for dealing with Unicode: * Added a function to insert characters by Unicode name. Don t remember the
                      Message 10 of 25 , Oct 3, 2003
                      View Source
                      • 0 Attachment
                        -----BEGIN PGP SIGNED MESSAGE-----
                        Hash: SHA1

                        FWIW, I hacked a bit more at my Emacs code for dealing with Unicode:

                        * Added a function to insert characters by Unicode name. Don't
                        remember the ISO entity name for "triple prime"? No worries, hit
                        C-t u type "trip<tab>pr<tab><enter>" and in it goes.

                        * Added a similar function for ISO entity names.

                        * Added a glyph list. Inserting literal Unicode characters is
                        great, if they display properly. If not, I'd rather see the
                        numeric character reference.

                        * If the character occurs in an XML name, then I need the real
                        character even if I can't see it. For those cases, each of the
                        functions takes a prefix arg. In other words, C-u C-t u.

                        * Adapted sgml-input so that it's sensitive to the glyph list. My
                        new xml-input watches what you type and automatically replaces
                        ISO entity names with appropriate characters.

                        In other words, typing é automatically inserts an "e"
                        while typing ‴ inserts ‴ because I don't have a
                        glyph for it in my emacs setup.

                        * The ISO entity names are all table driven; you can use any
                        mneumonics you like.

                        * I added code to construct a real Emacs pull-down menu (in
                        addition to or instead of the pop-up menu) for any special
                        characters that you'd like to access that way.

                        See http://nwalsh.com/emacs/xmlchars/ and/or
                        http://norman.walsh.name/2003/10/03/xmlunicode

                        I have the following in my .emacs file:

                        ;;; XML Characters

                        ;; Two #'s because I've got # bound to quoted-insert
                        (defvar unicode-charref-format "&##x%x;")

                        (setq unicode-character-list-file "/home/ndw/emacs/unichars.el")
                        (load-file "/home/ndw/emacs/xmlunicode.el")

                        (defun bind-nxml-mode-keys ()
                        (set-language-environment "utf-8")
                        (define-key nxml-mode-map "\"" 'unicode-smart-double-quote)
                        (define-key nxml-mode-map "\'" 'unicode-smart-single-quote)
                        (define-key nxml-mode-map [menu-bar unichar]
                        (cons "UniChar" unicode-character-menu-map))
                        (set-input-method 'xml))

                        (add-hook 'nxml-mode-hook 'rng-validate-mode)
                        (add-hook 'nxml-mode-hook 'bind-nxml-mode-keys)

                        (define-key ctl-t-map "c" 'unicode-character-menu-insert)
                        (define-key ctl-t-map "e" 'unicode-character-shortcut-insert)
                        (define-key ctl-t-map "u" 'unicode-character-insert)
                        (define-key ctl-t-map "i" 'iso8879-character-insert)

                        ;;; End of XML Characters

                        Be seeing you,
                        norm

                        - --
                        Norman Walsh <normyahoo@...> | The Future is something which
                        http://nwalsh.com/ | everyone reaches at the rate of
                        | sixty minutes an hour, whatever
                        | he does, whoever he is.--C. S.
                        | Lewis
                        -----BEGIN PGP SIGNATURE-----
                        Version: GnuPG v1.2.3 (GNU/Linux)
                        Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

                        iD8DBQE/fa2kOyltUcwYWjsRAt+lAJ0TJxQW0749D14fBG2xP2v2EtuI0QCfQc5+
                        14LPm+O+SLNjoR7RKGaboyc=
                        =i7Q5
                        -----END PGP SIGNATURE-----
                      • Norman Walsh
                        ... Hash: SHA1 I know almost nothing about how fontification works, so this may be impractical, but... Is there any possibility of adding per-namespace colors?
                        Message 11 of 25 , Oct 3, 2003
                        View Source
                        • 0 Attachment
                          -----BEGIN PGP SIGNED MESSAGE-----
                          Hash: SHA1

                          I know almost nothing about how fontification works, so this may be
                          impractical, but...

                          Is there any possibility of adding per-namespace colors? About the
                          only feature of xsl-ide that I miss is the ability to distinguish
                          between XSL instructions and literal result elements by color.

                          I suppose per-prefix, rather than true per-namespace, fontification
                          would be a reasonable fallback.

                          Be seeing you,
                          norm

                          - --
                          Norman Walsh <normyahoo@...> | Look for the ridiculous in
                          http://nwalsh.com/ | everything and you will find
                          | it.--Jules Renard
                          -----BEGIN PGP SIGNATURE-----
                          Version: GnuPG v1.2.3 (GNU/Linux)
                          Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

                          iD8DBQE/fbDZOyltUcwYWjsRAhFjAJwJ+DBxQ+rYitW++ezqk9/EbvZ01gCdGYWv
                          hjzd7bmOnhuMFRWhqAqVW50=
                          =M7Nm
                          -----END PGP SIGNATURE-----
                        • Xavier Cazin
                          ... Is unichars.el the same as the xmlchars.el you provide at http://nwalsh.com/emacs/xmlchars? When I replace the filenames by those where I saved xmlchars.el
                          Message 12 of 25 , Oct 3, 2003
                          View Source
                          • 0 Attachment
                            Norman Walsh <normyahoo@...> writes:

                            > ;;; XML Characters
                            >
                            > ;; Two #'s because I've got # bound to quoted-insert
                            > (defvar unicode-charref-format "&##x%x;")
                            >
                            > (setq unicode-character-list-file "/home/ndw/emacs/unichars.el")
                            > (load-file "/home/ndw/emacs/xmlunicode.el")

                            Is unichars.el the same as the xmlchars.el you provide at
                            http://nwalsh.com/emacs/xmlchars?

                            When I replace the filenames by those where I saved xmlchars.el and
                            xmlunicode.el, evaluating the code above complains that:

                            Debugger entered--Lisp error: (void-variable unicode-character-list)
                            (let ((ulist unicode-character-list)) (setq unicode-character-alist (list ...)) (setq ulist (cdr ulist)) (while ulist (nconc unicode-character-alist ...) (setq ulist ...)))
                            eval-buffer(#<buffer *load*> nil "/tmp/xmlunicode.el" nil t)
                            load-with-code-conversion("/tmp/xmlunicode.el" "/tmp/xmlunicode.el" nil nil)
                            load("/tmp/xmlunicode.el" nil nil t)
                            [...]

                            FWIW, my emacs version is 21.2.1.

                            -- Xavier.
                          • James Clark
                            At Fri, 03 Oct 2003 13:24:41 -0400, ... Per-namespace URI is not practical. How about distinguishing two kinds of elements, those that have the same prefix as
                            Message 13 of 25 , Oct 3, 2003
                            View Source
                            • 0 Attachment
                              At Fri, 03 Oct 2003 13:24:41 -0400,
                              Norman Walsh wrote:
                              > I know almost nothing about how fontification works, so this may be
                              > impractical, but...
                              >
                              > Is there any possibility of adding per-namespace colors? About the
                              > only feature of xsl-ide that I miss is the ability to distinguish
                              > between XSL instructions and literal result elements by color.
                              >
                              > I suppose per-prefix, rather than true per-namespace, fontification
                              > would be a reasonable fallback.

                              Per-namespace URI is not practical. How about distinguishing two
                              kinds of elements, those that have the same prefix as the root element
                              and those that do not? For attributes, attributes with no prefix
                              would be treated the same as their parent element; attributes with a
                              non-null prefix would be treated according to whether that prefix was
                              the same as the root element prefix. That would be simple, but would
                              be useful for a lot of vocabularies.

                              James
                            • Norman Walsh
                              ... Hash: SHA1 ... I think that would be sufficient at least 80% of the time. ... Absolutely! Be seeing you, norm - -- Norman Walsh |
                              Message 14 of 25 , Oct 4, 2003
                              View Source
                              • 0 Attachment
                                -----BEGIN PGP SIGNED MESSAGE-----
                                Hash: SHA1

                                / James Clark <jjc@...> was heard to say:
                                | Per-namespace URI is not practical. How about distinguishing two
                                | kinds of elements, those that have the same prefix as the root element
                                | and those that do not?

                                I think that would be sufficient at least 80% of the time.

                                | For attributes, attributes with no prefix
                                | would be treated the same as their parent element; attributes with a
                                | non-null prefix would be treated according to whether that prefix was
                                | the same as the root element prefix. That would be simple, but would
                                | be useful for a lot of vocabularies.

                                Absolutely!

                                Be seeing you,
                                norm

                                - --
                                Norman Walsh <normyahoo@...> | The worst enemy of life, freedom
                                http://nwalsh.com/ | and the common decencies is total
                                | anarchy; their second worst enemy
                                | is total efficiency.--Aldous
                                | Huxley
                                -----BEGIN PGP SIGNATURE-----
                                Version: GnuPG v1.2.3 (GNU/Linux)
                                Comment: Processed by Mailcrypt 3.5.8 <http://mailcrypt.sourceforge.net/>

                                iD8DBQE/fuWnOyltUcwYWjsRAuDJAKCQ0pUGbxigJRp51jjMOR/cX6hhAgCgsgsG
                                WQV3MSFrrULunqxYj8xQLio=
                                =Uknz
                                -----END PGP SIGNATURE-----
                              Your message has been successfully submitted and would be delivered to recipients shortly.