Loading ...
Sorry, an error occurred while loading the content.
 

Re: [emacs-nxml-mode] rendering PCDATA in xml documents

Expand Messages
  • Eric Chastan
    Hello All, Josh, I don t understand why you said that it s not the job of an xml editor to handle PCDATA. I think that it is important for such tool to render
    Message 1 of 18 , Feb 23, 2004

      Hello All,

      Josh, I don't understand why you said that it's not the job of an xml editor to handle PCDATA. I think that it is important for such tool to render all text with a friendly layout because a lot of xml files a quiet obscure to read and a good layout helps a lot.
      It is true that in my mail I spoke only about "code" but it in fact it could be every kind of text.
      I think that nxlm is really better than psgml and with the ability to render the full text in a pretty way it could be even better.

      You spoke about mmm-mode did you really try it? mmm-mode is a good tool but it has a lot of drawbacks, one of them is that mmm-mode used overlays intensively and this leads to problem when there is are a lot of embedded sections.

      Further thought about this extension let me think about something like this :
      - each time nxml finds an opening tag for an element the parser can look in a list to see if this element is associated with a function to call.
      - if it's the case nxml just let this function parsing the element. The function parses the element up to the end of the closing tag.
      There are a lot of pending question like how to handle attributes.

      I don't have enough time for the moment to work on it and also such job can't be done without the help and the agreement of James Clark himself.

      James, if you heard us what do you think about this discussion ?


      Eric.

      Josh Sled wrote:
      On Mon, Jan 26, 2004 at 11:00:45AM +0100, Eric Chastan wrote:

      |    I wonder if it could be possible to write some extension around nXml in
      |    order to render code embeded in xml . Of course I think about javascript
      |    , php or jsp embeded in htlm but also all other PCDATA   in xlm that
      |    needs special indentation or special font rendering.

      It's not the job of nxml to handle such things.

      Take a look at mmm-mode [multi-mode-mode --
      http://mmm-mode.sourceforge.net/], which lets you define expressions
      that tell emacs to switch out of one more [nxml-mode] and into another
      [php-mode]...

      ...jsled

    • James Clark
      ... If you ve got an XML file containing JavaScript, it seems like a very reasonable to want to be able to use the facilities of javascript mode to edit the
      Message 2 of 18 , Jul 25 10:18 PM
        On Mon, 2004-02-23 at 17:08, Eric Chastan wrote:
        > Hello All,
        >
        > Josh, I don't understand why you said that it's not the job of an xml
        > editor to handle PCDATA. I think that it is important for such tool to
        > render all text with a friendly layout because a lot of xml files a
        > quiet obscure to read and a good layout helps a lot.
        > It is true that in my mail I spoke only about "code" but it in fact it
        > could be every kind of text.
        > I think that nxlm is really better than psgml and with the ability to
        > render the full text in a pretty way it could be even better.

        If you've got an XML file containing JavaScript, it seems like a very
        reasonable to want to be able to use the facilities of javascript mode
        to edit the embedded JavaScript, and similarly for other kinds of PCDATA
        which have a specialized mode.

        > You spoke about mmm-mode did you really try it? mmm-mode is a good
        > tool but it has a lot of drawbacks, one of them is that mmm-mode used
        > overlays intensively and this leads to problem when there is are a lot
        > of embedded sections.

        I haven't yet tried mmm-mode.

        I can see a couple of problems with making a general purpose mode work
        for XML:

        a) I want to be able to specify which mode is used for PCDATA at the XML
        level rather than in terms of regexes in the buffer. For example, I want
        to be able to specify that the content of an element with a specific
        namespace URI and local name should use a particular mode.

        b) I don't want to be forced to use CDATA sections. I want Emacs to
        understand that the JavaScript code isn't simply a substring of the XML
        buffer, but rather a substring of the buffer after substitution of
        character/entity references. This seems not so easy. Perhaps you could
        have a separate, temporary buffer for each PCDATA fragment, which would
        use the appropriate mode for that fragment. You would arrange you could
        edit either the XML or the temporary buffer and the temporary buffer
        would always be equal to the result of replacing character/entity
        references in the corresponding fragment of XML. Then you would have a
        command in XML mode to switch to the temporary buffer, and I guess a
        minor mode in the temporary buffer to maintain synchronization with the
        XML and to provide a command to switch back to the XML. Does mmm-mode
        deal with the escaping issue?

        What kind of UI would people like to see for dealing with embedded
        PCDATA which has its own Emacs major mode?

        James
        --
        To send me mail, replace auth-only by public in the from address.
      • drkm
        ... I think it doesn t. ... What does mean UI ? --drkm, en recherche d un stage : http://www.fgeorges.org/ipl/stage.html
        Message 3 of 18 , Jul 26 6:44 AM
          James Clark <jjc@...> writes:

          > Does mmm-mode
          > deal with the escaping issue?

          I think it doesn't.

          > What kind of UI would people like to see for dealing with embedded
          > PCDATA which has its own Emacs major mode?

          What does mean UI ?

          --drkm, en recherche d'un stage : http://www.fgeorges.org/ipl/stage.html
        • James Clark
          ... User interface. James
          Message 4 of 18 , Jul 26 10:05 AM
            On Mon, 2004-07-26 at 20:44, drkm wrote:

            > > What kind of UI would people like to see for dealing with embedded
            > > PCDATA which has its own Emacs major mode?
            >
            > What does mean UI ?

            User interface.

            James
          • Peter Heslin
            ... For want it s worth, I ve written a package called nxml-script.el to help with this. It uses narrowing rather than mmm-mode, and it s nothing fancy, but
            Message 5 of 18 , Jul 28 2:44 AM
              On 2004-07-26, James Clark <jjc@...> wrote:
              > If you've got an XML file containing JavaScript, it seems like a very
              > reasonable to want to be able to use the facilities of javascript mode
              > to edit the embedded JavaScript, and similarly for other kinds of PCDATA
              > which have a specialized mode.

              For want it's worth, I've written a package called nxml-script.el to
              help with this. It uses narrowing rather than mmm-mode, and it's
              nothing fancy, but it works for me.

              You can find it here:
              http://www.dur.ac.uk/p.j.heslin/emacs/download/nxml-script.el

              > I haven't yet tried mmm-mode.

              I found it to be very brittle. I tried to get mmm-mode working with
              nxml and failed, which led to my writing nxml-script.el.

              The current Emacs etc/TODO file says this:

              ** Implement a clean way to use different major modes for
              different parts of a buffer. This could be useful in editing
              Bison input files, for instance, or other kinds of text
              where one language is embedded in another language.

              This implies to me that the Emacs maintainers do not regard the
              current implementation of mmm-mode as "clean" and would like to
              provide something better.

              I would be wary of having nxml-mode depend on a third-party package
              that is notoriously fiddly, and that is implicitly deprecated.

              >
              > I can see a couple of problems with making a general purpose mode work
              > for XML:
              >
              > a) I want to be able to specify which mode is used for PCDATA at the XML
              > level rather than in terms of regexes in the buffer. For example, I want
              > to be able to specify that the content of an element with a specific
              > namespace URI and local name should use a particular mode.
              >
              > b) I don't want to be forced to use CDATA sections. I want Emacs to
              > understand that the JavaScript code isn't simply a substring of the XML
              > buffer, but rather a substring of the buffer after substitution of
              > character/entity references. This seems not so easy. Perhaps you could
              > have a separate, temporary buffer for each PCDATA fragment, which would
              > use the appropriate mode for that fragment. You would arrange you could
              > edit either the XML or the temporary buffer and the temporary buffer
              > would always be equal to the result of replacing character/entity
              > references in the corresponding fragment of XML. Then you would have a
              > command in XML mode to switch to the temporary buffer, and I guess a
              > minor mode in the temporary buffer to maintain synchronization with the
              > XML and to provide a command to switch back to the XML. Does mmm-mode
              > deal with the escaping issue?

              I very much doubt mmm-mode deals with escaping. Here's an idea
              suggested by the implementation of nxml-script.el. You have a
              function that narrows the buffer to the content of the element,
              unescapes it, and switches to the relevant major mode. Then another
              function escapes the narrowed text, widens to the whole buffer, and
              switches back to nxml-mode.

              It's not ideal, but better than temporary buffers, I think -- no
              synchronization issues.

              >
              > What kind of UI would people like to see for dealing with embedded
              > PCDATA which has its own Emacs major mode?

              It may be that, since support for this sort of multiple major-mode
              functionality is marginal in Emacs, the implementation will be
              constrained by what is possible to achieve cleanly.

              Peter
            • david.pawson@rnib.org.uk
              ... From: James Clark What kind of UI would people like to see for dealing with embedded PCDATA which has its own Emacs major mode? How about a smart way to
              Message 6 of 18 , Jul 28 5:09 AM
                -----Original Message-----
                From: James Clark

                What kind of UI would people like to see for dealing with
                embedded PCDATA which has its own Emacs major mode?

                How about a 'smart' way to change modes for an already installed mode?
                Which would completely leave nxml-mode, run as needed for the embedded,
                then have some way to 'return' to the nxml-mode?

                Worst case
                M-x jscript-mode
                ....
                M-x nxml-mode

                Is that really too hard?

                regards DaveP

                ** snip here **

                --
                DISCLAIMER:

                NOTICE: The information contained in this email and any attachments is
                confidential and may be privileged. If you are not the intended
                recipient you should not use, disclose, distribute or copy any of the
                content of it or of any attachment; you are requested to notify the
                sender immediately of your receipt of the email and then to delete it
                and any attachments from your system.

                RNIB endeavours to ensure that emails and any attachments generated by
                its staff are free from viruses or other contaminants. However, it
                cannot accept any responsibility for any such which are transmitted.
                We therefore recommend you scan all attachments.

                Please note that the statements and views expressed in this email and
                any attachments are those of the author and do not necessarily represent
                those of RNIB.

                RNIB Registered Charity Number: 226227

                Website: http://www.rnib.org.uk
              • drkm
                ... I tried a little bit MMM Mode, and it seems to be not so bad. I can t see a clean way to use different major modes without some support in 1/ Emacs Lisp
                Message 7 of 18 , Jul 28 9:52 AM
                  Peter Heslin <usenet@...> writes:

                  > On 2004-07-26, James Clark <jjc@...> wrote:

                  >> I haven't yet tried mmm-mode.

                  > I found it to be very brittle. I tried to get mmm-mode working with
                  > nxml and failed, which led to my writing nxml-script.el.

                  > The current Emacs etc/TODO file says this:

                  > ** Implement a clean way to use different major modes for
                  > different parts of a buffer. This could be useful in editing
                  > Bison input files, for instance, or other kinds of text
                  > where one language is embedded in another language.

                  > This implies to me that the Emacs maintainers do not regard the
                  > current implementation of mmm-mode as "clean" and would like to
                  > provide something better.

                  I tried a little bit MMM Mode, and it seems to be not so bad.

                  I can't see a clean way to use different major modes without some
                  support in 1/ Emacs Lisp and 2/ in modes in general :

                  1/ I suppose supporting multiple major modes in the same buffer
                  requires a new kind of variable. Like the buffer-local ones
                  used for now to implement modes.

                  2/ The narrow/wide mecanism require that code that doesn't use it
                  take care about a few things. Don't use (goto-char 0), but
                  (goto-char (point-min)). The same way, I think having multiple
                  major modes in the same buffer requires modifications on some
                  existing code, and how define modes.

                  I don't think MMM Mode is so bad, but it make what it can.

                  --drkm, en recherche d'un stage : http://www.fgeorges.org/ipl/stage.html
                • drkm
                  ... I never use the Peter s nxml-script package, and I tried MMM Mode only a little bit. I think there are two major orientations, corresponding respectively
                  Message 8 of 18 , Jul 28 10:38 AM
                    James Clark <jjc@...> writes:

                    > On Mon, 2004-07-26 at 20:44, drkm wrote:

                    >> > What kind of UI would people like to see for dealing with embedded
                    >> > PCDATA which has its own Emacs major mode?

                    >> What does mean UI ?

                    > User interface.

                    I never use the Peter's nxml-script package, and I tried MMM Mode
                    only a little bit. I think there are two major orientations,
                    corresponding respectively to nxml-script (after the description Peter
                    made here) and MMM Mode.

                    The first one, corresponding to nxml-script if I didn't
                    misunderstand Peter, is to switch explicitely between the two modes.
                    And eventually warrow to the submode region, or yank it to a temporary
                    buffer.

                    The second one, corresponding to MMM Mode, is to make all in place.
                    The different submode regions have different font locking,
                    indentation, syntax tables, keymaps, etc. I think this is the most
                    intuitive. You have to do nothing, and in function of your position,
                    you edit code in one or other mode. And you always view code
                    highlighted the good way.

                    But as I said in an other post, I think this is difficult (if not
                    impossible) to implement rigorusly without support in Emacs Lisp and
                    other modes. But I think MMM Mode prove it is faisible not so bad.

                    The advantage of the other orientation (as in nxml-script) is that
                    the switch points are privileged points where we can do some
                    computation (as encoding/decoding). I think this is the most simple
                    to implement.

                    In all way, we have to rely on file names and file local variables
                    to activate some specific submodes support.

                    I suppose we can do some work on nxml-script to enhance it, and in
                    parallel defining some MMM classes (MMM classes define when activate
                    some submode support in a buffer, the delimitation strings, what to
                    do, etc.).

                    So we will have two ways. One requiring switching between modes,
                    but probably more robust. The other more intuitive and usable, but
                    IMHO not so robust as can be the other way. The user will use
                    normally the second way, but can use the first one if he have some
                    trouble.

                    Peter, can you verify I didn't say errors about nxml-script ? And
                    maybe precise some points.

                    --drkm, en recherche d'un stage : http://www.fgeorges.org/ipl/stage.html
                  • Peter Heslin
                    ... What you said is correct, and I agree with your assessment entirely. It would be better if we had an mmm-mode style implementation. The only problem is
                    Message 9 of 18 , Jul 28 1:33 PM
                      On 2004-07-28, drkm <darkman_spam@...> wrote:
                      > So we will have two ways. One requiring switching between modes,
                      > but probably more robust. The other more intuitive and usable, but
                      > IMHO not so robust as can be the other way. The user will use
                      > normally the second way, but can use the first one if he have some
                      > trouble.
                      >
                      > Peter, can you verify I didn't say errors about nxml-script ? And
                      > maybe precise some points.

                      What you said is correct, and I agree with your assessment entirely.
                      It would be better if we had an mmm-mode style implementation. The
                      only problem is that this functionality is not currently supported in
                      the official Emacs distribution.

                      The only real advantage of narrowing/widening the buffer and switching
                      major-modes is that it can be done pretty easily and robustly (I am
                      supposing). I agree that the UI is not as nice. That's why I said
                      that the UI may depend on what it is possible to implement cleanly.

                      Peter
                    • Vincent Lefevre
                      ... The advantage is that some form of decoding can be performed before yanking it to a temporary buffer, making the text more readable. And after editing,
                      Message 10 of 18 , Jul 28 1:46 PM
                        On 2004-07-28 19:38:46 +0200, drkm wrote:
                        > The first one, corresponding to nxml-script if I didn't
                        > misunderstand Peter, is to switch explicitely between the two modes.
                        > And eventually warrow to the submode region, or yank it to a temporary
                        > buffer.

                        The advantage is that some form of decoding can be performed before
                        yanking it to a temporary buffer, making the text more readable. And
                        after editing, reencoding can be performed before the text is put
                        back to the original buffer. This would be a bit like po files are
                        edited.

                        --
                        Vincent Lefèvre <vincent@...> - Web: <http://www.vinc17.org/>
                        100% validated (X)HTML - Acorn / RISC OS / ARM, free software, YP17,
                        Championnat International des Jeux Mathématiques et Logiques, etc.
                        Work: CR INRIA - computer arithmetic / SPACES project at LORIA
                      • Peter Heslin
                        ... Yes, and presumably the same kind of escaping/un-escaping could be done when widening/narrowing, if you wanted to implement it that way. Peter
                        Message 11 of 18 , Jul 28 2:03 PM
                          On 2004-07-28, Vincent Lefevre <vincent@...> wrote:
                          > The advantage is that some form of decoding can be performed before
                          > yanking it to a temporary buffer, making the text more readable. And
                          > after editing, reencoding can be performed before the text is put
                          > back to the original buffer. This would be a bit like po files are
                          > edited.

                          Yes, and presumably the same kind of escaping/un-escaping could be
                          done when widening/narrowing, if you wanted to implement it that way.

                          Peter
                        • drkm
                          ... Yes. It s what I mean when I wrote that switching points are privileged place to perform some tasks. More generaly, I think it s also more easy to
                          Message 12 of 18 , Jul 28 2:22 PM
                            Vincent Lefevre <vincent@...> writes:

                            > On 2004-07-28 19:38:46 +0200, drkm wrote:

                            >> The first one, corresponding to nxml-script if I didn't
                            >> misunderstand Peter, is to switch explicitely between the two modes.
                            >> And eventually warrow to the submode region, or yank it to a temporary
                            >> buffer.

                            > The advantage is that some form of decoding can be performed before
                            > yanking it to a temporary buffer, making the text more readable. And
                            > after editing, reencoding can be performed before the text is put
                            > back to the original buffer.

                            Yes. It's what I mean when I wrote that switching points are
                            privileged place to perform some tasks. More generaly, I think it's
                            also more easy to setting up the context of the submode mode
                            precisely, in a clean way.

                            > This would be a bit like po files are
                            > edited.

                            PO files. Mmm ... It's related to gettext, it isn't ? I don't
                            know how they are edited. I tried open a "test.po" file, but the mode
                            was text-mode. I didn't find any po-mode or gette* functions. What
                            do you mean, when you speak about PO files ?

                            --drkm, en recherche d'un stage : http://www.fgeorges.org/ipl/stage.html
                          • Vincent Lefevre
                            ... There s a po mode in Debian, provided by the gettext-el package. When a po file is edited, it is in fact marked as read-only, and the user can make a
                            Message 13 of 18 , Jul 28 2:41 PM
                              On 2004-07-28 23:22:53 +0200, drkm wrote:
                              > PO files. Mmm ... It's related to gettext, it isn't ? I don't
                              > know how they are edited. I tried open a "test.po" file, but the mode
                              > was text-mode. I didn't find any po-mode or gette* functions. What
                              > do you mean, when you speak about PO files ?

                              There's a po mode in Debian, provided by the gettext-el package.
                              When a po file is edited, it is in fact marked as read-only, and
                              the user can make a change / new translation by typing [Return]:
                              this opens a new Emacs window below the main one in fundamental
                              mode. When there is a double quote (") in a message, it must be
                              escaped with a backslash, as shown in the main window. But in
                              the temporary buffer, the message appears decoded: the double
                              quote isn't escaped. Ditto for tab characters (encoded as \t in
                              the po file). When the user has finished editing the message,
                              he types C-c C-c to return to the main window, and Emacs encodes
                              the double quote and tab characters as expected.

                              Something similar could be done for scripts embedded in XML, where
                              some characters must be encoded / escaped.

                              --
                              Vincent Lefèvre <vincent@...> - Web: <http://www.vinc17.org/>
                              100% validated (X)HTML - Acorn / RISC OS / ARM, free software, YP17,
                              Championnat International des Jeux Mathématiques et Logiques, etc.
                              Work: CR INRIA - computer arithmetic / SPACES project at LORIA
                            • drkm
                              ... Yes, I think is a lack in Emacs. But as I said, I think adding it to Emacs would be a non trivial task, and would require modifications in the emacs/lisp
                              Message 14 of 18 , Jul 28 3:33 PM
                                Peter Heslin <usenet@...> writes:

                                > On 2004-07-28, drkm <darkman_spam@...> wrote:

                                >> So we will have two ways. One requiring switching between modes,
                                >> but probably more robust. The other more intuitive and usable, but
                                >> IMHO not so robust as can be the other way. The user will use
                                >> normally the second way, but can use the first one if he have some
                                >> trouble.

                                >> Peter, can you verify I didn't say errors about nxml-script ? And
                                >> maybe precise some points.

                                > What you said is correct, and I agree with your assessment entirely.
                                > It would be better if we had an mmm-mode style implementation. The
                                > only problem is that this functionality is not currently supported in
                                > the official Emacs distribution.

                                Yes, I think is a lack in Emacs. But as I said, I think adding it
                                to Emacs would be a non trivial task, and would require modifications
                                in the emacs/lisp directory ... I don't read the Emacs devel ML. I
                                don't know if someone work on this.

                                > The only real advantage of narrowing/widening the buffer and switching
                                > major-modes is that it can be done pretty easily and robustly (I am
                                > supposing).

                                I think too.

                                > I agree that the UI is not as nice.

                                Well, it's not a lot of work to switch to submodes. And with a
                                alternate binding to a function key, for example, it could be very
                                simple to use.

                                > That's why I said
                                > that the UI may depend on what it is possible to implement cleanly.

                                I think it's why we have to do two things. A clean way, like
                                nxml-script, and writing MMM classes (because MMM Mode provide enough
                                functionalities to be useable, I think).

                                --drkm, en recherche d'un stage : http://www.fgeorges.org/ipl/stage.html
                              • drkm
                                ... Yes. Using a temporary buffer or narrow/wide are fundamentaly equivalent. I mean they require a trigger (an interactive function). This trigger can use a
                                Message 15 of 18 , Jul 28 3:47 PM
                                  Peter Heslin <usenet@...> writes:

                                  > On 2004-07-28, Vincent Lefevre <vincent@...> wrote:

                                  >> The advantage is that some form of decoding can be performed before
                                  >> yanking it to a temporary buffer, making the text more readable. And
                                  >> after editing, reencoding can be performed before the text is put
                                  >> back to the original buffer. This would be a bit like po files are
                                  >> edited.

                                  > Yes, and presumably the same kind of escaping/un-escaping could be
                                  > done when widening/narrowing, if you wanted to implement it that way.

                                  Yes. Using a temporary buffer or narrow/wide are fundamentaly
                                  equivalent. I mean they require a trigger (an interactive function).
                                  This trigger can use a temporary buffer, narrow, decoding, etc.

                                  --drkm, en recherche d'un stage : http://www.fgeorges.org/ipl/stage.html
                                • drkm
                                  Vincent Lefevre writes: [about editing PO files] Ok. It s what I thought about. ... BTW, more than provide an easy way to do
                                  Message 16 of 18 , Jul 28 4:06 PM
                                    Vincent Lefevre <vincent@...> writes:

                                    [about editing PO files]

                                    Ok. It's what I thought about.

                                    > Something similar could be done for scripts embedded in XML, where
                                    > some characters must be encoded / escaped.

                                    BTW, more than provide an easy way to do {en,de}coding, this way (as
                                    opposite to the MMM Mode way) make it clear to {en,de}code. In the
                                    MMM Mode way, it's not so clear to do or not. In MMM Mode way, you
                                    always see the entire XML document, so it may be confusing to decode
                                    "<" and co., IMHO.

                                    --drkm, en recherche d'un stage : http://www.fgeorges.org/ipl/stage.html
                                  Your message has been successfully submitted and would be delivered to recipients shortly.