Loading ...
Sorry, an error occurred while loading the content.
 

RE: [xml-doc] XML and technical writing

Expand Messages
  • Sebastian Rahtz
    ... No, DTDs and Schemes define the structure of XML files, they do not describe the data. To say that an may contain a and a , both of which should
    Message 1 of 17 , Feb 19, 2003
      On Wed, 2003-02-19 at 10:46, Sean Wheller wrote:

      > > true of most schemes. its not inherent in the DTD, but the support
      > > package
      >
      > This is a very cryptic statement. But if I decrypt it correctly, then I
      > cannot agree with that statement. My understanding is that DTD/Scheme's are
      > vocabularies to describe data stored in xml files, not how to display it.

      No, DTDs and Schemes define the structure of XML files, they do not
      describe the data. To say that an <a> may contain a <b> and a <c>, both
      of which should contain integer numbers, says absolutely nothing about
      what that data might represent

      > The Opening statement of the OEBPS says "The Open eBook Publication
      > Structure (OEBPS) is an open, non-proprietary, XML-based specification for
      > the content, structure, and presentation of electronic books."
      >
      > OEBPS is also concerned with presentation, something most DTD/Schemas are
      > not concerned with "provides a great deal of new functionality in the area
      > of presentation control, including, among other things, improvements in the
      > Basic markup vocabulary (now a pure subset of XHTML 1.1), and greatly
      > expanded CSS support."

      yes, OEB is about much more than a DTD, its about a system,
      and an assumption of meaning of XML elements


      > Though eBook has its uses as does DocBook, TEI and the DTD
      > provided with DITA. They are just a tools. I would just like to be certain
      > of using the tool that enables me to obtain any format, electronic or print.

      and I think that is largely orthoganal to which DTD you use. you
      want a processing system, not just a DTD.

      > What I think the world wants is a format that can be used to obtain all
      > subsets. I am convinced that DocBook does not do that and I continue to
      > explore new options. I want a DTD that enables support across all
      > applications

      and I claim that DTDs have little or no influence on the processing
      application

      > . So a document can be part USUPTO,
      > part NewsML, part DocBook, part DITA, part TEI, part VoXML, part WSDL, part
      > WML, part any other DTD or subset and still validate.

      fine. use schemas and
      namespaces, and you can create something that is technically a valid XML
      file. and then what? if your down-wind application cannot grok all those
      elements, you are nowhere.

      what I am saying is "defining a vocabulary does not give you an end-user
      application"

      --
      Sebastian Rahtz OUCS Information Manager
      13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
    • Ian Tindale
      ... Sorry, can t help there - technical documents ain t my bag, but it would seem from a distance that the sort of documentation that technical people create
      Message 2 of 17 , Feb 19, 2003
        On Monday 17 February 2003 08:25, Sean Wheller wrote:
        > Ian,
        >
        > Other than DocBook, DITA, TEI DTD's, can you suggest others that may work
        > well for Technical Documents. I have not found any for Technical Authoring
        > purpose and so, like thousands of others, reference a known standard with a
        > large support base of users. Mainly DocBook.

        Sorry, can't help there - technical documents ain't my bag, but it would seem
        from a distance that the sort of documentation that technical people create
        is a fairly close fit to the sort of schemes that can encapsulate that kind
        of editorial, as they're invariably created by quite similar kinds of people
        with similar mindsets. It's almost as if, and it wouldn't surprise me to find
        to be the case, that an awful lot of DocBook is used to write about DocBook
        (which I know is not the case, but it seems that way sometimes).

        Or, to put it another way, technical authors are just precisely the sort of
        people who would have come up with the concepts behind something like DocBook
        in the first place, first of all - others in other fields would have
        followed, because not everybody spends their time submersed in DTDs, XML, or
        even a conceptual awareness of the structure (if any) of their own subject
        area. That's what I mean about taking a look at DocBook, seeing how it
        approaches what it does, then roll your own.


        > DocBook was not designed for magazine publishing, story writing or poetry.
        > It is for computer based materials, the format of which is by nature very
        > structured. As there is a lack of DTD's in the niches you mention people do
        > tend to use DocBook for other purposes. I am not sure this is the solution,
        > but it certainly can serve as a base with which to build new DTD's.

        Exactly. And to add to that, I'd add that because DocBook became predominant
        in it's own area pretty soonish, it's had time to work out paradigm bugs and
        general approaches, and by experts in documentation and document types. Other
        'normal' people in the rest of the world of topics are less likely to be as
        well versed in how to knock up a DTD or even how to factor down their own
        problem space - typically because they spend all their lives in it, so they
        never get to see the architectural structure of their own expertise from the
        outside. In a way, it was easier for the technical documentation
        practitioners, because they're only next door to the concept of literate
        markup and factoring of topic structure. Other people live further away, so
        the job is potentially harder - unless, like I say, they derive influence
        from DocBook and roll their own.


        > For more creative tasks I think that XML still has some way to go. I would
        > be interested in hearing of any other DTD's technical or creative. I also
        > like to keep tabs on eBook, but the problem is that it is very e-centric
        > and I don't think everyone will standardize on a single display format. So
        > for now, DocBook let's me transform to formats of all types, including
        > print.


        I'd say it's got to get to that stage, and soonish, otherwise it simply
        becomes a (admittedly widespread) domain of wizardry and esoterica. Well,
        maybe not that bad, but, well, the facility to feed topic in and get document
        structure out the other end in the form of a document type (is there a more
        abstract term for a topic or subject architecture or construction than by
        directly referring to it as a DTD or Schema?) thingy.

        I mean, to give an example - if I were to create a special editorial
        encapsulation language for my own CV for job applications it might consist of
        a handful of elements: <name/> <contact-details/> and <job-description/>
        <bullshit/> - which is a pretty simple structure in and of itself.

        I very much advocate a roll your own stance for each individual need, but at
        the same time, I very much want mistakes that have already been made and
        ironed out in other cases elsewhere in time and space to be learned from,
        somehow. I suppose there might be a form of 'best practice' (I hate that term
        - why isn't there a 'worst practice'?) approach to factoring down a topic
        area into something that fits like a glove in places where DocBook clearly
        wont (ie, the rest of the world). It's all good stuff.
        --
        Ian Tindale
      • Ian Tindale
        Incidentally, is anyone here also on the newsml yahoo group? -- Ian Tindale
        Message 3 of 17 , Feb 19, 2003
          Incidentally, is anyone here also on the newsml yahoo group?

          --
          Ian Tindale
        • Jon Noring
          [Post to XML-Doc] ... It may be easier to send men to Mars than to come up with The One True and Universal Textual Markup XML Vocabulary ! As a
          Message 4 of 17 , Feb 19, 2003
            [Post to XML-Doc]

            Sam Wheller wrote to XML-Doc:

            > The Opening statement of the OEBPS says "The Open eBook Publication
            > Structure (OEBPS) is an open, non-proprietary, XML-based specification for
            > the content, structure, and presentation of electronic books."
            >
            > OEBPS is also concerned with presentation, something most DTD/Schemas are
            > not concerned with "provides a great deal of new functionality in the area
            > of presentation control, including, among other things, improvements in the
            > Basic markup vocabulary (now a pure subset of XHTML 1.1), and greatly
            > expanded CSS support."
            >
            > As a subset of XHTML, the OEBPS is more about like it's HTML parent, a
            > language to define how a user agent will display data. Though as an XHTML
            > subset it conforms to the XML compliance list, mainly it has to be well
            > formed. In the opening of the spec "Purpose and Scope" the document states
            > "In order for electronic-book technology to achieve widespread success in
            > the marketplace, Reading Systems must have convenient access to a large
            > number and variety of titles. The Open eBook Publication Structure (OEBPS)
            > is a specification for representing the content of electronic books."
            >
            > It continues to define the relationship to other technologies.
            >
            > <quote>
            > This specification combines subsets and applications of other
            > specifications. Together, these facilitate the construction, organization,
            > presentation, and unambiguous interchange of electronic documents: ...
            >
            > The definition of these relationships makes OEBPS very niche specific,
            > electronic, electronic, electronic. This is fine and I do agree that it has
            > its place, but I think I would like to find a DTD/schema that is at the top
            > of the system. Though eBook has its uses as does DocBook, TEI and the DTD
            > provided with DITA. They are just a tools. I would just like to be certain
            > of using the tool that enables me to obtain any format, electronic or print.
            > I think I am saying that I see OEBPS as an optional presentational format,
            > just like HTML, only once you are in OEBPS its hard to move up the chain to
            > the parent format.
            >
            > What I think the world wants is a format that can be used to obtain all
            > subsets. I am convinced that DocBook does not do that and I continue to
            > explore new options. I want a DTD that enables support across all
            > applications, in effect a merger of all existing DTD/Scheme's and a few more
            > that I can think of. The trick is how to bring all this together and yet
            > still manageable. The answer I think lies in a loose technology that enables
            > the inclusion of all Root DTD/Scheme's. So a document can be part USUPTO,
            > part NewsML, part DocBook, part DITA, part TEI, part VoXML, part WSDL, part
            > WML, part any other DTD or subset and still validate.
            >
            > Naturally there would be a large and small deltas between all the
            > DTD/Scheme's. This I would consider the user core. Those parts that remain
            > unique to one or other DTD/Scheme shall possibly remain outside the
            > application of other DTD/Scheme's. So the common part of DocBook and TEI can
            > be seamlessly interchanged without having to create separate files.
            >
            > Complex, but then if was easy we would be doing it years ago.

            It may be easier to send men to Mars than to come up with "The
            One True and Universal Textual Markup XML Vocabulary"! <laugh/>


            As a member of the Open eBook Forum Publication Structure Working
            Group (PSWG), which is charged with maintaining and updating OEBPS,
            I am happy to see that there is some discussion on XML-Doc of OEBPS
            and its role(s) in ebook publishing -- it definitely is "on-topic"
            since ebooks (and ebook reading devices) are becoming more and more
            mainstream for XML-based document presentation.

            [Any thoughts and opinions I give below are mine alone, and do not
            necessarily represent those of OeBF and/or PSWG.]

            Yes, I agree with many (but not all) of the insightful points raised
            by Sam Wheller. For example, OEBPS is definitely focused more on
            direct (and indirect) electronic presentation than on transformation
            into other vocabularies and for direct conversion into print. (In
            fact, I've lately been stating my belief that OEBPS, combined with
            MathML, SVG and XLink, has the potential to become the best general
            purpose direct ebook presentation format, far surpassing all other
            direct electronic presentation formats, including PDF. This is a topic
            for a different discussion which probably is off-topic to this group.)

            One little known aspect about OEBPS makes it much more powerful as a
            markup vocabulary than at first glance: its extensibility. Sure, the
            Basic OEBPS 1.2 markup vocabulary is a pure, selected subset of XHTML
            1.1. However, document authors may use other tags in their OEBPS
            documents besides those in the Basic vocabulary. Documents using such
            custom tags are called "Extended OEBPS Documents". One restriction is
            that each custom element must be provided a CSS style rule (if CSS
            'display' is not defined, then it is assumed 'display:inline'), thus
            it is not possible to assign all possible XHTML presentation constructs
            (such as <a>, <object>, <image> and <ol>/<ul>) to non-HTML tags: all
            that is supported is CSS-display 'inline', 'block', 'none', and
            table-related functions. For 'inline' and 'block' presentation, this
            is essentially equivalent to using XHTML <div> and <span> with classes.

            Despite OEBPS being capable of markup extensibility, it by-and-large
            is XHTML in presentation orientation, and thus is incompatible in a
            few ways with other document markup vocabularies such as TEI and
            DocBook. That is, it is difficult if not impossible to make a document
            simultaneously Extended OEBPS and TEI (or DocBook) conformant without
            the need for significant transformation between the two (e.g., via
            XSLT).

            For example, one area of "incompatibility" is in the handling of notes
            (e.g., footnotes, endnotes). In TEI, a note can be placed inline with
            the main text using the <note> element. In DocBook, there is, for
            example, the <footnote> element which works similarly to TEI <note>.
            In XHTML/OEBPS, notes are placed elsewhere (either in the same
            document or in a separate document) and are linked to using an anchor
            <a> tag or using XLink. (One could in OEBPS use something like a TEI
            <note> tag in "Extended OEBPS" mode and declare it CSS 'display:none',
            but then proper rendering of this note content requires a specific
            OEBPS Reading System be designed to render this tag, thus the OEBPS
            Publication author is not guaranteed that all other OEBPS Reading
            Systems, and HTML web browsers, would render it as desired.)

            Another major area of incompatibility is the general structure of the
            Publication itself. OEBPS essentially links (or more properly "knits")
            together one or more documents using a separate XML document (the
            Package file) to act as the "control center" for the Publication. In
            TEI and DocBook, as I best understand them, a lot of the features of
            the OEBPS Package are incorporated right into one content document
            itself, including publication metadata.

            In fact, the Package is the real innovation in OEBPS which brings a
            new dimension to Publication structuring not found in the TEI and
            DocBook paradigms, and this goes beyond solely for presentation
            purposes. For example, highly non-linear documents (such as
            experimental hypertext literature and general "help" type of
            documents) fit much better into OEBPS than into TEI/DocBook, the
            latter of which tend to be quite linear in orientation (which is
            understandable if the source for the markup, or the intended output,
            is highly linear such as print). And PSWG is now considering greatly
            improving the capability of OEBPS to handle such highly non-linear
            documents, as well as specifying how to "linearize" such non-linear
            publications for output on linear-only formats such as print, and
            to provide better navigation throughout such complex documents
            while meeting the important need for accessibility.

            Now, Sam Wheller brought up his thoughts that the current book
            markup vocabularies (including OEBPS, DocBook, and TEI) are not a good
            starting framework by which a publisher or document producer would
            be able to use as "Source" to create all other needed document
            formats, both print and electronic.

            I admit that I go back and forth on this issue, and in the near-future
            will probably continue to do so. <laugh/> At least in principle,
            especially with the rise of transformation tools such as XSLT and
            XSL:FO, I see hope that a future version of OEBPS may actually fit the
            bill. For example, properly structured OEBPS, where the markup
            strictly separates styling from structure ala DocBook/TEI by using a
            pre-defined div/span/class library, or simply adding extended tags to
            give TEI/DocBook-like document structure, may be considered a possible
            Source format since OEBPS is friendly to both direct electronic
            rendering of non-linear documents and to repurposing into more linear
            formats such as print. (As a variation on this theme, one can build a
            new DTD which would fit into the current OEBPS framework and do the
            structural things that TEI and DocBook now support. This is an area
            I would seriously explore -- if need be the current OEBPS
            specification can be tweaked to provide for better flexibility here.)

            Certainly it is now possible to work in the opposite direction. For
            example, as I understand it DocBook publications can be transformed
            into high presentation quality XHTML+CSS (and from there into OEBPS
            with very little extra effort.) Properly structured documents using a
            selected subset from TEI can likewise be transformed.

            The bottom line is that I am in sympathy with Sam Wheller's desire
            for a "Source" XML markup vocabulary for books and documents, from
            which everything else can be created. The current markup vocabularies
            of OEBPS, TEI, DocBook, etc., all tackle different aspects of this
            issue, and each provides interesting insights into what this Source
            vocabulary must accomplish. DocBook and TEI rightly focus on marking
            up only the structure/semantics of documents with total separation of
            styling away from the documents (OEBPS can do this just as well, but
            OEBPS 1.2 still allows the "freedom" for crappy markup practice since
            some styling markup is still allowed in documents, such as the yucky
            <i> and <b> tags and the deprecated "style" attribute.) OEBPS brings
            to the table much better compatibility with direct electronic
            rendering, very good content modularity (which is especially useful
            for non-linear publications), as well as the innovative Package
            construct so Publication parameters are kept separate from the content
            documents (one can strongly argue that what describes a "Publication"
            should be kept separate from the textual content modules/documents
            which comprise the Publication.)

            Just my $0.02 worth.

            Jon Noring
          • Ian Tindale
            In fact, this is of interest: http://www.gentoo.org/doc/en/xml-guide.xml (yes, I ve just spent the last fortnight submersed in installing Gentoo Linux here).
            Message 5 of 17 , Feb 27, 2003
              In fact, this is of interest:
              http://www.gentoo.org/doc/en/xml-guide.xml

              (yes, I've just spent the last fortnight submersed in installing Gentoo Linux
              here).
              --
              Ian Tindale
            Your message has been successfully submitted and would be delivered to recipients shortly.