Loading ...
Sorry, an error occurred while loading the content.

Re: [xml-doc] Sample Word 2003 WordML XML mark-up

Expand Messages
  • Oliver Meyer
    Hi. ... Paul That means I will be stuck with Word s form of XML. Tommie Not necessarily. You know what Word s form of XML is. You Tommie can, once, write a
    Message 1 of 16 , Mar 31, 2003
    • 0 Attachment
      Hi.

      >>>>> "Paul" == Paul Tremblay <phthenry@...> writes:
      >>>>> "Tommie" == B Tommie Usdin <btusdin@...> writes:

      Paul> That means I will be stuck with Word's form of XML.

      Tommie> Not necessarily. You know what Word's form of XML is. You
      Tommie> can, once, write a transformation that makes it what you
      Tommie> want it to be. Then run that transform every time you get
      Tommie> a file. Running XSLT transformations is generally
      Tommie> painless.

      Do we know, what Word's form is? Does/Will MS provide a schema/DTD or
      even documentation for their document format?

      If not, Paul won't be able to create a XSLT transformation that can
      convert every document he will receive. Then he'd need to
      reverse-engineer syntax and semantic of the document format. Also MS
      would be able to change even the syntax making the transformer
      useless!

      Oliver

      --
      ------------ DISCLAIMER: I do not speak for anyone but myself. ----------
      Oliver Meyer omeyer@...-aachen.de
      phone: +49 (2 41) 80 - 2 13 13 Department of Computer Science III
      fax: +49 (2 41) 80 - 2 22 18 Aachen University of Technology
    • Paul Tremblay
      ... Perhaps you are right about the transformation being painless. I am in the proces of finishing up a scrip that transforms RTF to XML, and RTF is *such* a
      Message 2 of 16 , Mar 31, 2003
      • 0 Attachment
        On Mon, Mar 31, 2003 at 05:15:43PM -0500, B. Tommie Usdin wrote:
        >
        > >Right now Open Office and
        > >Koffice are devleoping one schema for word processing. This schema may
        > >not give as much structure as a purist would like. For example, it won't
        > >requrie documents have an author name and revison dates, as docbook
        > >does. None-the-less, it is probable that this XML will make a lot of
        > >sense.
        >
        > If you like that schema better, transform Word's XML into that XML. It
        > is likely that tools to do that transformation will be freely available.
        > You won't even have to write the XSLT yourself.
        >
        > Or talk everyone in your organization into loading that schema into Word
        > before they start writing. Perhaps you can even get the powers-that-be to
        > install that schema as the default within your organization.

        Perhaps you are right about the transformation being painless. I am in
        the proces of finishing up a scrip that transforms RTF to XML, and RTF
        is *such* a mess, that I am suspicious that Word XML will also be so.

        (I know that a good sample was provided on the website, but I will have
        to see a lot more examples to have a good idea. For example, if you
        italize a whole paragraph in Word, it handles it differently than
        otherwise. How will their XML handle this?)

        My impression is that having to look at siblings to figure out
        formatting properties is somewhat diffucult, but that's only because I
        haven't tried it before. If one can use XSLT to transform Word XML, than
        I agree the XML would be pretty darn good. Easier to use an XSLT style
        sheet than 5,000 lines of python code.

        It would be nice to talk everyone in my organization into using docbook
        or TEI. That is a bit tricky. Most people think they know best when it
        comes to computers, or they just don't think these matters are a big
        deal. I can't even get one guy in the group to use ascii text in his
        emails--he uses some crazy emailer, which puts a "???" for every smart
        quote he uses. I have to weed out all these "???" when I want to post
        his articles on our website. Oh well!

        I see your point that there is not one good XML form. But as far as
        marking up word processing documents, don't you think some forms would
        be better than others?

        <!--these forms make sense to me-->

        <i>italics</i>

        <emph rend='italics>italics</emph>

        <list>
        <item>
        in a list
        </item>
        </list>

        <!--these forms seem crazy- - though maybe that's just me?-->

        <text-to-follow>
        <i/>
        </text-to-follow>
        <text>This is kind of what Word will look like</text>

        <paragraph list='true' level='2'> text</paragraph>

        I suppose we could debate this forever. But you made an intriguing point
        that some people prefer non-nested XML over nested. What is the
        reasoning behind that?

        Paul



        --

        ************************
        *Paul Tremblay *
        *phthenry@...*
        ************************
      • Dave Pawson
        ... I m just hoping that with M$ pushing XML, there will be at least more interest in XML, if not real curiosity. Fashion and all that? ... +1 ... If people
        Message 3 of 16 , Apr 1, 2003
        • 0 Attachment
          At 17:15 31/03/2003 -0500, B. Tommie Usdin wrote:

          >I agree that most people don't know anything about XML, and don't care,
          >won't care, and shouldn't care.

          I'm just hoping that with M$ pushing XML, there will be at least
          more interest in XML, if not real curiosity. Fashion and all that?




          >I still think that it will be easier to
          >make something you want from any XML than from the dog's breakfast of
          >word processing formats out there now.

          +1


          > >That means I will be stuck with Word's form of XML.
          >
          >Not necessarily. You know what Word's form of XML is. You can,
          >once, write a transformation that makes it what you want it to
          >be. Then run that transform every time you get a file. Running
          >XSLT transformations is generally painless.

          If people repeatedly did the same thing this would be feasible Tommie.
          As it is, with any serious length document, styling seems to vary
          more and more as the document length grows!

          regards DaveP
        • ed nixon
          ... As an old-style end user computing guy, I think there is a sense in which all of this discussion misses an important point. Arguably, the whole impetus for
          Message 4 of 16 , Apr 4, 2003
          • 0 Attachment
            >
            >
            >>>>>>"Paul" == Paul Tremblay <phthenry@...> writes:
            >>>>>>"Tommie" == B Tommie Usdin <btusdin@...> writes:
            >>>>>>
            >>>>>>
            >
            > Paul> That means I will be stuck with Word's form of XML.
            >
            > Tommie> Not necessarily. You know what Word's form of XML is. You
            > Tommie> can, once, write a transformation that makes it what you
            > Tommie> want it to be. Then run that transform every time you get
            > Tommie> a file. Running XSLT transformations is generally
            > Tommie> painless.
            >
            >Do we know, what Word's form is? Does/Will MS provide a schema/DTD or
            >even documentation for their document format?
            >
            >
            As an old-style end user computing guy, I think there is a sense in
            which all of this discussion misses an important point. Arguably, the
            whole impetus for developing word processing software was originally to
            put sophisticated document creation and formatting power in the hands of
            a wider and larger group of workers. In one sense, it was a move to make
            a certain class of clerical worker -- secretaries, stenographers,
            typists, etc. -- redundant. Certainly, the clerical positions have gone
            the way of the dodo. The propagation strategy has largely succeeded;
            what has failed, in my opinion, is the actual realization or utilization
            of that capability by the new hands (or fingers.) The skill levels of
            most word processor users is lamentable.

            Now, we are talking about adding (temporarily or not) a whole new class
            of function to the mix: the consultant, document expert, document
            technical support person, whatever? From a management and business
            perspective, is this going to look like progress? My experience with the
            current generation of work processing software is that management is
            generally not interested in supporting the tools with training and
            technical support beyond basic installation and configuration. This may
            be a result of management's uncritically swallowing the office
            productivity line or it may be just bad policy. I've found it very
            difficult to interest organizations in development of standards, of
            rudimentary templates or even to advocate layout and typography
            standards. I may be traveling with the wrong crowd, but I think you get
            my point. Maybe you've met the same class of people.

            XML technologies supporting document creation for the broadest audience,
            is going to have to be much more transparent and function rich, I think,
            before the potential synergies of semi-structured content can be
            realized. My sense, and I'm still very ignorant of these new tools, is
            the the Open Office approach (generally speaking separating structure
            from cosmetics) is much more robust if still very complex and poorly
            supported.

            Regards. ...edN
          • B. Tommie Usdin
            ... In terms of this discussion, I think there is a significant difference between the Open Office approach and the Word 2003 XML approach ONLY when the
            Message 5 of 16 , Apr 4, 2003
            • 0 Attachment
              At 8:14 AM -0500 4/4/03, ed nixon wrote:
              >XML technologies supporting document creation for the broadest audience,
              >is going to have to be much more transparent and function rich, I think,
              >before the potential synergies of semi-structured content can be
              >realized. My sense, and I'm still very ignorant of these new tools, is
              >the the Open Office approach (generally speaking separating structure
              >from cosmetics) is much more robust if still very complex and poorly
              >supported.

              In terms of this discussion, I think there is a significant difference
              between the "Open Office approach" and the "Word 2003 XML approach ONLY
              when the user is actively interested in the XML.

              When the user ignores the XML and the structure of the document and
              just "uses" the word processor, both tools will produce a significantly
              higher quality "junk" file than the non-XML word processors that
              dominate the market now. And that "junk" (not in terms of content but
              in terms of markup) will take significant cleanup before it can be
              used in a more rigorous environment (such as combined with other
              word processing files into a multi-author publication).

              The difference, as I see it, is that when the user is willing and
              able to pay attention to the XML, the Word approach will allow
              them to use a tag set appropriate to their content and make XML
              that is probably re-useable.

              -- Tommie
              --
              ======================================================================
              B. Tommie Usdin mailto:btusdin@...
              Mulberry Technologies, Inc. http://www.mulberrytech.com
              17 West Jefferson Street Phone: 301/315-9631
              Suite 207 Direct Line: 301/315-9634
              Rockville, MD 20850 Fax: 301/315-8285
              ----------------------------------------------------------------------
              Mulberry Technologies: A Consultancy Specializing in SGML and XML
              ======================================================================
            • Paul Tyson
              ... to ... of ... make ... gone ... utilization ... of ... I will argue that it misses an even bigger point, which Barry Schaeffer has repeatedly mentioned in
              Message 6 of 16 , Apr 4, 2003
              • 0 Attachment
                4/4/03 7:14:44 AM, ed nixon <ed.nixon@...> wrote:

                >As an old-style end user computing guy, I think there is a sense in
                >which all of this discussion misses an important point. Arguably, the
                >whole impetus for developing word processing software was originally
                to
                >put sophisticated document creation and formatting power in the hands
                of
                >a wider and larger group of workers. In one sense, it was a move to
                make
                >a certain class of clerical worker -- secretaries, stenographers,
                >typists, etc. -- redundant. Certainly, the clerical positions have
                gone
                >the way of the dodo. The propagation strategy has largely succeeded;
                >what has failed, in my opinion, is the actual realization or
                utilization
                >of that capability by the new hands (or fingers.) The skill levels
                of
                >most word processor users is lamentable.
                >

                I will argue that it misses an even bigger point, which Barry
                Schaeffer has repeatedly mentioned in this forum. Office automation
                took a wrong turn when Doug Engelbart's early hypertext research was
                cut off. Engelbart and other early researchers in this field realized
                it was about communicating *ideas*, not about making things that look
                like paper documents. They knew that traditional paper documents were
                *means to an end*, not an end in themselves. The goal is to
                communicate ideas from the mind of one person to the mind of another.

                But people with more money than imagination took control after that,
                and decided to pursue "word processing" systems as you describe, Ed,
                and for those purposes.

                The same thing happened with computer-aided drafting tools. Not
                realizing that engineering drawings are *means to an end*, the whiz-
                bang programmers came up with neat ways to put lines on paper using
                computers. Starting down the wrong road, with both office automation
                and CAD, set the fields back at least 20 years.

                In office automation, SGML was a mighty attempt to get it back on the
                right track. In the CAD field, 3-D modeling tools are maturing, and
                have almost reached the point where the printed "drawing" is just an
                afterthought, a convenience when you want to unroll something on a big
                table to study it. All the "real" data is in the 3-D model, and you
                can get any kind of rendition you want of it.

                Unfortunately, the office automation folks haven't "gotten it" yet,
                and are still fixated on the "printed document". The printed
                document, or a pretty screen rendition, are just trivial aspects of
                the document. The substance is in the ideas that the document
                contains, and these are best represented with structural markup. For
                at least twenty years people have tried to make products that try to
                marry desktop publishing with structured documentation, and all have
                proven the futility of this approach. MSWord+XML will be no
                different.

                >Now, we are talking about adding (temporarily or not) a whole new
                class
                >of function to the mix: the consultant, document expert, document
                >technical support person, whatever? From a management and business
                >perspective, is this going to look like progress? My experience with
                the
                >current generation of work processing software is that management is
                >generally not interested in supporting the tools with training and
                >technical support beyond basic installation and configuration. This
                may
                >be a result of management's uncritically swallowing the office
                >productivity line or it may be just bad policy. I've found it very
                >difficult to interest organizations in development of standards, of
                >rudimentary templates or even to advocate layout and typography
                >standards. I may be traveling with the wrong crowd, but I think you
                get
                >my point. Maybe you've met the same class of people.
                >
                >XML technologies supporting document creation for the broadest
                audience,
                >is going to have to be much more transparent and function rich, I
                think,
                >before the potential synergies of semi-structured content can be
                >realized. My sense, and I'm still very ignorant of these new tools,
                is
                >the the Open Office approach (generally speaking separating structure
                >from cosmetics) is much more robust if still very complex and poorly
                >supported.
                >

                I have always maintained that the hardest part of all this is the part
                that requires thinking. That is irreducibly hard, no matter how many
                smart dancing paper clips you call on. To continue the analogy of CAD
                systems, it is hard to conceive of any non-trivial piece of machinery
                and represent it in such a way that the shop can build it, the stress
                engineer can analyze it, the government can approve it, etc., etc.
                Just so, it is hard to write something meaningful so that you get the
                point across without misunderstanding, especially if the topic is
                complex and the message important. But it is much harder to develop
                computer systems that represent the geometry and properties of
                machinery, or that represent your ideas and their relationships to one
                another, to the reader, and to the world at large. That is the
                important point of "office automation", but it is precisely the point
                that everyone ignores.

                Paul Tyson, Principal Consultant Precision Documents
                paul@... http://precisiondocuments.com
                "The art and science of document engineering."
              Your message has been successfully submitted and would be delivered to recipients shortly.