Loading ...
Sorry, an error occurred while loading the content.

RE: [xml-doc] Conversion: Word documents to XML (Docbook )

Expand Messages
  • Mike Feimster
    Shruti, You can always use FrameMaker conversion tables to structure the content. The theoretically simple procedure is: 1. Import Word Content into
    Message 1 of 9 , Jun 2, 2006
    • 0 Attachment
      Shruti,

      You can always use FrameMaker conversion tables to structure the content.
      The theoretically simple procedure is:

      1. Import Word Content into FrameMaker. The content will be unstructured.
      2. Create a conversion table to convert the unstructured Frame content into
      structured Frame/DocBook.
      3. Save as XML.

      In reality, you'll probably have to:

      1. Massage the content in Word so that it uses logical/semantic paragraph
      styles with no overrides. VBA works great for this.
      2. Ensure you have a template in Frame with the proper paragraph/character
      styles that matches your current formatting and is semantically rich enough.
      3. Ensure that the elements in the DocBook EDD are mapped to the
      paragraph/character styles in your template.
      4. Import the Word content into FrameMaker.
      5. Clean up the content to remove some residual Word funkiness.
      6. Generate a conversion table and map the paragraph styles, character
      styles, table elements, etc. to the proper DocBook elements.
      7. Run the Frame file against the conversion table.
      8. Clean up the structured Frame document so that it is valid.
      9. Save as XML.

      Repeat any of the above steps as often as necessary. It can be a very
      iterative process.

      I don't use DocBook, so others can chime in on the strengths and weaknesses
      of FrameMaker's implementation of DocBook. But, this process has worked
      fairly well for me in the past and with a current project.

      Mike Feimster
      IDD Technical Analyst

      ACS Technologies
      180 N. Dunbarton Drive
      Florence, SC 29501
      p / 843.413.8122
      f / 843.413.8122
      e / mike.feimster@...


      -----Original Message-----
      From: xml-doc@yahoogroups.com [mailto:xml-doc@yahoogroups.com] On Behalf Of
      Shruti bn
      Sent: Friday, June 02, 2006 5:38 AM
      To: xml-doc@yahoogroups.com
      Subject: [xml-doc] Conversion: Word documents to XML (Docbook )

      Hi All,

      I have to find methods of converting the Word documents to Docbook
      coversion which finally are to be exported to the FrameMaker format.

      Could any one suggest an method for the process without loss in
      formatting?

      Thanks,
      Shruti


      __________________________________________________
      Do You Yahoo!?
      Tired of spam? Yahoo! Mail has the best spam protection around
      http://mail.yahoo.com

      [Non-text portions of this message have been removed]




      Yahoo! Groups Links
    • Tom Crawford
      Hi Shruti, You could try opening the Word documents in Open Office, which is supposed to have a built-in Docbook convertor, or this can be added. I think there
      Message 2 of 9 , Jun 2, 2006
      • 0 Attachment
        Hi Shruti,
        You could try opening the Word documents in Open Office, which is
        supposed to have a built-in Docbook convertor, or this can be added. I
        think there is some loss of formatting in opening the Word document in
        OO, but not much.
        As for going from Docbook to FM, it sounds a little strange to me then I
        don't know your processes.
        All the best,
        Tom.



        -----Original Message-----
        From: xml-doc@yahoogroups.com [mailto:xml-doc@yahoogroups.com] On Behalf
        Of Shruti bn
        Sent: 02 June 2006 11:38
        To: xml-doc@yahoogroups.com
        Subject: [xml-doc] Conversion: Word documents to XML (Docbook )

        Hi All,

        I have to find methods of converting the Word documents to Docbook
        coversion which finally are to be exported to the FrameMaker format.

        Could any one suggest an method for the process without loss in
        formatting?

        Thanks,
        Shruti


        __________________________________________________
        Do You Yahoo!?
        Tired of spam? Yahoo! Mail has the best spam protection around
        http://mail.yahoo.com

        [Non-text portions of this message have been removed]




        Yahoo! Groups Links







        ____________________________________________________________

        � This email and any files transmitted with it are CONFIDENTIAL and intended
        solely for the use of the individual or entity to which they are addressed.
        � Any unauthorized copying, disclosure, or distribution of the material within
        this email is strictly forbidden.
        � Any views or opinions presented within this e-mail are solely those of the
        author and do not necessarily represent those of Odyssey Asset Management
        Systems SA unless otherwise specifically stated.
        � An electronic message is not binding on its sender. Any message referring to
        a binding engagement must be confirmed in writing and duly signed.
        � If you have received this email in error, please notify the sender immediately
        and delete the original.
      • Melanie Kendell
        Hi Shruti You might find it easier to import to FrameMaker directly (you don t say whether you already have a FrameMaker to Docbook mechanism set up already).
        Message 3 of 9 , Jun 2, 2006
        • 0 Attachment
          Hi Shruti

          You might find it easier to import to FrameMaker directly (you don't
          say whether you already have a FrameMaker to Docbook mechanism set up
          already).

          Word to FrameMaker works pretty well as long as you have a FrameMaker
          template set up with the same styles as the Word doc (FM to Word is,
          unfortunately, not as successful).

          Just a thought.

          -Melanie

          On 02/06/06, Shruti bn <bnshruti@...> wrote:
          > Hi All,
          >
          > I have to find methods of converting the Word documents to Docbook coversion which finally are to be exported to the FrameMaker format.
          >
          > Could any one suggest an method for the process without loss in formatting?
          >
          > Thanks,
          > Shruti
          >
          >
          > __________________________________________________
          > Do You Yahoo!?
          > Tired of spam? Yahoo! Mail has the best spam protection around
          > http://mail.yahoo.com
          >
          > [Non-text portions of this message have been removed]
          >
          >
          >
          >
          > Yahoo! Groups Links
          >
          >
          >
          >
          >
          >
          >
        • Eoin Campbell
          There are a number of commercial Word to DocBook XML converters including UpCast, Logictran and (our own offering) YAWC Pro (www.yawcpro.com). With all of
          Message 4 of 9 , Jun 6, 2006
          • 0 Attachment
            There are a number of commercial Word to DocBook XML converters
            including UpCast, Logictran and (our own offering) YAWC Pro
            (www.yawcpro.com).

            With all of them, the key is to clean up the Word file before attempting
            to convert to XML.
            This means applying heading and character level styles consistently, and
            using named
            styles (e.g. List Bullet, Heading 1, etc.) rather than presentation-only
            formatting.

            We have developed a Word template which assists the editing process, by
            making explicit
            a lot of the commonly used styles in Word, so that editors/authors find
            it easy to apply the
            required style. The template has an explicit menu item, toolbar icon and
            keyboard
            shortcut to apply the most common structural styles (e.g. <Ctrl>+1 =
            Heading 1).

            You can download it from
            http://www.yawconline.com/wordtemplates/yawcOnline.dot
            Feel free to use it as you wish.


            Once correctly formatted in Word, any Word to XML converter will do a
            reasonably good job of turning
            it into DocBook XML, although only a simple section hierarchy will be
            supported.
            If you want to automatically convert certain Word constructs to specific
            DocBook element structures,
            then you will need to customise the conversion process to a greater or
            lesser extent.


            xml-doc@yahoogroups.com wrote:
            >
            > -----Original Message-----
            > From: xml-doc@yahoogroups.com [mailto:xml-doc@yahoogroups.com] On Behalf Of
            > Shruti bn
            > Sent: Friday, June 02, 2006 5:38 AM
            > To: xml-doc@yahoogroups.com
            > Subject: [xml-doc] Conversion: Word documents to XML (Docbook )
            >
            > Hi All,
            >
            > I have to find methods of converting the Word documents to Docbook
            > coversion which finally are to be exported to the FrameMaker format.
            >
            > Could any one suggest an method for the process without loss in
            > formatting?
            >
            >

            --
            --
            Eoin Campbell, Technical Director, XML Workshop Ltd.
            10 Greenmount Industrial Estate, Harolds Cross, Dublin, Ireland.
            Phone: +353 1 4547811; fax: +353 1 4496299.
            Email: ecampbell@...; web: www.xmlw.ie
            YAWC: One-click web publishing from Word!
            YAWC Pro: www.yawcpro.com
            YAWC Online: www.yawconline.com
          • Ryan Germann
            ... Hello; if this sounds like too much work for you, and you re not inclined to spend the time doing the work yourself, the company I work for, Exegenix, uses
            Message 5 of 9 , Jun 7, 2006
            • 0 Attachment
              --- In xml-doc@yahoogroups.com, Eoin Campbell <ecampbell@...> wrote:

              > With all of them, the key is to clean up the Word file before
              > attempting to convert to XML. This means applying heading and
              > character level styles consistently, and using named styles
              > (e.g. List Bullet, Heading 1, etc.) rather than presentation-only
              > formatting.

              Hello; if this sounds like too much work for you, and you're not
              inclined to spend the time doing the work yourself, the company I work
              for, Exegenix, uses a different approach; print the file to
              PostScript, and we will analyse the formatting to properly intuit the
              section hierarchy, and we automatically detect lists, tables etc., by
              their layout on the page, regardless of the formatting codes used.

              If there are specific semantic tags desired, like "author" or
              "publishername", tagging can be handled during the quality assurance
              phase using our ECS Inspector tool (either by us, or by you) or, as
              Eoin suggests, you could pre-process the document... but instead of
              having to ensure every single FORMATTING construct is properly tagged,
              you JUST tag the handful of important semantic objects... the
              processing from that point is automated.

              The pricing model is based on volume of output, with
              cost-per-kilocharacter of output decreasing with higher volumes. If
              the time you spend doing tagging and cleanup is part of your cost
              consideration, Exegenix is very cost effective. Otherwise, you can
              spend hours or days of your own time cleaning up and fixing things,
              instead of being outside enjoying the summer weather. :-)

              Visit www.exegenix.com and submit a sample document to us and we can
              provide the sample output for you.

              Ryan Germann
              Exegenix Product Manager
            • Phil Caisley
              Hi all, You could also take a look at exegenix.com who can convert any styled PDF or postscript file to Docbook XML and retain all the formatting as attributes
              Message 6 of 9 , Jun 8, 2006
              • 0 Attachment
                Hi all,

                You could also take a look at exegenix.com who can convert any styled PDF or
                postscript file to Docbook XML and retain all the formatting as attributes
                of the Docbook XML elements.

                Cheers
                Phil


                [Non-text portions of this message have been removed]
              Your message has been successfully submitted and would be delivered to recipients shortly.