Loading ...
Sorry, an error occurred while loading the content.

Re: [XSL-FO] How to use properly?

Expand Messages
  • W. Eliot Kimber
    ... The exact mechanism will be entirely dependent on the FO engine you re using and what, if anything, they do to help in this case. Alternatively you can use
    Message 1 of 7 , Jan 19, 2005
    • 0 Attachment
      C. Myers wrote:

      > Hi Eliot,
      >
      > Could you provide me a little more information/clue(s)
      > how to implement the two-pass approach? Thanks.

      The exact mechanism will be entirely dependent on the FO engine you're
      using and what, if anything, they do to help in this case. Alternatively
      you can use the "put the data in PDF and extract it" approach, which is
      generic but can be a bit more trouble to implement (but not that hard).

      For any solution, the basic approach is:

      1. Figure out what layout-related information you need in order to get
      the effect you want. In your case you need to know what page each
      footnote reference falls on. That is, for each element that makes a
      footnote reference, you need to know the page number it falls on.

      Thus you need to create an association between the original input XML
      element and the page number of the page it eventually falls on. This is
      easiest if the original element has an ID or some other
      easily-referenceable identifier, but that's not a hard requirement. For
      example, if you are using Saxon, the generated IDs will be consistent
      for the same input document because the IDs directly reflect the
      document and tree organization of the elements [NOTE: XSLT doesn't
      require this and you should not depend on it as a general solution. The
      Saxon implementation could change at any time (and it may not even be
      true in Saxon 8, I don't know).]

      2. Generate the layout-related information. In the abscence of a more
      direct extension, there are essentially two available approaches:

      A. Use Ken Holman's technique of creating leading or trailing pages in
      your PDF that contain the data you want in some convenient text format
      (e.g., as XML data or comma-delimited strings or something). You can
      then use any number of PDF page and text-extraction tools to get the
      text out of the pages. Note that it doesn't matter what the font size
      is, so you can make the text very small if you want. See
      www.cranesoftwrights.com for details on Ken's technique. This should
      work for any FO implementation.

      B. If your FO engine produces one, use the (proprietary) area tree
      serialization produced by your FO implementation. Both XEP and XSL
      Formatter provide the ability to dump the paginated area tree to an XML
      file (FOP might as well, I don't know). These trees are non-standard
      (there is no standard for area tree representation, nor should there be)
      but pretty obvious in their structure given an understanding of the FO
      specification. You can an XSLT transform to process this tree in order
      to figure out which elements occur on which pages. The main downside
      with this approach is that these area trees can be quite large, easily
      10 times as big as the original XML documente, which can make the total
      process time slow. This is one reason I would prefer the ability to
      generate only that information I actually need for a given process.

      3. In your second pass, use the information gathered in step 2 to
      reprocess the original input XML document. In this pass you will now
      know which pages your footnote references fall on and can therefore do
      things like only generate one reference per page or reset the callouts
      per page.

      Unfortunately, in this particular example, because you will likely be
      changing which footnotes actually occur on which pages, you will likely
      change the pagination. This will require at least one more pass to
      settle out the footnote placement, and may require a 4th pass to ensure
      that there is no change from pass 3 to pass 4.

      Cheers,

      Eliot
      --
      W. Eliot Kimber
      Professional Services
      Innodata Isogen
      9390 Research Blvd, #410
      Austin, TX 78759
      (512) 372-8122

      ekimber@...
      www.innodata-isogen.com
    • C. Myers
      Eliot, Thank you so much for your prompt reply and valuable information you have provided. We are using RenderX, and its immediate file is called xep and
      Message 2 of 7 , Jan 20, 2005
      • 0 Attachment
        Eliot,
        Thank you so much for your prompt reply and valuable
        information you have provided. We are using RenderX,
        and its immediate file is called xep and basically a
        text. I will share your message with my colleague and
        decide what to do next.

        Thanks again.

        Sincerely,
        Ching

        --- "W. Eliot Kimber" <ekimber@...>
        wrote:

        > C. Myers wrote:
        >
        > > Hi Eliot,
        > >
        > > Could you provide me a little more
        > information/clue(s)
        > > how to implement the two-pass approach? Thanks.
        >
        > The exact mechanism will be entirely dependent on
        > the FO engine you're
        > using and what, if anything, they do to help in this
        > case. Alternatively
        > you can use the "put the data in PDF and extract it"
        > approach, which is
        > generic but can be a bit more trouble to implement
        > (but not that hard).
        >
        > For any solution, the basic approach is:
        >
        > 1. Figure out what layout-related information you
        > need in order to get
        > the effect you want. In your case you need to know
        > what page each
        > footnote reference falls on. That is, for each
        > element that makes a
        > footnote reference, you need to know the page number
        > it falls on.
        >
        > Thus you need to create an association between the
        > original input XML
        > element and the page number of the page it
        > eventually falls on. This is
        > easiest if the original element has an ID or some
        > other
        > easily-referenceable identifier, but that's not a
        > hard requirement. For
        > example, if you are using Saxon, the generated IDs
        > will be consistent
        > for the same input document because the IDs directly
        > reflect the
        > document and tree organization of the elements
        > [NOTE: XSLT doesn't
        > require this and you should not depend on it as a
        > general solution. The
        > Saxon implementation could change at any time (and
        > it may not even be
        > true in Saxon 8, I don't know).]
        >
        > 2. Generate the layout-related information. In the
        > abscence of a more
        > direct extension, there are essentially two
        > available approaches:
        >
        > A. Use Ken Holman's technique of creating leading
        > or trailing pages in
        > your PDF that contain the data you want in some
        > convenient text format
        > (e.g., as XML data or comma-delimited strings or
        > something). You can
        > then use any number of PDF page and text-extraction
        > tools to get the
        > text out of the pages. Note that it doesn't matter
        > what the font size
        > is, so you can make the text very small if you want.
        > See
        > www.cranesoftwrights.com for details on Ken's
        > technique. This should
        > work for any FO implementation.
        >
        > B. If your FO engine produces one, use the
        > (proprietary) area tree
        > serialization produced by your FO implementation.
        > Both XEP and XSL
        > Formatter provide the ability to dump the paginated
        > area tree to an XML
        > file (FOP might as well, I don't know). These trees
        > are non-standard
        > (there is no standard for area tree representation,
        > nor should there be)
        > but pretty obvious in their structure given an
        > understanding of the FO
        > specification. You can an XSLT transform to process
        > this tree in order
        > to figure out which elements occur on which pages.
        > The main downside
        > with this approach is that these area trees can be
        > quite large, easily
        > 10 times as big as the original XML documente, which
        > can make the total
        > process time slow. This is one reason I would prefer
        > the ability to
        > generate only that information I actually need for a
        > given process.
        >
        > 3. In your second pass, use the information gathered
        > in step 2 to
        > reprocess the original input XML document. In this
        > pass you will now
        > know which pages your footnote references fall on
        > and can therefore do
        > things like only generate one reference per page or
        > reset the callouts
        > per page.
        >
        > Unfortunately, in this particular example, because
        > you will likely be
        > changing which footnotes actually occur on which
        > pages, you will likely
        > change the pagination. This will require at least
        > one more pass to
        > settle out the footnote placement, and may require a
        > 4th pass to ensure
        > that there is no change from pass 3 to pass 4.
        >
        > Cheers,
        >
        > Eliot
        > --
        > W. Eliot Kimber
        > Professional Services
        > Innodata Isogen
        > 9390 Research Blvd, #410
        > Austin, TX 78759
        > (512) 372-8122
        >
        > ekimber@...
        > www.innodata-isogen.com
        >
        >


        __________________________________________________
        Do You Yahoo!?
        Tired of spam? Yahoo! Mail has the best spam protection around
        http://mail.yahoo.com
      Your message has been successfully submitted and would be delivered to recipients shortly.