Loading ...
Sorry, an error occurred while loading the content.

Linking to/from SVG and FO Content--How Should It Work?

Expand Messages
  • Eliot Kimber
    We have a client that currently uses Interleaf to create PDFs that contain vector graphics that are linked at the graphic object level to content within the
    Message 1 of 13 , Sep 20, 2005
    • 0 Attachment
      We have a client that currently uses Interleaf to create PDFs that
      contain vector graphics that are linked at the graphic object level to
      content within the document, creating graphic hot spots from which
      readers can navigate to the text. Interleaf can render these documents
      to PDF such that the graphic objects in the PDF are hot link anchors.
      They not unreasonably would like to have the same functionality in an
      XML- and XSL-FO-based system.

      An obvious response is "use SVG and its linking mechanism". That then
      raises the immediate question of how to implement this. A quick look at
      the XEP and XSL Formatter documentation indicated that, while they both
      support SVG-to-PDF for their respective direct-to-PDF outputs, they do
      not support links from SVGs. Hmm.

      So I started putting together a test document to test this and
      immediately ran into a problem that is, to a degree, fundamental to
      hyperlink authoring but that is clear in this instance.

      The basic problem is what to specify in the SVG so that the rendered
      result will work?

      PDF certainly has the raw functionality needed to create a hot spot for
      any part of a graphic so there's no technical barrier to implementing
      the link at the PDF level--you simply need to know where the graphic
      object is within the current page and generate the appropriate link
      annotation over the same place. If you're rendering the graphic to PDF
      drawing commands you obviously have sufficient information to genenerate
      the link annotation as well.

      But the problem is how to represent the addresses of the link targets?

      In the SVG you can specify an "a" element with an xlink:href attribute,
      which by the SVG spec must be a URL that points to the desired target.

      Assume that the FO instance and the SVG to be included in or processed
      with the FO instance are both the result of transforms applied to some
      source documents, which means that we need only deal with how things
      should be specified in the FO and SVG that will be the direct inputs to
      the PDF generation process, not the XML and SVG source *as authored*
      (because our transforms can rewrite the addresses in the generated files
      to reflect the detailes of those generated files, including there
      individual elements and locations relative to each other).

      There are two use cases: the SVG is external to the FO instance and the
      SVG is embedded in the FO instance (via instream-foreign-object).

      The question: what do I specify for the SVG's xlink:href= attribute of a
      given "a" element so that the FO processor can generate a working link
      to a target within the generated PDF document?

      I think the right answer is, in both use cases, the location of the FO
      instance and the ID of the target FO element, e.g. if the FO instance
      and SVG graphic are in the same directory:

      <svg>
      ...
      <a xlink:href="mydoc.fo#d354342">
      <g>...</g>
      </a>
      </svg>

      But I would be interested to know if anyone else has either thought
      about this issue or tried to implement it?

      In the case where the SVG is external to the FO this is the only
      approach that can make sense because the SVG has to point to the only
      thing it knows about at that time, which is the FO instance (it can't
      make presumptions about the PDF that will be generated from the FO
      instance, which is why it can't specify a link to a PDF resource--this
      is because the details of the generated PDF are outside its
      control--they are totally the purview of the PDF generation process).

      In the case where the SVG is inline in the FO instance, you could, in
      theory, omit the filename part of the URL, but it is still valid for a
      document to point to itself--it should produce the same result.

      Therefore you can have one form of address that is independent of how
      the SVG is accessed from the FO instance.

      This approach presumes that the SVG is rendered to PDF commands as part
      of the rendition process. I don't know of any other way to get a vector
      result in a PDF from an SVG input in the context of an FO-based process
      (that is, Distiller 7 does not directly support SVG in PDFs as far as I
      know).

      It seems unlikely that FO implementors will be keen to implement linking
      from (or to) SVG objects until there is a standard practice for how the
      links are represented within FO and SVG instances.

      Comments? Thoughts?

      Cheers,

      Eliot
      --
      W. Eliot Kimber
      Professional Services
      Innodata Isogen
      9390 Research Blvd, #410
      Austin, TX 78759
      (512) 372-8155

      ekimber@...
      www.innodata-isogen.com
    • Jirka Kosek
      ... Indeed, seems like a real problem. ... I don t think this is the right answer. In many situations FO file is just virtual, it is materialized only as a DOM
      Message 2 of 13 , Sep 20, 2005
      • 0 Attachment
        Eliot Kimber wrote:

        > But the problem is how to represent the addresses of the link targets?

        Indeed, seems like a real problem.

        > I think the right answer is, in both use cases, the location of the FO
        > instance and the ID of the target FO element, e.g. if the FO instance
        > and SVG graphic are in the same directory:
        >
        > <svg>
        > ...
        > <a xlink:href="mydoc.fo#d354342">
        > <g>...</g>
        > </a>
        > </svg>
        >
        > But I would be interested to know if anyone else has either thought
        > about this issue or tried to implement it?

        I don't think this is the right answer. In many situations FO file is
        just virtual, it is materialized only as a DOM tree in memory and don't
        have filename. You will be also unable to reuse this image in several
        different documents.

        > In the case where the SVG is inline in the FO instance, you could, in
        > theory, omit the filename part of the URL, but it is still valid for a
        > document to point to itself--it should produce the same result.
        >
        > Therefore you can have one form of address that is independent of how
        > the SVG is accessed from the FO instance.

        IMHO if you want such links in SVG you should place SVG inline into FO
        code and use just fragment identifier (#foobar) to indentify target of
        link (or absolute URL if you want to point to completely different
        document). Doing it other way will raise a lot of nasty questions.
        I think that this is not a big limitaion because you can very easily
        place external SVG inside fo:instream-foreign-object by using XSLT.

        > It seems unlikely that FO implementors will be keen to implement linking
        > from (or to) SVG objects until there is a standard practice for how the
        > links are represented within FO and SVG instances.

        I think that if we will stick to inlined SVG images with #foobar or
        absolute URL links we can get consensus very quickly. OTOH supporting
        links in standalone SVG images raise a lot of really hard questions --
        some of them you have summarized in your post.

        Jirka

        --
        ------------------------------------------------------------------
        Jirka Kosek e-mail: jirka@... http://www.kosek.cz
        ------------------------------------------------------------------
        Profesion�ln� �kolen� a poradenstv� v oblasti technologi� XML.
        Pod�vejte se na n�� nov� spu�t�n� web http://DocBook.cz
        Podrobn� p�ehled �kolen� http://xmlguru.cz/skoleni/
        ------------------------------------------------------------------
        Nejbli��� term�ny �kolen�: DocBook 5.-7.12. * XSL-FO 19.-20.12.
        XSLT 17.-20.10. * XML sch�mata (v�etn� RELAX NG) 7.-9.11.
        ------------------------------------------------------------------



        [Non-text portions of this message have been removed]
      • Eliot Kimber
        ... [...] ... As discussed below, I don t think in-memory FO instances actually change the problem. ... I think I didn t make a key aspect of my scenario
        Message 3 of 13 , Sep 21, 2005
        • 0 Attachment
          Jirka Kosek wrote:
          > Eliot Kimber wrote:
          >>I think the right answer is, in both use cases, the location of the FO
          >>instance and the ID of the target FO element, e.g. if the FO instance
          >>and SVG graphic are in the same directory:

          [...]

          > I don't think this is the right answer. In many situations FO file is
          > just virtual, it is materialized only as a DOM tree in memory and don't
          > have filename.



          As discussed below, I don't think in-memory FO instances actually change
          the problem.

          > You will be also unable to reuse this image in several
          > different documents.

          I think I didn't make a key aspect of my scenario clear, which would
          lead to this misundertanding.

          The SVG documents in my scenario are not the SVG documents *as authored*
          but are the result of a transformation that must be done as part of the
          generation of the FO instance.

          That is, at the time you author both the input XML and the SVG, the only
          thing you can know for sure is the location of the original target XML
          elements.

          For example, say you have this source xml:

          mydoc.xml:

          <?xml version="1.0"?>
          <mydoc>
          ...
          <sect id="sect-a">...</sect>
          ..
          </mydoc>

          And you want to author a link from an SVG to the <sect> element. All you
          know at this time is the location of the <sect> element, so your SVG
          instance must point to the target element *as authored* (assuming both
          are in the same directory for simplicity in this example):

          fig1.svg:

          <?xml version="1.0"?>
          <svg>
          ...
          <a xlink:href="mydoc.xml#sect-a">
          <g>...</g>
          </a>
          ...
          </svg>


          That is, you must *always* author links in terms of the data *as
          authored*, because that's all you know. The source element could be used
          in several different compound documents, the SVG graphic could be used
          in several different compound documents.

          You cannot author the links in terms of any intermediate or rendered
          form because of course you cannot predict where a given source element
          will be used or where it will be rendered. This is simply a hard fact of
          link authoring.

          [This is also why XInclude, per strict interpretation of the XInclude
          spec, is not appropriate for authoring, but that's a separate
          discussion, much of which is covered in the paper I gave at XML Europe
          2004: http://idealliance.org/papers/dx_xmle04/papers/03-05-01/03-05-01.html%5d

          Given the above two documents, in order to generate an FO with the SVG
          you must do both of the following:

          1. Generate an FO instance from mydoc.xml in which each original element
          that is a link target is reflected by a formatting object with an ID
          that is unique within the FO instance.

          2. Generate a new SVG from fig1.svg in which the xlink:href= values have
          been rewritten to reflect the location of the original target element in
          the generated FO.

          Thus, given the above two input documents, you would generate these
          intermediate objects (again in the same directory for simplicity):

          mydoc.fo:

          <?xml version="1.0"?>
          <fo:root>
          ...
          <fo:block id="d012">
          {rendered content of <sect> "sect-a" element}
          </fo:block>
          ...
          </fo:root>

          fig1-for-mydoc_fo.svg:

          <?xml version="1.0"?>
          <svg>
          ...
          <a xlink:href="mydoc.fo#d012">
          <g>...</g>
          </a>
          ...
          </svg>

          This new SVG (fig1-for-mydoc_fo.svg) now points to the intermediate FO
          instance.

          If you wanted to embed the SVG the transform would be the same but the
          address could be just a fragment ID, as suggested (which I think makes
          sense).

          [Given that you have to tranform the SVG in any case I agree that it
          probably makes sense to just always have it be inline in the FO
          instance. However, there might be practical reasons for not doing this
          in all cases, such as when an SVG graphic is very large or embeds a
          large bitmap.]

          Note that whether the SVG is external to the FO instance or embedded, it
          doesn't change the fact that the original SVG must be processed to
          create a new version specific to the FO instance being created.

          Also, in the case where the FO is only in memory, presumably the
          newly-generated SVG would also be, in which case the processor doing the
          work can do whatever it needs to in order to get the effect as if had
          rewritten the href as in the example above, so I don't think that having
          the FO being in memory actually changes the problem or the solution.

          Note that in this discussion I've been focusing on linking from the SVG
          to formatting objects, but the other direction is also useful.

          In that case there's the wrinkle that whether the SVG is inline or
          external may determine how you construct the link in the FO instance.

          That is, XSL-FO provides the basic-link formatting object for creating
          navigable links. basic-link has two addressing attributes,
          internal-destination= and external-destination=.

          The internal-destination= attribute is an ID reference. The
          external-destination= attribute is an URI.

          So the question is how best to address SVG constructs using basic-link?

          I think I would expect internal-destination to work for instream SVGs.
          Obviously external-destination= is the only choice for external SVGs.

          But I think you should also be able to use external-destination to point
          to an instream SVG using the logic that pointing to yourself should be
          the same whether you do it by including a filename or by using just a
          fragment identifier.

          I think there might also be some potential ambiguity or disagreement
          about how to address elements by ID in the case of external-destination.

          In particular, the use of a bare fragment identifier as an ID reference,
          while common, is only supposed to work in the context of
          schema-aware/dtd-aware processing (in advance of XML ID being a
          recommendation). But there is no requirement that SVG documents point to
          schemas or DTDs--it is sufficient for them to simply declare the SVG
          namespace (and be correct SVGs, of course).

          But in the case of SVG, the semantics of SVG are clear in that the id=
          attribute of graphic objects in an XML ID and therefore I think that
          processors are certainly justified in acting as if processing of all SVG
          documents is schema-aware, regardless of whether the SVG schema is
          literally processed or not. I think this is so because the schema spec
          specifically provides for processors to associate schemas to documents
          based purely on the namespaces declared in the document.

          Cheers,

          Eliot
          --
          W. Eliot Kimber
          Professional Services
          Innodata Isogen
          9390 Research Blvd, #410
          Austin, TX 78759
          (512) 372-8155

          ekimber@...
          www.innodata-isogen.com
        • Jirka Kosek
          ... Indeed, I know these problems very well from olinks in DocBook. :-( ... I m not sure about this. How would you know that you are pointing to yourself? By
          Message 4 of 13 , Sep 21, 2005
          • 0 Attachment
            Eliot Kimber wrote:

            > You cannot author the links in terms of any intermediate or rendered
            > form because of course you cannot predict where a given source element
            > will be used or where it will be rendered. This is simply a hard fact of
            > link authoring.

            Indeed, I know these problems very well from olinks in DocBook. :-(

            > But I think you should also be able to use external-destination to point
            > to an instream SVG using the logic that pointing to yourself should be
            > the same whether you do it by including a filename or by using just a
            > fragment identifier.

            I'm not sure about this. How would you know that you are pointing to
            yourself? By external-destination="mydoc.fo" or
            external-destination="mydoc.pdf" or external-destination="mydoc.ps"? I
            think that this ambiguity is the reason why FO provides separate ways
            for internal and external links. Of course if you will use
            external-destination="mydoc.pdf#a123" then you will get same user
            experience in PDF viewer as if you were used internal-destination="a123"
            assuming that result of rendering was stored in file mydoc.pdf. But as
            you surely know even Acrobat Reader has problems with fragment
            identifiers unless is running as plugin inside web-browser. So to me it
            seems that using internal-destination would be much more safer and
            predictable.

            > But in the case of SVG, the semantics of SVG are clear in that the id=
            > attribute of graphic objects in an XML ID and therefore I think that
            > processors are certainly justified in acting as if processing of all SVG
            > documents is schema-aware, regardless of whether the SVG schema is
            > literally processed or not. I think this is so because the schema spec
            > specifically provides for processors to associate schemas to documents
            > based purely on the namespaces declared in the document.

            I agree here. I wouldn't call it schema-aware though, just "aware". It
            is same for XSL-FO -- id attribute is considered as ID type even if you
            don't associate schema/DTD with FO instance.

            --
            ------------------------------------------------------------------
            Jirka Kosek e-mail: jirka@... http://www.kosek.cz
            ------------------------------------------------------------------
            Profesion�ln� �kolen� a poradenstv� v oblasti technologi� XML.
            Pod�vejte se na n�� nov� spu�t�n� web http://DocBook.cz
            Podrobn� p�ehled �kolen� http://xmlguru.cz/skoleni/
            ------------------------------------------------------------------
            Nejbli��� term�ny �kolen�: DocBook 5.-7.12. * XSL-FO 19.-20.12.
            XSLT 17.-20.10. * XML sch�mata (v�etn� RELAX NG) 7.-9.11.
            ------------------------------------------------------------------



            [Non-text portions of this message have been removed]
          • Eliot Kimber
            ... mydoc.fo That is the resource being processed. As part of the FO-to-PDF output, the FO processor must again rewrite the pointers to reflect what things
            Message 5 of 13 , Sep 21, 2005
            • 0 Attachment
              Jirka Kosek wrote:

              >>But I think you should also be able to use external-destination to point
              >>to an instream SVG using the logic that pointing to yourself should be
              >>the same whether you do it by including a filename or by using just a
              >>fragment identifier.
              >
              >
              > I'm not sure about this. How would you know that you are pointing to
              > yourself? By external-destination="mydoc.fo" or
              > external-destination="mydoc.pdf" or external-destination="mydoc.ps"?

              mydoc.fo

              That is the resource being processed.

              As part of the FO-to-PDF output, the FO processor must again rewrite the
              pointers to reflect what things become in the PDF, i.e., PDF anchors,
              with whatever names, and PDF link annotations to those anchors.

              Cheers,

              E.


              --
              W. Eliot Kimber
              Professional Services
              Innodata Isogen
              9390 Research Blvd, #410
              Austin, TX 78759
              (512) 372-8155

              ekimber@...
              www.innodata-isogen.com
            • Jirka Kosek
              ... But AFAIK current FO processors don t do this. They just pass URL in external-destination from FO to PDF file. -- ... Jirka Kosek e-mail:
              Message 6 of 13 , Sep 21, 2005
              • 0 Attachment
                Eliot Kimber wrote:

                >>I'm not sure about this. How would you know that you are pointing to
                >>yourself? By external-destination="mydoc.fo" or
                >>external-destination="mydoc.pdf" or external-destination="mydoc.ps"?
                >
                >
                > mydoc.fo
                >
                > That is the resource being processed.
                >
                > As part of the FO-to-PDF output, the FO processor must again rewrite the
                > pointers to reflect what things become in the PDF, i.e., PDF anchors,
                > with whatever names, and PDF link annotations to those anchors.

                But AFAIK current FO processors don't do this. They just pass URL in
                external-destination from FO to PDF file.

                --
                ------------------------------------------------------------------
                Jirka Kosek e-mail: jirka@... http://www.kosek.cz
                ------------------------------------------------------------------
                Profesion�ln� �kolen� a poradenstv� v oblasti technologi� XML.
                Pod�vejte se na n�� nov� spu�t�n� web http://DocBook.cz
                Podrobn� p�ehled �kolen� http://xmlguru.cz/skoleni/
                ------------------------------------------------------------------
                Nejbli��� term�ny �kolen�: DocBook 5.-7.12. * XSL-FO 19.-20.12.
                XSLT 17.-20.10. * XML sch�mata (v�etn� RELAX NG) 7.-9.11.
                ------------------------------------------------------------------



                [Non-text portions of this message have been removed]
              • Eliot Kimber
                ... Hmm. That s a problem. The FO processor should be examining the value of external-destination to determine if it is pointing into a resource it knows
                Message 7 of 13 , Sep 21, 2005
                • 0 Attachment
                  Jirka Kosek wrote:
                  > Eliot Kimber wrote:
                  >
                  >
                  >>>I'm not sure about this. How would you know that you are pointing to
                  >>>yourself? By external-destination="mydoc.fo" or
                  >>>external-destination="mydoc.pdf" or external-destination="mydoc.ps"?
                  >>
                  >>
                  >>mydoc.fo
                  >>
                  >>That is the resource being processed.
                  >>
                  >>As part of the FO-to-PDF output, the FO processor must again rewrite the
                  >>pointers to reflect what things become in the PDF, i.e., PDF anchors,
                  >>with whatever names, and PDF link annotations to those anchors.
                  >
                  > But AFAIK current FO processors don't do this. They just pass URL in
                  > external-destination from FO to PDF file.

                  Hmm. That's a problem. The FO processor should be examining the value of
                  external-destination to determine if it is pointing into a resource it
                  knows something about (i.e., the input FO, included graphic objects,
                  etc.) or something it doesn't (anything that is not the FO instance or
                  an FO-defined or SVG-defined dependency on it). There is no other way
                  that FO processors could, for example, implement links from the FO to
                  SVG graphic objects in external SVGs.

                  Note the implication here: In an FO processor that also does SVG
                  processing, it must build a list consisting of the FO instance itself,
                  any SVG documents it points to, and any SVG-related resources those SVG
                  documents point to (i.e., embedded rasters, components in other SVG
                  documents). All of these objects are part of the "compound document"
                  rooted at the FO instance and the FO engine must be prepared to directly
                  handle the processing of links to and from any of them.

                  But there are still cases where the only thing the FO engine should have
                  to do is pass the value straight through. But that then raises the
                  question of what value to pass through? Which I think was Jirka's
                  original question, applied to the more general case.

                  That is, in the specific case of links between an FO instance and SVG
                  graphics included in that FO instance, either instream or external,
                  there's no reason for an FO processor *not* to recognize that the
                  external-destination is in fact to that FO instance or one of its
                  included SVG graphics and do the right thing.

                  But when the external destination is to a resource that is entirely
                  outside the scope of the FO instance and its dependencies, then I agree
                  that there's not much it can do but pass it straight through. This is
                  because the responsibility for determining the right ultimate target
                  address lies with the processor that generates the FO, not the FO
                  renderer itself.

                  For example, consider the case where you have two "units of
                  publication", Doc_A and Doc_B. These units of publication represent,
                  logically, the final rendered result (for example, two manuals within a
                  set of manuals for a single product) in whatever form they might be
                  rendered (PDF, HTML, online help, etc.).

                  Each unit of publication will be composed of many separate components,
                  i.e., XML source for the text, SVG graphics, etc. Some of these
                  components may be use in multiple units of publication.

                  Again we quickly realize that the links must be constructed in the
                  source data in terms of the data *as authored*, because again that's all
                  we know about for sure and because there may be a one-to-many
                  relationship between source objects and renditions of those source objects.

                  For consider the scenario where you have a component of Doc_A that needs
                  to link to a component of Doc_B. The link as authored will be from an
                  element in the source for Doc_A to an element in the source for Doc_B, i.e.:

                  doc_a.xml:

                  <?xml version="1.0"?>
                  <doc>
                  ...
                  <link href="doc_b.xml" xpointer="sect-a">...</link>
                  ...
                  </doc>

                  doc_b.xml:

                  <?xml version="1.0"?>
                  <doc>
                  ...
                  <sect id="sect-a">...</sect>
                  ...
                  </doc>

                  In order to make this link from Doc_A to Doc_B a working link in the
                  final PDF you must know, at the time you generate the PDF for Doc_A,
                  what the PDF anchor for the PDF rendering of <sect> sect-a is. The FO
                  renderer cannot know this because it doesn't know anything about any PDF
                  rendered from Doc_B (it only knows about the FO instance it is rendering
                  at the moment).

                  So what will the processing of Doc_A look like?

                  First, we transform doc_a.xml into FO. As part of this we have to
                  transfrom the <link> element into an <fo:basic-link with an
                  external-destination= value.

                  The challenge here is that the working link in the final PDF for Doc_A
                  must point to a named anchor in the PDF rendering of Doc_B. How can we
                  know what the name of the anchor for the PDF rendering of <sect> sect-a is?

                  There are only three posibilities as far as I can determine:

                  1. The generation of PDF anchor names is deterministic based on some
                  value provided in the FO instance, either FO IDs or some
                  FO-engine-specific bit of information (i.e., an "anchor-name=" attribute).

                  2. The FO engine generates the anchor names but then writes out a "side
                  file" that maps IDs in the FO to anchor names in the PDF. From this you
                  can map back to the original XML elements by having your FO generation
                  process write out the mapping from elements in specific source files to
                  the IDs generated for them in a specific FO instance.

                  3. A variant of (2): You put the ID-to-name mapping in the PDF itself
                  and then extract it as a post process.

                  I'm pretty sure no FO implementation does (2) but they may do (1) and
                  (3) is always doable using the sort of technique Ken Holman uses for
                  back-of-the-book index generation.

                  So at this point we can presume that we have a mapping from original XML
                  source document/element ID pairs to PDF anchor names in specific
                  renderings of those XML elements (that is, the elements in the context
                  of specific renderings of specific units of publication).

                  In our simple case here, we will have this mapping entry, created as a
                  side effect of rendering unit of publication Doc_B to PDF:

                  Source Doc | Elem ID || Unit of Pub | Rendition | Anchor Name |
                  -----------|---------||-------------------------|-------------|
                  doc_b.xml | sect-a || Doc_B | Doc_B.pdf | anch_00234 |


                  Note that a basic aspect of XML is that document/ID pairs provide unique
                  addresses for every XML element in the universe (where by "document" I
                  mean a specific physical storage object containing an invariant XML
                  document). That is, every element instance exists in exactly one
                  document and every document is, by definition, an addressible object.
                  Therefore every element can be addressed in terms of its location with a
                  specific document.

                  Note also that "Doc_B.pdf" really has to be an absolute location or an
                  relative path that is in a known relationship to all other PDFs that
                  might be rendered and that might link to it, otherwise, short of
                  server-side redirection of URLs embedded in the PDFs or rewriting of
                  URLs in the PDF when its location is moved (which is doable), there is
                  no way to create reliable PDF-to-PDF pointers. For this example we can
                  assume that all the PDFs are published in the same directory just to
                  keep it simple.

                  Given the mapping shown above, we can implement our XSLT process so that
                  we can generate a PDF-specific external destination value at FO
                  generation time:

                  doc_a.fo:

                  <?xml version="1.0"?>
                  <fo:root>
                  ...
                  <fo:basic-link
                  external-destination="Doc_B.pdf#anch_00234">...</fo:basic-link>
                  ...
                  </fo:root>

                  Now the FO engine can pass the value of external-destination straight
                  through and it will work as long as our mapping table was correct.

                  Cheers,

                  Eliot
                  --
                  W. Eliot Kimber
                  Professional Services
                  Innodata Isogen
                  9390 Research Blvd, #410
                  Austin, TX 78759
                  (512) 372-8155

                  ekimber@...
                  www.innodata-isogen.com
                • Jirka Kosek
                  ... Seems that we are moving away from original topic, but still in interesting direction. ... XEP uses id value as PDF anchor name. AFAIK other FO
                  Message 8 of 13 , Sep 21, 2005
                  • 0 Attachment
                    Eliot Kimber wrote:

                    > The challenge here is that the working link in the final PDF for Doc_A
                    > must point to a named anchor in the PDF rendering of Doc_B. How can we
                    > know what the name of the anchor for the PDF rendering of <sect> sect-a is?

                    Seems that we are moving away from original topic, but still in
                    interesting direction.

                    > I'm pretty sure no FO implementation does (2) but they may do (1) and
                    > (3) is always doable using the sort of technique Ken Holman uses for
                    > back-of-the-book index generation.

                    XEP uses id value as PDF anchor name. AFAIK other FO implementations do
                    same thing.

                    > Given the mapping shown above, we can implement our XSLT process so that
                    > we can generate a PDF-specific external destination value at FO
                    > generation time:

                    If anyone is curious, for DocBook this was already implemented

                    http://sagehill.net/docbookxsl/Olinking.html

                    BTW: Eliot, have you asked RenderX or Antenna House about implementing
                    links from SVG to FO at least in the simplest case when SVG images are
                    placed as instream-foreign-objects and links are using just fragment
                    identifiers? It shouldn't be hard to implement. Is any FO engine vendor
                    listening here?

                    --
                    ------------------------------------------------------------------
                    Jirka Kosek e-mail: jirka@... http://www.kosek.cz
                    ------------------------------------------------------------------
                    Profesion�ln� �kolen� a poradenstv� v oblasti technologi� XML.
                    Pod�vejte se na n�� nov� spu�t�n� web http://DocBook.cz
                    Podrobn� p�ehled �kolen� http://xmlguru.cz/skoleni/
                    ------------------------------------------------------------------
                    Nejbli��� term�ny �kolen�: DocBook 5.-7.12. * XSL-FO 19.-20.12.
                    XSLT 17.-20.10. * XML sch�mata (v�etn� RELAX NG) 7.-9.11.
                    ------------------------------------------------------------------



                    [Non-text portions of this message have been removed]
                  • Eliot Kimber
                    ... Only a little bit. I think the FO and SVG case is just a special case of the more general problem of how to render documents in which links work, both
                    Message 9 of 13 , Sep 21, 2005
                    • 0 Attachment
                      Jirka Kosek wrote:
                      > Eliot Kimber wrote:
                      >
                      >
                      >>The challenge here is that the working link in the final PDF for Doc_A
                      >>must point to a named anchor in the PDF rendering of Doc_B. How can we
                      >>know what the name of the anchor for the PDF rendering of <sect> sect-a is?
                      >
                      >
                      > Seems that we are moving away from original topic, but still in
                      > interesting direction.

                      Only a little bit. I think the FO and SVG case is just a special case of
                      the more general problem of how to render documents in which links work,
                      both within a single unit of publication and among units of publication.

                      >>I'm pretty sure no FO implementation does (2) but they may do (1) and
                      >>(3) is always doable using the sort of technique Ken Holman uses for
                      >>back-of-the-book index generation.
                      >
                      >
                      > XEP uses id value as PDF anchor name. AFAIK other FO implementations do
                      > same thing.

                      That makes it a little easier.

                      > BTW: Eliot, have you asked RenderX or Antenna House about implementing
                      > links from SVG to FO

                      I have.

                      Cheers,

                      E.
                      --
                      W. Eliot Kimber
                      Professional Services
                      Innodata Isogen
                      9390 Research Blvd, #410
                      Austin, TX 78759
                      (512) 372-8155

                      ekimber@...
                      www.innodata-isogen.com
                    • Altsoft Xml2PDF
                      ... Altsoft Xml2PDF allows linking between embedded SVGs and XSL-FO document and vice versa. We treat all IDs in all documents (XSL-FO and embedded SVG) as the
                      Message 10 of 13 , Sep 22, 2005
                      • 0 Attachment
                        > BTW: Eliot, have you asked RenderX or Antenna House about implementing
                        > links from SVG to FO at least in the simplest case when SVG images are
                        > placed as instream-foreign-objects and links are using just fragment
                        > identifiers? It shouldn't be hard to implement. Is any FO engine
                        > vendor listening here?

                        Altsoft Xml2PDF allows linking between embedded SVGs and XSL-FO
                        document and vice versa. We treat all IDs in all documents (XSL-FO and
                        embedded SVG) as the joint set of IDs. There is no difference for
                        Xml2PDF in the source of the destination. Thus, if author knows the
                        desired ID in the parent or embedded document he just uses it as an
                        internal destination. However, the problem with overlapping sets of
                        IDs still exists, but it can be easily solved by the author.

                        You can download a free evaluation version of Altsoft Xml2PDF at
                        http://alt-soft.com/products_xml2pdf_download.jsp


                        Best regards,
                        Victor Vishnyakov
                        Altsoft NV
                        http://alt-soft.com/





                        --- In XSL-FO@yahoogroups.com, Jirka Kosek <jirka@k...> wrote:
                        > Eliot Kimber wrote:
                        >
                        > > The challenge here is that the working link in the final PDF for
                        Doc_A
                        > > must point to a named anchor in the PDF rendering of Doc_B. How
                        can we
                        > > know what the name of the anchor for the PDF rendering of <sect>
                        sect-a is?
                        >
                        > Seems that we are moving away from original topic, but still in
                        > interesting direction.
                        >
                        > > I'm pretty sure no FO implementation does (2) but they may do (1) and
                        > > (3) is always doable using the sort of technique Ken Holman uses for
                        > > back-of-the-book index generation.
                        >
                        > XEP uses id value as PDF anchor name. AFAIK other FO implementations do
                        > same thing.
                        >
                        > > Given the mapping shown above, we can implement our XSLT process
                        so that
                        > > we can generate a PDF-specific external destination value at FO
                        > > generation time:
                        >
                        > If anyone is curious, for DocBook this was already implemented
                        >
                        > http://sagehill.net/docbookxsl/Olinking.html
                        >
                        > BTW: Eliot, have you asked RenderX or Antenna House about implementing
                        > links from SVG to FO at least in the simplest case when SVG images are
                        > placed as instream-foreign-objects and links are using just fragment
                        > identifiers? It shouldn't be hard to implement. Is any FO engine vendor
                        > listening here?
                        >
                        > --
                        > ------------------------------------------------------------------
                        > Jirka Kosek e-mail: jirka@k... http://www.kosek.cz
                        > ------------------------------------------------------------------
                        > Profesionální ¹kolení a poradenství v oblasti technologií XML.
                        > Podívejte se na ná¹ novì spu¹tìný web http://DocBook.cz
                        > Podrobný pøehled ¹kolení http://xmlguru.cz/skoleni/
                        > ------------------------------------------------------------------
                        > Nejbli¾¹í termíny ¹kolení: DocBook 5.-7.12. * XSL-FO 19.-20.12.
                        > XSLT 17.-20.10. * XML schémata (vèetnì RELAX NG) 7.-9.11.
                        > ------------------------------------------------------------------
                        >
                        >
                        >
                        > [Non-text portions of this message have been removed]
                      • Peter B. West
                        ... Eliot, Sorry about the delay. Address problems. What happens when doc_a and doc_b cross-reference? Peter -- Peter B. West Folio
                        Message 11 of 13 , Sep 23, 2005
                        • 0 Attachment
                          Eliot Kimber wrote:
                          > Jirka Kosek wrote:
                          >
                          >>Eliot Kimber wrote:
                          >>
                          >>
                          >>
                          >>>>I'm not sure about this. How would you know that you are pointing to
                          >>>>yourself? By external-destination="mydoc.fo" or
                          >>>>external-destination="mydoc.pdf" or external-destination="mydoc.ps"?
                          >>>
                          >>>
                          >>>mydoc.fo
                          >>>
                          >>>That is the resource being processed.
                          >>>
                          >>>As part of the FO-to-PDF output, the FO processor must again rewrite the
                          >>>pointers to reflect what things become in the PDF, i.e., PDF anchors,
                          >>>with whatever names, and PDF link annotations to those anchors.
                          >>
                          >>But AFAIK current FO processors don't do this. They just pass URL in
                          >>external-destination from FO to PDF file.
                          >
                          >
                          > Hmm. That's a problem. The FO processor should be examining the value of
                          > external-destination to determine if it is pointing into a resource it
                          > knows something about (i.e., the input FO, included graphic objects,
                          > etc.) or something it doesn't (anything that is not the FO instance or
                          > an FO-defined or SVG-defined dependency on it). There is no other way
                          > that FO processors could, for example, implement links from the FO to
                          > SVG graphic objects in external SVGs.
                          >
                          > Note the implication here: In an FO processor that also does SVG
                          > processing, it must build a list consisting of the FO instance itself,
                          > any SVG documents it points to, and any SVG-related resources those SVG
                          > documents point to (i.e., embedded rasters, components in other SVG
                          > documents). All of these objects are part of the "compound document"
                          > rooted at the FO instance and the FO engine must be prepared to directly
                          > handle the processing of links to and from any of them.
                          >
                          > But there are still cases where the only thing the FO engine should have
                          > to do is pass the value straight through. But that then raises the
                          > question of what value to pass through? Which I think was Jirka's
                          > original question, applied to the more general case.
                          >
                          > That is, in the specific case of links between an FO instance and SVG
                          > graphics included in that FO instance, either instream or external,
                          > there's no reason for an FO processor *not* to recognize that the
                          > external-destination is in fact to that FO instance or one of its
                          > included SVG graphics and do the right thing.
                          >
                          > But when the external destination is to a resource that is entirely
                          > outside the scope of the FO instance and its dependencies, then I agree
                          > that there's not much it can do but pass it straight through. This is
                          > because the responsibility for determining the right ultimate target
                          > address lies with the processor that generates the FO, not the FO
                          > renderer itself.
                          >
                          > For example, consider the case where you have two "units of
                          > publication", Doc_A and Doc_B. These units of publication represent,
                          > logically, the final rendered result (for example, two manuals within a
                          > set of manuals for a single product) in whatever form they might be
                          > rendered (PDF, HTML, online help, etc.).
                          >
                          > Each unit of publication will be composed of many separate components,
                          > i.e., XML source for the text, SVG graphics, etc. Some of these
                          > components may be use in multiple units of publication.
                          >
                          > Again we quickly realize that the links must be constructed in the
                          > source data in terms of the data *as authored*, because again that's all
                          > we know about for sure and because there may be a one-to-many
                          > relationship between source objects and renditions of those source objects.
                          >
                          > For consider the scenario where you have a component of Doc_A that needs
                          > to link to a component of Doc_B. The link as authored will be from an
                          > element in the source for Doc_A to an element in the source for Doc_B, i.e.:
                          >
                          > doc_a.xml:
                          >
                          > <?xml version="1.0"?>
                          > <doc>
                          > ...
                          > <link href="doc_b.xml" xpointer="sect-a">...</link>
                          > ...
                          > </doc>
                          >
                          > doc_b.xml:
                          >
                          > <?xml version="1.0"?>
                          > <doc>
                          > ...
                          > <sect id="sect-a">...</sect>
                          > ...
                          > </doc>
                          >
                          > In order to make this link from Doc_A to Doc_B a working link in the
                          > final PDF you must know, at the time you generate the PDF for Doc_A,
                          > what the PDF anchor for the PDF rendering of <sect> sect-a is. The FO
                          > renderer cannot know this because it doesn't know anything about any PDF
                          > rendered from Doc_B (it only knows about the FO instance it is rendering
                          > at the moment).
                          >
                          > So what will the processing of Doc_A look like?
                          >
                          > First, we transform doc_a.xml into FO. As part of this we have to
                          > transfrom the <link> element into an <fo:basic-link with an
                          > external-destination= value.
                          >
                          > The challenge here is that the working link in the final PDF for Doc_A
                          > must point to a named anchor in the PDF rendering of Doc_B. How can we
                          > know what the name of the anchor for the PDF rendering of <sect> sect-a is?
                          >
                          > There are only three posibilities as far as I can determine:
                          >
                          > 1. The generation of PDF anchor names is deterministic based on some
                          > value provided in the FO instance, either FO IDs or some
                          > FO-engine-specific bit of information (i.e., an "anchor-name=" attribute).
                          >
                          > 2. The FO engine generates the anchor names but then writes out a "side
                          > file" that maps IDs in the FO to anchor names in the PDF. From this you
                          > can map back to the original XML elements by having your FO generation
                          > process write out the mapping from elements in specific source files to
                          > the IDs generated for them in a specific FO instance.
                          >
                          > 3. A variant of (2): You put the ID-to-name mapping in the PDF itself
                          > and then extract it as a post process.
                          >
                          > I'm pretty sure no FO implementation does (2) but they may do (1) and
                          > (3) is always doable using the sort of technique Ken Holman uses for
                          > back-of-the-book index generation.
                          >
                          > So at this point we can presume that we have a mapping from original XML
                          > source document/element ID pairs to PDF anchor names in specific
                          > renderings of those XML elements (that is, the elements in the context
                          > of specific renderings of specific units of publication).
                          >
                          > In our simple case here, we will have this mapping entry, created as a
                          > side effect of rendering unit of publication Doc_B to PDF:
                          >
                          > Source Doc | Elem ID || Unit of Pub | Rendition | Anchor Name |
                          > -----------|---------||-------------------------|-------------|
                          > doc_b.xml | sect-a || Doc_B | Doc_B.pdf | anch_00234 |
                          >
                          >
                          > Note that a basic aspect of XML is that document/ID pairs provide unique
                          > addresses for every XML element in the universe (where by "document" I
                          > mean a specific physical storage object containing an invariant XML
                          > document). That is, every element instance exists in exactly one
                          > document and every document is, by definition, an addressible object.
                          > Therefore every element can be addressed in terms of its location with a
                          > specific document.
                          >
                          > Note also that "Doc_B.pdf" really has to be an absolute location or an
                          > relative path that is in a known relationship to all other PDFs that
                          > might be rendered and that might link to it, otherwise, short of
                          > server-side redirection of URLs embedded in the PDFs or rewriting of
                          > URLs in the PDF when its location is moved (which is doable), there is
                          > no way to create reliable PDF-to-PDF pointers. For this example we can
                          > assume that all the PDFs are published in the same directory just to
                          > keep it simple.
                          >
                          > Given the mapping shown above, we can implement our XSLT process so that
                          > we can generate a PDF-specific external destination value at FO
                          > generation time:
                          >
                          > doc_a.fo:
                          >
                          > <?xml version="1.0"?>
                          > <fo:root>
                          > ...
                          > <fo:basic-link
                          > external-destination="Doc_B.pdf#anch_00234">...</fo:basic-link>
                          > ...
                          > </fo:root>
                          >
                          > Now the FO engine can pass the value of external-destination straight
                          > through and it will work as long as our mapping table was correct.
                          >
                          > Cheers,
                          >
                          > Eliot

                          Eliot,

                          Sorry about the delay. Address problems.

                          What happens when doc_a and doc_b cross-reference?

                          Peter
                          --
                          Peter B. West <http://cv.pbw.id.au/>
                          Folio <http://defoe.sourceforge.net/folio/>


                          [Non-text portions of this message have been removed]
                        • Eliot Kimber
                          ... I m not sure I understand exactly what your question is. But I think what you re alluding to is the fact that Doc B has to be rendered before Doc A if Doc
                          Message 12 of 13 , Sep 28, 2005
                          • 0 Attachment
                            Peter B. West wrote:
                            >
                            > What happens when doc_a and doc_b cross-reference?
                            >

                            I'm not sure I understand exactly what your question is.

                            But I think what you're alluding to is the fact that Doc B has to be
                            rendered before Doc A if Doc A has a cross reference to Doc B and wants
                            to resolve it and likewise Doc A has to be rendered before Doc B if Doc
                            B has a cross reference to Doc A and wants to resolve it.

                            This looks like deadlock and it is.

                            The resolution is to do two passes: Render Doc A, defering generation of
                            links to Doc B. This populates the XML ID to rendered PDF name map for
                            Doc A. Then render Doc B, which can now resolve all its links to Doc A.
                            This populates the XML ID to rendered PDF name map for Doc B. Now
                            re-render Doc A, resolving the links to Doc B.

                            This is another way of saying that Doc A and Doc B have to processed
                            together, either literally as a single processing operation, or in
                            stages as described above.

                            In a set of inter-linked compound documents you essentially have to
                            render each doc twice.

                            Cheers,

                            Eliot

                            --
                            W. Eliot Kimber
                            Professional Services
                            Innodata Isogen
                            9390 Research Blvd, #410
                            Austin, TX 78759
                            (512) 372-8155

                            ekimber@...
                            www.innodata-isogen.com
                          • Peter B. West
                            ... That s what I was asking. ... Peter -- Peter B. West Folio [Non-text portions of this message
                            Message 13 of 13 , Sep 28, 2005
                            • 0 Attachment
                              Eliot Kimber wrote:
                              > Peter B. West wrote:
                              >
                              >>What happens when doc_a and doc_b cross-reference?
                              >>
                              >
                              >
                              > I'm not sure I understand exactly what your question is.
                              >
                              > But I think what you're alluding to is the fact that Doc B has to be
                              > rendered before Doc A if Doc A has a cross reference to Doc B and wants
                              > to resolve it and likewise Doc A has to be rendered before Doc B if Doc
                              > B has a cross reference to Doc A and wants to resolve it.
                              >

                              That's what I was asking.

                              > This looks like deadlock and it is.
                              >

                              Peter
                              --
                              Peter B. West <http://cv.pbw.id.au/>
                              Folio <http://defoe.sourceforge.net/folio/>


                              [Non-text portions of this message have been removed]
                            Your message has been successfully submitted and would be delivered to recipients shortly.