Loading ...
Sorry, an error occurred while loading the content.

Re: [XSL-FO] XSL-FO and foreign language.

Expand Messages
  • W. Eliot Kimber
    ... Definitely you should not use unicode-bidi in every text element. The first key is that Unicode has built-in directionality information such that a
    Message 1 of 3 , Jan 12, 2005
    • 0 Attachment
      LuzErez wrote:

      > Hi all
      > I have a general question regarding XSL-FO. The concept that the formatting
      > direction is in the tags: “ unicode-bidi="embed" is not clear to me. I build
      > a system that knows to export XSL-FO or I am using XML stream as a data
      > source. How do I know if my user keyed in one of the application text box
      > Hebrew or Arabic? Maybe my external XML has Arabic in it ? Should I use the
      > “ unicode-bidi “ in every text element?

      Definitely you should not use unicode-bidi in every text element.

      The first key is that Unicode has built-in directionality information
      such that a Unicode-aware, directionality-aware processor should be able
      to detect the inherent directionality of any sequence of characters.

      The second key is that Unicode defines a sophisticated "bidirectionality
      algorithm" by which applications can determine how to correctly render
      text that mixes left-to-right and right-to-left characters. Most of the
      time this algorithm produces correct results, assuming it's correctly
      implemented (which is a big if).

      With respect to getting the correct layout in FO, there are three
      possible cases with respect to directionality:

      1. A document consists of content that is all of one directionality
      (e.g., all Latin script, all Arabic script, all ideographic characters).

      2. A document consists of mostly left-to-right script with some
      right-to-left script, such as an English document with quoted Arabic words.

      3. A document consists of mostly right-to-left script with some
      left-to-right script, such as Hebrew document with some English words.

      In the first case all that should be required is to set the appropriate
      writing-mode where necessary in order to get the correct layout. The
      Unicode directionality information will be sufficient for the processor
      to compose the character content correctly.

      In the second and third cases in most of the time you need do nothing
      special because again the Unicode directionality and the Unicode
      bidirectional algorithm will produce the correct result (at least in
      those FO processors that implement the Unicode bidi algorithm).

      The only place where bidi-override should be necessary is where you need
      to get a result that is different from the result that would be produced
      by the application of the Unicode bidi algorithm or different from what
      your renderer produces (for whatever reason).

      The biggest problems I've run into are when right-to-left text is mixed
      with both arabic digits and punctuation or bracketing characters--in
      these cases the default result is almost never right, either because of
      the way the bidi algorithm works or because of bugs in a particular
      renderer.

      Also, the fo:bidi-override element is functionally equivalent to the
      Unicode directionality control characters, \u202b through \u202e, and in
      fact some, if not all, implementations simply translate fo:bidi-override
      into the equivalent control characters. Therefore it may be easier to
      just use these control characters where you need them, especially if the
      directionality control needs to span FO boundaries.

      The best thing to do is create some test FO instances and see what you
      get--that will tell you whether you need to do more work to get the
      correct result in the output.

      Cheers,

      Eliot
      --
      W. Eliot Kimber
      Professional Services
      Innodata Isogen
      9390 Research Blvd, #410
      Austin, TX 78759
      (512) 372-8122

      ekimber@...
      www.innodata-isogen.com
    • luzerez
      Thank U for the detail answer. From my tests with out ( With several renderes ) with out specifaing the correnct tags the mixed Hebrew English text will be ;-(
      Message 2 of 3 , Jan 12, 2005
      • 0 Attachment
        Thank U for the detail answer. From my tests with out ( With several
        renderes ) with out specifaing the correnct tags the mixed Hebrew
        English text will be ;-( . Maybe in the future this process will be
        not need this.


        --- In XSL-FO@yahoogroups.com, "W. Eliot Kimber" <ekimber@i...> wrote:
        > LuzErez wrote:
        >
        > > Hi all
        > > I have a general question regarding XSL-FO. The concept that the
        formatting
        > > direction is in the tags: " unicode-bidi="embed" is not clear to
        me. I build
        > > a system that knows to export XSL-FO or I am using XML stream as
        a data
        > > source. How do I know if my user keyed in one of the application
        text box
        > > Hebrew or Arabic? Maybe my external XML has Arabic in it ?
        Should I use the
        > > " unicode-bidi " in every text element?
        >
        > Definitely you should not use unicode-bidi in every text element.
        >
        > The first key is that Unicode has built-in directionality
        information
        > such that a Unicode-aware, directionality-aware processor should be
        able
        > to detect the inherent directionality of any sequence of characters.
        >
        > The second key is that Unicode defines a
        sophisticated "bidirectionality
        > algorithm" by which applications can determine how to correctly
        render
        > text that mixes left-to-right and right-to-left characters. Most of
        the
        > time this algorithm produces correct results, assuming it's
        correctly
        > implemented (which is a big if).
        >
        > With respect to getting the correct layout in FO, there are three
        > possible cases with respect to directionality:
        >
        > 1. A document consists of content that is all of one directionality
        > (e.g., all Latin script, all Arabic script, all ideographic
        characters).
        >
        > 2. A document consists of mostly left-to-right script with some
        > right-to-left script, such as an English document with quoted
        Arabic words.
        >
        > 3. A document consists of mostly right-to-left script with some
        > left-to-right script, such as Hebrew document with some English
        words.
        >
        > In the first case all that should be required is to set the
        appropriate
        > writing-mode where necessary in order to get the correct layout.
        The
        > Unicode directionality information will be sufficient for the
        processor
        > to compose the character content correctly.
        >
        > In the second and third cases in most of the time you need do
        nothing
        > special because again the Unicode directionality and the Unicode
        > bidirectional algorithm will produce the correct result (at least
        in
        > those FO processors that implement the Unicode bidi algorithm).
        >
        > The only place where bidi-override should be necessary is where you
        need
        > to get a result that is different from the result that would be
        produced
        > by the application of the Unicode bidi algorithm or different from
        what
        > your renderer produces (for whatever reason).
        >
        > The biggest problems I've run into are when right-to-left text is
        mixed
        > with both arabic digits and punctuation or bracketing characters--
        in
        > these cases the default result is almost never right, either
        because of
        > the way the bidi algorithm works or because of bugs in a particular
        > renderer.
        >
        > Also, the fo:bidi-override element is functionally equivalent to
        the
        > Unicode directionality control characters, \u202b through \u202e,
        and in
        > fact some, if not all, implementations simply translate fo:bidi-
        override
        > into the equivalent control characters. Therefore it may be easier
        to
        > just use these control characters where you need them, especially
        if the
        > directionality control needs to span FO boundaries.
        >
        > The best thing to do is create some test FO instances and see what
        you
        > get--that will tell you whether you need to do more work to get the
        > correct result in the output.
        >
        > Cheers,
        >
        > Eliot
        > --
        > W. Eliot Kimber
        > Professional Services
        > Innodata Isogen
        > 9390 Research Blvd, #410
        > Austin, TX 78759
        > (512) 372-8122
        >
        > ekimber@i...
        > www.innodata-isogen.com
      Your message has been successfully submitted and would be delivered to recipients shortly.