Loading ...
Sorry, an error occurred while loading the content.

Data localization

Expand Messages
  • masaru20100
    Hi, What can I do to move forward the localization of PCGen data? I thought a bit about it, and I felt that if it was done it would need to be: – The data
    Message 1 of 17 , Dec 8, 2011
    • 0 Attachment
      Hi,

      What can I do to move forward the localization of PCGen data?

      I thought a bit about it, and I felt that if it was done it would need to be:
      – The data should be the same that the English one to avoid having to duplicate data correction.
      – The data that is not translated in a language, for example books not yet translated, should stay in the language it first get out.
      – The interface language and the data language should be separate to allow people with book in a different language than their own to use the program more simply.
      – It should be possible to use another language in a data to allow non English speaker to develop custom content in another language. That seem to be a problem when combined with separate language from data. A way to avoid it is to provide easy reference to create custom content.
      – Use of collator to sort lists.
    • masaru20100
      Hi, What can I do to move forward the localization of PCGen data? I thought a bit about it, and I felt that if it was done it would need to be: – The data
      Message 2 of 17 , Dec 8, 2011
      • 0 Attachment
        Hi,

        What can I do to move forward the localization of PCGen data?

        I thought a bit about it, and I felt that if it was done it would need to be:
        – The data should be the same that the English one to avoid having to duplicate data correction.
        – The data that is not translated in a language, for example books not yet translated, should stay in the language it first get out.
        – The interface language and the data language should be separate to allow people with book in a different language than their own to use the program more simply.
        – It should be possible to use another language in a data to allow non English speaker to develop custom content in another language. That seem to be a problem when combined with separate language from data. A way to avoid it is to provide easy reference to create custom content.
        – Use of collator to sort lists.
      • Martijn Verburg
        Hi there, It would be great if you could throw some of this up on http://wiki.pcgen.org/Internationalization We re still quite some way from being able to
        Message 3 of 17 , Dec 9, 2011
        • 0 Attachment
          Hi there,

          It would be great if you could throw some of this up on
          http://wiki.pcgen.org/Internationalization

          We're still quite some way from being able to start our i18n work, but it's
          good to get the requirements down.

          K

          On 9 December 2011 04:09, masaru20100 <hooya.masaru20100@...>wrote:

          > **
          >
          >
          > Hi,
          >
          > What can I do to move forward the localization of PCGen data?
          >
          > I thought a bit about it, and I felt that if it was done it would need to
          > be:
          > – The data should be the same that the English one to avoid having to
          > duplicate data correction.
          > – The data that is not translated in a language, for example books not yet
          > translated, should stay in the language it first get out.
          > – The interface language and the data language should be separate to allow
          > people with book in a different language than their own to use the program
          > more simply.
          > – It should be possible to use another language in a data to allow non
          > English speaker to develop custom content in another language. That seem to
          > be a problem when combined with separate language from data. A way to avoid
          > it is to provide easy reference to create custom content.
          > – Use of collator to sort lists.
          >
          >
          >


          [Non-text portions of this message have been removed]
        • Andrew
          @Kar - Wiki is restricted access, I d need to get a desired user name and then send him a password. ... [Non-text portions of this message have been removed]
          Message 4 of 17 , Dec 9, 2011
          • 0 Attachment
            @Kar - Wiki is restricted access, I'd need to get a desired user name and then send him a password.



            On 12/9/2011 1:53 AM, Martijn Verburg wrote:
            >
            >
            > Hi there,
            >
            > It would be great if you could throw some of this up on
            > http://wiki.pcgen.org/Internationalization
            >
            > We're still quite some way from being able to start our i18n work, but it's
            > good to get the requirements down.
            >
            > K
            >
            > On 9 December 2011 04:09, masaru20100 <hooya.masaru20100@...
            > <mailto:hooya.masaru20100%40antichef.net>>wrote:
            >
            > > **
            > >
            > >
            > > Hi,
            > >
            > > What can I do to move forward the localization of PCGen data?
            > >
            > > I thought a bit about it, and I felt that if it was done it would need to
            > > be:
            > > – The data should be the same that the English one to avoid having to
            > > duplicate data correction.
            > > – The data that is not translated in a language, for example books not yet
            > > translated, should stay in the language it first get out.
            > > – The interface language and the data language should be separate to allow
            > > people with book in a different language than their own to use the program
            > > more simply.
            > > – It should be possible to use another language in a data to allow non
            > > English speaker to develop custom content in another language. That seem to
            > > be a problem when combined with separate language from data. A way to avoid
            > > it is to provide easy reference to create custom content.
            > > – Use of collator to sort lists.
            > >
            > >
            > >
            >
            > [Non-text portions of this message have been removed]
            >
            >
            > --
            > Andrew Maitland (LegacyKing)
            > Admin Silverback - PCGen Board of Directors
            > Data 2nd, Docs Tamarin, OS Lemur
            > Unique Title "Quick-Silverback Tracker Monkey"
            > Unique Title "The Torturer of PCGen"


            [Non-text portions of this message have been removed]
          • thpr
            So looking back at the earlier thread, I never actually came back with my suggestion, so here is my perspective on localization and how it should be done.
            Message 5 of 17 , Dec 9, 2011
            • 0 Attachment
              So looking back at the earlier thread, I never actually came back with my suggestion, so here is my perspective on localization and how it should be done. This has been made easier by some changes we have in the 6.x line, and having new UI (with what I believe to be a lot more isolation of code that does display of items) is a huge boost here to making this practical.

              First a few base facts as background:
              (1) For [slmost] every item (Class, Skill, etc.) there is a unique identifier (generally referred to as a Key - note that if the "KEY" token is not used, then the Key is the name (first entry on the line in the data) (There is an exception to this we'll call problem #1)
              (2) There are basically 4 types of things that need translation:
              (2a) item names
              (2b) constants (like spell schools)
              (2c) variables (like meters/feet/etc.)
              (2d) Strings (like descriptions)
              [if anyone can think of more, let me know]
              (3) Most (but not all) tokens are "unique" or otherwise "addressable" in that they can only occur once per object. (Anyone notice that the tokens have started to be specific in the test code & docs about whether they overwrite, add, etc. - this is one reason why - and yes, I've been slowly trying to make progress on this even in the 2007-2009 work) [The exception to this is problem #2)

              A few principles about l10n:
              (1) We must not make producing a data set materially more complicated than it is today (no requirement to put %L10NNAME% type gunk into data)
              (2) We should target an ability to tell us if l10n is complete for any given data set

              With that:
              Following from #1, almost everything we have in the data today is "addressable". By that, I mean that the OUTPUTNAME for a Skill called "FooBar" can be uniquely called out. The name hierarchy is something like: SKILL//Foo//OUTPUTNAME (exceptions are problems #1,2 to be addressed later)

              Given that, we can actually set principle #1 to "The data remains unchanged" (again, except for problem #2). The entire data set is produced (assume US English for a moment). We then have a unique file for l10n that has things like:
              SKILL:FooBar|Oobarf-ay
              SKILL:FooBar:OUTPUTNAME|ooBarOutF-ay
              etc.

              This (to the first order) covers 2a, 2d.
              For 2b items, we simply have to expand the list of items (SKILL, SPELL) that we are familiar with, so we get things like:
              SPELLSCHOOL:Divination|Ivinationd-ay

              2c is a bit more complicated, but I can't believe it's all that bad given it's a thing many applications already do (and it's a known thing)

              Each file could be named in such a way that it identifies it's l10n, e.g.:
              srd_de_ch.l10n
              ...and probably placed in a L10N subfolder of the initial dataset.
              (Note I'd recommend we be clear on where these go, AND ALSO require that NO PCC FILES (I'd recommend we say no PCC or LST, but the formal limit would be no PCC) are recognized if they are in the L10N folder... so that the initial directory parse [which is probably one of the slower parts of our boot process] can immediately ignore the l10n directory and not have to look through the file list looking for .PCC files... alternately, we have a l10n folder that is parallel to the data folder, but that then adds complication in needing new preferences to point at multiple l10n folders and requires more complicated structure within that l10n structure to identify WHICH files in that folder map to which datasets... because we can't simply say a given English word will always translate the same way - English is way too overloaded for that.)

              I would recommend we keep things in a small subset of files, and not try to do 1:1 for each data file (that would produce a lot of file sprawl) - but that's not my call.

              Addressing principle #2:
              Since we have a set of items we know would need to be covered (names, certain tokens), we should be able to load those into memory, and load the l10n file into memory and compare. This should be able to produce warnings of 2 kinds:
              (W1) Errors where the base data set contains things that are not translated
              (W2) Errors where the translation file attempts to translate things not in the base data set
              I can't imagine that utility is all that hard to write (just would need to make the list of todos)

              This brings us to problems #1 and #2:

              Problem #1: Non-unique names
              We glossed over this in 6.x, but the truth is that Spell names are not unique. Some of the *RDs have duplicate names (not all, but I forget which). Same is true for languages.
              (1a) Spells can theoretically be differentiated by evaluating the "TYPE" token for Divine, Arcane, or Psionic (those are "magical" items in our code.
              (1b) Languages can theoretically be differentiated between "Spoken" and "Written" as those are "magical" types.
              I believe both of those forms of magic are things on our backlog of FREQs to clean up... and the reason they are on the cleanup list really for L10N (as much as it is to just cleanup the overuse of TYPE)

              Problem #2: Non-unique tokens
              There are only a few tokens that are not unique. DESC is one of them, if I recall correctly. The probably solution here is to simply give an identifier to each token. This might require a change to LST Something like:
              DESC*Overall:x
              DESC*Second:x
              (Note: I'm not sure this syntax works or is by any means "good". DESC:OVERALL|x might be better as it avoids potential issues with using * as a reserved character - take this is a principle of what would have to happen to DESC token, not as a full-blown proposal)

              Note that this naming of each DESC item (And other reusable items), while it breaks the "can't change the data" principle actually helps as much as it hurts... it would give us the ability to do things like:
              DESC:.CLEARID.Overall
              or things similar to that, which is actually a neat benefit for the small overhead of pain it puts into datasets. (Which, by the way, could be converted to whatever we decide on anyway with our nifty converter, so this doesn't seem all that bad)


              So in my mind, the question really is: Is going slightly more than half way good enough? There are areas where we could do translation, and some areas where we need some pretty material core code changes to support it.

              --- In pcgen_international@yahoogroups.com, "masaru20100" <hooya.masaru20100@...> wrote:
              >
              > Hi,
              >
              > What can I do to move forward the localization of PCGen data?
              >
              > I thought a bit about it, and I felt that if it was done it would need to be:
              > – The data should be the same that the English one to avoid having to duplicate data correction.
              > – The data that is not translated in a language, for example books not yet translated, should stay in the language it first get out.
              > – The interface language and the data language should be separate to allow people with book in a different language than their own to use the program more simply.
              > – It should be possible to use another language in a data to allow non English speaker to develop custom content in another language. That seem to be a problem when combined with separate language from data. A way to avoid it is to provide easy reference to create custom content.
              > – Use of collator to sort lists.
              >
            • Andrew
              Hi Tom, Welcome BACK!!! I don t mind the solutions put forth. Unique Identifiers for DESC is no more absurd then the Unique Identifiers we ll be needing to use
              Message 6 of 17 , Dec 9, 2011
              • 0 Attachment
                Hi Tom,

                Welcome BACK!!!

                I don't mind the solutions put forth. Unique Identifiers for DESC is no more absurd then the Unique
                Identifiers we'll be needing to use more than one CHOOSER on a single line. Plus, the ability to
                clear off a Section of DESC would be awesome.

                I'd vote for as complete a job as possible.

                Here is where I'm not understanding - where are we going to put the translation stuff?

                On 12/9/2011 11:03 AM, thpr wrote:
                >
                >
                >
                >
                > So looking back at the earlier thread, I never actually came back with my suggestion, so here is
                > my perspective on localization and how it should be done. This has been made easier by some
                > changes we have in the 6.x line, and having new UI (with what I believe to be a lot more isolation
                > of code that does display of items) is a huge boost here to making this practical.
                >
                > First a few base facts as background:
                > (1) For [slmost] every item (Class, Skill, etc.) there is a unique identifier (generally referred
                > to as a Key - note that if the "KEY" token is not used, then the Key is the name (first entry on
                > the line in the data) (There is an exception to this we'll call problem #1)
                > (2) There are basically 4 types of things that need translation:
                > (2a) item names
                > (2b) constants (like spell schools)
                > (2c) variables (like meters/feet/etc.)
                > (2d) Strings (like descriptions)
                > [if anyone can think of more, let me know]
                > (3) Most (but not all) tokens are "unique" or otherwise "addressable" in that they can only occur
                > once per object. (Anyone notice that the tokens have started to be specific in the test code &
                > docs about whether they overwrite, add, etc. - this is one reason why - and yes, I've been slowly
                > trying to make progress on this even in the 2007-2009 work) [The exception to this is problem #2)
                >
                > A few principles about l10n:
                > (1) We must not make producing a data set materially more complicated than it is today (no
                > requirement to put %L10NNAME% type gunk into data)
                > (2) We should target an ability to tell us if l10n is complete for any given data set
                >
                > With that:
                > Following from #1, almost everything we have in the data today is "addressable". By that, I mean
                > that the OUTPUTNAME for a Skill called "FooBar" can be uniquely called out. The name hierarchy is
                > something like: SKILL//Foo//OUTPUTNAME (exceptions are problems #1,2 to be addressed later)
                >
                > Given that, we can actually set principle #1 to "The data remains unchanged" (again, except for
                > problem #2). The entire data set is produced (assume US English for a moment). We then have a
                > unique file for l10n that has things like:
                > SKILL:FooBar|Oobarf-ay
                > SKILL:FooBar:OUTPUTNAME|ooBarOutF-ay
                > etc.
                >
                > This (to the first order) covers 2a, 2d.
                > For 2b items, we simply have to expand the list of items (SKILL, SPELL) that we are familiar with,
                > so we get things like:
                > SPELLSCHOOL:Divination|Ivinationd-ay
                >
                > 2c is a bit more complicated, but I can't believe it's all that bad given it's a thing many
                > applications already do (and it's a known thing)
                >
                > Each file could be named in such a way that it identifies it's l10n, e.g.:
                > srd_de_ch.l10n
                > ...and probably placed in a L10N subfolder of the initial dataset.
                > (Note I'd recommend we be clear on where these go, AND ALSO require that NO PCC FILES (I'd
                > recommend we say no PCC or LST, but the formal limit would be no PCC) are recognized if they are
                > in the L10N folder... so that the initial directory parse [which is probably one of the slower
                > parts of our boot process] can immediately ignore the l10n directory and not have to look through
                > the file list looking for .PCC files... alternately, we have a l10n folder that is parallel to the
                > data folder, but that then adds complication in needing new preferences to point at multiple l10n
                > folders and requires more complicated structure within that l10n structure to identify WHICH files
                > in that folder map to which datasets... because we can't simply say a given English word will
                > always translate the same way - English is way too overloaded for that.)
                >
                > I would recommend we keep things in a small subset of files, and not try to do 1:1 for each data
                > file (that would produce a lot of file sprawl) - but that's not my call.
                >
                > Addressing principle #2:
                > Since we have a set of items we know would need to be covered (names, certain tokens), we should
                > be able to load those into memory, and load the l10n file into memory and compare. This should be
                > able to produce warnings of 2 kinds:
                > (W1) Errors where the base data set contains things that are not translated
                > (W2) Errors where the translation file attempts to translate things not in the base data set
                > I can't imagine that utility is all that hard to write (just would need to make the list of todos)
                >
                > This brings us to problems #1 and #2:
                >
                > Problem #1: Non-unique names
                > We glossed over this in 6.x, but the truth is that Spell names are not unique. Some of the *RDs
                > have duplicate names (not all, but I forget which). Same is true for languages.
                > (1a) Spells can theoretically be differentiated by evaluating the "TYPE" token for Divine, Arcane,
                > or Psionic (those are "magical" items in our code.
                > (1b) Languages can theoretically be differentiated between "Spoken" and "Written" as those are
                > "magical" types.
                > I believe both of those forms of magic are things on our backlog of FREQs to clean up... and the
                > reason they are on the cleanup list really for L10N (as much as it is to just cleanup the overuse
                > of TYPE)
                >
                > Problem #2: Non-unique tokens
                > There are only a few tokens that are not unique. DESC is one of them, if I recall correctly. The
                > probably solution here is to simply give an identifier to each token. This might require a change
                > to LST Something like:
                > DESC*Overall:x
                > DESC*Second:x
                > (Note: I'm not sure this syntax works or is by any means "good". DESC:OVERALL|x might be better as
                > it avoids potential issues with using * as a reserved character - take this is a principle of what
                > would have to happen to DESC token, not as a full-blown proposal)
                >
                > Note that this naming of each DESC item (And other reusable items), while it breaks the "can't
                > change the data" principle actually helps as much as it hurts... it would give us the ability to
                > do things like:
                > DESC:.CLEARID.Overall
                > or things similar to that, which is actually a neat benefit for the small overhead of pain it puts
                > into datasets. (Which, by the way, could be converted to whatever we decide on anyway with our
                > nifty converter, so this doesn't seem all that bad)
                >
                > So in my mind, the question really is: Is going slightly more than half way good enough? There are
                > areas where we could do translation, and some areas where we need some pretty material core code
                > changes to support it.
                >
                > --- In pcgen_international@yahoogroups.com <mailto:pcgen_international%40yahoogroups.com>,
                > "masaru20100" <hooya.masaru20100@...> wrote:
                > >
                > > Hi,
                > >
                > > What can I do to move forward the localization of PCGen data?
                > >
                > > I thought a bit about it, and I felt that if it was done it would need to be:
                > > -- The data should be the same that the English one to avoid having to duplicate data correction.
                > > -- The data that is not translated in a language, for example books not yet translated, should
                > stay in the language it first get out.
                > > -- The interface language and the data language should be separate to allow people with book in
                > a different language than their own to use the program more simply.
                > > -- It should be possible to use another language in a data to allow non English speaker to
                > develop custom content in another language. That seem to be a problem when combined with separate
                > language from data. A way to avoid it is to provide easy reference to create custom content.
                > > -- Use of collator to sort lists.
                > >
                >
                >
                > --
                > Andrew Maitland (LegacyKing)
                > Admin Silverback - PCGen Board of Directors
                > Data 2nd, Docs Tamarin, OS Lemur
                > Unique Title "Quick-Silverback Tracker Monkey"
                > Unique Title "The Torturer of PCGen"


                [Non-text portions of this message have been removed]
              • Martijn Verburg
                Hey Tom, ... There is still some way to go with the separation of UI and core, but it is improved and I think i18n will force it a little further. It also
                Message 7 of 17 , Dec 9, 2011
                • 0 Attachment
                  Hey Tom,

                  On 9 December 2011 19:03, thpr <thpr@...> wrote:

                  > **
                  > So looking back at the earlier thread, I never actually came back with my
                  > suggestion, so here is my perspective on localization and how it should be
                  > done. This has been made easier by some changes we have in the 6.x line,
                  > and having new UI (with what I believe to be a lot more isolation of code
                  > that does display of items) is a huge boost here to making this practical.
                  >

                  There is still some way to go with the separation of UI and core, but it is
                  improved and I think i18n will force it a little further. It also allows
                  us to swap UIs to say things like Java FX going forward or even having
                  PCGen core running as a service somewhere with a DHTML front end, lots of
                  interesting possibilities.

                  > First a few base facts as background:
                  > (1) For [slmost] every item (Class, Skill, etc.) there is a unique
                  > identifier (generally referred to as a Key - note that if the "KEY" token
                  > is not used, then the Key is the name (first entry on the line in the data)
                  > (There is an exception to this we'll call problem #1)
                  > (2) There are basically 4 types of things that need translation:
                  > (2a) item names
                  > (2b) constants (like spell schools)
                  > (2c) variables (like meters/feet/etc.)
                  > (2d) Strings (like descriptions)
                  > [if anyone can think of more, let me know]
                  >
                  Numbers if we need to support a non Arabic numbering system (e.g. Kanji)

                  > (3) Most (but not all) tokens are "unique" or otherwise "addressable" in
                  > that they can only occur once per object. (Anyone notice that the tokens
                  > have started to be specific in the test code & docs about whether they
                  > overwrite, add, etc. - this is one reason why - and yes, I've been slowly
                  > trying to make progress on this even in the 2007-2009 work) [The exception
                  > to this is problem #2)
                  >
                  Yes and it's a good thing.

                  > A few principles about l10n:
                  > (1) We must not make producing a data set materially more complicated than
                  > it is today (no requirement to put %L10NNAME% type gunk into data)
                  >
                  Agreed

                  > (2) We should target an ability to tell us if l10n is complete for any
                  > given data set
                  >
                  Not sure what you mean by this?

                  > With that:
                  > Following from #1, almost everything we have in the data today is
                  > "addressable". By that, I mean that the OUTPUTNAME for a Skill called
                  > "FooBar" can be uniquely called out. The name hierarchy is something like:
                  > SKILL//Foo//OUTPUTNAME (exceptions are problems #1,2 to be addressed later)
                  >
                  > Given that, we can actually set principle #1 to "The data remains
                  > unchanged" (again, except for problem #2). The entire data set is produced
                  > (assume US English for a moment). We then have a unique file for l10n that
                  > has things like:
                  > SKILL:FooBar|Oobarf-ay
                  > SKILL:FooBar:OUTPUTNAME|ooBarOutF-ay
                  > etc.
                  >
                  > This (to the first order) covers 2a, 2d.
                  > For 2b items, we simply have to expand the list of items (SKILL, SPELL)
                  > that we are familiar with, so we get things like:
                  > SPELLSCHOOL:Divination|Ivinationd-ay
                  >
                  > 2c is a bit more complicated, but I can't believe it's all that bad given
                  > it's a thing many applications already do (and it's a known thing)
                  >
                  > Each file could be named in such a way that it identifies it's l10n, e.g.:
                  > srd_de_ch.l10n
                  > ...and probably placed in a L10N subfolder of the initial dataset.
                  > (Note I'd recommend we be clear on where these go, AND ALSO require that
                  > NO PCC FILES (I'd recommend we say no PCC or LST, but the formal limit
                  > would be no PCC) are recognized if they are in the L10N folder... so that
                  > the initial directory parse [which is probably one of the slower parts of
                  > our boot process] can immediately ignore the l10n directory and not have to
                  > look through the file list looking for .PCC files... alternately, we have a
                  > l10n folder that is parallel to the data folder, but that then adds
                  > complication in needing new preferences to point at multiple l10n folders
                  > and requires more complicated structure within that l10n structure to
                  > identify WHICH files in that folder map to which datasets... because we
                  > can't simply say a given English word will always translate the same way -
                  > English is way too overloaded for that.)
                  >
                  Sure, that's all performance related stuff to take into account.

                  > I would recommend we keep things in a small subset of files, and not try
                  > to do 1:1 for each data file (that would produce a lot of file sprawl) -
                  > but that's not my call.
                  >
                  Up to the data folks I suppose, but if there's performance impact then we
                  should look at it again.

                  > Addressing principle #2:
                  > Since we have a set of items we know would need to be covered (names,
                  > certain tokens), we should be able to load those into memory, and load the
                  > l10n file into memory and compare. This should be able to produce warnings
                  > of 2 kinds:
                  > (W1) Errors where the base data set contains things that are not translated
                  > (W2) Errors where the translation file attempts to translate things not in
                  > the base data set
                  > I can't imagine that utility is all that hard to write (just would need to
                  > make the list of todos)
                  >
                  Sounds good. Java's regex parser however isn't always that great (it
                  improved with some Perl guru's work in Java 7, but still)

                  > This brings us to problems #1 and #2:
                  >
                  > Problem #1: Non-unique names
                  > We glossed over this in 6.x, but the truth is that Spell names are not
                  > unique. Some of the *RDs have duplicate names (not all, but I forget
                  > which). Same is true for languages.
                  > (1a) Spells can theoretically be differentiated by evaluating the "TYPE"
                  > token for Divine, Arcane, or Psionic (those are "magical" items in our code.
                  > (1b) Languages can theoretically be differentiated between "Spoken" and
                  > "Written" as those are "magical" types.
                  > I believe both of those forms of magic are things on our backlog of FREQs
                  > to clean up... and the reason they are on the cleanup list really for L10N
                  > (as much as it is to just cleanup the overuse of TYPE)
                  >
                  Sounds fixable, we can bump that up the roadmap - Devon is back as another
                  coding resource as well which helps.

                  > Problem #2: Non-unique tokens
                  > There are only a few tokens that are not unique. DESC is one of them, if I
                  > recall correctly. The probably solution here is to simply give an
                  > identifier to each token. This might require a change to LST Something like:
                  > DESC*Overall:x
                  > DESC*Second:x
                  > (Note: I'm not sure this syntax works or is by any means "good".
                  > DESC:OVERALL|x might be better as it avoids potential issues with using *
                  > as a reserved character - take this is a principle of what would have to
                  > happen to DESC token, not as a full-blown proposal)
                  >
                  > Note that this naming of each DESC item (And other reusable items), while
                  > it breaks the "can't change the data" principle actually helps as much as
                  > it hurts... it would give us the ability to do things like:
                  > DESC:.CLEARID.Overall
                  > or things similar to that, which is actually a neat benefit for the small
                  > overhead of pain it puts into datasets. (Which, by the way, could be
                  > converted to whatever we decide on anyway with our nifty converter, so this
                  > doesn't seem all that bad)
                  >
                  > So in my mind, the question really is: Is going slightly more than half
                  > way good enough? There are areas where we could do translation, and some
                  > areas where we need some pretty material core code changes to support it.
                  >
                  I say we go the full way (or as full as possible). It could/should be the
                  once of the major projects for 6.2. One of my overriding goals for PCGen
                  is to get it into as many gamer machines as possible and having
                  translations does that.

                  Oh and welcome back Tom, the community has missed you :-)

                  K



                  >
                  > --- In pcgen_international@yahoogroups.com, "masaru20100"
                  > <hooya.masaru20100@...> wrote:
                  > >
                  > > Hi,
                  > >
                  > > What can I do to move forward the localization of PCGen data?
                  > >
                  > > I thought a bit about it, and I felt that if it was done it would need
                  > to be:
                  > > – The data should be the same that the English one to avoid having to
                  > duplicate data correction.
                  > > – The data that is not translated in a language, for example books not
                  > yet translated, should stay in the language it first get out.
                  > > – The interface language and the data language should be separate to
                  > allow people with book in a different language than their own to use the
                  > program more simply.
                  > > – It should be possible to use another language in a data to allow non
                  > English speaker to develop custom content in another language. That seem to
                  > be a problem when combined with separate language from data. A way to avoid
                  > it is to provide easy reference to create custom content.
                  > > – Use of collator to sort lists.
                  > >
                  >
                  >
                  >


                  [Non-text portions of this message have been removed]
                • Andrew
                  Hi, I ve added Tom s and Masura20100 notes here: http://wiki.pcgen.org/Internationalization ... [Non-text portions of this message have been removed]
                  Message 8 of 17 , Dec 11, 2011
                  • 0 Attachment
                    Hi,

                    I've added Tom's and Masura20100 notes here:

                    http://wiki.pcgen.org/Internationalization


                    On 12/9/2011 1:05 PM, Martijn Verburg wrote:
                    >
                    >
                    > Hey Tom,
                    >
                    > On 9 December 2011 19:03, thpr <thpr@... <mailto:thpr%40yahoo.com>> wrote:
                    >
                    > > **
                    > > So looking back at the earlier thread, I never actually came back with my
                    > > suggestion, so here is my perspective on localization and how it should be
                    > > done. This has been made easier by some changes we have in the 6.x line,
                    > > and having new UI (with what I believe to be a lot more isolation of code
                    > > that does display of items) is a huge boost here to making this practical.
                    > >
                    >
                    > There is still some way to go with the separation of UI and core, but it is
                    > improved and I think i18n will force it a little further. It also allows
                    > us to swap UIs to say things like Java FX going forward or even having
                    > PCGen core running as a service somewhere with a DHTML front end, lots of
                    > interesting possibilities.
                    >
                    > > First a few base facts as background:
                    > > (1) For [slmost] every item (Class, Skill, etc.) there is a unique
                    > > identifier (generally referred to as a Key - note that if the "KEY" token
                    > > is not used, then the Key is the name (first entry on the line in the data)
                    > > (There is an exception to this we'll call problem #1)
                    > > (2) There are basically 4 types of things that need translation:
                    > > (2a) item names
                    > > (2b) constants (like spell schools)
                    > > (2c) variables (like meters/feet/etc.)
                    > > (2d) Strings (like descriptions)
                    > > [if anyone can think of more, let me know]
                    > >
                    > Numbers if we need to support a non Arabic numbering system (e.g. Kanji)
                    >
                    > > (3) Most (but not all) tokens are "unique" or otherwise "addressable" in
                    > > that they can only occur once per object. (Anyone notice that the tokens
                    > > have started to be specific in the test code & docs about whether they
                    > > overwrite, add, etc. - this is one reason why - and yes, I've been slowly
                    > > trying to make progress on this even in the 2007-2009 work) [The exception
                    > > to this is problem #2)
                    > >
                    > Yes and it's a good thing.
                    >
                    > > A few principles about l10n:
                    > > (1) We must not make producing a data set materially more complicated than
                    > > it is today (no requirement to put %L10NNAME% type gunk into data)
                    > >
                    > Agreed
                    >
                    > > (2) We should target an ability to tell us if l10n is complete for any
                    > > given data set
                    > >
                    > Not sure what you mean by this?
                    >
                    > > With that:
                    > > Following from #1, almost everything we have in the data today is
                    > > "addressable". By that, I mean that the OUTPUTNAME for a Skill called
                    > > "FooBar" can be uniquely called out. The name hierarchy is something like:
                    > > SKILL//Foo//OUTPUTNAME (exceptions are problems #1,2 to be addressed later)
                    > >
                    > > Given that, we can actually set principle #1 to "The data remains
                    > > unchanged" (again, except for problem #2). The entire data set is produced
                    > > (assume US English for a moment). We then have a unique file for l10n that
                    > > has things like:
                    > > SKILL:FooBar|Oobarf-ay
                    > > SKILL:FooBar:OUTPUTNAME|ooBarOutF-ay
                    > > etc.
                    > >
                    > > This (to the first order) covers 2a, 2d.
                    > > For 2b items, we simply have to expand the list of items (SKILL, SPELL)
                    > > that we are familiar with, so we get things like:
                    > > SPELLSCHOOL:Divination|Ivinationd-ay
                    > >
                    > > 2c is a bit more complicated, but I can't believe it's all that bad given
                    > > it's a thing many applications already do (and it's a known thing)
                    > >
                    > > Each file could be named in such a way that it identifies it's l10n, e.g.:
                    > > srd_de_ch.l10n
                    > > ...and probably placed in a L10N subfolder of the initial dataset.
                    > > (Note I'd recommend we be clear on where these go, AND ALSO require that
                    > > NO PCC FILES (I'd recommend we say no PCC or LST, but the formal limit
                    > > would be no PCC) are recognized if they are in the L10N folder... so that
                    > > the initial directory parse [which is probably one of the slower parts of
                    > > our boot process] can immediately ignore the l10n directory and not have to
                    > > look through the file list looking for .PCC files... alternately, we have a
                    > > l10n folder that is parallel to the data folder, but that then adds
                    > > complication in needing new preferences to point at multiple l10n folders
                    > > and requires more complicated structure within that l10n structure to
                    > > identify WHICH files in that folder map to which datasets... because we
                    > > can't simply say a given English word will always translate the same way -
                    > > English is way too overloaded for that.)
                    > >
                    > Sure, that's all performance related stuff to take into account.
                    >
                    > > I would recommend we keep things in a small subset of files, and not try
                    > > to do 1:1 for each data file (that would produce a lot of file sprawl) -
                    > > but that's not my call.
                    > >
                    > Up to the data folks I suppose, but if there's performance impact then we
                    > should look at it again.
                    >
                    > > Addressing principle #2:
                    > > Since we have a set of items we know would need to be covered (names,
                    > > certain tokens), we should be able to load those into memory, and load the
                    > > l10n file into memory and compare. This should be able to produce warnings
                    > > of 2 kinds:
                    > > (W1) Errors where the base data set contains things that are not translated
                    > > (W2) Errors where the translation file attempts to translate things not in
                    > > the base data set
                    > > I can't imagine that utility is all that hard to write (just would need to
                    > > make the list of todos)
                    > >
                    > Sounds good. Java's regex parser however isn't always that great (it
                    > improved with some Perl guru's work in Java 7, but still)
                    >
                    > > This brings us to problems #1 and #2:
                    > >
                    > > Problem #1: Non-unique names
                    > > We glossed over this in 6.x, but the truth is that Spell names are not
                    > > unique. Some of the *RDs have duplicate names (not all, but I forget
                    > > which). Same is true for languages.
                    > > (1a) Spells can theoretically be differentiated by evaluating the "TYPE"
                    > > token for Divine, Arcane, or Psionic (those are "magical" items in our code.
                    > > (1b) Languages can theoretically be differentiated between "Spoken" and
                    > > "Written" as those are "magical" types.
                    > > I believe both of those forms of magic are things on our backlog of FREQs
                    > > to clean up... and the reason they are on the cleanup list really for L10N
                    > > (as much as it is to just cleanup the overuse of TYPE)
                    > >
                    > Sounds fixable, we can bump that up the roadmap - Devon is back as another
                    > coding resource as well which helps.
                    >
                    > > Problem #2: Non-unique tokens
                    > > There are only a few tokens that are not unique. DESC is one of them, if I
                    > > recall correctly. The probably solution here is to simply give an
                    > > identifier to each token. This might require a change to LST Something like:
                    > > DESC*Overall:x
                    > > DESC*Second:x
                    > > (Note: I'm not sure this syntax works or is by any means "good".
                    > > DESC:OVERALL|x might be better as it avoids potential issues with using *
                    > > as a reserved character - take this is a principle of what would have to
                    > > happen to DESC token, not as a full-blown proposal)
                    > >
                    > > Note that this naming of each DESC item (And other reusable items), while
                    > > it breaks the "can't change the data" principle actually helps as much as
                    > > it hurts... it would give us the ability to do things like:
                    > > DESC:.CLEARID.Overall
                    > > or things similar to that, which is actually a neat benefit for the small
                    > > overhead of pain it puts into datasets. (Which, by the way, could be
                    > > converted to whatever we decide on anyway with our nifty converter, so this
                    > > doesn't seem all that bad)
                    > >
                    > > So in my mind, the question really is: Is going slightly more than half
                    > > way good enough? There are areas where we could do translation, and some
                    > > areas where we need some pretty material core code changes to support it.
                    > >
                    > I say we go the full way (or as full as possible). It could/should be the
                    > once of the major projects for 6.2. One of my overriding goals for PCGen
                    > is to get it into as many gamer machines as possible and having
                    > translations does that.
                    >
                    > Oh and welcome back Tom, the community has missed you :-)
                    >
                    > K
                    >
                    > >
                    > > --- In pcgen_international@yahoogroups.com <mailto:pcgen_international%40yahoogroups.com>,
                    > "masaru20100"
                    > > <hooya.masaru20100@...> wrote:
                    > > >
                    > > > Hi,
                    > > >
                    > > > What can I do to move forward the localization of PCGen data?
                    > > >
                    > > > I thought a bit about it, and I felt that if it was done it would need
                    > > to be:
                    > > > – The data should be the same that the English one to avoid having to
                    > > duplicate data correction.
                    > > > – The data that is not translated in a language, for example books not
                    > > yet translated, should stay in the language it first get out.
                    > > > – The interface language and the data language should be separate to
                    > > allow people with book in a different language than their own to use the
                    > > program more simply.
                    > > > – It should be possible to use another language in a data to allow non
                    > > English speaker to develop custom content in another language. That seem to
                    > > be a problem when combined with separate language from data. A way to avoid
                    > > it is to provide easy reference to create custom content.
                    > > > – Use of collator to sort lists.
                    > > >
                    > >
                    > >
                    > >
                    >
                    > [Non-text portions of this message have been removed]
                    >
                    >
                    > --
                    > Andrew Maitland (LegacyKing)
                    > Admin Silverback - PCGen Board of Directors
                    > Data 2nd, Docs Tamarin, OS Lemur
                    > Unique Title "Quick-Silverback Tracker Monkey"
                    > Unique Title "The Torturer of PCGen"


                    [Non-text portions of this message have been removed]
                  • masaru20100
                    ... In Japanese, the classic number system is almost always used especially in RPG. In fact, translating numbers is not needed, what is needed is formatting
                    Message 9 of 17 , Dec 13, 2011
                    • 0 Attachment
                      --- In pcgen_international@yahoogroups.com, Martijn Verburg <martijnverburg@...> wrote:
                      > Numbers if we need to support a non Arabic numbering system (e.g. Kanji)

                      In Japanese, the classic number system is almost always used especially in RPG. In fact, translating numbers is not needed, what is needed is formatting numbers.
                      That means that 10000 gets formatted 10,000 in English, 10 000 in French. In Japanese, it would either be 10,000 or 1万.
                      In Java, this is usually done by using the NumberFormat class.

                      --
                      masaru20100
                    • masaru20100
                      ... I’m only guessing there but here is my thoughts. It’s either to provide the user with the information on what is translated or not, either for
                      Message 10 of 17 , Dec 13, 2011
                      • 0 Attachment
                        > > (2) We should target an ability to tell us if l10n is complete for any
                        > > given data set
                        > >
                        > Not sure what you mean by this?

                        I’m only guessing there but here is my thoughts. It’s either to provide the user with the information on what is translated or not, either for maintainers to know what is needing translation or not.
                        It can also be both.
                      • masaru20100
                        ... I’ve update the page with new thoughts, tried to clarify points and added some. I’ve also tried to understand the PCGen specifics point that Tom
                        Message 11 of 17 , Dec 13, 2011
                        • 0 Attachment
                          --- In pcgen_international@yahoogroups.com, Andrew <drew0500@...> wrote:
                          >
                          > Hi,
                          >
                          > I've added Tom's and Masura20100 notes here:
                          >
                          > http://wiki.pcgen.org/Internationalization
                          >

                          I’ve update the page with new thoughts, tried to clarify points and added some.

                          I’ve also tried to understand the PCGen specifics point that Tom address but I’m a bit loss, because I never went into details of PCGen functioning.

                          If the file srd_de_ch.l10n contain “SKILL:FooBar|Oobarf-ay”, is the meaning the skill Foobar should be translated displayed as Oobarf-ay in de_ch (Swiss German?).
                          Is there a reason not to use ResourceBundle? Or just another already existing system, like the one used by GNU’s gettext (a Java library exist)?
                          The reason I mention those formats, is that they are tools (meant for translators) that manage either format and ease translating work. When using another format, those tools become useless.
                          I’m trying to update the existing PCGen ResourceBundle and used JavaPM to produce a XIFF file from the bundles then edit it with Virtaal.

                          I thought that a properties file would be used, so the file srd_de_ch.properties would contain those lines:
                          SKILL:FooBar=Oobarf-ay
                          SKILL:FooBar.DESC=Oobarf-ay allows you to blabla
                          The last line is the DESC translation of the Foobar skill.

                          I realise there could be problem because of the : in the key (but with a \: it might disappear), but other than that, what would be the problem.

                          Another point, I didn’t know is that some spell name are not unique. Writing this, I think I got it, do you mean that some third party publisher create a spell “BuffMe” and another did too and you have two spells with the same name?

                          --
                          masaru20100
                        • thpr
                          ... That s a given, but that s not the question (It s not for you, it s for the folks that would do the translating). The point is if we start people down the
                          Message 12 of 17 , Dec 13, 2011
                          • 0 Attachment
                            --- In pcgen_international@yahoogroups.com, Andrew <drew0500@...> wrote:
                            > I'd vote for as complete a job as possible.

                            That's a given, but that's not the question (It's not for you, it's for the folks that would do the translating). The point is if we start people down the path of actually doing any translation, then we need to clearly set the expectation of what will be possible. If we can't do any conversion for things like DESC, then it would only cover say 60% of the data sets... so the question is not do we want to finish this someday, it's do folks want to start when they know the near-term (measured as 12-18 months) end game is only 60% complete on translation. That may be demoralizing to start knowing that, so would we even tell people to get started?

                            > Here is where I'm not understanding - where are we going to put the translation stuff?

                            Idea #1 (bad idea IMHO) is to put it in a separate directory:
                            data/srd/...
                            l10n/srd/...

                            The problem with that is that you get into synchronization issues... so any time files/directory names change in one place they have to change in another. That's a contract on the data developer. (It would also require multiple l10n directories, since we support multiple data directories, so it's a bunch of code)

                            Idea #2: Implicit Subdirectories:
                            data/srd/*.lst
                            data/srd/l10n/*.l10n

                            That ensures that the l10n directory is associated with the dataset.

                            Idea #3: Explicit designation:
                            data/srd/srd.pcc contains LOCALIZATION:l10n/srd.l10n
                            then:
                            data/l10n/srd.l10n

                            (would support more than one .l10n file)


                            --- In pcgen_international@yahoogroups.com, Martijn Verburg <martijnverburg@...> wrote:
                            > > (2) We should target an ability to tell us if l10n is complete for any
                            > > given data set
                            > >
                            > Not sure what you mean by this?

                            If there is an object called "Dagger" we need to ensure the file contains:
                            EQUIPMENT:Dagger|Aggerd-ay

                            If it contains:
                            EQUIPMENT:Dgager|Aggerd-ay

                            That is also an error (just like the "unconstructed reference" items are errors. By capturing both type 1 and type 2 error (things that aren't translated as well as things that shouldn't have been translated) we are capturing the vast majority of the simple problems.



                            --- In pcgen_international@yahoogroups.com, "masaru20100" <hooya.masaru20100@...> wrote:
                            >
                            > In the process of updating the translation, I found another potential problem: depending on the system, some element is not translated the same way. This particular case is rank that is not translated the same way in D&D 3.x and Pathfinder.
                            > Doing system dependant translation might complicate things quite a bit.

                            This depends on what is being translated. If it's data items, then the system I proposed will handle that. If it's NOT (meaning it's things in the UI) then we will have to address that in a more complicated fashion.


                            > If the file srd_de_ch.l10n contain “SKILL:FooBar|Oobarf-ay”, is the meaning the skill Foobar should be translated displayed as Oobarf-ay in de_ch (Swiss German?).

                            Yes, obviously I was being a bit silly in using Pig Latin, but you have the idea.

                            > Is there a reason not to use ResourceBundle? Or just another already existing system, like the one used by GNU’s gettext (a Java library exist)?

                            A few additional background items as I answer this. I stated earlier the original data sets should not require weird behavior. That means we don't want a data file to be:
                            l10n.spell.Fireball <> SPELLSCHOOL:l10n.spellschool.Evocation <> ...

                            It makes the data files completely impossible to read or debug IMHO, and would drive people completely batty to require that in our home brew data. (Talk about no one wanting to develop data anymore for PCGen...)

                            Having the direct string in a file helps a LOT when there are issues we are trying to resolve. This is why you will often see developers have "message IDs" (unique of translation) in error messages so that they can just ask the end user for the message ID/error ID and look THAT up in the code using grep (rather than having to figure out which resource bundle has the string, then search the code for that resource id, etc.)

                            > The reason I mention those formats, is that they are tools (meant for translators) that manage either format and ease translating work. When using another format, those tools become useless.

                            We can evaluate whether we can get them to work without ambiguity. Also note that we would build the strings dynamically inside the code and there would be no list for the default language (English in most cases), so anything expecting a list of items from the default language (as a method of telling what is/is not completed) for the translator would not work. (Part of the investigation here should be how much of this would actually be usable by the translator given how we would be doing the work in the code)

                            > I thought that a properties file would be used, so the file srd_de_ch.properties would contain those lines:
                            > SKILL:FooBar=Oobarf-ay
                            > SKILL:FooBar.DESC=Oobarf-ay allows you to blabla
                            > The last line is the DESC translation of the Foobar skill.

                            > I realise there could be problem because of the : in the key (but with a \: it might disappear), but other than that, what would be the problem.

                            We'd have to investigate how much of the infrastructure in Java we could use. The resourcebundle code has defaults that require files be in certain places (which they wouldn't be in the case of our files) and we'd have to figure out how to handle that. I just haven't used it outside of it's default behavior before, so just haven't read the docs or tried. There is also complication because the default language would not require a resourcebundle (to me this is an unyielding requirement as it would make homebrews a nightmare to make if you have to do this)

                            > Another point, I didn’t know is that some spell name are not unique. Writing this, I think I got it, do you mean that some third party publisher create a spell “BuffMe” and another did too and you have two spells with the same name?

                            No. I mean the SRD has "Foo" and "Foo", one of which is Arcane, one Psionic. (I don't remember the exact spell)

                            TP.
                          • Andrew
                            Hi, ... I think realistic expectations should be explained up front - this is what we can accomplish today, this is what we want to accomplish in 12-18 months,
                            Message 13 of 17 , Dec 13, 2011
                            • 0 Attachment
                              Hi,



                              On 12/13/2011 7:23 AM, thpr wrote:
                              >
                              >
                              >
                              >
                              > --- In pcgen_international@yahoogroups.com <mailto:pcgen_international%40yahoogroups.com>, Andrew
                              > <drew0500@...> wrote:
                              > > I'd vote for as complete a job as possible.
                              >
                              > That's a given, but that's not the question (It's not for you, it's for the folks that would do
                              > the translating). The point is if we start people down the path of actually doing any translation,
                              > then we need to clearly set the expectation of what will be possible. If we can't do any
                              > conversion for things like DESC, then it would only cover say 60% of the data sets... so the
                              > question is not do we want to finish this someday, it's do folks want to start when they know the
                              > near-term (measured as 12-18 months) end game is only 60% complete on translation. That may be
                              > demoralizing to start knowing that, so would we even tell people to get started?
                              >

                              I think realistic expectations should be explained up front - this is what we can accomplish today,
                              this is what we want to accomplish in 12-18 months, and this is the future outlook. Though I agree,
                              it can be demoralizing, I'd rather folks know what we're doing then keep people in the dark.


                              >
                              > > Here is where I'm not understanding - where are we going to put the translation stuff?
                              >
                              > Idea #1 (bad idea IMHO) is to put it in a separate directory:
                              > data/srd/...
                              > l10n/srd/...
                              >
                              > The problem with that is that you get into synchronization issues... so any time files/directory
                              > names change in one place they have to change in another. That's a contract on the data developer.
                              > (It would also require multiple l10n directories, since we support multiple data directories, so
                              > it's a bunch of code)
                              >

                              Agreed, bad idea. Less work/upkeep is better

                              >
                              > Idea #2: Implicit Subdirectories:
                              > data/srd/*.lst
                              > data/srd/l10n/*.l10n
                              >
                              > That ensures that the l10n directory is associated with the dataset.
                              >
                              > Idea #3: Explicit designation:
                              > data/srd/srd.pcc contains LOCALIZATION:l10n/srd.l10n
                              > then:
                              > data/l10n/srd.l10n
                              >
                              > (would support more than one .l10n file)
                              >

                              I'm leaning towards #2 myself, I'm not sure why we'd need multiple files. Though, my understanding
                              of l10n is very limited, so if there are best practices I think we should defer to those. Besides -
                              earlier you stated PCC and LST should not be aware of the l10n files and #3 seems to contradict that.

                              From a data perspective, idea #2 gives me a better grasp of what files have been started on
                              translation. Idea #3 keeps it in one spot - so it really is a tough call.

                              >
                              > --- In pcgen_international@yahoogroups.com <mailto:pcgen_international%40yahoogroups.com>, Martijn
                              > Verburg <martijnverburg@...> wrote:
                              > > > (2) We should target an ability to tell us if l10n is complete for any
                              > > > given data set
                              > > >
                              > > Not sure what you mean by this?
                              >
                              > If there is an object called "Dagger" we need to ensure the file contains:
                              > EQUIPMENT:Dagger|Aggerd-ay
                              >
                              > If it contains:
                              > EQUIPMENT:Dgager|Aggerd-ay
                              >
                              > That is also an error (just like the "unconstructed reference" items are errors. By capturing both
                              > type 1 and type 2 error (things that aren't translated as well as things that shouldn't have been
                              > translated) we are capturing the vast majority of the simple problems.
                              >
                              > --- In pcgen_international@yahoogroups.com <mailto:pcgen_international%40yahoogroups.com>,
                              > "masaru20100" <hooya.masaru20100@...> wrote:
                              > >
                              > > In the process of updating the translation, I found another potential problem: depending on the
                              > system, some element is not translated the same way. This particular case is rank that is not
                              > translated the same way in D&D 3.x and Pathfinder.
                              > > Doing system dependant translation might complicate things quite a bit.
                              >
                              > This depends on what is being translated. If it's data items, then the system I proposed will
                              > handle that. If it's NOT (meaning it's things in the UI) then we will have to address that in a
                              > more complicated fashion.
                              >
                              > > If the file srd_de_ch.l10n contain âEURoeSKILL:FooBar|Oobarf-ayâEUR?, is the meaning the skill
                              > Foobar should be translated displayed as Oobarf-ay in de_ch (Swiss German?).
                              >
                              > Yes, obviously I was being a bit silly in using Pig Latin, but you have the idea.
                              >
                              > > Is there a reason not to use ResourceBundle? Or just another already existing system, like the
                              > one used by GNUâEUR^(TM)s gettext (a Java library exist)?
                              >
                              > A few additional background items as I answer this. I stated earlier the original data sets should
                              > not require weird behavior. That means we don't want a data file to be:
                              > l10n.spell.Fireball <> SPELLSCHOOL:l10n.spellschool.Evocation <> ...
                              >
                              > It makes the data files completely impossible to read or debug IMHO, and would drive people
                              > completely batty to require that in our home brew data. (Talk about no one wanting to develop data
                              > anymore for PCGen...)
                              >
                              > Having the direct string in a file helps a LOT when there are issues we are trying to resolve.
                              > This is why you will often see developers have "message IDs" (unique of translation) in error
                              > messages so that they can just ask the end user for the message ID/error ID and look THAT up in
                              > the code using grep (rather than having to figure out which resource bundle has the string, then
                              > search the code for that resource id, etc.)
                              >
                              > > The reason I mention those formats, is that they are tools (meant for translators) that manage
                              > either format and ease translating work. When using another format, those tools become useless.
                              >
                              > We can evaluate whether we can get them to work without ambiguity. Also note that we would build
                              > the strings dynamically inside the code and there would be no list for the default language
                              > (English in most cases), so anything expecting a list of items from the default language (as a
                              > method of telling what is/is not completed) for the translator would not work. (Part of the
                              > investigation here should be how much of this would actually be usable by the translator given how
                              > we would be doing the work in the code)
                              >
                              > > I thought that a properties file would be used, so the file srd_de_ch.properties would contain
                              > those lines:
                              > > SKILL:FooBar=Oobarf-ay
                              > > SKILL:FooBar.DESC=Oobarf-ay allows you to blabla
                              > > The last line is the DESC translation of the Foobar skill.
                              >
                              > > I realise there could be problem because of the : in the key (but with a \: it might disappear),
                              > but other than that, what would be the problem.
                              >
                              > We'd have to investigate how much of the infrastructure in Java we could use. The resourcebundle
                              > code has defaults that require files be in certain places (which they wouldn't be in the case of
                              > our files) and we'd have to figure out how to handle that. I just haven't used it outside of it's
                              > default behavior before, so just haven't read the docs or tried. There is also complication
                              > because the default language would not require a resourcebundle (to me this is an unyielding
                              > requirement as it would make homebrews a nightmare to make if you have to do this)
                              >
                              > > Another point, I didnâEUR^(TM)t know is that some spell name are not unique. Writing this, I
                              > think I got it, do you mean that some third party publisher create a spell âEURoeBuffMeâEUR? and
                              > another did too and you have two spells with the same name?
                              >
                              > No. I mean the SRD has "Foo" and "Foo", one of which is Arcane, one Psionic. (I don't remember the
                              > exact spell)
                              >

                              Oh, that - 'Telepathic Bond (Lesser)' is the one I think you're thinking of. I'd also point out that
                              WotC has multiple feats across it's books that share the same name.


                              >
                              > TP.
                              >
                              >
                              > --
                              > Andrew Maitland (LegacyKing)
                              > Admin Silverback - PCGen Board of Directors
                              > Data 2nd, Docs Tamarin, OS Lemur
                              > Unique Title "Quick-Silverback Tracker Monkey"
                              > Unique Title "The Torturer of PCGen"


                              [Non-text portions of this message have been removed]
                            • hooya.masaru20100@antichef.net
                              In my opinion, the main problem is that data can not be localized for now. So internationalisation should be done and is a priority. Providing localizations is
                              Message 14 of 17 , Dec 18, 2011
                              • 0 Attachment
                                In my opinion, the main problem is that data can not be localized for
                                now. So internationalisation should be done and is a priority. Providing
                                localizations is a second step, and it would take time; but it cannot be
                                done, even by users, so the internationalization is the main problem.

                                I’ve updated the wiki with more details on many things, also tried to
                                separate internationalization problems and localization ones.

                                Le mardi 13 décembre 2011 à 15:23 +0000, thpr - thpr@... a écrit :
                                > --- In pcgen_international@yahoogroups.com, "masaru20100" <hooya.masaru20100@...> wrote:
                                > >
                                > > In the process of updating the translation, I found another potential problem: depending on the system, some element is not translated the same way. This particular case is rank that is not translated the same way in D&D 3.x and Pathfinder.
                                > > Doing system dependant translation might complicate things quite a bit.
                                >
                                > This depends on what is being translated. If it's data items, then the system I proposed will handle that. If it's NOT (meaning it's things in the UI) then we will have to address that in a more complicated fashion.

                                I pointed the problem because it is UI strings that are dataset
                                dependent. In this case, both are probably understandable but having the
                                good term used would be better.
                                Compared to data set internationalization, this is not that important,
                                but if, somehow, the solution could handle that aspect, it would be
                                good.

                                > > The reason I mention those formats, is that they are tools (meant for translators) that manage either format and ease translating work. When using another format, those tools become useless.
                                >
                                > We can evaluate whether we can get them to work without ambiguity. Also note that we would build the strings dynamically inside the code and there would be no list for the default language (English in most cases), so anything expecting a list of items from the default language (as a method of telling what is/is not completed) for the translator would not work. (Part of the investigation here should be how much of this would actually be usable by the translator given how we would be doing the work in the code)

                                If there is no list of strings to translate for the default language, a
                                tool to generate that list would be a must have to start doing
                                translation in a language.

                                A tool that produce and update a standard format would be a nice
                                addition.

                                --
                                Vincent
                              • masaru20100
                                Hi, Is there an issue in JIRA about data localization. I had created a wiki page (http://wiki.pcgen.org/Internationalization) with plenty of text on the matter
                                Message 15 of 17 , Sep 2, 2012
                                • 0 Attachment
                                  Hi,

                                  Is there an issue in JIRA about data localization. I had created a wiki page (http://wiki.pcgen.org/Internationalization) with plenty of text on the matter (including the mails in this topic), but I have no idea if it's advancing or not.

                                  Regards,
                                  --
                                  Vincent
                                • Martijn Verburg
                                  Yes, James has fixed a number of CODE Jiras recently - take a look at the recently closed issues in that project. -K ... [Non-text portions of this message
                                  Message 16 of 17 , Sep 3, 2012
                                  • 0 Attachment
                                    Yes, James has fixed a number of CODE Jiras recently - take a look at the
                                    recently closed issues in that project. -K

                                    On 3 September 2012 06:35, masaru20100 <hooya.masaru20100@...>wrote:

                                    > **
                                    >
                                    >
                                    > Hi,
                                    >
                                    > Is there an issue in JIRA about data localization. I had created a wiki
                                    > page (http://wiki.pcgen.org/Internationalization) with plenty of text on
                                    > the matter (including the mails in this topic), but I have no idea if it's
                                    > advancing or not.
                                    >
                                    > Regards,
                                    > --
                                    > Vincent
                                    >
                                    >
                                    >


                                    [Non-text portions of this message have been removed]
                                  • James Dempsey
                                    Hi, Those CODE jiras were concerned with UI localisation. We aren t tracking data localisation in JIRA yet. That project will have to wait for after the 6.0
                                    Message 17 of 17 , Sep 3, 2012
                                    • 0 Attachment
                                      Hi,

                                      Those CODE jiras were concerned with UI localisation. We aren't tracking
                                      data localisation in JIRA yet. That project will have to wait for after
                                      the 6.0 release.

                                      Cheers,
                                      James.

                                      On 3/09/2012 6:20 PM Martijn Verburg wrote
                                      > Yes, James has fixed a number of CODE Jiras recently - take a look at the
                                      > recently closed issues in that project. -K
                                      >
                                      > On 3 September 2012 06:35, masaru20100<hooya.masaru20100@...>wrote:
                                      >
                                      >> **
                                      >>
                                      >>
                                      >> Hi,
                                      >>
                                      >> Is there an issue in JIRA about data localization. I had created a wiki
                                      >> page (http://wiki.pcgen.org/Internationalization) with plenty of text on
                                      >> the matter (including the mails in this topic), but I have no idea if it's
                                      >> advancing or not.
                                      >>
                                      >> Regards,
                                      >> --
                                      >> Vincent
                                    Your message has been successfully submitted and would be delivered to recipients shortly.