Loading ...
Sorry, an error occurred while loading the content.

Re: Data localization

Expand Messages
  • masaru20100
    ... I’ve update the page with new thoughts, tried to clarify points and added some. I’ve also tried to understand the PCGen specifics point that Tom
    Message 1 of 17 , Dec 13, 2011
    View Source
    • 0 Attachment
      --- In pcgen_international@yahoogroups.com, Andrew <drew0500@...> wrote:
      >
      > Hi,
      >
      > I've added Tom's and Masura20100 notes here:
      >
      > http://wiki.pcgen.org/Internationalization
      >

      I’ve update the page with new thoughts, tried to clarify points and added some.

      I’ve also tried to understand the PCGen specifics point that Tom address but I’m a bit loss, because I never went into details of PCGen functioning.

      If the file srd_de_ch.l10n contain “SKILL:FooBar|Oobarf-ay”, is the meaning the skill Foobar should be translated displayed as Oobarf-ay in de_ch (Swiss German?).
      Is there a reason not to use ResourceBundle? Or just another already existing system, like the one used by GNU’s gettext (a Java library exist)?
      The reason I mention those formats, is that they are tools (meant for translators) that manage either format and ease translating work. When using another format, those tools become useless.
      I’m trying to update the existing PCGen ResourceBundle and used JavaPM to produce a XIFF file from the bundles then edit it with Virtaal.

      I thought that a properties file would be used, so the file srd_de_ch.properties would contain those lines:
      SKILL:FooBar=Oobarf-ay
      SKILL:FooBar.DESC=Oobarf-ay allows you to blabla
      The last line is the DESC translation of the Foobar skill.

      I realise there could be problem because of the : in the key (but with a \: it might disappear), but other than that, what would be the problem.

      Another point, I didn’t know is that some spell name are not unique. Writing this, I think I got it, do you mean that some third party publisher create a spell “BuffMe” and another did too and you have two spells with the same name?

      --
      masaru20100
    • thpr
      ... That s a given, but that s not the question (It s not for you, it s for the folks that would do the translating). The point is if we start people down the
      Message 2 of 17 , Dec 13, 2011
      View Source
      • 0 Attachment
        --- In pcgen_international@yahoogroups.com, Andrew <drew0500@...> wrote:
        > I'd vote for as complete a job as possible.

        That's a given, but that's not the question (It's not for you, it's for the folks that would do the translating). The point is if we start people down the path of actually doing any translation, then we need to clearly set the expectation of what will be possible. If we can't do any conversion for things like DESC, then it would only cover say 60% of the data sets... so the question is not do we want to finish this someday, it's do folks want to start when they know the near-term (measured as 12-18 months) end game is only 60% complete on translation. That may be demoralizing to start knowing that, so would we even tell people to get started?

        > Here is where I'm not understanding - where are we going to put the translation stuff?

        Idea #1 (bad idea IMHO) is to put it in a separate directory:
        data/srd/...
        l10n/srd/...

        The problem with that is that you get into synchronization issues... so any time files/directory names change in one place they have to change in another. That's a contract on the data developer. (It would also require multiple l10n directories, since we support multiple data directories, so it's a bunch of code)

        Idea #2: Implicit Subdirectories:
        data/srd/*.lst
        data/srd/l10n/*.l10n

        That ensures that the l10n directory is associated with the dataset.

        Idea #3: Explicit designation:
        data/srd/srd.pcc contains LOCALIZATION:l10n/srd.l10n
        then:
        data/l10n/srd.l10n

        (would support more than one .l10n file)


        --- In pcgen_international@yahoogroups.com, Martijn Verburg <martijnverburg@...> wrote:
        > > (2) We should target an ability to tell us if l10n is complete for any
        > > given data set
        > >
        > Not sure what you mean by this?

        If there is an object called "Dagger" we need to ensure the file contains:
        EQUIPMENT:Dagger|Aggerd-ay

        If it contains:
        EQUIPMENT:Dgager|Aggerd-ay

        That is also an error (just like the "unconstructed reference" items are errors. By capturing both type 1 and type 2 error (things that aren't translated as well as things that shouldn't have been translated) we are capturing the vast majority of the simple problems.



        --- In pcgen_international@yahoogroups.com, "masaru20100" <hooya.masaru20100@...> wrote:
        >
        > In the process of updating the translation, I found another potential problem: depending on the system, some element is not translated the same way. This particular case is rank that is not translated the same way in D&D 3.x and Pathfinder.
        > Doing system dependant translation might complicate things quite a bit.

        This depends on what is being translated. If it's data items, then the system I proposed will handle that. If it's NOT (meaning it's things in the UI) then we will have to address that in a more complicated fashion.


        > If the file srd_de_ch.l10n contain “SKILL:FooBar|Oobarf-ay”, is the meaning the skill Foobar should be translated displayed as Oobarf-ay in de_ch (Swiss German?).

        Yes, obviously I was being a bit silly in using Pig Latin, but you have the idea.

        > Is there a reason not to use ResourceBundle? Or just another already existing system, like the one used by GNU’s gettext (a Java library exist)?

        A few additional background items as I answer this. I stated earlier the original data sets should not require weird behavior. That means we don't want a data file to be:
        l10n.spell.Fireball <> SPELLSCHOOL:l10n.spellschool.Evocation <> ...

        It makes the data files completely impossible to read or debug IMHO, and would drive people completely batty to require that in our home brew data. (Talk about no one wanting to develop data anymore for PCGen...)

        Having the direct string in a file helps a LOT when there are issues we are trying to resolve. This is why you will often see developers have "message IDs" (unique of translation) in error messages so that they can just ask the end user for the message ID/error ID and look THAT up in the code using grep (rather than having to figure out which resource bundle has the string, then search the code for that resource id, etc.)

        > The reason I mention those formats, is that they are tools (meant for translators) that manage either format and ease translating work. When using another format, those tools become useless.

        We can evaluate whether we can get them to work without ambiguity. Also note that we would build the strings dynamically inside the code and there would be no list for the default language (English in most cases), so anything expecting a list of items from the default language (as a method of telling what is/is not completed) for the translator would not work. (Part of the investigation here should be how much of this would actually be usable by the translator given how we would be doing the work in the code)

        > I thought that a properties file would be used, so the file srd_de_ch.properties would contain those lines:
        > SKILL:FooBar=Oobarf-ay
        > SKILL:FooBar.DESC=Oobarf-ay allows you to blabla
        > The last line is the DESC translation of the Foobar skill.

        > I realise there could be problem because of the : in the key (but with a \: it might disappear), but other than that, what would be the problem.

        We'd have to investigate how much of the infrastructure in Java we could use. The resourcebundle code has defaults that require files be in certain places (which they wouldn't be in the case of our files) and we'd have to figure out how to handle that. I just haven't used it outside of it's default behavior before, so just haven't read the docs or tried. There is also complication because the default language would not require a resourcebundle (to me this is an unyielding requirement as it would make homebrews a nightmare to make if you have to do this)

        > Another point, I didn’t know is that some spell name are not unique. Writing this, I think I got it, do you mean that some third party publisher create a spell “BuffMe” and another did too and you have two spells with the same name?

        No. I mean the SRD has "Foo" and "Foo", one of which is Arcane, one Psionic. (I don't remember the exact spell)

        TP.
      • Andrew
        Hi, ... I think realistic expectations should be explained up front - this is what we can accomplish today, this is what we want to accomplish in 12-18 months,
        Message 3 of 17 , Dec 13, 2011
        View Source
        • 0 Attachment
          Hi,



          On 12/13/2011 7:23 AM, thpr wrote:
          >
          >
          >
          >
          > --- In pcgen_international@yahoogroups.com <mailto:pcgen_international%40yahoogroups.com>, Andrew
          > <drew0500@...> wrote:
          > > I'd vote for as complete a job as possible.
          >
          > That's a given, but that's not the question (It's not for you, it's for the folks that would do
          > the translating). The point is if we start people down the path of actually doing any translation,
          > then we need to clearly set the expectation of what will be possible. If we can't do any
          > conversion for things like DESC, then it would only cover say 60% of the data sets... so the
          > question is not do we want to finish this someday, it's do folks want to start when they know the
          > near-term (measured as 12-18 months) end game is only 60% complete on translation. That may be
          > demoralizing to start knowing that, so would we even tell people to get started?
          >

          I think realistic expectations should be explained up front - this is what we can accomplish today,
          this is what we want to accomplish in 12-18 months, and this is the future outlook. Though I agree,
          it can be demoralizing, I'd rather folks know what we're doing then keep people in the dark.


          >
          > > Here is where I'm not understanding - where are we going to put the translation stuff?
          >
          > Idea #1 (bad idea IMHO) is to put it in a separate directory:
          > data/srd/...
          > l10n/srd/...
          >
          > The problem with that is that you get into synchronization issues... so any time files/directory
          > names change in one place they have to change in another. That's a contract on the data developer.
          > (It would also require multiple l10n directories, since we support multiple data directories, so
          > it's a bunch of code)
          >

          Agreed, bad idea. Less work/upkeep is better

          >
          > Idea #2: Implicit Subdirectories:
          > data/srd/*.lst
          > data/srd/l10n/*.l10n
          >
          > That ensures that the l10n directory is associated with the dataset.
          >
          > Idea #3: Explicit designation:
          > data/srd/srd.pcc contains LOCALIZATION:l10n/srd.l10n
          > then:
          > data/l10n/srd.l10n
          >
          > (would support more than one .l10n file)
          >

          I'm leaning towards #2 myself, I'm not sure why we'd need multiple files. Though, my understanding
          of l10n is very limited, so if there are best practices I think we should defer to those. Besides -
          earlier you stated PCC and LST should not be aware of the l10n files and #3 seems to contradict that.

          From a data perspective, idea #2 gives me a better grasp of what files have been started on
          translation. Idea #3 keeps it in one spot - so it really is a tough call.

          >
          > --- In pcgen_international@yahoogroups.com <mailto:pcgen_international%40yahoogroups.com>, Martijn
          > Verburg <martijnverburg@...> wrote:
          > > > (2) We should target an ability to tell us if l10n is complete for any
          > > > given data set
          > > >
          > > Not sure what you mean by this?
          >
          > If there is an object called "Dagger" we need to ensure the file contains:
          > EQUIPMENT:Dagger|Aggerd-ay
          >
          > If it contains:
          > EQUIPMENT:Dgager|Aggerd-ay
          >
          > That is also an error (just like the "unconstructed reference" items are errors. By capturing both
          > type 1 and type 2 error (things that aren't translated as well as things that shouldn't have been
          > translated) we are capturing the vast majority of the simple problems.
          >
          > --- In pcgen_international@yahoogroups.com <mailto:pcgen_international%40yahoogroups.com>,
          > "masaru20100" <hooya.masaru20100@...> wrote:
          > >
          > > In the process of updating the translation, I found another potential problem: depending on the
          > system, some element is not translated the same way. This particular case is rank that is not
          > translated the same way in D&D 3.x and Pathfinder.
          > > Doing system dependant translation might complicate things quite a bit.
          >
          > This depends on what is being translated. If it's data items, then the system I proposed will
          > handle that. If it's NOT (meaning it's things in the UI) then we will have to address that in a
          > more complicated fashion.
          >
          > > If the file srd_de_ch.l10n contain âEURoeSKILL:FooBar|Oobarf-ayâEUR?, is the meaning the skill
          > Foobar should be translated displayed as Oobarf-ay in de_ch (Swiss German?).
          >
          > Yes, obviously I was being a bit silly in using Pig Latin, but you have the idea.
          >
          > > Is there a reason not to use ResourceBundle? Or just another already existing system, like the
          > one used by GNUâEUR^(TM)s gettext (a Java library exist)?
          >
          > A few additional background items as I answer this. I stated earlier the original data sets should
          > not require weird behavior. That means we don't want a data file to be:
          > l10n.spell.Fireball <> SPELLSCHOOL:l10n.spellschool.Evocation <> ...
          >
          > It makes the data files completely impossible to read or debug IMHO, and would drive people
          > completely batty to require that in our home brew data. (Talk about no one wanting to develop data
          > anymore for PCGen...)
          >
          > Having the direct string in a file helps a LOT when there are issues we are trying to resolve.
          > This is why you will often see developers have "message IDs" (unique of translation) in error
          > messages so that they can just ask the end user for the message ID/error ID and look THAT up in
          > the code using grep (rather than having to figure out which resource bundle has the string, then
          > search the code for that resource id, etc.)
          >
          > > The reason I mention those formats, is that they are tools (meant for translators) that manage
          > either format and ease translating work. When using another format, those tools become useless.
          >
          > We can evaluate whether we can get them to work without ambiguity. Also note that we would build
          > the strings dynamically inside the code and there would be no list for the default language
          > (English in most cases), so anything expecting a list of items from the default language (as a
          > method of telling what is/is not completed) for the translator would not work. (Part of the
          > investigation here should be how much of this would actually be usable by the translator given how
          > we would be doing the work in the code)
          >
          > > I thought that a properties file would be used, so the file srd_de_ch.properties would contain
          > those lines:
          > > SKILL:FooBar=Oobarf-ay
          > > SKILL:FooBar.DESC=Oobarf-ay allows you to blabla
          > > The last line is the DESC translation of the Foobar skill.
          >
          > > I realise there could be problem because of the : in the key (but with a \: it might disappear),
          > but other than that, what would be the problem.
          >
          > We'd have to investigate how much of the infrastructure in Java we could use. The resourcebundle
          > code has defaults that require files be in certain places (which they wouldn't be in the case of
          > our files) and we'd have to figure out how to handle that. I just haven't used it outside of it's
          > default behavior before, so just haven't read the docs or tried. There is also complication
          > because the default language would not require a resourcebundle (to me this is an unyielding
          > requirement as it would make homebrews a nightmare to make if you have to do this)
          >
          > > Another point, I didnâEUR^(TM)t know is that some spell name are not unique. Writing this, I
          > think I got it, do you mean that some third party publisher create a spell âEURoeBuffMeâEUR? and
          > another did too and you have two spells with the same name?
          >
          > No. I mean the SRD has "Foo" and "Foo", one of which is Arcane, one Psionic. (I don't remember the
          > exact spell)
          >

          Oh, that - 'Telepathic Bond (Lesser)' is the one I think you're thinking of. I'd also point out that
          WotC has multiple feats across it's books that share the same name.


          >
          > TP.
          >
          >
          > --
          > Andrew Maitland (LegacyKing)
          > Admin Silverback - PCGen Board of Directors
          > Data 2nd, Docs Tamarin, OS Lemur
          > Unique Title "Quick-Silverback Tracker Monkey"
          > Unique Title "The Torturer of PCGen"


          [Non-text portions of this message have been removed]
        • hooya.masaru20100@antichef.net
          In my opinion, the main problem is that data can not be localized for now. So internationalisation should be done and is a priority. Providing localizations is
          Message 4 of 17 , Dec 18, 2011
          View Source
          • 0 Attachment
            In my opinion, the main problem is that data can not be localized for
            now. So internationalisation should be done and is a priority. Providing
            localizations is a second step, and it would take time; but it cannot be
            done, even by users, so the internationalization is the main problem.

            I’ve updated the wiki with more details on many things, also tried to
            separate internationalization problems and localization ones.

            Le mardi 13 décembre 2011 à 15:23 +0000, thpr - thpr@... a écrit :
            > --- In pcgen_international@yahoogroups.com, "masaru20100" <hooya.masaru20100@...> wrote:
            > >
            > > In the process of updating the translation, I found another potential problem: depending on the system, some element is not translated the same way. This particular case is rank that is not translated the same way in D&D 3.x and Pathfinder.
            > > Doing system dependant translation might complicate things quite a bit.
            >
            > This depends on what is being translated. If it's data items, then the system I proposed will handle that. If it's NOT (meaning it's things in the UI) then we will have to address that in a more complicated fashion.

            I pointed the problem because it is UI strings that are dataset
            dependent. In this case, both are probably understandable but having the
            good term used would be better.
            Compared to data set internationalization, this is not that important,
            but if, somehow, the solution could handle that aspect, it would be
            good.

            > > The reason I mention those formats, is that they are tools (meant for translators) that manage either format and ease translating work. When using another format, those tools become useless.
            >
            > We can evaluate whether we can get them to work without ambiguity. Also note that we would build the strings dynamically inside the code and there would be no list for the default language (English in most cases), so anything expecting a list of items from the default language (as a method of telling what is/is not completed) for the translator would not work. (Part of the investigation here should be how much of this would actually be usable by the translator given how we would be doing the work in the code)

            If there is no list of strings to translate for the default language, a
            tool to generate that list would be a must have to start doing
            translation in a language.

            A tool that produce and update a standard format would be a nice
            addition.

            --
            Vincent
          • masaru20100
            Hi, Is there an issue in JIRA about data localization. I had created a wiki page (http://wiki.pcgen.org/Internationalization) with plenty of text on the matter
            Message 5 of 17 , Sep 2, 2012
            View Source
            • 0 Attachment
              Hi,

              Is there an issue in JIRA about data localization. I had created a wiki page (http://wiki.pcgen.org/Internationalization) with plenty of text on the matter (including the mails in this topic), but I have no idea if it's advancing or not.

              Regards,
              --
              Vincent
            • Martijn Verburg
              Yes, James has fixed a number of CODE Jiras recently - take a look at the recently closed issues in that project. -K ... [Non-text portions of this message
              Message 6 of 17 , Sep 3, 2012
              View Source
              • 0 Attachment
                Yes, James has fixed a number of CODE Jiras recently - take a look at the
                recently closed issues in that project. -K

                On 3 September 2012 06:35, masaru20100 <hooya.masaru20100@...>wrote:

                > **
                >
                >
                > Hi,
                >
                > Is there an issue in JIRA about data localization. I had created a wiki
                > page (http://wiki.pcgen.org/Internationalization) with plenty of text on
                > the matter (including the mails in this topic), but I have no idea if it's
                > advancing or not.
                >
                > Regards,
                > --
                > Vincent
                >
                >
                >


                [Non-text portions of this message have been removed]
              • James Dempsey
                Hi, Those CODE jiras were concerned with UI localisation. We aren t tracking data localisation in JIRA yet. That project will have to wait for after the 6.0
                Message 7 of 17 , Sep 3, 2012
                View Source
                • 0 Attachment
                  Hi,

                  Those CODE jiras were concerned with UI localisation. We aren't tracking
                  data localisation in JIRA yet. That project will have to wait for after
                  the 6.0 release.

                  Cheers,
                  James.

                  On 3/09/2012 6:20 PM Martijn Verburg wrote
                  > Yes, James has fixed a number of CODE Jiras recently - take a look at the
                  > recently closed issues in that project. -K
                  >
                  > On 3 September 2012 06:35, masaru20100<hooya.masaru20100@...>wrote:
                  >
                  >> **
                  >>
                  >>
                  >> Hi,
                  >>
                  >> Is there an issue in JIRA about data localization. I had created a wiki
                  >> page (http://wiki.pcgen.org/Internationalization) with plenty of text on
                  >> the matter (including the mails in this topic), but I have no idea if it's
                  >> advancing or not.
                  >>
                  >> Regards,
                  >> --
                  >> Vincent
                Your message has been successfully submitted and would be delivered to recipients shortly.