Loading ...
Sorry, an error occurred while loading the content.

Re: Data localization

Expand Messages
  • masaru20100
    ... I’m only guessing there but here is my thoughts. It’s either to provide the user with the information on what is translated or not, either for
    Message 1 of 17 , Dec 13, 2011
    • 0 Attachment
      > > (2) We should target an ability to tell us if l10n is complete for any
      > > given data set
      > >
      > Not sure what you mean by this?

      I’m only guessing there but here is my thoughts. It’s either to provide the user with the information on what is translated or not, either for maintainers to know what is needing translation or not.
      It can also be both.
    • masaru20100
      ... I’ve update the page with new thoughts, tried to clarify points and added some. I’ve also tried to understand the PCGen specifics point that Tom
      Message 2 of 17 , Dec 13, 2011
      • 0 Attachment
        --- In pcgen_international@yahoogroups.com, Andrew <drew0500@...> wrote:
        >
        > Hi,
        >
        > I've added Tom's and Masura20100 notes here:
        >
        > http://wiki.pcgen.org/Internationalization
        >

        I’ve update the page with new thoughts, tried to clarify points and added some.

        I’ve also tried to understand the PCGen specifics point that Tom address but I’m a bit loss, because I never went into details of PCGen functioning.

        If the file srd_de_ch.l10n contain “SKILL:FooBar|Oobarf-ay”, is the meaning the skill Foobar should be translated displayed as Oobarf-ay in de_ch (Swiss German?).
        Is there a reason not to use ResourceBundle? Or just another already existing system, like the one used by GNU’s gettext (a Java library exist)?
        The reason I mention those formats, is that they are tools (meant for translators) that manage either format and ease translating work. When using another format, those tools become useless.
        I’m trying to update the existing PCGen ResourceBundle and used JavaPM to produce a XIFF file from the bundles then edit it with Virtaal.

        I thought that a properties file would be used, so the file srd_de_ch.properties would contain those lines:
        SKILL:FooBar=Oobarf-ay
        SKILL:FooBar.DESC=Oobarf-ay allows you to blabla
        The last line is the DESC translation of the Foobar skill.

        I realise there could be problem because of the : in the key (but with a \: it might disappear), but other than that, what would be the problem.

        Another point, I didn’t know is that some spell name are not unique. Writing this, I think I got it, do you mean that some third party publisher create a spell “BuffMe” and another did too and you have two spells with the same name?

        --
        masaru20100
      • thpr
        ... That s a given, but that s not the question (It s not for you, it s for the folks that would do the translating). The point is if we start people down the
        Message 3 of 17 , Dec 13, 2011
        • 0 Attachment
          --- In pcgen_international@yahoogroups.com, Andrew <drew0500@...> wrote:
          > I'd vote for as complete a job as possible.

          That's a given, but that's not the question (It's not for you, it's for the folks that would do the translating). The point is if we start people down the path of actually doing any translation, then we need to clearly set the expectation of what will be possible. If we can't do any conversion for things like DESC, then it would only cover say 60% of the data sets... so the question is not do we want to finish this someday, it's do folks want to start when they know the near-term (measured as 12-18 months) end game is only 60% complete on translation. That may be demoralizing to start knowing that, so would we even tell people to get started?

          > Here is where I'm not understanding - where are we going to put the translation stuff?

          Idea #1 (bad idea IMHO) is to put it in a separate directory:
          data/srd/...
          l10n/srd/...

          The problem with that is that you get into synchronization issues... so any time files/directory names change in one place they have to change in another. That's a contract on the data developer. (It would also require multiple l10n directories, since we support multiple data directories, so it's a bunch of code)

          Idea #2: Implicit Subdirectories:
          data/srd/*.lst
          data/srd/l10n/*.l10n

          That ensures that the l10n directory is associated with the dataset.

          Idea #3: Explicit designation:
          data/srd/srd.pcc contains LOCALIZATION:l10n/srd.l10n
          then:
          data/l10n/srd.l10n

          (would support more than one .l10n file)


          --- In pcgen_international@yahoogroups.com, Martijn Verburg <martijnverburg@...> wrote:
          > > (2) We should target an ability to tell us if l10n is complete for any
          > > given data set
          > >
          > Not sure what you mean by this?

          If there is an object called "Dagger" we need to ensure the file contains:
          EQUIPMENT:Dagger|Aggerd-ay

          If it contains:
          EQUIPMENT:Dgager|Aggerd-ay

          That is also an error (just like the "unconstructed reference" items are errors. By capturing both type 1 and type 2 error (things that aren't translated as well as things that shouldn't have been translated) we are capturing the vast majority of the simple problems.



          --- In pcgen_international@yahoogroups.com, "masaru20100" <hooya.masaru20100@...> wrote:
          >
          > In the process of updating the translation, I found another potential problem: depending on the system, some element is not translated the same way. This particular case is rank that is not translated the same way in D&D 3.x and Pathfinder.
          > Doing system dependant translation might complicate things quite a bit.

          This depends on what is being translated. If it's data items, then the system I proposed will handle that. If it's NOT (meaning it's things in the UI) then we will have to address that in a more complicated fashion.


          > If the file srd_de_ch.l10n contain “SKILL:FooBar|Oobarf-ay”, is the meaning the skill Foobar should be translated displayed as Oobarf-ay in de_ch (Swiss German?).

          Yes, obviously I was being a bit silly in using Pig Latin, but you have the idea.

          > Is there a reason not to use ResourceBundle? Or just another already existing system, like the one used by GNU’s gettext (a Java library exist)?

          A few additional background items as I answer this. I stated earlier the original data sets should not require weird behavior. That means we don't want a data file to be:
          l10n.spell.Fireball <> SPELLSCHOOL:l10n.spellschool.Evocation <> ...

          It makes the data files completely impossible to read or debug IMHO, and would drive people completely batty to require that in our home brew data. (Talk about no one wanting to develop data anymore for PCGen...)

          Having the direct string in a file helps a LOT when there are issues we are trying to resolve. This is why you will often see developers have "message IDs" (unique of translation) in error messages so that they can just ask the end user for the message ID/error ID and look THAT up in the code using grep (rather than having to figure out which resource bundle has the string, then search the code for that resource id, etc.)

          > The reason I mention those formats, is that they are tools (meant for translators) that manage either format and ease translating work. When using another format, those tools become useless.

          We can evaluate whether we can get them to work without ambiguity. Also note that we would build the strings dynamically inside the code and there would be no list for the default language (English in most cases), so anything expecting a list of items from the default language (as a method of telling what is/is not completed) for the translator would not work. (Part of the investigation here should be how much of this would actually be usable by the translator given how we would be doing the work in the code)

          > I thought that a properties file would be used, so the file srd_de_ch.properties would contain those lines:
          > SKILL:FooBar=Oobarf-ay
          > SKILL:FooBar.DESC=Oobarf-ay allows you to blabla
          > The last line is the DESC translation of the Foobar skill.

          > I realise there could be problem because of the : in the key (but with a \: it might disappear), but other than that, what would be the problem.

          We'd have to investigate how much of the infrastructure in Java we could use. The resourcebundle code has defaults that require files be in certain places (which they wouldn't be in the case of our files) and we'd have to figure out how to handle that. I just haven't used it outside of it's default behavior before, so just haven't read the docs or tried. There is also complication because the default language would not require a resourcebundle (to me this is an unyielding requirement as it would make homebrews a nightmare to make if you have to do this)

          > Another point, I didn’t know is that some spell name are not unique. Writing this, I think I got it, do you mean that some third party publisher create a spell “BuffMe” and another did too and you have two spells with the same name?

          No. I mean the SRD has "Foo" and "Foo", one of which is Arcane, one Psionic. (I don't remember the exact spell)

          TP.
        • Andrew
          Hi, ... I think realistic expectations should be explained up front - this is what we can accomplish today, this is what we want to accomplish in 12-18 months,
          Message 4 of 17 , Dec 13, 2011
          • 0 Attachment
            Hi,



            On 12/13/2011 7:23 AM, thpr wrote:
            >
            >
            >
            >
            > --- In pcgen_international@yahoogroups.com <mailto:pcgen_international%40yahoogroups.com>, Andrew
            > <drew0500@...> wrote:
            > > I'd vote for as complete a job as possible.
            >
            > That's a given, but that's not the question (It's not for you, it's for the folks that would do
            > the translating). The point is if we start people down the path of actually doing any translation,
            > then we need to clearly set the expectation of what will be possible. If we can't do any
            > conversion for things like DESC, then it would only cover say 60% of the data sets... so the
            > question is not do we want to finish this someday, it's do folks want to start when they know the
            > near-term (measured as 12-18 months) end game is only 60% complete on translation. That may be
            > demoralizing to start knowing that, so would we even tell people to get started?
            >

            I think realistic expectations should be explained up front - this is what we can accomplish today,
            this is what we want to accomplish in 12-18 months, and this is the future outlook. Though I agree,
            it can be demoralizing, I'd rather folks know what we're doing then keep people in the dark.


            >
            > > Here is where I'm not understanding - where are we going to put the translation stuff?
            >
            > Idea #1 (bad idea IMHO) is to put it in a separate directory:
            > data/srd/...
            > l10n/srd/...
            >
            > The problem with that is that you get into synchronization issues... so any time files/directory
            > names change in one place they have to change in another. That's a contract on the data developer.
            > (It would also require multiple l10n directories, since we support multiple data directories, so
            > it's a bunch of code)
            >

            Agreed, bad idea. Less work/upkeep is better

            >
            > Idea #2: Implicit Subdirectories:
            > data/srd/*.lst
            > data/srd/l10n/*.l10n
            >
            > That ensures that the l10n directory is associated with the dataset.
            >
            > Idea #3: Explicit designation:
            > data/srd/srd.pcc contains LOCALIZATION:l10n/srd.l10n
            > then:
            > data/l10n/srd.l10n
            >
            > (would support more than one .l10n file)
            >

            I'm leaning towards #2 myself, I'm not sure why we'd need multiple files. Though, my understanding
            of l10n is very limited, so if there are best practices I think we should defer to those. Besides -
            earlier you stated PCC and LST should not be aware of the l10n files and #3 seems to contradict that.

            From a data perspective, idea #2 gives me a better grasp of what files have been started on
            translation. Idea #3 keeps it in one spot - so it really is a tough call.

            >
            > --- In pcgen_international@yahoogroups.com <mailto:pcgen_international%40yahoogroups.com>, Martijn
            > Verburg <martijnverburg@...> wrote:
            > > > (2) We should target an ability to tell us if l10n is complete for any
            > > > given data set
            > > >
            > > Not sure what you mean by this?
            >
            > If there is an object called "Dagger" we need to ensure the file contains:
            > EQUIPMENT:Dagger|Aggerd-ay
            >
            > If it contains:
            > EQUIPMENT:Dgager|Aggerd-ay
            >
            > That is also an error (just like the "unconstructed reference" items are errors. By capturing both
            > type 1 and type 2 error (things that aren't translated as well as things that shouldn't have been
            > translated) we are capturing the vast majority of the simple problems.
            >
            > --- In pcgen_international@yahoogroups.com <mailto:pcgen_international%40yahoogroups.com>,
            > "masaru20100" <hooya.masaru20100@...> wrote:
            > >
            > > In the process of updating the translation, I found another potential problem: depending on the
            > system, some element is not translated the same way. This particular case is rank that is not
            > translated the same way in D&D 3.x and Pathfinder.
            > > Doing system dependant translation might complicate things quite a bit.
            >
            > This depends on what is being translated. If it's data items, then the system I proposed will
            > handle that. If it's NOT (meaning it's things in the UI) then we will have to address that in a
            > more complicated fashion.
            >
            > > If the file srd_de_ch.l10n contain âEURoeSKILL:FooBar|Oobarf-ayâEUR?, is the meaning the skill
            > Foobar should be translated displayed as Oobarf-ay in de_ch (Swiss German?).
            >
            > Yes, obviously I was being a bit silly in using Pig Latin, but you have the idea.
            >
            > > Is there a reason not to use ResourceBundle? Or just another already existing system, like the
            > one used by GNUâEUR^(TM)s gettext (a Java library exist)?
            >
            > A few additional background items as I answer this. I stated earlier the original data sets should
            > not require weird behavior. That means we don't want a data file to be:
            > l10n.spell.Fireball <> SPELLSCHOOL:l10n.spellschool.Evocation <> ...
            >
            > It makes the data files completely impossible to read or debug IMHO, and would drive people
            > completely batty to require that in our home brew data. (Talk about no one wanting to develop data
            > anymore for PCGen...)
            >
            > Having the direct string in a file helps a LOT when there are issues we are trying to resolve.
            > This is why you will often see developers have "message IDs" (unique of translation) in error
            > messages so that they can just ask the end user for the message ID/error ID and look THAT up in
            > the code using grep (rather than having to figure out which resource bundle has the string, then
            > search the code for that resource id, etc.)
            >
            > > The reason I mention those formats, is that they are tools (meant for translators) that manage
            > either format and ease translating work. When using another format, those tools become useless.
            >
            > We can evaluate whether we can get them to work without ambiguity. Also note that we would build
            > the strings dynamically inside the code and there would be no list for the default language
            > (English in most cases), so anything expecting a list of items from the default language (as a
            > method of telling what is/is not completed) for the translator would not work. (Part of the
            > investigation here should be how much of this would actually be usable by the translator given how
            > we would be doing the work in the code)
            >
            > > I thought that a properties file would be used, so the file srd_de_ch.properties would contain
            > those lines:
            > > SKILL:FooBar=Oobarf-ay
            > > SKILL:FooBar.DESC=Oobarf-ay allows you to blabla
            > > The last line is the DESC translation of the Foobar skill.
            >
            > > I realise there could be problem because of the : in the key (but with a \: it might disappear),
            > but other than that, what would be the problem.
            >
            > We'd have to investigate how much of the infrastructure in Java we could use. The resourcebundle
            > code has defaults that require files be in certain places (which they wouldn't be in the case of
            > our files) and we'd have to figure out how to handle that. I just haven't used it outside of it's
            > default behavior before, so just haven't read the docs or tried. There is also complication
            > because the default language would not require a resourcebundle (to me this is an unyielding
            > requirement as it would make homebrews a nightmare to make if you have to do this)
            >
            > > Another point, I didnâEUR^(TM)t know is that some spell name are not unique. Writing this, I
            > think I got it, do you mean that some third party publisher create a spell âEURoeBuffMeâEUR? and
            > another did too and you have two spells with the same name?
            >
            > No. I mean the SRD has "Foo" and "Foo", one of which is Arcane, one Psionic. (I don't remember the
            > exact spell)
            >

            Oh, that - 'Telepathic Bond (Lesser)' is the one I think you're thinking of. I'd also point out that
            WotC has multiple feats across it's books that share the same name.


            >
            > TP.
            >
            >
            > --
            > Andrew Maitland (LegacyKing)
            > Admin Silverback - PCGen Board of Directors
            > Data 2nd, Docs Tamarin, OS Lemur
            > Unique Title "Quick-Silverback Tracker Monkey"
            > Unique Title "The Torturer of PCGen"


            [Non-text portions of this message have been removed]
          • hooya.masaru20100@antichef.net
            In my opinion, the main problem is that data can not be localized for now. So internationalisation should be done and is a priority. Providing localizations is
            Message 5 of 17 , Dec 18, 2011
            • 0 Attachment
              In my opinion, the main problem is that data can not be localized for
              now. So internationalisation should be done and is a priority. Providing
              localizations is a second step, and it would take time; but it cannot be
              done, even by users, so the internationalization is the main problem.

              I’ve updated the wiki with more details on many things, also tried to
              separate internationalization problems and localization ones.

              Le mardi 13 décembre 2011 à 15:23 +0000, thpr - thpr@... a écrit :
              > --- In pcgen_international@yahoogroups.com, "masaru20100" <hooya.masaru20100@...> wrote:
              > >
              > > In the process of updating the translation, I found another potential problem: depending on the system, some element is not translated the same way. This particular case is rank that is not translated the same way in D&D 3.x and Pathfinder.
              > > Doing system dependant translation might complicate things quite a bit.
              >
              > This depends on what is being translated. If it's data items, then the system I proposed will handle that. If it's NOT (meaning it's things in the UI) then we will have to address that in a more complicated fashion.

              I pointed the problem because it is UI strings that are dataset
              dependent. In this case, both are probably understandable but having the
              good term used would be better.
              Compared to data set internationalization, this is not that important,
              but if, somehow, the solution could handle that aspect, it would be
              good.

              > > The reason I mention those formats, is that they are tools (meant for translators) that manage either format and ease translating work. When using another format, those tools become useless.
              >
              > We can evaluate whether we can get them to work without ambiguity. Also note that we would build the strings dynamically inside the code and there would be no list for the default language (English in most cases), so anything expecting a list of items from the default language (as a method of telling what is/is not completed) for the translator would not work. (Part of the investigation here should be how much of this would actually be usable by the translator given how we would be doing the work in the code)

              If there is no list of strings to translate for the default language, a
              tool to generate that list would be a must have to start doing
              translation in a language.

              A tool that produce and update a standard format would be a nice
              addition.

              --
              Vincent
            • masaru20100
              Hi, Is there an issue in JIRA about data localization. I had created a wiki page (http://wiki.pcgen.org/Internationalization) with plenty of text on the matter
              Message 6 of 17 , Sep 2, 2012
              • 0 Attachment
                Hi,

                Is there an issue in JIRA about data localization. I had created a wiki page (http://wiki.pcgen.org/Internationalization) with plenty of text on the matter (including the mails in this topic), but I have no idea if it's advancing or not.

                Regards,
                --
                Vincent
              • Martijn Verburg
                Yes, James has fixed a number of CODE Jiras recently - take a look at the recently closed issues in that project. -K ... [Non-text portions of this message
                Message 7 of 17 , Sep 3, 2012
                • 0 Attachment
                  Yes, James has fixed a number of CODE Jiras recently - take a look at the
                  recently closed issues in that project. -K

                  On 3 September 2012 06:35, masaru20100 <hooya.masaru20100@...>wrote:

                  > **
                  >
                  >
                  > Hi,
                  >
                  > Is there an issue in JIRA about data localization. I had created a wiki
                  > page (http://wiki.pcgen.org/Internationalization) with plenty of text on
                  > the matter (including the mails in this topic), but I have no idea if it's
                  > advancing or not.
                  >
                  > Regards,
                  > --
                  > Vincent
                  >
                  >
                  >


                  [Non-text portions of this message have been removed]
                • James Dempsey
                  Hi, Those CODE jiras were concerned with UI localisation. We aren t tracking data localisation in JIRA yet. That project will have to wait for after the 6.0
                  Message 8 of 17 , Sep 3, 2012
                  • 0 Attachment
                    Hi,

                    Those CODE jiras were concerned with UI localisation. We aren't tracking
                    data localisation in JIRA yet. That project will have to wait for after
                    the 6.0 release.

                    Cheers,
                    James.

                    On 3/09/2012 6:20 PM Martijn Verburg wrote
                    > Yes, James has fixed a number of CODE Jiras recently - take a look at the
                    > recently closed issues in that project. -K
                    >
                    > On 3 September 2012 06:35, masaru20100<hooya.masaru20100@...>wrote:
                    >
                    >> **
                    >>
                    >>
                    >> Hi,
                    >>
                    >> Is there an issue in JIRA about data localization. I had created a wiki
                    >> page (http://wiki.pcgen.org/Internationalization) with plenty of text on
                    >> the matter (including the mails in this topic), but I have no idea if it's
                    >> advancing or not.
                    >>
                    >> Regards,
                    >> --
                    >> Vincent
                  Your message has been successfully submitted and would be delivered to recipients shortly.