Loading ...
Sorry, an error occurred while loading the content.

Leveraging Resource files

Expand Messages
  • sghetti1971
    Hi Yves, I have another challenge that I have to take care of now. I am pretty sure that with some clever filtering this can be done. I want to normalize and
    Message 1 of 6 , Apr 1 12:38 AM
    View Source
    • 0 Attachment
      Hi Yves,

      I have another challenge that I have to take care of now. I am pretty sure that with some clever filtering this can be done.

      I want to normalize and localize resource files. With the text filter, I ripped out all of the UI strings with a regular expression - (("[^"]*")|('[^']*')). So everything with a "" around it was selected.

      I generated a wonderful monolingual XLIFF file and then corrected all the target entries. The problem is that I don't seem to be able to leverage this file or the translation memory.

      Now what I need to do is segment the resource file so I can leverage the translation memory matches. Does this have to be done from the segmentation rules for the translation package utility. How do I set the rules so that I can get a rule like "Segment normal and make a special segment for strings between " and ". Or, where do I learn how to do this.

      Parallel to this post, I'll try and figure it out for myself.

      Best,
      Scott
    • sghetti1971
      Hi, Solved this one by myself. Looks like I was able to use my custom filter to do this all along. Still don t know if it is the perfect solution though.
      Message 2 of 6 , Apr 1 3:58 AM
      View Source
      • 0 Attachment
        Hi,

        Solved this one by myself. Looks like I was able to use my custom filter to do this all along. Still don't know if it is the perfect solution though.

        Thanks,
        Scott

        --- In okapitools@yahoogroups.com, "sghetti1971" <scott.speights@...> wrote:
        >
        > Hi Yves,
        >
        > I have another challenge that I have to take care of now. I am pretty sure that with some clever filtering this can be done.
        >
        > I want to normalize and localize resource files. With the text filter, I ripped out all of the UI strings with a regular expression - (("[^"]*")|('[^']*')). So everything with a "" around it was selected.
        >
        > I generated a wonderful monolingual XLIFF file and then corrected all the target entries. The problem is that I don't seem to be able to leverage this file or the translation memory.
        >
        > Now what I need to do is segment the resource file so I can leverage the translation memory matches. Does this have to be done from the segmentation rules for the translation package utility. How do I set the rules so that I can get a rule like "Segment normal and make a special segment for strings between " and ". Or, where do I learn how to do this.
        >
        > Parallel to this post, I'll try and figure it out for myself.
        >
        > Best,
        > Scott
        >
      • Yves Savourel
        Hi Scott, ... That regex looks dangerous if you have quotes within the text. For example if you resource file looks like: str3= text with for str3
        Message 3 of 6 , Apr 1 4:08 AM
        View Source
        • 0 Attachment
          Hi Scott,

          > I want to normalize and localize resource files. With the text
          > filter, I ripped out all of the UI strings with a regular expression
          > - (("[^"]*")|('[^']*')). So everything with a "" around it was selected.

          That regex looks dangerous if you have quotes within the text. For example if you resource file looks like:

          str3="text with \" for str3"
          str3='text with \' for str3'

          will be extracted as ["text with \"] and ['text with \']

          If your files have no quotes inside it's fine, but otherwise you may have to think about another regex to extract.

          Another (minor) issue is that the quotes get extracted too.

          One possibility to fix this is the Regex Filter. It will understand string with \" and \' in them. So one rule like this:

          =(.*?)\n

          And the action "Extract the strings within the source group", and the source group set to 1, and the string start and end defined as ["'] and ["'] (Option tab), should get you the proper extraction.

          Depending on the format of your resource files there may be other options to. What type of resources files are they? (RC, ResX, PO, Java properties, INI, custom ones?)


          > I generated a wonderful monolingual XLIFF file and
          > then corrected all the target entries. The problem is
          > that I don't seem to be able to leverage this file
          > or the translation memory.

          I'm not sure I follow here.
          What do you mean by "corrected all the target entries"?
          Do you have two resource files? One source and one target, and want to align them to create a TM that you can use to leverage a new source file?
          Could you explain a bit more the type of input you have, and the desired output? Sorry, I'm slow in the morning :)


          > Now what I need to do is segment the resource file
          > so I can leverage the translation memory matches.
          > Does this have to be done from the segmentation rules
          > for the translation package utility. How do I set
          > the rules so that I can get a rule like "Segment normal
          > and make a special segment for strings between " and ".
          > Or, where do I learn how to do this.

          You would define segmentation rules using SRX. There is a default SRX file (actually two) in the \config directory of the distribution.
          You can use Ratel ("Tools > Edit Segmentation Rules" from Rainbow) that you can use to create or modify an SRX file.

          But before you do that you may want to explain more the part above. Without understanding what you are trying to achieve I can't be sure the issue has to do with segmentation.

          Cheers,
          -yves
        • sghetti1971
          Hi Yves, I would like to know more. All Strings in the document I am using are offset by in the resource files ... I am parsing MS .rc files ... I m going
          Message 4 of 6 , Apr 1 5:07 AM
          View Source
          • 0 Attachment
            Hi Yves,

            I would like to know more. All Strings in the document I am using are offset by "" in the resource files

            > Depending on the format of your resource files there may be other > > options to. What type of resources files are they? (RC, ResX, PO, > > Java properties, INI, custom ones?)

            I am parsing MS .rc files

            > I'm not sure I follow here.
            > What do you mean by "corrected all the target entries"?
            > Do you have two resource files? One source and one target, and want to align them to create a TM that you can use to leverage a new source file?
            > Could you explain a bit more the type of input you have, and the desired output? Sorry, I'm slow in the morning :)

            I'm going from German to German. So there is just one .rc file. I create an XLIFF, I go through and make the changes to the German target text and then use the manifest file to create the out file.

            Somehow this process works using my regex. But if there is a better way, I would like to know how.

            Best,
            Scott



            --- In okapitools@yahoogroups.com, "Yves Savourel" <yves@...> wrote:
            >
            > Hi Scott,
            >
            > > I want to normalize and localize resource files. With the text
            > > filter, I ripped out all of the UI strings with a regular expression
            > > - (("[^"]*")|('[^']*')). So everything with a "" around it was selected.
            >
            > That regex looks dangerous if you have quotes within the text. For example if you resource file looks like:
            >
            > str3="text with \" for str3"
            > str3='text with \' for str3'
            >
            > will be extracted as ["text with \"] and ['text with \']
            >
            > If your files have no quotes inside it's fine, but otherwise you may have to think about another regex to extract.
            >
            > Another (minor) issue is that the quotes get extracted too.
            >
            > One possibility to fix this is the Regex Filter. It will understand string with \" and \' in them. So one rule like this:
            >
            > =(.*?)\n
            >
            > And the action "Extract the strings within the source group", and the source group set to 1, and the string start and end defined as ["'] and ["'] (Option tab), should get you the proper extraction.
            >
            > Depending on the format of your resource files there may be other options to. What type of resources files are they? (RC, ResX, PO, Java properties, INI, custom ones?)
            >
            >
            > > I generated a wonderful monolingual XLIFF file and
            > > then corrected all the target entries. The problem is
            > > that I don't seem to be able to leverage this file
            > > or the translation memory.
            >
            > I'm not sure I follow here.
            > What do you mean by "corrected all the target entries"?
            > Do you have two resource files? One source and one target, and want to align them to create a TM that you can use to leverage a new source file?
            > Could you explain a bit more the type of input you have, and the desired output? Sorry, I'm slow in the morning :)
            >
            >
            > > Now what I need to do is segment the resource file
            > > so I can leverage the translation memory matches.
            > > Does this have to be done from the segmentation rules
            > > for the translation package utility. How do I set
            > > the rules so that I can get a rule like "Segment normal
            > > and make a special segment for strings between " and ".
            > > Or, where do I learn how to do this.
            >
            > You would define segmentation rules using SRX. There is a default SRX file (actually two) in the \config directory of the distribution.
            > You can use Ratel ("Tools > Edit Segmentation Rules" from Rainbow) that you can use to create or modify an SRX file.
            >
            > But before you do that you may want to explain more the part above. Without understanding what you are trying to achieve I can't be sure the issue has to do with segmentation.
            >
            > Cheers,
            > -yves
            >
          • Yves Savourel
            ... We have not ported the RC Filter in the Java-based Okapi tools (too little demand for that yet). But there is a filter for RC files in Rainbow v5 Using a
            Message 5 of 6 , Apr 1 6:06 AM
            View Source
            • 0 Attachment
              > I am parsing MS .rc files

              We have not ported the RC Filter in the Java-based Okapi tools (too little demand for that yet).
              But there is a filter for RC files in Rainbow v5

              Using a true RC filter instead of a regular-expression based filter may be a lot safer: RC files can be quite complex, (if they are more than string tables). For example you have some translatable text like labels in combo boxes that are stored in binary block not accessible without decoding.

              The Rainbow v5 download is available here:
              http://okapi.sourceforge.net/downloads.html

              You can extract/merge to/from XLIFF as well.


              > I'm going from German to German. So there is just one .rc file.
              > I create an XLIFF, I go through and make the changes to the
              > German target text and then use the manifest file to create
              > the out file.

              I'm probably missing something, but if you are just editing the source text, wouldn't it be as easy to edit directly the RC files? Another solution would be to use a translator editor that supports RC, like (I think) OmegaT. Or some of the free RC editors available on Windows. I think MS Studio also has some free dumb-down edition that has a full RC editor. But you would know what's best for your needs.

              -ys
            • Didier Briel
              ... Yes, OmegaT supports RC files since 2.1.3. The filter is not very complex, but not trivial either. I wouldn t try to do that with a single regexp (we re
              Message 6 of 6 , Apr 1 6:16 AM
              View Source
              • 0 Attachment
                -----Original Message-----
                >From: okapitools@yahoogroups.com [mailto:okapitools@yahoogroups.com]On Behalf Of Yves Savourel
                >Sent: Friday, April 01, 2011 3:07 PM
                >To: okapitools@yahoogroups.com
                >Subject: RE: [okapitools] Re: Leveraging Resource files

                >I'm probably missing something, but if you are just editing the source text, wouldn't it be as easy to edit directly the RC files? Another solution would be to use a translator editor that supports RC, like (I think) OmegaT.

                Yes, OmegaT supports RC files since 2.1.3.

                The filter is not very complex, but not trivial either. I wouldn't try to do that with a single regexp (we're using 5 different ones).

                I see for instance in the comments:
                /*
                * Some software produce escaped quotes, but valid are only
                * double quotes
                */

                Didier
              Your message has been successfully submitted and would be delivered to recipients shortly.