Loading ...
Sorry, an error occurred while loading the content.

Message About Extraction from Bilingual XML Documents

Expand Messages
  • Alan Amos
    Normal 0 21 false false false MicrosoftInternetExplorer4 Hallo, I would like to be able to use a program such as
    Message 1 of 6 , Feb 28, 2012
    View Source
    • 0 Attachment

      Hallo,

      I would like to be able to use a program such as OmegaT for translating texts that that have already been partly translated and are in a bilingual XML format.
      (Yes, I know that this is not seen as good working practice but the company owner wants it done this way.)

      My ideal solution would be being able to create an OmegaT Project from the bilingual files where already translated text elements are written to a file in the OmegaT “target” folder as mentioned in the Rainbow documentation.

       

      At the moment my source document starts off like this:

       

      <?xml version="1.0" encoding="UTF-8"?>

      <documentation

        xmlns:xi="http://www.w3.org/2001/XInclude" xmlns:xhtml="http://www.w3.org/1999/xhtml"

      <!-- xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

      xsi:noNamespaceSchemaLocation="J:/Data/doku/sos_docu.xsd"

      -->

      <title language="de">JobScheduler - Quickstart</title>

      <title language="en">JobScheduler - Quickstart</title>

      <subtitle language="de">Einstieg in das Job Scheduling</subtitle>

      <subtitle language="en">Introduction to Job Scheduling</subtitle>

      <xi:include href="../../globals/author_oh.sosdoc" parse="xml" />

      <xi:include href="../../globals/company.sosdoc" parse="xml" />

      <topics>

      <topic id="quickstart_introduction">

      <title language="en">Introduction<title>

      <title language="de">Einleitung<title>

      <description>

      <p language="en">

      The <JobScheduler /> has been installed and can be started. What next?

      </p>

      <p language="de">

      Der <JobScheduler /> ist installiert und konnte gestartet werden. Was nun?

      </p>

      </description>

       

      Even if I modify all the language="xx" parameters with xml:lang="xx" and place all the source language tags before the target ones, Rainbow does not associate the source and target tags with one another: it treats all tags as source.

       

      If I use

      <its:translateRule selector="//*[@language='en']" translate="no" />

      (or …@xml:lang='en'…), I can generate a project with my source text elements in the OmegaT “source” folder but this is a bit like throwing out half of the baby with the bathwater: I lose the already translated content.

       

      Can you suggest how I can get this done?

       

      Many thanks in advance

       

      Alan




      --
      Alan Amos
      Web Developer and Translator
      Borkumer Str. 13
      14199 Berlin
      T.  +49 30 70220129
      M. +49 178 6615112
      E. Alan-A@...
      I.  www.alanamos.de

    • Yves Savourel
      Hi Alan, ... The rule should get you all the English entries protected. The only text to go
      Message 2 of 6 , Feb 28, 2012
      View Source
      • 0 Attachment
        Hi Alan,

        > ...I can generate a project with my source text
        > elements in the OmegaT “source” folder but this
        > is a bit like throwing out half of the baby with
        > the bathwater: I lose the already translated content.

        The rule <its:translateRule selector="//*[@language='en']" translate="no" /> should get you all the English entries protected. The only text to go to translation would be the part not with language='en', so the German.

        However, you'll see both German entries that are in English (like "JobScheduler - Quickstart" and German entries already in German (like "Einstieg in das Job Scheduling")

        You would have to go through the extracted entries, translate the English and copy the German. That shouldn't throw away your existing German, just create a useless TM for the future since many entries will be German->German.

        ITS does not have an official way to deal with bilingual files. There is a proposal for a targetPointer rule that would allow you to define both a source and where its corresponding target element is (See http://www.w3.org/International/its/wiki/IssuesAndProposedFeatures#Proposal:_targetPointer)
        But I don't think any supported tool has implemented it so far.

        I see in our code that we've started to implemented it in the XML filter, but so far didn't get the time to complete that feature.

        Cheers,
        -yves
      • Alan Amos
        Hello Yves, Many thanks for your quick reply - you ve helped me clear up my thoughts a bit. I ll concentrate on getting the source text correctly processed.
        Message 3 of 6 , Feb 29, 2012
        View Source
        • 0 Attachment
          Hello Yves,

          Many thanks for your quick reply - you've helped me clear up my thoughts a bit.
          I'll concentrate on getting the source text correctly processed.

          Best wishes,

          Alan

          -------- Original-Nachricht --------
          Datum: Tue, 28 Feb 2012 05:29:38 -0700
          Von: "Yves Savourel" <yves@...>
          An: okapitools@yahoogroups.com
          Betreff: RE: [okapitools] Message About Extraction from Bilingual XML Documents

          Hi Alan,

          > ...I can generate a project with my source text
          > elements in the OmegaT “source” folder but this
          > is a bit like throwing out half of the baby with
          > the bathwater: I lose the already translated content.

          The rule <its:translateRule selector="//*[@language='en']" translate="no" /> should get you all the English entries protected. The only text to go to translation would be the part not with language='en', so the German.

          However, you'll see both German entries that are in English (like "JobScheduler - Quickstart" and German entries already in German (like "Einstieg in das Job Scheduling")

          You would have to go through the extracted entries, translate the English and copy the German. That shouldn't throw away your existing German, just create a useless TM for the future since many entries will be German->German.

          ITS does not have an official way to deal with bilingual files. There is a proposal for a targetPointer rule that would allow you to define both a source and where its corresponding target element is (See http://www.w3.org/International/its/wiki/IssuesAndProposedFeatures#Proposal:_targetPointer)
          But I don't think any supported tool has implemented it so far.

          I see in our code that we've started to implemented it in the XML filter, but so far didn't get the time to complete that feature.

          Cheers,
          -yves




          ------------------------------------

          Yahoo! Groups Links

          <*> To visit your group on the web, go to:
          http://groups.yahoo.com/group/okapitools/

          <*> Your email settings:
          Individual Email | Traditional

          <*> To change settings online go to:
          http://groups.yahoo.com/group/okapitools/join
          (Yahoo! ID required)

          <*> To change settings via email:
          okapitools-digest@yahoogroups.com
          okapitools-fullfeatured@yahoogroups.com

          <*> To unsubscribe from this group, send an email to:
          okapitools-unsubscribe@yahoogroups.com

          <*> Your use of Yahoo! Groups is subject to:
          http://docs.yahoo.com/info/terms/




          --
          Alan Amos
          Web Developer and Translator
          Borkumer Str. 13
          14199 Berlin
          T.  +49 30 70220129
          M. +49 178 6615112
          E. Alan-A@...
          I.  www.alanamos.de

        • Alan
          Hallo Yves, When I first started this thread, I was wanting to extract both source and target (if any) texts from a single bilingual XML file to XLF so I could
          Message 4 of 6 , Apr 20 2:32 AM
          View Source
          • 0 Attachment
            Hallo Yves,

            When I first started this thread, I was wanting to extract both source and target (if any) texts from a single bilingual XML file to XLF so I could use them in a translation program like Omega-T. What I hadn't sorted out how was how to merge a translated text back into the appropriate tags in the bilingual file.

            I'm writing this now because I didn't find any instructions how to do this merge-back anywhere else – please don't hesitate to tell me if I've simply overseen them – and because it took me quite a bit of puzzling to work out how to do this.

            As I wrote in my first mail in this thread, the source and target text blocks in the bilingual text files are identified by XML tags (in my case with the non-standard language="de" and language="en" respectively) and typically look like:
            <p language="de">
            Der <JobScheduler /> ist installiert und konnte gestartet werden. Was nun?
            </p>
            <p language="en">
            The <JobScheduler /> has been installed and can be started. What next?
            </p>
            I'm using Tikal to extract the source text (the German in my case) from the bilingual file and convert it to XLF, which my translation program (Omega-T) can work with. I have the following translate rule selector in the test_extract.fprm file I use for the extraction:
            <its:translateRule selector="//*[@language='en']" translate="no" />
            This means that any already translated English text in the XML file is ignored during extraction but Yves has outlined a way of doing this in the second mail in this thread. (For a variety of reasons it's not been worthwhile for me to try this out yet.)

            However, the tricky bit was working out that once I'd done the translation, I had to use a different translateRule selector to merge the translated (English) XLF file into the English blocks of the bilingual file:
            <its:translateRule selector="//*[@language='de']" translate="no" />
            To keep things simple, I packed this in a new FRPM file which I named test_merge.frpm

            I now have a set-up in which I can carry out basic translation using a bilingual XML source/target document and a CAT tool such as Omega-T and XLF as an intermediate file format. There are a few bugs but I think they belong in a different thread.

            One question, however: am I right in stating that if even if I change to standard `xml:lang="xx"' tags, Tikal simply "pours" the translated text segments into the XML file one-after-the-other and ignores such potentially useful language tags in the target file when it is doing the merge? After all, it knows what the target language is from the "-tl" parameter.

            Best wishes,

            Alan


            --- In okapitools@yahoogroups.com, "Yves Savourel" <yves@...> wrote:
            >
            > Hi Alan,
            >
            > > ...I can generate a project with my source text
            > > elements in the OmegaT “source” folder but this
            > > is a bit like throwing out half of the baby with
            > > the bathwater: I lose the already translated content.
            >
            > The rule <its:translateRule selector="//*[@language='en']" translate="no" /> should get you all the English entries protected. The only text to go to translation would be the part not with language='en', so the German.
            >
            > However, you'll see both German entries that are in English (like "JobScheduler - Quickstart" and German entries already in German (like "Einstieg in das Job Scheduling")
            >
            > You would have to go through the extracted entries, translate the English and copy the German. That shouldn't throw away your existing German, just create a useless TM for the future since many entries will be German->German.
            >
            > ITS does not have an official way to deal with bilingual files. There is a proposal for a targetPointer rule that would allow you to define both a source and where its corresponding target element is (See http://www.w3.org/International/its/wiki/IssuesAndProposedFeatures#Proposal:_targetPointer)
            > But I don't think any supported tool has implemented it so far.
            >
            > I see in our code that we've started to implemented it in the XML filter, but so far didn't get the time to complete that feature.
            >
            > Cheers,
            > -yves
            >
          • Yves Savourel
            Hi Alan, ... That is correct. The XML Filter is not looking at any language-specific information currently. The extraction/merge is solely based on the
            Message 5 of 6 , Apr 20 4:41 AM
            View Source
            • 0 Attachment
              Hi Alan,

              > One question, however: am I right in stating that if even
              > if I change to standard `xml:lang="xx"' tags, Tikal simply
              > "pours" the translated text segments into the XML file
              > one-after-the-other and ignores such potentially useful
              > language tags in the target file when it is doing
              > the merge? After all, it knows what the target language
              > is from the "-tl" parameter.

              That is correct. The XML Filter is not looking at any language-specific information currently. The extraction/merge is solely based on the translateRules of the ITS file, which may or may not be related to language information.

              For example your rules could be <its:translateRule selector="//*[@Status='Draft']" translate="no" /> or <its:translateRule selector="//*[@xml:lang='en']" translate="no" />, it make no difference.

              I hope we'll be able to implement the ITS targetPointer proposal described here: http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#targetPointer
              Okapi is involved in this MultilingualWeb work, so it's likely to happen in the coming months.
              It would allow you to work more efficiently with multilingual documents.

              Cheers,
              -yves
            • Alan
              Hello Yves, Many thanks for the explanation, Best wishes, Alan
              Message 6 of 6 , Apr 20 6:31 AM
              View Source
              • 0 Attachment
                Hello Yves,

                Many thanks for the explanation,

                Best wishes,

                Alan

                --- In okapitools@yahoogroups.com, "Yves Savourel" <yves@...> wrote:
                >
                > Hi Alan,
                >
                > > One question, however: am I right in stating that if even
                > > if I change to standard `xml:lang="xx"' tags, Tikal simply
                > > "pours" the translated text segments into the XML file
                > > one-after-the-other and ignores such potentially useful
                > > language tags in the target file when it is doing
                > > the merge? After all, it knows what the target language
                > > is from the "-tl" parameter.
                >
                > That is correct. The XML Filter is not looking at any language-specific information currently. The extraction/merge is solely based on the translateRules of the ITS file, which may or may not be related to language information.
                >
                > For example your rules could be <its:translateRule selector="//*[@Status='Draft']" translate="no" /> or <its:translateRule selector="//*[@xml:lang='en']" translate="no" />, it make no difference.
                >
                > I hope we'll be able to implement the ITS targetPointer proposal described here: http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#targetPointer
                > Okapi is involved in this MultilingualWeb work, so it's likely to happen in the coming months.
                > It would allow you to work more efficiently with multilingual documents.
                >
                > Cheers,
                > -yves
                >
              Your message has been successfully submitted and would be delivered to recipients shortly.