Loading ...
Sorry, an error occurred while loading the content.

Re: [OmT] How to convert XLS in TBX and produce TMX from bilingual HTML files

Expand Messages
  • Martin Wunderlich
    Hi Didier, ... Both variants would work. The input would be a pointer to an HTML file. ... Good to know! :) Cheers, Martin ... [Non-text portions of this
    Message 1 of 18 , Mar 18 9:44 AM
    • 0 Attachment
      Hi Didier,

      >
      >
      > > As it turns out, the Java solution isn't actually all that complicated. I just
      > > started coding away and I am already half-way there to be able to provide a
      > > tool to convert Linguee tables to TMX. I will probably have this finished up in
      > > the next couple of days.
      > >
      > > A question to the other developers: Would this be an interesting enough
      > > feature to include in the core product, i.e. to be able to convert Linguee
      > > tables to TMX?
      >
      > Do you mean, once the tables are downloaded locally, or doing it online?
      >

      Both variants would work. The input would be a pointer to an HTML file.


      >
      > >The Java library Jsoup is published under the MIT license.
      > > Does anyone know, if this would be compatible with the OmegaT license?
      >
      > Yes, it is.
      >
      > Didier
      >

      Good to know! :)

      Cheers,

      Martin



      >
      >



      [Non-text portions of this message have been removed]
    • Didier Briel
      ... So, you would have a dialog to either enter a URL, or to browse a local file? Didier
      Message 2 of 18 , Mar 26 10:13 AM
      • 0 Attachment
        > -----Original Message-----
        > From: OmegaT@yahoogroups.com [mailto:OmegaT@yahoogroups.com] On
        > Behalf Of Martin Wunderlich
        > Sent: Monday, March 18, 2013 5:45 PM
        > To: OmegaT@yahoogroups.com
        > Subject: Re: [OmT] How to convert XLS in TBX and produce TMX from
        > bilingual HTML files


        > > > As it turns out, the Java solution isn't actually all that
        > > > complicated. I just started coding away and I am already half-way
        > > > there to be able to provide a tool to convert Linguee tables to TMX.
        > > > I will probably have this finished up in the next couple of days.
        > > >
        > > > A question to the other developers: Would this be an interesting
        > > > enough feature to include in the core product, i.e. to be able to
        > > > convert Linguee tables to TMX?
        > >
        > > Do you mean, once the tables are downloaded locally, or doing it online?
        > >
        >
        > Both variants would work. The input would be a pointer to an HTML file.

        So, you would have a dialog to either enter a URL, or to browse a local file?

        Didier
      • Martin Wunderlich
        Hi all, The Easter bunny has brought a new tool for the OmegaT toolbox: You can use HTML2TMX to convert any HTML table to a TMX file (v1.4). This can be
        Message 3 of 18 , Mar 30 7:48 AM
        • 0 Attachment
          Hi all,

          The Easter bunny has brought a new tool for the OmegaT toolbox: You can use "HTML2TMX" to convert any HTML table to a TMX file (v1.4). This can be useful, for instance, when you want to do a concordance search on Linguee for certain terms and import the sentence results with translations into your project. Or you might have some bilingual sentences in HTML format that need to be imported, as in Roman's case here.

          I have published the files for "HTML2TMX" here
          http://www.martinwunderlich.com/download/HTML2TMX.zip
          and here
          http://f1.grp.yahoofs.com/v1/qPpWUUzYC6Z2o-_-NzWpv9mLWrM5pyFMSoltE_gKv22YFl6wsbzJo7fgAboVqmCcR9XgzZZUhYYll2L65FsRFGrDx8JpQTM/5-%20Macros%20and%20tools/Other%20things/HTML2TMX.zip

          and the source code (under LGPL) here:
          https://github.com/mwunderlich/HTML2TMX/tree/master/src

          Looking forward to the feedback (also on the code). If there are enough people interested, I would also offer to integrate this feature in OmegaT directly and put a little GUI around it (at the moment, it only has a command line interface).

          Cheers,

          Martin





          Am 17.03.2013 um 13:42 schrieb roman.mironov@...:

          > Thank you very much for the feedback, everyone.
          >
          > > As it turns out, the Java solution isn't actually all that complicated. I just started coding away and I am already half-way there >to be able to provide a tool to convert Linguee tables to TMX. I will probably have this finished up in the next couple of days.
          > Thank you, Martin. I appreciate this. I sent you a sample HTML off-list.
          >
          > >What about creating several CSV files out of you XLS, deleting the columns for extra languages and using {tab} as field delimiter and {nothing} as text delimiter. Rename resulting CSV into TXT and you're good to go. Not as comprehensive as TBX, but quick and easy.
          >
          > >ad 1.
          > A csv2tbx utility is also a part of the translate toolkit: http://docs.translatehouse.org/projects/translate-toolkit/en/latest>/commands/csv2tbx.html
          >
          > These are both valid suggestions, thank you. What I'm looking for at this point is a multilingual TBX, though. As far as I understand, none of the suggested methods can make that possible. Perhaps, eventually, I would go with a simple TXT option, but right now, I don't want to give up on finding a converter that creates multilingual TBX files.
          >
          > Good luck,
          > Roman
          >
          >



          [Non-text portions of this message have been removed]
        • Jean-Christophe Helary
          ... Something I d love to see is support for CSV files (UTF-8 as well as UTF-16) where the first line gives the language code and the other lines the
          Message 4 of 18 , Mar 30 8:04 AM
          • 0 Attachment
            On Mar 30, 2013, at 11:48 PM, Martin Wunderlich <martin_wu@...> wrote:

            > Looking forward to the feedback (also on the code). If there are enough people interested, I would also offer to integrate this feature in OmegaT directly and put a little GUI around it (at the moment, it only has a command line interface).

            Something I'd love to see is support for CSV files (UTF-8 as well as UTF-16) where the first line gives the language code and the other lines the source/target data...

            You can't imagine how many times I have to convert multilingual Excel files to UTF-8 CSV in LibreOffice (Excel mixes the encoding badly) after removing all the columns I don't need and then use CSVconverter to put that into a TMX that I can use in OmegaT...


            Jean-Christophe Helary
            ----------------------------------------
            fun: http://mac4translators.blogspot.com
            work: http://www.doublet.jp (ja/en > fr)
            tweets: http://twitter.com/brandelune
          • Martin Wunderlich
            Hi JC, I would have thought such a basic feature was already implemented, but I just checked and it seems that OmegaT doesn t have an option to import glossary
            Message 5 of 18 , Apr 4, 2013
            • 0 Attachment
              Hi JC,

              I would have thought such a basic feature was already implemented, but I just checked and it seems that OmegaT doesn't have an option to import glossary files. Am I missing something? You are saying that you convert the CSV to a TMX. Does that mean the CSV contains sentence data? Usually, CSV/Excel files contain glossary data or terminology. But if I understand you correctly, you referring to using CSV as a source file format for translation?

              Cheers,

              Martin

              Am 30.03.2013 um 16:04 schrieb Jean-Christophe Helary:

              >
              > On Mar 30, 2013, at 11:48 PM, Martin Wunderlich <martin_wu@...> wrote:
              >
              > > Looking forward to the feedback (also on the code). If there are enough people interested, I would also offer to integrate this feature in OmegaT directly and put a little GUI around it (at the moment, it only has a command line interface).
              >
              > Something I'd love to see is support for CSV files (UTF-8 as well as UTF-16) where the first line gives the language code and the other lines the source/target data...
              >
              > You can't imagine how many times I have to convert multilingual Excel files to UTF-8 CSV in LibreOffice (Excel mixes the encoding badly) after removing all the columns I don't need and then use CSVconverter to put that into a TMX that I can use in OmegaT...
              >
              > Jean-Christophe Helary
              > ----------------------------------------
              > fun: http://mac4translators.blogspot.com
              > work: http://www.doublet.jp (ja/en > fr)
              > tweets: http://twitter.com/brandelune
              >
              >



              [Non-text portions of this message have been removed]
            • Jean-Christophe Helary
              Martin, I am referring to the use of CSV as an intermediate format between an alignment process and a proper TMX. Simple (or not) alignments are often made in
              Message 6 of 18 , Apr 4, 2013
              • 0 Attachment
                Martin,

                I am referring to the use of CSV as an intermediate format between an alignment process and a proper TMX.

                Simple (or not) alignments are often made in a 2 column format and exporting that to CSV is usually trivial. I mention CSV because we have the free CSVConverter utility from Heartsome that allows us to convert such a data set to a useable TMX, be it could be anything that we have a converter for.

                Skipping the conversion step wold be wonderful: just put the CSV file in /tm and work from there...

                Jean-Christophe

                On Apr 5, 2013, at 12:06 AM, Martin Wunderlich <martin_wu@...> wrote:

                > Hi JC,
                >
                > I would have thought such a basic feature was already implemented, but I just checked and it seems that OmegaT doesn't have an option to import glossary files. Am I missing something? You are saying that you convert the CSV to a TMX. Does that mean the CSV contains sentence data? Usually, CSV/Excel files contain glossary data or terminology. But if I understand you correctly, you referring to using CSV as a source file format for translation?
                >
                > Cheers,
                >
                > Martin
                >
                > Am 30.03.2013 um 16:04 schrieb Jean-Christophe Helary:
                >
                >>
                >> On Mar 30, 2013, at 11:48 PM, Martin Wunderlich <martin_wu@...> wrote:
                >>
                >>> Looking forward to the feedback (also on the code). If there are enough people interested, I would also offer to integrate this feature in OmegaT directly and put a little GUI around it (at the moment, it only has a command line interface).
                >>
                >> Something I'd love to see is support for CSV files (UTF-8 as well as UTF-16) where the first line gives the language code and the other lines the source/target data...
                >>
                >> You can't imagine how many times I have to convert multilingual Excel files to UTF-8 CSV in LibreOffice (Excel mixes the encoding badly) after removing all the columns I don't need and then use CSVconverter to put that into a TMX that I can use in OmegaT...
                >>
                >> Jean-Christophe Helary
                >> ----------------------------------------
                >> fun: http://mac4translators.blogspot.com
                >> work: http://www.doublet.jp (ja/en > fr)
                >> tweets: http://twitter.com/brandelune
                >>
                >>
                >
                >
                >
                > [Non-text portions of this message have been removed]
                >
                >
                >
                > ------------------------------------
                >
                > The OmegaT Project Philosophy:
                > http://www.omegat.org/en/philosophy.html
                > The OmegaT Project and You:
                > http://www.omegat.org/en/involved.html
                >
                > OmegaT contributors should join the "omegat-development" group
                > OmegaT localizers should join the "omegat-l10n" group
                > http://sourceforge.net/mail/?group_id=68187
                >
                > IRC channel: http://java.freenode.net//index.php?channel=omegat
                > or: irc://irc.freenode.net/omegat
                > Bug reports, feature requests, OmegaT test versions etc...:
                > http://sourceforge.net/projects/omegat/
                >
                > ------------------------------------
                >
                > Yahoo! Groups Links
                >
                >
                >

                Jean-Christophe Helary
                ----------------------------------------
                fun: http://mac4translators.blogspot.com
                work: http://www.doublet.jp (ja/en > fr)
                tweets: http://twitter.com/brandelune
              • Didier Briel
                ... Create an RFE. That said, we can now search in glossaries (in the development version), which is one reason less to convert CSVs to TMX. Didier
                Message 7 of 18 , Apr 4, 2013
                • 0 Attachment
                  > -----Original Message-----
                  > From: OmegaT@yahoogroups.com [mailto:OmegaT@yahoogroups.com] On
                  > Behalf Of Jean-Christophe Helary
                  > Sent: Thursday, April 04, 2013 5:40 PM
                  > To: OmegaT@yahoogroups.com
                  > Subject: Re: [OmT] How to convert XLS in TBX and produce TMX from
                  > bilingual HTML files
                  >
                  > Martin,
                  >
                  > I am referring to the use of CSV as an intermediate format between an
                  > alignment process and a proper TMX.
                  >
                  > Simple (or not) alignments are often made in a 2 column format and
                  > exporting that to CSV is usually trivial. I mention CSV because we have the
                  > free CSVConverter utility from Heartsome that allows us to convert such a
                  > data set to a useable TMX, be it could be anything that we have a converter
                  > for.
                  >
                  > Skipping the conversion step wold be wonderful: just put the CSV file in /tm
                  > and work from there...

                  Create an RFE.

                  That said, we can now search in glossaries (in the development version), which is one reason less to convert CSVs to TMX.

                  Didier
                • roman.mironov@ymail.com
                  Thank you for the input, everyone! Paul, thank you for the suggestion about Anchovy. It appears to be what I needed. Martin, thank you for your fantastic help!
                  Message 8 of 18 , Apr 8, 2013
                  • 0 Attachment
                    Thank you for the input, everyone!

                    Paul, thank you for the suggestion about Anchovy. It appears to be what I needed.

                    Martin, thank you for your fantastic help!

                    Best wishes,
                    Roman
                  Your message has been successfully submitted and would be delivered to recipients shortly.