Loading ...
Sorry, an error occurred while loading the content.

Re: [edict-jmdict] OT: epwing format

Expand Messages
  • Darren Cook
    ... Thanks, that program turns out to do exactly what I needed. It was confusing until I realized you have to say how many blocks you want to export. So, to
    Message 1 of 8 , Feb 17, 2008
    View Source
    • 0 Attachment
      > DC> Not directly related to jmdict, but does anyone know of any tools (*) to
      > DC> dump all the data from an epwing dictionary into some kind of text ...

      > http://www31.ocn.ne.jp/~h_ishida/EBDump.html

      Thanks, that program turns out to do exactly what I needed. It was
      confusing until I realized you have to say how many blocks you want to
      export.

      So, to export edict's epwing file [1], here are detailed instructions
      (Jim, I've tried to make them understandable even if you're seeing
      mojibake; let me know if something is unclear):

      1. Open EBDump

      2. Open edict's HONMON file

      3. In the main window in top-left you see "[00]本文". Click that. Just
      above (and to the right) it says "blks=10314".

      4. To the right is an input box (the one with little up/down arrows).
      It is defaulting to 1. Type 10314 in there.

      5. Beneath that (roughly in the middle of the app) is a choice between
      記述子(1), plain-text (2) and HTML. Choose plain-text.

      6. At the bottom you see the filename where it will write the dump
      file to. Just above that are a couple of checkboxes. Uncheck both (one
      says open the dumped file in notepad, which is unwise as it is big, the
      other says delete the file when closing).

      7. 2nd option in the 2nd menu (i.e. Alt-T, then T) to do the text dump.

      Then copy the file to a linux system and convert it from Shift-JIS to
      UTF-8 :-).

      The format for the first 222,580 lines is 2 lines per entry:
      KANJI [HIRAGANA]
      Type English
      (where Type is "n", "adv", etc.)

      The last 100,000 lines use a single line per entry:
      ... English1
      KANJI1 [HIRAGANA1] English2
      KANJI2 [HIRAGANA2] ...

      (i.e. English is for Japanese on next line.) These last 100,000 lines
      are the same data, so I think they are for the reverse lookup.

      Export of GG5 epwing format also worked, but the format is different.
      E.g. hiragana comes first then kanji in brackets then [zB667] then the
      romaji in parentheses.

      So it seems in epwing the format of the actual data is fairly free form,
      and structure is supplied by the hard-coded indices.

      Darren

      [1]: http://www.hloeffler.info/epwing/
    • Jim Breen
      ... I tried it with a couple of EB disks I have. ... Well, I saw [00]!@#$ so that s what I tried. ... Total lack of success (this is running in on Win98SE).
      Message 2 of 8 , Mar 3, 2008
      View Source
      • 0 Attachment
        On 18/02/2008, Darren Cook <darren@...> wrote:
        > Thanks, that program turns out to do exactly what I needed. It was
        > confusing until I realized you have to say how many blocks you want to
        > export.
        >
        > So, to export edict's epwing file [1], here are detailed instructions
        > (Jim, I've tried to make them understandable even if you're seeing
        > mojibake; let me know if something is unclear):

        I tried it with a couple of EB disks I have.

        > 1. Open EBDump
        >
        > 2. Open edict's HONMON file
        >
        > 3. In the main window in top-left you see "[00]本文". Click that. Just
        > above (and to the right) it says "blks=10314".

        Well, I saw "[00]!@#$" so that's what I tried.

        > 4. To the right is an input box (the one with little up/down arrows).
        > It is defaulting to 1. Type 10314 in there.
        >
        > 5. Beneath that (roughly in the middle of the app) is a choice between
        > 記述子(1), plain-text (2) and HTML. Choose plain-text.
        >
        > 6. At the bottom you see the filename where it will write the dump
        > file to. Just above that are a couple of checkboxes. Uncheck both (one
        > says open the dumped file in notepad, which is unwise as it is big, the
        > other says delete the file when closing).
        >
        > 7. 2nd option in the 2nd menu (i.e. Alt-T, then T) to do the text dump.
        >
        > Then copy the file to a linux system and convert it from Shift-JIS to
        > UTF-8 :-).

        Total lack of success (this is running in on Win98SE). The EB file
        was read and a file was written, but it wasn't in Shift-JIS; it was in
        the pseudo ISO2022JP coding that is used inside EBs. There was a menu
        item defaulting to "JIS", so I tried changing that to SHIFT-JIS, but it
        made no difference.

        Wonder why it works for you and not for me?

        Cheers

        Jim

        --
        Jim Breen
        Honorary Senior Research Fellow
        Clayton School of Information Technology,
        Monash University, VIC 3800, Australia
        http://www.csse.monash.edu.au/~jwb/
      • Darren Cook
        ... Hi Jim, Did you try it with the edict epwing file? The other one that worked for me was GG5 dictionary. I don t have any others to experiment with. ... I
        Message 3 of 8 , Mar 4, 2008
        View Source
        • 0 Attachment
          >> So, to export edict's epwing file [1], here are detailed instructions
          >> (Jim, I've tried to make them understandable even if you're seeing
          >> mojibake; let me know if something is unclear):
          >
          > I tried it with a couple of EB disks I have.

          Hi Jim,
          Did you try it with the edict epwing file? The other one that worked for
          me was GG5 dictionary. I don't have any others to experiment with.

          > Total lack of success (this is running in on Win98SE). ...

          I was running on Windows XP, Japanese version. Perhaps it isn't
          supported on Win 98?

          Darren


          --
          Darren Cook
          http://dcook.org/mlsn/ (English-Japanese-German-Chinese free dictionary)
          http://dcook.org/work/ (About me and my work)
          http://dcook.org/work/charts/ (My flash charting demos)
        Your message has been successfully submitted and would be delivered to recipients shortly.