> DC> Not directly related to jmdict, but does anyone know of any tools (*) to
Thanks, that program turns out to do exactly what I needed. It was
> DC> dump all the data from an epwing dictionary into some kind of text ...
> http://www31.ocn.ne.jp/~h_ishida/EBDump.html
confusing until I realized you have to say how many blocks you want to
export.
So, to export edict's epwing file [1], here are detailed instructions
(Jim, I've tried to make them understandable even if you're seeing
mojibake; let me know if something is unclear):
1. Open EBDump
2. Open edict's HONMON file
3. In the main window in top-left you see "[00]本文". Click that. Just
above (and to the right) it says "blks=10314".
4. To the right is an input box (the one with little up/down arrows).
It is defaulting to 1. Type 10314 in there.
5. Beneath that (roughly in the middle of the app) is a choice between
記述子(1), plain-text (2) and HTML. Choose plain-text.
6. At the bottom you see the filename where it will write the dump
file to. Just above that are a couple of checkboxes. Uncheck both (one
says open the dumped file in notepad, which is unwise as it is big, the
other says delete the file when closing).
7. 2nd option in the 2nd menu (i.e. Alt-T, then T) to do the text dump.
Then copy the file to a linux system and convert it from Shift-JIS to
UTF-8 :-).
The format for the first 222,580 lines is 2 lines per entry:
KANJI [HIRAGANA]
Type English
(where Type is "n", "adv", etc.)
The last 100,000 lines use a single line per entry:
... English1
KANJI1 [HIRAGANA1] English2
KANJI2 [HIRAGANA2] ...
(i.e. English is for Japanese on next line.) These last 100,000 lines
are the same data, so I think they are for the reverse lookup.
Export of GG5 epwing format also worked, but the format is different.
E.g. hiragana comes first then kanji in brackets then [zB667] then the
romaji in parentheses.
So it seems in epwing the format of the actual data is fairly free form,
and structure is supplied by the hard-coded indices.
Darren
[1]: http://www.hloeffler.info/epwing/- On 18/02/2008, Darren Cook <darren@...> wrote:
> Thanks, that program turns out to do exactly what I needed. It was
I tried it with a couple of EB disks I have.
> confusing until I realized you have to say how many blocks you want to
> export.
>
> So, to export edict's epwing file [1], here are detailed instructions
> (Jim, I've tried to make them understandable even if you're seeing
> mojibake; let me know if something is unclear):
> 1. Open EBDump
Well, I saw "[00]!@#$" so that's what I tried.
>
> 2. Open edict's HONMON file
>
> 3. In the main window in top-left you see "[00]本文". Click that. Just
> above (and to the right) it says "blks=10314".
> 4. To the right is an input box (the one with little up/down arrows).
Total lack of success (this is running in on Win98SE). The EB file
> It is defaulting to 1. Type 10314 in there.
>
> 5. Beneath that (roughly in the middle of the app) is a choice between
> 記述子(1), plain-text (2) and HTML. Choose plain-text.
>
> 6. At the bottom you see the filename where it will write the dump
> file to. Just above that are a couple of checkboxes. Uncheck both (one
> says open the dumped file in notepad, which is unwise as it is big, the
> other says delete the file when closing).
>
> 7. 2nd option in the 2nd menu (i.e. Alt-T, then T) to do the text dump.
>
> Then copy the file to a linux system and convert it from Shift-JIS to
> UTF-8 :-).
was read and a file was written, but it wasn't in Shift-JIS; it was in
the pseudo ISO2022JP coding that is used inside EBs. There was a menu
item defaulting to "JIS", so I tried changing that to SHIFT-JIS, but it
made no difference.
Wonder why it works for you and not for me?
Cheers
Jim
--
Jim Breen
Honorary Senior Research Fellow
Clayton School of Information Technology,
Monash University, VIC 3800, Australia
http://www.csse.monash.edu.au/~jwb/ >> So, to export edict's epwing file [1], here are detailed instructions
Hi Jim,
>> (Jim, I've tried to make them understandable even if you're seeing
>> mojibake; let me know if something is unclear):
>
> I tried it with a couple of EB disks I have.
Did you try it with the edict epwing file? The other one that worked for
me was GG5 dictionary. I don't have any others to experiment with.
> Total lack of success (this is running in on Win98SE). ...
I was running on Windows XP, Japanese version. Perhaps it isn't
supported on Win 98?
Darren
--
Darren Cook
http://dcook.org/mlsn/ (English-Japanese-German-Chinese free dictionary)
http://dcook.org/work/ (About me and my work)
http://dcook.org/work/charts/ (My flash charting demos)