[Jim Rose (Re: [edict-jmdict] Future EDICT/JMdict, etc. maintenance system) writes:]
>> Seems like yet another Perl application to me.
Is language the first issue? We tell our students to consider this
>> Maybe this
>> conversation would be more productive if you were to develop an RFP
>> with specific functions and file interactions mapped out. It seems
>> to me like a 10 or 20 feature system. Specified to the right level
>> of detail, maybe it could be reduced to a series of discrete and
>> independent scripts each of which are no more than 30 minutes of work.
RFP? As though I'm putting it out for contract? I was hoping for a bit
of collective brainstorming rather than me writing specs. 8-)}
Anyway, as a thumbnail sketch, what I had in mind was:
- a database to hold the information in the current JMdict/EDICT
database. The current database is in fact a big text-file which I edit
with an editor and the occasional bulk-update utility. What is needed is
a real database with record-locking, etc. etc. so that several people
could be updating at the same time. (Once it is going, hopefully it
could be xtended/replicated to cover ENAMDICT, KANJIDIC, etc.)
- an update facility which could be accessed by a link from "edit" functions
in any number of online dictionaries (WWWJDIC, Jeffrey's, KanjiCafe,
etc.) This would bring up a screen with the editable parts of the
entry. After the edit, the edited version would go into a holding part
of the database until it could be reviewed and
approved/rejected/corrected/etc. by an editor. Once an edit is approved
it can go into the live database.
- a new-entry system which would enable new material to be submitted,
and passed to a similar editing arrangement to the one applying to
- a user management system - really an ID & password system so that
approved editors could log on and then have the rights to approve/etc.
edits and new entries.
- a comment/logging facility associated with each entry, so that a
summary of previous changes along with the date and who did it would be
- a facility to view the new and newly changed entries
- a bulk-addition facility to roll in multiple new entries
- generator utilities to make the current distribution formats: (EDICT,
EDICT_SUB, EDICT2, JMdict) plus any other format that may be appropriate
I am sure there are other things I have not thought of.
In terms of packages and languages, my only firm feeling is that the
database should be MySQL, as it is the dominant free high-quality
DBMS in the *n*x world.
I think we should do the DB in UTF8 ab initio. The legacy EDICT/EDICT2 files
can be converted to EUC at generation time.
As for programming languages, well whatever is suitable. AFAICR the main
scripting languages for working with MySQL are Perl and PHP, with maybe
Python and Java as options too. The C API seems to be pretty bad. If I were
doing it, I'd probably use PHP (since it's C embedded in HTML-like stuff),
but I have no firm views. As long as it works and is portable and
Anyway, that's my mental image of what is needed/desired. Is there
anyone out there used to building MySQL/Perl|PHP|etc. online systems?
Jim Breen http://www.csse.monash.edu.au/~jwb/
Clayton School of Information Technology, Tel: +61 3 9905 9554
Monash University, VIC 3800, Australia Fax: +61 3 9905 5146
(Monash Provider No. 00008C) ジム・ブリーン@モナシュ大学