Loading ...
Sorry, an error occurred while loading the content.

Draft for a multilingual conlang dictionary web-application (was: Conlanging Software Wish List)

Expand Messages
  • Jan Strasser
    As mentioned in my previous message (http://listserv.brown.edu/archives/cgi-bin/wa?A2=conlang;23cd1b5a.1307e), here s a rough draft of what I d like to see in
    Message 1 of 1 , Jul 31, 2013
      As mentioned in my previous message (http://listserv.brown.edu/archives/cgi-bin/wa?A2=conlang;23cd1b5a.1307e), here's a rough draft of what I'd like to see in a multilingual conlang dictionary. It's mostly a list of entity types with attributes; some are still incomplete but most of them are already similar to what the actual database might look like, with entity types corresponding to tables, and attributes corresponding to columns in these tables. (FK) indicates a Foreign Key (i.e. a field that references another field, usually from a different table); (n) indicates that the field must allow for several items entered into it.

      All of this is to be seen in the context that I'm envisioning this dictionary management system as a web application that every conlanger could run on his own webserver. Near the bottom of the message are some keywords regarding the user interface, both the web frontend (for readers) and the web backend (for the conlanger himself to enter and manage data). An important feature is import/export from/to various formats; this is to be handled via plugins that use a simple markup language to define how the output of a lexicon entry should be formatted. I personally want to use the software as a reference for all lexical and etymological data in the collaborative conworld Akana, and also as a tool for helping me find suitable sources for loanwords when coming across a concept I don't have yet while translating a text into one of my conlangs.

      I haven't actually started implementing any of this, but:
      - I would like to hear some opinions from other conlangers. Do you think this architecture would work for a multilingual dictionary application? Would you like to use something like this yourself? Which features, requirements, or usecases am I missing?
      - Some or all of this might be relevant to the current discussion in the thread "Conlanging Software Wish List". I think I'd eventually be able to code my dictionary project on my own if somebody helps me choose a suitable programming language (I have experience in Java and JavaScript only, and I don't want to use Java for this) and set up the programming environment (something I've never done so far), but there might be more benefit for me and everyone else if these ideas were integrated into one of the other conlanging software projects - Patrick VanDusen's "Wishlist program", Logan Kearsley's dictionary software, a possible revival of Sai and Alex Fink's Kura2 project, or something else. Alternatively, if it seems more reasonable to run this as a separate project, maybe someone of you would be interested in collaborating with me.



      - ID
      - orthography
      - phonemic (optional, ideally auto-generated)
      - phonetic (highly optional, ideally auto-generated)
      - (FK) language
      - (FK) part_of_speech (language-specific, possibly 1:n - but what's the hierarchy between PoS and sense then???)
      - (FK, n) senses
      - (FK, n) sense_comments (applies to the relationship between word and sense!)
      - (FK) source_language
      - (FK) source_word
      - (FK) source_relationship
      - etymology_comment
      - (FK, n) examples
      - morphology_comment (some languages require multiple variants for this, based on PoS!)
      - (FK) register (optional; default: general)
      - usage_comment

      - ID
      - name
      - abbreviation (free abbreviation)
      - iso_code (for natlangs)
      - time_early
      - time_late
      - (FK) parent_language
      - family (name of the language family descending from this node, e.g. Latin > "Romance")
      - sort_order (a string that defines the default sort order for the lexicon, allowing for polygraphs)
      - (FK, n) parts_of_speech
      - (FK, n) morphology_details (e.g. principal parts, irregular forms, inflection class...)
      - orthography_to_phonemic (optional; SCA ruleset for converting <word> to /word/)
      - phonemic_to_phonetic (optional; SCA ruleset for converting /word/ to [word])
      - phonemic_to_orthography (optional; SCA ruleset for converting /word/ to <word>)

      - ID
      - (FK) supertype (i.e. "v.it" has supertype "v", "n.anim" has supertype "n")
      - (FK, n) language (possibly the other way around, i.e. every language tracks which PoS exist in it?)

      - ID
      - name
      - abbreviation (glossing abbreviation of the relevant form, e.g. "pfv.sg" for "perfective singular")
      - explanation
      - (FK, n) part_of_speech (i.e. where is this type of morphological detail relevant?)

      SENSE (maybe distinguish between lexicographic "sense" and taxonomic "concept"???)
      - ID
      - gloss
      - definition
      - (FK) buck_list_id
      - (FK) rapidwords_classification (+ other semantic categorizations)
      - (FK, n, ???) semantic_fields (i.e. additional semantic fields not covered by categorizations)
      - (FK, n) synonyms
      - (FK, n) antonyms
      - (FK) hypernym
      - (FK, n) other_semantic_relations (e.g. metaphor, metonymy etc., language-specific)
      - (FK, n) domain (e.g. specialized vocabulary; differs from semantic field mostly by when it should be displayed)

      SOURCE_RELATIONSHIP (maybe distinguish between diachronic and synchronic sources???)
      - ID
      - name (e.g. reflex, old_loan, contemporary_loan, calque, derivation, compound,...)
      - explanation (optional)

      - ID
      - (FK) language
      - orthography
      - phonemic (optional)
      - phonetic (optional)
      - morphemic_breakdown (should be able to include links to individual morphemes)
      - functional_gloss
      - semantic_gloss (optional)
      - literal_translation (optional)
      - free_translation

      - ID
      - code (i.e. the actual code, including the initial letter; fixed)
      - sortkey (i.e. just the number, free [and thus extensible!])
      - (FK) semantic_field

      - ID
      - code
      - sortkey
      - (FK) semantic_field

      - ID
      - abbreviation (e.g. jur., biol., math.)
      - explanation (optional)

      - ID
      - name (e.g. metaphor, metonymy, ...)

      - ID
      - name (e.g. elevated, formal, polite, general, colloquial, derogatory, vulgar)
      - sortkey


      PERMISSION (for each language, for words in general, for export scripts...)
      EXPORT_SCRIPT (for generating output as HTML, PDF, wikicode, LaTeX, Toolbox MDF, JSON, CSV...)



      - simple lookup
      - regex search
      - advanced search (combination of several features; filtering; some features should come as a list from the DB)

      - basic view (one or many entries, some details, possibly configurable)
      - text view (one or many entries, formatted as in a typical printed dictionary)
      - detail view (one entry only, all important details)
      - mass view (many entries [up to full lexicon] as a sortable table, basic information only)

      - automatic linking to related words (both semantic relations and etymological relations)
      - option to toggle interlinears for example sentences (requires parsing spaces for alignment)
      - various sorting options
      - easily themable with CSS
      - highly configurable with export scripts



      - clean & simple layout
      - tab & arrowkey navigation
      - shortcuts for accessing specific fields
      - listboxes for selecting predefined data (populated from DB)
      - in some cases, a lookup-box should be used instead (combined text field & listbox) (AJAX)
      - automatic parsing of gloss/definition fields to retrieve suitable sense suggestions (AJAX)
      - automatic suggestion of suitable source words, e.g. by semantic field (AJAX)
      - one-to-many associations handled by widgets added/removed on demand

      - show alphabetically (or semantically) close words in the background (for inspiration)
      - entry mode: quick navigation to new word while saving data (optionally keeping certain fields filled)
      - edit mode: quick navigation to next/previous word while saving/discarding data
      - optionally: auto-generation of phone(m/t)ic representation from orthography or vice versa
      - optionally: auto-generation of phonemic representation from etymological source word

      - if there are built-in SCA scripts, they should use syntax highlighting and automatic processing of user-defined example words

      - handle spaces for aligning interlinears

      - easy-to-use markup language (~"API")
      - syntax highlighting
      - placeholders that can be resolved to links in the output file



      - built-in filters for CSV, Toolbox MDF, Lexical Markup Framework,...
      - optional: possibility to define additional import scripts via an API

      - predefined export scripts (written in a custom markup language):
      -- standard HTML views: basic, text, detail, mass
      -- traditional Akana wikicode
      -- CSV lexicon overview
      -- dictionary file output (i.e. "HTML text view" converted to PDF and/or LaTeX source code)
      - all of the above, with GUI-configurable options
      - possibility to define additional export scripts (probably best implemented as an interplay of simple custom markup with plugins for conversion to HTML, PDF, wikicode, LaTeX, Toolbox MDF, Lexical Markup Framework, JSON, CSV etc.)
      - possibility to add new plugins
    Your message has been successfully submitted and would be delivered to recipients shortly.