Loading ...
Sorry, an error occurred while loading the content.

Re: Conlanging Software Wish List

Expand Messages
  • H. S. Teoh
    ... The code doesn t have to necessarily be usable as command-line tools, but the point was that they should have properly-designed APIs so that anyone who
    Message 1 of 29 , Jul 31 7:36 AM
    • 0 Attachment
      On Tue, Jul 30, 2013 at 07:03:33PM -0400, Patrick VanDusen wrote:
      > On Tue, 30 Jul 2013 14:52:52 -0700, H. S. Teoh <hsteoh@...> wrote:
      > >My interest is mainly in building reusable libraries that can be
      > >easily customized and incorporated into conlang/conlanger-specific
      > >tooling. I'm a hardcore command-line interface person, so GUI
      > >interfaces are actually detrimental to me. I'd rather just have
      > >backend libraries and I can hook into scripts and such, instead of
      > >being forced to use a GUI that doesn't integrate well into the way I
      > >work.
      >
      > This should be doable. Obviously what I'm working on is a big,
      > integrated web-based application, but there's no reason that
      > individual pieces of functionality shouldn't be separated out into
      > reusable libraries which can be used as command-line tools. I can see
      > us ending up with a whole suite of individual python modules for
      > different purposes, that anyone can import into their own application
      > (including my own Django one).

      The code doesn't have to necessarily be usable as command-line tools,
      but the point was that they should have properly-designed APIs so that
      anyone who wants to can write a CLI frontend for them, or, for that
      matter, an automated script that does a particular task.


      [...]
      > We should probably define standards for data interchange in multiple
      > common formats (JSON and XML both would qualify).

      Good idea, though I'd rather we standardize on a single format.
      Otherwise you might end up with out-of-sync code between the JSON parser
      and the XML parser, which just adds lots of coding effort for not much
      gain.

      OTOH, maybe what's needed is to (for example) standardize on JSON
      internally, then have a tool that converts between JSON/XML so that you
      only need one parser for extracting the actual data, the tool would
      basically be just for transcoding.


      > As far as data-storage goes, that's probably a somewhat domain-
      > specific problem (although my first thought would be to go to
      > something like sqlite).
      [...]

      I like sqlite, it's lightweight, doesn't require a full-fledged SQL
      server, and pretty good for what it does.


      On Tue, Jul 30, 2013 at 05:00:10PM -0600, Logan Kearsley wrote:
      > On 30 July 2013 15:52, H. S. Teoh <hsteoh@...> wrote:
      [...]
      > > My interest is mainly in building reusable libraries that can be
      > > easily customized and incorporated into conlang/conlanger-specific
      > > tooling. I'm a hardcore command-line interface person, so GUI
      > > interfaces are actually detrimental to me. I'd rather just have
      > > backend libraries and I can hook into scripts and such, instead of
      > > being forced to use a GUI that doesn't integrate well into the way I
      > > work.
      >
      > We are of one mind in this. However, a front-end interface is one of
      > those re-useable components that's kind of essential to most people
      > using the system.

      Sure. I just don't like the back-end code being inextricably bound to
      the GUI front-end. Ideally, the system architecture should look
      something like this:

      GUI <-> internal data format <-> algorithms

      Basically, the GUI code works with the common internal data format that
      is UI-independent, and the back-end algorithms operate on this internal
      format. That way, another front-end can be written that understands the
      internal format, then it can reuse the same algorithms without caring
      about the GUI code.

      In fact, this architecture may be useful even for (multiple) GUI
      front-ends: the algorithms could have, say, a JSON interface for
      transmitting/receiving the internal data format, then you could have a
      client-side Javascript app that can talk to the back-end, as well as a
      native GUI app (say a native windows app) that also talks to the
      back-end in the same way, or even a server-side app (if we want, say, a
      thin client implementation).

      Additionally, we can implement an automated unit-testing framework by
      writing test drivers that talk directly to the back-end, that way we can
      build up a suite of unittests for ensuring that bugfixes / new features
      don't break existing functionality. (It would be a far bigger pain if we
      had to write scripts to interact through the GUI!)

      [...]
      > I absolutely agree. That's why I currently have a really, really nice
      > database and API design, and actual software that does about 5% of
      > what it's planned to do. Hopefully I can get the next 55% done in the
      > time it took to do all of that architectural work (and hopefully I
      > won't find out that the current architecture doesn't work when I start
      > adding more components).

      Now you've piqued my interest. What kind of database design do you have?


      > >> On 30 July 2013 10:46, H. S. Teoh <hsteoh@...> wrote:
      [...]
      > > I'm a critic of using XML-based formats for data storage, actually.
      > > I'm all for them in terms of data *interchange*, but they're
      > > dreadfully inefficient formats for *storage*. Ideally, any
      > > conlanging tool we build, whether collaboratively or independently,
      > > should be able to "speak" the common XML-based format, so that it's
      > > easy to exchange data with each other; but for internal storage, I'd
      > > use something optimized for whatever the tool needs to do.
      >
      > Oh, so am I. LMF, however, does not mandate the use of XML- a sample
      > XML serialization is provided as an informative, but not normative,
      > addition to a subset of the LMF standard. LMF is all about abstract
      > data modelling for lexicographic information. That's useful for
      > designing storage formats as well as a common (potentially but not
      > necessarily XML-based) interchange format. And making sure that all o
      > your data conforms to LMF models ensures that whatever format you end
      > up using, it's fairly easy to translate into any other specific format
      > that's also LMF-based.

      Good point.


      > Contrast with TBX, which spends very little time on the abstract TMF
      > model and then goes whole-hog on normative XML implementation....
      > Yuck.

      Talk about premature standardization. :)


      [...]
      > >> > - A lexicon database tool:
      > >> >
      > >> > (1) Support searching by head word or definition, filtered by
      > >> > entry type, regex pattern searching. Entries should include all
      > >> > affixes and morphemes, not just full-on words.
      > >>
      > >> Easy to do, hard to do efficiently. Not a problem for most
      > >> conlangers, would be a problem for managing natural-language lexica
      > >> with hundreds of thousands of entries, but fortunately it's a
      > >> mostly-solved problem.
      > >
      > > Regexes on a large corpus is a mostly-solved problem? I didn't know
      > > that. :) Last I heard, only a subset of search functionality is
      > > provided. But sometimes you just *have* to use a regex to find that
      > > one almost-forgotten word in your conlang, so if anything, a linear
      > > search fallback to support full regexing is a requirement for me. I
      > > don't care if it's slow, as long as it's *possible*. Most search
      > > patterns are trivial, anyway, so most of the time we can use the
      > > fast searching algorithms.
      >
      > Well, not the regex part; efficiently searching large databases on
      > arbitrary entries, though- that's mostly solved, insofar as the
      > relevant algorithms and the tradeoffs involved in them are all known
      > and have reference implementations.

      OK, makes sense.


      [...]
      > > I'd use dedicated data storage formats, but have import/export
      > > capabilities for interchange. Then my custom scripts can just output
      > > XML and feed it into, say, a HTML renderer that you wrote, and
      > > things would Just Work with a minimum of hassle. There's no need for
      > > a common data storage format as long as there's a common interchange
      > > format.
      >
      > That is exactly the architecture I have in mind. For "storage" in this
      > case I perhaps should've said "archival", the idea being that you want
      > to be able to save your work and still open it ten years later with
      > different or upgraded software, which is a somewhat different use case
      > from transferring data between different people's systems.

      Ah, I see. Yeah, for archival I agree a standardized format would be
      good, otherwise you may find 10 years down the road that all the
      precious archived data you carefully backed up all these years can't be
      understood by present-day software anymore!


      > >> > (3) Sort entries by any arbitrary order defined by the conlang,
      > >> > not just ASCII or Unicode order. Be able to verify that entries
      > >> > actually follow this order (not relevant if everything is in a
      > >> > SQL database where you can just sort search results).
      > >>
      > >> This is suprisingly difficult, but also a solved problem.
      > >
      > > Really? Why is it difficult? Isn't it just a matter of defining a
      > > linear order on the entries? It could be just a 3-way comparison
      > > function that returns the order of two strings (ala strcmp).
      >
      > Well, yes, at a certain level of abstraction. But the implementation
      > of that comparison function is not trivial. And the most efficient
      > string sorting algorithms (like trie-base burst sort) require more
      > detailed sorting information than that.

      I see. Still, I don't see it as an inherently hard problem. One just has
      to be able to define the desired ordering in a precise manner.


      > >> Any viable conlang or arbitrary-natlang documentation software
      > >> *must* realistically operate entirely in Unicode (of which ASCII is
      > >> a subset). Fortunately, the Unicode spec provides a system for
      > >> defining arbitrary custom sort orders (including weirdnesses like
      > >> sorting backwards for Canadian French diacritics) and an algorithm
      > >> for performing collation on them. See
      > >> http://www.unicode.org/reports/tr10/ (I have had that tab open in
      > >> my browser for easy reference continuously for the last 8 months).
      > >
      > > Unfortunately, ;-) I've read that TR before. Unicode collation is an
      > > extremely complicated beast. For conlang purposes, though, I
      > > wouldn't bother with natlang sorting algorithms -- that's just
      > > needless complexity. Instead, I'd focus on how the conlanger can
      > > define custom sorting orders on Unicode strings.
      >
      > I don't imagine conlangers expect to limit themselves to sorting rules
      > any less complex than what natlangs make use of.

      Yes, but I doubt many of them would reuse the exact natlang sorting
      rulese specified in TR10. So personally, I would just create a generic
      sorting framework for expressing arbitary ordering rules, and implement
      the sorting algorithms for that, instead of implementing TR10 and then
      trying to figure out how to make it extensible to conlangs.


      > >> > (4) Automatically verify that head words conform to conlang's
      > >> > morphological rules (see below on automatic parsing). The idea
      > >> > is to prevent accidental addition of entries that violate rules
      > >> > of phonology that you have set down.
      > >>
      > >> This is *really hard*.
      > >
      > > Really? I thought phonological rules should be relatively easy to check
      > > against a prospective entry: just run the string through the rules to
      > > see that all of them are satisfied.
      >
      > Oh, phonological rules, sure (as long as you provide an appropriate
      > representation; I wouldn't want to try it on standard English
      > orthography). Verifying morphology, on the other hand, is much more
      > difficult, unless you're willing to allow a large number of
      > false-accepts.

      Well, I don't see it as a fully-automatic thing, though. It's more like
      when I'm adding a new entry to the lexicon, it would warn me if my head
      word obviously violates phonological constraints. It could also show me
      all possible morphemic breakdowns of the head word, so that I know that
      it makes sense according to current morphological rules. Obviously, when
      there is some ambiguity as to how to break things down, the computer
      can't decide for me whether that's a valid word, or whether it should be
      put on the exceptions list.


      [...]
      > > What I have in mind is basically a semantically-marked up format,
      > > sorta like docbook, where you lay out the logical structure of the
      > > text and then various export tools can convert it into various
      > > target formats. So it's quite a generic problem, in that you want to
      > > be able to express the logical structure of a document, but it's
      > > also specific in the sense that for conlangers, we're interested in
      > > including things like IPA, interlinear glosses, orthographic
      > > fragments, etc., in the text.
      > >
      > > Better yet, if the document itself can define additional logical
      > > elements, so that you can, for example, define placeholders for a
      > > particular glyph/morpheme you haven't quite decided on yet, and use
      > > these placeholders throughout, then when you finally settle on the
      > > Unicode character to be used, you can just change the definition of
      > > the placeholder and the entire document will be updated
      > > automatically, without the painstaking and error-prone process of
      > > manual revision.
      > >
      > > (The importance of using *logical* placeholders rather than physical
      > > entities is that you have a logical, unique name for these things,
      > > even if their surface realization may be identical to other logical
      > > entities, so that when you decide that, say, U is better written as
      > > Ø, a single change replaces all relevant U's without also changing,
      > > incorrectly, all English U's in your grammar's text, as would happen
      > > if you only had search-n-replace at your disposal.)
      > >
      > > Having logical markup also frees your attention from the
      > > nitty-gritty and distraction of visual formatting to concentrate on
      > > the content of your text.
      >
      > This is the kind of thing that XML was made for, and is actually good
      > at. I bet we could probably knock out a suitable format in a month or
      > so. And then there's the multi-year ISO standardization process...
      > (somebody will complain loudly if it doesn't conform to TEI, which I
      > think is largely a useless bit of overengineering...).

      I can live with not being standardized. ;-) As long as the DTD /
      documentation / whatever is published online, it's good enough in my
      book.

      I glanced over the TEI... looks like a nasty piece of overengineering,
      all right.

      I'm no fan of dealing with XML by hand, though. It's overly verbose, and
      more suitable for machine parsing. But that's not a problem; there are
      various other isomorphic (and more human-readable) syntaxes that can be
      easily mapped to/from XML, so I can live with it. :) In fact, with the
      help of a few clever regexes, one could even invent one's own format
      quite easily.


      [...]
      > > As long as the interchange format itself is expressive enough to
      > > represent all the possibilities, then each component can eventually
      > > be improved to handle all the features. It's far worse to have a
      > > deficient common interchange format, then you're stuck when you need
      > > to extend it to cover a new feature.
      >
      > Exactly the problem I have- I'm stuck with W3C standard interchange
      > models that have no correspondence at all to the vast array of
      > completely independently developed proprietary formats and rendering
      > models.

      Yeah, that's a problem.


      [...]
      > >> This is really hard. I know a professor who works on corpus search
      > >> tools (Mark Davies, the guy who created the Corpus of Contemporary
      > >> American English [http://corpus.byu.edu/coca/%5d and Corpus of
      > >> Historical American English [http://corpus.byu.edu/coha/%5d), and the
      > >> current state of the art in corpus creation is pretty much the same
      > >> as the state of the art in professional dictionary publication- you
      > >> start over and write custom software for pretty much every project.
      > >
      > > It's really that hard? What I'm looking for is really just a
      > > database of texts with tags on them -- conlang text "abc def ghi":
      > > covered_in_grammar(section 1.2); category(short phrases).
      >
      > What tag set do you use, and how do you apply them? Tagsets tend to be
      > very language-specific, and there's very little work on defining a
      > language-agnostic tagging framework.

      *Can* there be a language-agnostic tagging framework, though? Esp. if
      you're talking about *conlangs*, which can range anywhere from your tame
      SAE clone to alienlangs like Rikchik to wild untameable beasts that defy
      all linguistic categorization.

      I see this mainly as a conlanger-specific tool, something to help the
      conlanger organize notes, etc., so I don't think it's an issue for the
      tags to be language-specific or conlanger-specific.


      [...]
      > > The automatic stemming part is the harder task. A possible solution
      > > may be to prompt the user to disambiguate parses when submitting the
      > > text to the database. Or maybe treat all possible parses equally and
      > > index it that way -- you'll get some false positives sometimes, but
      > > it should be liveable. (Although, on second thoughts, that could
      > > result in exponential index growth with exponential number of false
      > > positives, given sufficiently long texts. Maybe it's intractible
      > > after all. :-/)
      >
      > Yeah, you don't want to index all possible parses. You do want to
      > index multiple possible parses, but not *all* possible parses. The
      > approach we (in my lab) were working towards was human-assisted
      > tagging/parsing, where the computer does as much as it can above a
      > certain confidence level, and then asks for human assistance to add
      > tagging information when it doesn't feel sure enough to proceed
      > (basically, constraining the paths through a tag lattice that can be
      > considered by the Viterbi algorithm).

      Interesting. I've never heard of the algorithm before... time to do some
      reading when I get some free time.


      [...]
      > >> > One thing obvious that I didn't include is automatic voice
      > >> > generation for conlangs. It *would* be very nice to have, but
      > >> > unfortunately, it appears to be far more complex than it seems at
      > >> > first glance.
      > >>
      > >> Incredibly difficult for doing straight text-to-speech, but maybe
      > >> fudgeable if you include IPA transcriptions in dictionary entries.
      > >
      > > I had in mind IPA transcriptions. Straight text-to-speech is a bit
      > > too high a pie in the sky. :) Unless, of course, you have a 1-to-1
      > > mapping from orthography to IPA, then there's a slight chance.
      > >
      > > But even with IPA, it's extremely difficult, because in real speech,
      > > pure phonemes are rarely ever attained. Rather, the vocal apparatus
      > > is constantly moving towards successive pure positions, cutting
      > > corners where the resulting sound won't be noticeably different,
      > > etc.. If you like visualizing things, phonemes are like landmarks in
      > > the configuration space of the vocal apparatus, each dimension
      > > corresponding to a parameter, and speech is like a winding path
      > > through this space. It would've been simple if it were a polygonal
      > > path from landmark to landmark, but in reality, it's a smoothed-out
      > > curve that passes near, but not quite at, each landmark, with the
      > > parts before and after each landmark influencing how the curve
      > > winds. To synthesis realistic speech, you have to be able to model
      > > the vocal apparatus close enough to be able to trace out a curve in
      > > this configuration space that's close enough to sound realistic. A
      > > rather tall order!
      >
      > Praat does that:
      > http://www.fon.hum.uva.nl/praat/manual/Articulatory_synthesis.html It
      > may not be perfect, but it's something.

      Interesting indeed! I'll have to check that out sometime.


      [...]
      > On 30 July 2013 16:51, Patrick VanDusen <pdusen@...> wrote:
      > > Hi Logan,
      > >
      > > You seem to have made significantly more progress in this direction
      > > than I have. Plus, coincidentally, Django is the exact platform I'd
      > > planned on using. It would be pretty silly of me to turn down your
      > > offer for collaboration ;)
      > >
      > > I'm sorry I don't have a more concrete suggestion (I didn't expect
      > > any actual offers for help with the project), but please find me on
      > > github (https://github.com/pdusen) and we will hash out a plan!
      >
      > Will do!

      I followed you on github, but it doesn't seem to show up on your page?

      Anyway, you can find me here: https://github.com/quickfur


      > >>Rather than using personal accounts, I'd suggest setting up an
      > >>organizational account for collaboration; it's really easy to do on
      > >>GitHub (and free). I've got one currently set up for the
      > >>lexicography project I've been working on for the last year, but if
      > >>you want to start a fresh project, maybe the LCS could set up an
      > >>organization account for conlanging-related stuff?
      > >
      > > Git's native workflow involves every collaborator having a personal
      > > account and repository, and then pulling the appropriate changes
      > > from one another; that's how I'd like to work with it.
      >
      > Well, yes. But it's also nice to have an "official" repository for
      > people to get the software from for use. When it's one person's
      > project that other people happen to be helping out on, then it makes
      > sense for that to be part of a personal account. For
      > from-the-ground-up collaborations, though, I like having
      > organizational accounts.
      [...]

      I agree. The official repo should be the organization's, and we can
      individually fork the repo to work on whatever code we're working on,
      and merge them back when they're ready. It helps to have a single place
      to look for the latest changes and evaluate the current state of things.


      T

      --
      What do you mean the Internet isn't filled with subliminal messages?
      What about all those buttons marked "submit"??
    • Jan Strasser
      ... I m also very much interested in this project, and willing to help. I m not a professional developer by any means, but I have some experience in frontend
      Message 2 of 29 , Jul 31 9:07 AM
      • 0 Attachment
        >> On 29 July 2013 16:09, Patrick VanDusen wrote:
        >> Hi all,
        >>
        >> I'm a software developer who would like to (in my spare time, of
        >> course) create a large-scale web application to support conlanging.

        I'm also very much interested in this project, and willing to help. I'm not a professional developer by any means, but I have some experience in frontend programming in Java/GWT (which I *don't* recommend using!) and JavaScript. A project like this would be a welcome opportunity to familiarize myself with other programming languages, environments, and database access. I'd need some help in setting up things on my computer before I'd be able to contribute much, but I hope some of you can help me with that.

        I've been making plans for a multilingual, database-driven, web-based conlang dictionary lately, which might have a few interesting points to contribute to a larger-scale conlanging suite. Some of these are in turn based on discussions by Sai, Alex and Arthaey on the Kura2 mailing list in 2009: http://lists.conlang.org/pipermail/kura2-conlang.org/ I'll talk about my dictionary ideas in a separate e-mail though, so that I can focus on commenting about earlier suggestions here.

        >> On 30 July 2013 15:52, H. S. Teoh wrote:
        >>> On 30 July 2013 12:39, Logan Kearsley wrote:
        >>> I've been working on a research project in computational
        >>> lexicography for almost the last year to develop software for
        >>> documenting languages in dictionary and translation termbase forms.
        >>> [...]
        >>> Would you be at all interested in collaborating on that?
        >>> I've done a *ton* of work on database design and the basic API for
        >>> programmatic data manipulation, but there's still a lot to be done in
        >>> terms of adding new features and actually making the darn thing
        >>> Usable.
        >>> [...]
        >>> You'll note that a lot of the feature requests are things like "I want
        >>> to be able to output stuff in any format- PDS, LaTeX, HTML, ODF,
        >>> etc."; easily supporting that kind of things requires good
        >>> architecture to allow for different display and formatting plugins,
        >>> which in return requires extensively documented and standardized data
        >>> formats that you can give to plugins or other programs to work with
        >>> with, and that's what's been taking up most of my time on the LexTerm
        >>> project.

        That's one of the most important requirements for my own multilingual dictionary too. I want to be able to use it as a central repository for lexical and etymological data in the collaborative conworld Akana, which means that it'll have to support a wide variety of usecases: detailed single-word dictionary entries, full-lexicon overview tables, semantically ordered wordlists, generation of text for wiki pages (including automatic wikilinks) and more. All this should be relatively easy to do *if* the software architecture allows retrieving the relevant information and handing it over to customizable output plugins. Logan, please have a look at my upcoming e-mail about the dictionary database in order to see whether we have been thinking in a similar direction.

        > On 30 July 2013 17:00, Logan Kearsley wrote:
        >> On 30 July 2013 15:52, H. S. Teoh wrote:
        >> My interest is mainly in building reusable libraries that can be easily
        >> customized and incorporated into conlang/conlanger-specific tooling. I'm
        >> a hardcore command-line interface person, so GUI interfaces are actually
        >> detrimental to me. I'd rather just have backend libraries and I can hook
        >> into scripts and such, instead of being forced to use a GUI that doesn't
        >> integrate well into the way I work.
        >
        > We are of one mind in this. However, a front-end interface is one of
        > those re-useable components that's kind of essential to most people
        > using the system.

        I have little experience with the command line myself, apart from using it to run a sound change applier. But whatever the experience, for me a front-end interface would not just make things more accessible to use. A well-designed and interactive interface can be a legitimate goal in and of itself. In a dictionary management application, I can easily imagine an interface that actively helps developing the lexicon, e.g. by automatically providing suggestions for things like semantic field, synonyms, antonyms, etymology, phone(m/t)ic representation etc., for example by using AJAX in a web-based environment.

        That said, planning out the application mostly as a collection of individual reusable modules/libraries is definitely a good idea!

        > On 30 July 2013 17:00, Logan Kearsley wrote:
        >> On 30 July 2013 10:46, H. S. Teoh wrote:
        >>> On 30 July 2013 12:39, Logan Kearsley wrote:
        >>> In any case, the idiosyncratic, not-easily-shareable, format issue is
        >>> one of the things I've been working really hard to solve over the last
        >>> year. I can giuve a rather extensive presentation on the problems and
        >>> what I've done to solve them if anybody's interested, but I *strongly*
        >>> recommend anybody looking to tackle the issue of creating a dictionary
        >>> storage format from scratch to check out LMF (Lexical Markup
        >>> Framework, http://www.lexicalmarkupframework.org/).
        >>
        >> I'm a critic of using XML-based formats for data storage, actually. I'm
        >> all for them in terms of data *interchange*, but they're dreadfully
        >> inefficient formats for *storage*. Ideally, any conlanging tool we
        >> build, whether collaboratively or independently, should be able to
        >> "speak" the common XML-based format, so that it's easy to exchange data
        >> with each other; but for internal storage, I'd use something optimized
        >> for whatever the tool needs to do.
        >
        > Oh, so am I. LMF, however, does not mandate the use of XML- a sample
        > XML serialization is provided as an informative, but not normative,
        > addition to a subset of the LMF standard. LMF is all about abstract
        > data modelling for lexicographic information. That's useful for
        > designing storage formats as well as a common (potentially but not
        > necessarily XML-based) interchange format. And making sure that all o
        > your data conforms to LMF models ensures that whatever format you end
        > up using, it's fairly easy to translate into any other specific format
        > that's also LMF-based.

        I've been thinking about a database-driven web application instead of something that saves its data as a file of any sort. This is mostly because I eventually want to publish everything about my conlangs to the web, and a db-driven application allows the visitor to perform the same searches that the application needs to be able to do for my own purposes anyway, so saving in a database provides more features while requiring less effort for publishing. However, the program should of course be able to import and export from/to LMF and a variety of other file formats.

        > On 30 July 2013 17:00, Logan Kearsley wrote:
        >> On 30 July 2013 10:46, H. S. Teoh wrote:
        >>> On 30 July 2013 12:39, Logan Kearsley wrote:
        >>> Another related project that I started doing planning work on but then
        >>> never quite got around to finishing is a comparative database of
        >>> dictionary formats. E.g., each entry is a sample page from some
        >>> published dictionary, with the image annotated with where all of the
        >>> fields are and what their purposes are. The LCS offered to provide
        >>> webspace for that if it ever does get done.
        >>
        >> That would be interesting to know, how to lay out conlang lexicons.
        >>
        >> OTOH, if we have software libraries for dealing with lexicons, we could
        >> simply have a handful of alternative layout styles that can be easily
        >> selected and exported. One could even export the same lexicon in
        >> multiple output formats with no further effort. :)
        >
        > Indeed. Help in designing such styles was the main point of that project.

        And also one of the main points in my own collection of ideas for a dictionary application. See my other mail for details.

        > On 30 July 2013 10:46, H. S. Teoh wrote:
        > But you're asking for wishlists, so here is mine (admittedly rather
        > idealistic and may not be feasible to implement):
        >
        > - A lexicon database tool:
        >
        > (1) Support searching by head word or definition, filtered by entry
        > type, regex pattern searching. Entries should include all affixes and
        > morphemes, not just full-on words.

        Yes. Searching by part of speech, semantic field, etymology, and various other things should be supported too. Basically, every field in the lexicon database should be usable as a search domain. Logical combinations of several search conditions should also be possible. (And maybe even more complex requests, but it might be difficult to pass through the full power of SQL to the user without requiring him to actually type SQL...)

        > (2) Automatic rendering into LaTeX and/or HTML. With automatic
        > hyperlinks between cross-referenced entries.

        For both of these, various view options should be possible. Which fields should be included in which order, how should the output be formatted, etc. Also, export to wikicode, LMF, Toolbox, CSV, JSON etc.

        > (3) Sort entries by any arbitrary order defined by the conlang, not
        > just ASCII or Unicode order. Be able to verify that entries actually
        > follow this order (not relevant if everything is in a SQL database
        > where you can just sort search results).

        Yes.

        > (4) Automatically verify that head words conform to conlang's
        > morphological rules (see below on automatic parsing). The idea is to
        > prevent accidental addition of entries that violate rules of phonology
        > that you have set down.

        Nice idea, but this would be of relatively low importance for me personally. And I believe it'd be hard to implement; many languages have morphophonological rules that are not phonologically predictable.

        > (5) Support tagging of entries by category, so that you can, e.g.,
        > search for all words for animals, or all words related to thinking,
        > names of vegetables, furniture, body parts, etc..

        Yes, this is very important. Optimally, the list of tags to choose from would both (a) be open-ended and user-definable, and (b) incorporate one or more "standard" semantic classifications, e.g. the Buck wordlist: http://cals.conlang.org/word/list/buck/ and/or SIL's Rapid Words classification: http://rapidwords.net/resources/files/list-domains

        If you take this to its logical extreme, it might be useful to separate the concepts of "lexicographic sense" and "taxonomic concept" altogether, and build a full taxonomy of concepts into the program which can be used to automatically determine semantic field, synonyms, and antonyms etc. (because these relationships apply first of all to the concept, which simply happens to be included in the semantic range of the word). I don't know how feasible this is, and of course we'd have to make sure that the eventual taxonomy is as culture-neutral as possible, but the opportunities in the context of conlanging would be fascinating. (This idea was first suggested by Alex during the discussions for the Kura2 database schema: http://lists.conlang.org/pipermail/kura2-conlang.org/2009-April/000048.html)

        > (6) Optionally, support searching by stemming (see below on automatic
        > conlang parsing), very important if your conlang has complex mutation
        > rules between adjacent morphemes, which make it hard to find things
        > just by text search.

        Yes... but as you say, this might better be treated as optional, or as a later-version feature.

        For my own purposes, a lexicon management tool should absolutely have the following additional feature too:

        (7) Support for multiple languages and for the relationships between them. This includes things like linking the etymology field of each individual word to the etymon in the parent language, linking loanwords to their source word in the source language, possibly integrating a sound change applier to generate the reflex of a particular word or phrase in a daughterlanguage, and being able to query the database for, let's say, all words for different types of fish in several languages at once (so that the conlanger can decide which of these words to use as the source of a loanword, for instance).

        >- A document representation format suitable for writing conlang grammars
        > that is:
        >
        > (1) Semantically-encoded, i.e., no such nonsense as "bold" or
        > "italics", but rather "this word is in the conlang's orthography",
        > "this word is an IPA rendering", etc., which are then mapped to a
        > physical rendering using a stylesheet of some sort.
        >
        > (2) Supports complex interlinears. Just yesterday, in fact, I
        > discovered that somebody made a linguistic package for LaTeX called
        > ExPex, which supports very complex interlinears. I spent all day
        > typesetting my alien conlang's grammar, and it's looking pretty good,
        > even though there is still work to do. What I'd like for interlinears
        > is: a 3-part rendering, following ExPex's example: (a) a preamble
        > (e.g., for a conlang snippet in native writing); (b) a morphemic
        > breakdown (used as the "index" for vertical alignment of glossing
        > elements), following by one or more glossing lines; (c) a free
        > translation.
        >
        > (3) Multi-targetable: I'd like to be able to publish, from the same
        > source, conlang grammers in both LaTeX and HTML formats without any
        > manual intervention.

        Very good idea. As with the lexicon database, I think all of the export functionality should rely on an extensible collection of (configurable?) output plugins so that it would theoretically be possible to output the data in *any* format with a bit of fiddling with the plugin definitions.

        >- Fully-customizable, automated conlang text analyser. Basically, you
        > tell it the rules of morphology, then type in any arbitrary conlang
        > word, and it should be able to automatically parse it and give you the
        > morphemic breakdown. If there are multiple possible parses (as any
        > sufficiently naturalistic conlang must have), then it should present
        > all possibilities and give you a choice as to which one was intended.
        > The purpose of this is to (1) explore the rules of morphology as
        > you're developing it, by seeing how things would parse; (2) serve as
        > the basis of a conlang text spellchecker: check that what you wrote is
        > actually what you meant! Ideally, this should be keyed to the lexicon
        > tool so that all morphemes can be accounted for (and possibly generate
        > HTML with hyperlinks to lexicon entries for each morpheme -- see
        > Arthaey's conlang page
        > [http://www.arthaey.com/conlang/ashaille/writing/relayInverse2.html%5d
        > for an excellent example of interlinears with auto hyperlinks to
        > lexicon entries).

        It's worth noting that SIL's Toolbox and Fieldworks software can already do semi-automated parsing, and fairly well at that, so maybe a sensible approach would be to simply provide data interfaces for those applications.

        >- An idea database, to keep track of various conlang ideas that may or
        > may not be "officially" part of the conlang yet. This one is hard to
        > implement... basically, I'd like to be able to jot down ideas as they
        > come to me, and the software should somehow know to categorize it in
        > such a way that it's easy to find afterwards. At the very least, it
        > should allow searching by (1) date, (2) keywords, or (3) regular
        > expressions (as a last resort if you, say, forgot the keyword but know
        > roughly what it was about).

        Good idea, and it shouldn't be too difficult with a tag system similar to that of a blog.

        > There should be some way of tracking these idea snippets such that you
        > can check them off as officially approved, or rejected (optionally
        > with a reason why), etc.. If approved, associate it to a particular
        > section of the grammar / phonology / etc., that incorporates it.

        Can be solved with tags too. IMO a sensible approach would be to have two separate types of tags, for content on the one hand and for the conlanger's workflow on the other hand.

        >- A corpus-management tool: basically a database of texts, ranging
        > anywhere from short phrases, example sentences, to entire books
        > written in the conlang (if you ever get that far). These texts should
        > be readily searchable, preferably with a search engine that can handle
        > arbitrary conlang stemming (non-trivial, esp. in conlangs with very
        > complex morphology). Furthermore, there should be multiple indexes
        > defined on these texts; the idea is that if I wrote an example
        > sentence in the conlang to illustrate a particular point of grammar,
        > or a particular phonological mutation, then that sentence should be
        > keyed to that point of grammar / phonological mutation, such that 5
        > months later I can say "hmm I wonder what I wrote that related to this
        > point of grammar", then I can search for it and get all relevant
        > entries instantly, instead of having to flip through 50 pages of
        > scattered conlang notes. As to how such an indexing should be done,
        > probably it's good enough to maintain a list of topics, and then each
        > sample text can be tagged with one or more topics (sorta like blog
        > tagging, if you will).

        A database of texts is possible with SIL's software too; the disadvantage is that it's not easy to export these texts for uses other than analysing their lexical and grammatical structure. This is of course an artifact of targeting language *documentation* as opposed to language *creation*, and might be a point for re-implementing a corpus parser after all. Especially if it's meant to be tied into grammar writing. I don't see how the latter wouldn't also be useful for language documentation, but I don't think it's included in the current version of SIL Fieldworks...

        >- A conlang font-creation and native writing typesetting program.
        > [...]

        Not important for me personally, but a valid wishlist item! This is a fairly different thing though, so it should be a separate application IMHO.
      • Nicole Valicia Thompson-Andrews
        That website is screen-reader friendly. Mellissa Green @GreenNovelist ... From: Constructed Languages List [mailto:CONLANG@LISTSERV.BROWN.EDU] On Behalf Of
        Message 3 of 29 , Jul 31 9:56 AM
        • 0 Attachment
          That website is screen-reader friendly.

          Mellissa Green


          @GreenNovelist

          -----Original Message-----
          From: Constructed Languages List [mailto:CONLANG@...] On Behalf Of taliesin the storyteller
          Sent: Wednesday, July 31, 2013 9:11 AM
          To: CONLANG@...
          Subject: Re: Conlanging Software Wish List

          On 2013-07-30 00:09, Patrick VanDusen wrote:
          > I'm a software developer who would like to (in my spare time, of course)
          > create a large-scale web application to support conlanging. Some ideas I've
          > had for features, off the top of my head, include
          >
          > * Public/Private languages
          > * Sound Inventory (for each language)
          > * Language Families
          > * A uniform sound change applier (for language families)
          > * Lexicon

          I develop and run CALS: http://cals.conlang.org/

          It's got:

          * ✓ Public/Private languages
          * ✓ Language Families

          Not done but in alpha:

          * Sound Inventory (for each language)
          * Lexicon

          I am interested in co-developers and patch-submitters. I haven't worked
          much on it lately as moving it from SVN to GIT (and github) and going to
          django 1.3 was more work than I thought. It's grown organically and it
          shows... Plus... Bionade is no longer available in my country :'( More
          eyes and hands is a good incentive to finish it though, Bionade or not :)

          See http://cals.conlang.org/about/ for the tech.

          Future plans/wishes, in no particular order:
          * Django 1.6, and clean up the user-mess
          * Better registration/support for openid, oauth etc.
          * Finish lexicon-bits and minimal sound entry
          * REST API (I've experimented with tastypie)
          * Add neat UI for adding and changing sounds
          * Better scalability on small devices (not that a galaxy note 2 is small
          but I do check CALS on it)
          * Support for moderators
          * Somehow integrated bugtracker/feature requester
          * User-submitted translation exercises, with a voting-system to serve as
          a sort of moderation-queue, a bit ala. stack exchange
          * Tests, CI and better modularization


          t. aka kaleissin
        • Patrick VanDusen
          ... Making each module individually usable as a CLI application is actually quite trivial in Python, in my experience, so I would have the expectation of that
          Message 4 of 29 , Jul 31 10:48 AM
          • 0 Attachment
            On Wed, Jul 31, 2013 at 10:36 AM, H. S. Teoh <hsteoh@...> wrote:
            >
            > The code doesn't have to necessarily be usable as command-line tools,
            > but the point was that they should have properly-designed APIs so that
            > anyone who wants to can write a CLI frontend for them, or, for that
            > matter, an automated script that does a particular task.


            Making each module individually usable as a CLI application is actually
            quite trivial in Python, in my experience, so I would have the expectation
            of that being done.


            > [...]
            > > We should probably define standards for data interchange in multiple
            > > common formats (JSON and XML both would qualify).
            >
            > Good idea, though I'd rather we standardize on a single format.
            > Otherwise you might end up with out-of-sync code between the JSON parser
            > and the XML parser, which just adds lots of coding effort for not much
            > gain.
            >

            I would agree with making JSON the primary interchange format with other
            options being secondary, but I think you should be able to use a single
            spec for at least JSON and XML. The differences would just be syntactical.


            >
            > OTOH, maybe what's needed is to (for example) standardize on JSON
            > internally, then have a tool that converts between JSON/XML so that you
            > only need one parser for extracting the actual data, the tool would
            > basically be just for transcoding.
            >

            If we're dealing with python, we could just standardize on certain python
            classes for internal use and then let python's native serialization
            faculties handle all of the exporting. They're quite good, in my experience.


            > I like sqlite, it's lightweight, doesn't require a full-fledged SQL
            > server, and pretty good for what it does.
            >

            Indeed, I've used SQLite DBs to create "custom" file formats for more than
            one commercial application.


            > Additionally, we can implement an automated unit-testing framework by
            > writing test drivers that talk directly to the back-end, that way we can
            > build up a suite of unittests for ensuring that bugfixes / new features
            > don't break existing functionality. (It would be a far bigger pain if we
            > had to write scripts to interact through the GUI!)
            >

            Quite right. Since we'd ostensibly be dealing with a large number of
            discrete modules, I'd expect unit tests to be in place for each individual
            one.

            > Well, yes. But it's also nice to have an "official" repository for
            > > people to get the software from for use. When it's one person's
            > > project that other people happen to be helping out on, then it makes
            > > sense for that to be part of a personal account. For
            > > from-the-ground-up collaborations, though, I like having
            > > organizational accounts.
            > [...]
            >
            > I agree. The official repo should be the organization's, and we can
            > individually fork the repo to work on whatever code we're working on,
            > and merge them back when they're ready. It helps to have a single place
            > to look for the latest changes and evaluate the current state of things.


            Fair enough. I have already created a discussion group for this purpose (
            conlang-software-dev@...), so I can go ahead and set up some
            sort of "organizational" github repo for everyone to work off of.

            Patrick
          • Patrick VanDusen
            I mentioned it in another message, but for anyone who missed it, I have set up a separate mailing list for discussion on the topic of conlang software
            Message 5 of 29 , Jul 31 10:52 AM
            • 0 Attachment
              I mentioned it in another message, but for anyone who missed it, I have set
              up a separate mailing list for discussion on the topic of conlang software
              development (and particularly the projects we've been discussing).

              conlang-software-dev@...

              I've already invited a few of you, but anyone else who wants to is more
              than welcome.

              Patrick
            • Nicole Valicia Thompson-Andrews
              Since I ll be testing it, I ll join, what s the subscription link? Mellissa Green @GreenNovelist ... From: Constructed Languages List
              Message 6 of 29 , Jul 31 11:42 AM
              • 0 Attachment
                Since I'll be testing it, I'll join, what's the subscription link?

                Mellissa Green


                @GreenNovelist

                -----Original Message-----
                From: Constructed Languages List [mailto:CONLANG@...] On
                Behalf Of Patrick VanDusen
                Sent: Wednesday, July 31, 2013 1:52 PM
                To: CONLANG@...
                Subject: Re: Conlanging Software Wish List

                I mentioned it in another message, but for anyone who missed it, I have set
                up a separate mailing list for discussion on the topic of conlang software
                development (and particularly the projects we've been discussing).

                conlang-software-dev@...

                I've already invited a few of you, but anyone else who wants to is more
                than welcome.

                Patrick
              • James Kane
                Hi I haven t read every post quite yet and a lot of them get pretty technical and go over my head a little. But I must say that Teoh s very extensive list
                Message 7 of 29 , Jul 31 5:13 PM
                • 0 Attachment
                  Hi

                  I haven't read every post quite yet and a lot of them get pretty technical and go over my head a little. But I must say that Teoh's very extensive list would be ideal for most conlangers. One thing that I personally would like is that making new entries in a dictionary should be very very quick.

                  This is the reason I use excel or word for my lexicons, even though they aren't really suited, because making new entries is incredibly quick. I once tried using Lexique and some of the other SIL softwares but they are terribly clunky, and some days I might make a few dozen new lexical items which I don't want to take half an hour just to put them in my lexicon. Keyboard shortcuts would be very handy.

                  I have another idea which might not be suitable for everyone and would probably be more useful for conlangs with large lexicons or for people unfamiliar with a conlang to look up words in it. Anyway, it's sort of based on how Google Translate works. Each entry in the conlang has a number of key words or phrases. Each time a word is searched for, any entry with that word amongst its keywords comes up.

                  Each entry also has a usage section and etymology section and derivatives sections etc. Clicking on any keyword brings up all the entries which have that keyword in the definition. To make it even more google translatish, there could be values assigned to each keyword or the keywords arranged in a hierarchy to show which translation is more common/better, as google translate does with its corpus statistics.


                  James

                  On 31/07/2013, at 4:46 AM, "H. S. Teoh" <hsteoh@...> wrote:

                  > On Mon, Jul 29, 2013 at 06:09:20PM -0400, Patrick VanDusen wrote:
                  >> Hi all,
                  >>
                  >> I'm a software developer who would like to (in my spare time, of
                  >> course) create a large-scale web application to support conlanging.
                  >> Some ideas I've had for features, off the top of my head, include
                  >>
                  >> * Public/Private languages
                  >> * Sound Inventory (for each language)
                  >> * Language Families
                  >> * A uniform sound change applier (for language families)
                  >> * Lexicon
                  >>
                  >> However, while I find conlanging fascinating, I'm at best a rank amateur.
                  >> So let me ask you, the professionals; what sort of functionality would you
                  >> look for in conlanging software? What features would appeal to you?
                  > [...]
                  >
                  > First of all, I'm no professional. None of my conlangs have turned out
                  > "normal". :-P
                  >
                  > As for conlanging software... I've in fact just (re)written my Tatari
                  > Faran dictionary tool to work with my new alien conlang as well. The
                  > tool supports searching headwords by arbitrary regular expressions,
                  > searching definition bodies by arbitrary regular expressions, counting
                  > search results, filtering by entry type (word, phrase, affix,
                  > pseudo-entry) verifying correct word ordering in the lexicon, checking
                  > cross-references between entries, etc.. However, my tool is nowhere near
                  > fit for public consumption, because it's dependent on my idiosyncratic
                  > text-based lexicon file format, which probably doesn't work well with
                  > the way other people conlang.
                  >
                  > But you're asking for wishlists, so here is mine (admittedly rather
                  > idealistic and may not be feasible to implement):
                  >
                  > - A lexicon database tool:
                  >
                  > (1) Support searching by head word or definition, filtered by entry
                  > type, regex pattern searching. Entries should include all affixes and
                  > morphemes, not just full-on words.
                  >
                  > (2) Automatic rendering into LaTeX and/or HTML. With automatic
                  > hyperlinks between cross-referenced entries.
                  >
                  > (3) Sort entries by any arbitrary order defined by the conlang, not
                  > just ASCII or Unicode order. Be able to verify that entries actually
                  > follow this order (not relevant if everything is in a SQL database
                  > where you can just sort search results).
                  >
                  > (4) Automatically verify that head words conform to conlang's
                  > morphological rules (see below on automatic parsing). The idea is to
                  > prevent accidental addition of entries that violate rules of phonology
                  > that you have set down.
                  >
                  > (5) Support tagging of entries by category, so that you can, e.g.,
                  > search for all words for animals, or all words related to thinking,
                  > names of vegetables, furniture, body parts, etc..
                  >
                  > (6) Optionally, support searching by stemming (see below on automatic
                  > conlang parsing), very important if your conlang has complex mutation
                  > rules between adjacent morphemes, which make it hard to find things
                  > just by text search.
                  >
                  > - A document representation format suitable for writing conlang grammars
                  > that is:
                  >
                  > (1) Semantically-encoded, i.e., no such nonsense as "bold" or
                  > "italics", but rather "this word is in the conlang's orthography",
                  > "this word is an IPA rendering", etc., which are then mapped to a
                  > physical rendering using a stylesheet of some sort.
                  >
                  > (2) Supports complex interlinears. Just yesterday, in fact, I
                  > discovered that somebody made a linguistic package for LaTeX called
                  > ExPex, which supports very complex interlinears. I spent all day
                  > typesetting my alien conlang's grammar, and it's looking pretty good,
                  > even though there is still work to do. What I'd like for interlinears
                  > is: a 3-part rendering, following ExPex's example: (a) a preamble
                  > (e.g., for a conlang snippet in native writing); (b) a morphemic
                  > breakdown (used as the "index" for vertical alignment of glossing
                  > elements), following by one or more glossing lines; (c) a free
                  > translation.
                  >
                  > (3) Multi-targetable: I'd like to be able to publish, from the same
                  > source, conlang grammers in both LaTeX and HTML formats without any
                  > manual intervention.
                  >
                  > - Fully-customizable, automated conlang text analyser. Basically, you
                  > tell it the rules of morphology, then type in any arbitrary conlang
                  > word, and it should be able to automatically parse it and give you the
                  > morphemic breakdown. If there are multiple possible parses (as any
                  > sufficiently naturalistic conlang must have), then it should present
                  > all possibilities and give you a choice as to which one was intended.
                  > The purpose of this is to (1) explore the rules of morphology as
                  > you're developing it, by seeing how things would parse; (2) serve as
                  > the basis of a conlang text spellchecker: check that what you wrote is
                  > actually what you meant! Ideally, this should be keyed to the lexicon
                  > tool so that all morphemes can be accounted for (and possibly generate
                  > HTML with hyperlinks to lexicon entries for each morpheme -- see
                  > Arthaey's conlang page
                  > [http://www.arthaey.com/conlang/ashaille/writing/relayInverse2.html%5d
                  > for an excellent example of interlinears with auto hyperlinks to
                  > lexicon entries).
                  >
                  > - An idea database, to keep track of various conlang ideas that may or
                  > may not be "officially" part of the conlang yet. This one is hard to
                  > implement... basically, I'd like to be able to jot down ideas as they
                  > come to me, and the software should somehow know to categorize it in
                  > such a way that it's easy to find afterwards. At the very least, it
                  > should allow searching by (1) date, (2) keywords, or (3) regular
                  > expressions (as a last resort if you, say, forgot the keyword but know
                  > roughly what it was about).
                  >
                  > There should be some way of tracking these idea snippets such that you
                  > can check them off as officially approved, or rejected (optionally
                  > with a reason why), etc.. If approved, associate it to a particular
                  > section of the grammar / phonology / etc., that incorporates it.
                  >
                  > - A corpus-management tool: basically a database of texts, ranging
                  > anywhere from short phrases, example sentences, to entire books
                  > written in the conlang (if you ever get that far). These texts should
                  > be readily searchable, preferably with a search engine that can handle
                  > arbitrary conlang stemming (non-trivial, esp. in conlangs with very
                  > complex morphology). Furthermore, there should be multiple indexes
                  > defined on these texts; the idea is that if I wrote an example
                  > sentence in the conlang to illustrate a particular point of grammar,
                  > or a particular phonological mutation, then that sentence should be
                  > keyed to that point of grammar / phonological mutation, such that 5
                  > months later I can say "hmm I wonder what I wrote that related to this
                  > point of grammar", then I can search for it and get all relevant
                  > entries instantly, instead of having to flip through 50 pages of
                  > scattered conlang notes. As to how such an indexing should be done,
                  > probably it's good enough to maintain a list of topics, and then each
                  > sample text can be tagged with one or more topics (sorta like blog
                  > tagging, if you will).
                  >
                  > - A conlang font-creation and native writing typesetting program. This
                  > is probably the most total pie-in-the-sky idea of them all, but my
                  > ideal font-creation program should be able to handle:
                  >
                  > (1) Both alphabetic and non-alphabetic writing. Handle creation of
                  > arbitrary glyph shapes and hinting (kerning, etc.).
                  >
                  > (2) Arbitrary diacritics and nesting of diacritics/ligatures to
                  > arbitrary levels.
                  >
                  > (3) Writing in any direction (not just L-to-R or R-to-L, but
                  > top-to-bottom, say, or boustrophedon).
                  >
                  > (4) Automatically handle arbitrary ligature rules, e.g., f + i -> fi.
                  > Ligatures across multiple characters should be supported.
                  >
                  > (5) Fully customizable input method, e.g., if your conlang has very
                  > complex glyphs, it should be possible to tell the program "when I type
                  > `xyz' it means glyph #571", so that you don't have to manually pick
                  > out the glyphs by selecting from a 10-level deep character map
                  > (seriously, I hate deeply-nested pull-down menus, they are slow and
                  > unnavigable).
                  >
                  > (6) Able to render output to any format, probably some graphics format
                  > since many conlang scripts have features that no ordinary
                  > Unicode-based text display system can handle. :) LaTeX output would be
                  > nice to have, but may not be practical depending on how complex things
                  > get. HTML output should be supported if the script is simple enough to
                  > represent with Unicode.
                  >
                  > Well, these are the things I can think of right now. I'm sure there are
                  > more. :)
                  >
                  > One thing obvious that I didn't include is automatic voice generation
                  > for conlangs. It *would* be very nice to have, but unfortunately, it
                  > appears to be far more complex than it seems at first glance. It's far
                  > from being just a matter of stringing together a series of recorded
                  > clips of phone(me)s. In actual speech, complex interactions between
                  > adjacent phones happen, such that consonants and vowels are actually
                  > only the extreme points of the continuum of speech, and interpolating
                  > between these extrema is far from trivial, as it requires essentially
                  > the simulation of the physics of how the entire vocal apparatus works,
                  > layered on top of a sound synthesis system. Automated voice generation
                  > for specific natlangs like English are an entire field of research in
                  > the industry, much more voice generation for *conlangs*!
                  >
                  >
                  > T
                  >
                  > --
                  > People demand freedom of speech to make up for the freedom of thought which they avoid. -- Soren Aabye Kierkegaard (1813-1855)
                • Christophe Grandsire-Koevoets
                  ... Actually, adding entries in a SIL Toolbox dictionary is very fast and easy, *provided you created an entry template* (which is also very easy: create an
                  Message 8 of 29 , Jul 31 11:28 PM
                  • 0 Attachment
                    On 1 August 2013 02:13, James Kane <kanejam@...> wrote:

                    > Hi
                    >
                    > I haven't read every post quite yet and a lot of them get pretty technical
                    > and go over my head a little. But I must say that Teoh's very extensive
                    > list would be ideal for most conlangers. One thing that I personally would
                    > like is that making new entries in a dictionary should be very very quick.
                    >
                    > This is the reason I use excel or word for my lexicons, even though they
                    > aren't really suited, because making new entries is incredibly quick. I
                    > once tried using Lexique and some of the other SIL softwares but they are
                    > terribly clunky, and some days I might make a few dozen new lexical items
                    > which I don't want to take half an hour just to put them in my lexicon.
                    > Keyboard shortcuts would be very handy.
                    >
                    >
                    Actually, adding entries in a SIL Toolbox dictionary is very fast and easy,
                    *provided you created an entry template* (which is also very easy: create
                    an entry with all the fields you want the entries to have, and then
                    Database > Template..., then click OK :P). Nowadays I just do Ctrl+N and
                    I've got an empty record with all the needed fields already present. You
                    can even have default contents in those fields.

                    Anyway, templating is a functionality that needs to be present in any
                    software we conlangers will use, and that goes further than just lexicon
                    entry templates.


                    > I have another idea which might not be suitable for everyone and would
                    > probably be more useful for conlangs with large lexicons or for people
                    > unfamiliar with a conlang to look up words in it. Anyway, it's sort of
                    > based on how Google Translate works. Each entry in the conlang has a number
                    > of key words or phrases. Each time a word is searched for, any entry with
                    > that word amongst its keywords comes up.
                    >
                    > Each entry also has a usage section and etymology section and derivatives
                    > sections etc. Clicking on any keyword brings up all the entries which have
                    > that keyword in the definition. To make it even more google translatish,
                    > there could be values assigned to each keyword or the keywords arranged in
                    > a hierarchy to show which translation is more common/better, as google
                    > translate does with its corpus statistics.
                    >
                    >
                    All good ideas.
                    --
                    Christophe Grandsire-Koevoets.

                    http://christophoronomicon.blogspot.com/
                    http://www.christophoronomicon.nl/
                  • Arnt Richard Johansen
                    ... I would like a tool to help with non-agglutinative morphology. A framework I use a lot to help make the regularity less obvious is Word and Paradigm
                    Message 9 of 29 , Aug 5, 2013
                    • 0 Attachment
                      On Mon, Jul 29, 2013 at 06:09:20PM -0400, Patrick VanDusen wrote:

                      > So let me ask you, the professionals; what sort of functionality would you
                      > look for in conlanging software? What features would appeal to you?

                      I would like a tool to help with non-agglutinative morphology.

                      A framework I use a lot to help make the regularity less obvious is Word and Paradigm Morphology, as presented in David Peterson's LCC1 talk.
                      Video: http://www.youtube.com/watch?v=-z6lYZzLN-A
                      and slides: http://conference.conlang.org/lcc1/Peterson-6slides.pdf

                      This means that you have a set of rules, and then you apply those rules to a lexical item to make up a paradigm of inflections or derivations.

                      Doing this by hand is rather time-consuming and error-prone, so it would be helpful to be able to automate it somehow.

                      --
                      Arnt Richard Johansen http://arj.nvg.org/
                      Löylyä lisää, ei tunnu missää.
                    Your message has been successfully submitted and would be delivered to recipients shortly.