Loading ...
Sorry, an error occurred while loading the content.

Linguistic Database?

Expand Messages
  • Boris Shapiro
    Aiya! Have you ever thought of creating a linguistic database of Tolkien s languages, at least for Quenya? That would greatly improve scholarly process, and it
    Message 1 of 16 , Jul 20, 2002
    • 0 Attachment
      Aiya!

      Have you ever thought of creating a linguistic database of Tolkien's
      languages, at least for Quenya? That would greatly improve scholarly
      process, and it is especially useful for Tolkien's languages where we have
      an extra vital parameter of the chronology of the evolvement of Quenya.

      If none of you know any existing database suitable for our purposes it
      would be a great idea to create one.


      Namaarie! S.Y., Elenhil Laiquendo


      : masse sii nar i nuunatani · elessar · elessar? :

      [A comprehensive database of all of Tolkien's languages (limiting it to
      Quenya only would be unnecessarily limiting of its utility), at all stages
      of their development, would be a powerful tool indeed. It would, of
      course, require the permission of the Tolkien Estate for its publication.
      Carl]
    • Kai MacTane
      ... Oddly enough, I thought of this idea about 4-6 months ago. I ve since been implementing it, and it s nearly ready for comments and testing. ... The
      Message 2 of 16 , Jul 20, 2002
      • 0 Attachment
        At 7/20/02 10:27 AM , Boris Shapiro wrote:

        >Have you ever thought of creating a linguistic database of Tolkien's
        >languages, at least for Quenya? That would greatly improve scholarly
        >process, and it is especially useful for Tolkien's languages where we have
        >an extra vital parameter of the chronology of the evolvement of Quenya.

        Oddly enough, I thought of this idea about 4-6 months ago. I've since been
        implementing it, and it's nearly ready for comments and testing.

        >[A comprehensive database of all of Tolkien's languages (limiting it to
        >Quenya only would be unnecessarily limiting of its utility), at all stages
        >of their development, would be a powerful tool indeed. It would, of
        >course, require the permission of the Tolkien Estate for its publication.
        >Carl]

        The necessity of TE permission has not escaped me -- not only is it alluded
        to in a couple of FAQ questions in my database system, and in the copyright
        notice at the bottom of every page it serves up, but I've also kept the
        data-set as limited as I can at the moment, loading in just enough
        information for testing purposes and to fill the various categories and
        parts of speech.

        Your suggestion that it should cover all of Tolkien's languages is a good
        one, and one I'll have to ponder. I *think* that expanding this database
        system (which I currently call _Quettahostanie_) to include other languages
        might be fairly easy.

        I'm tidying up a few loose ends, and have a busy weekend ahead of me, but I
        should have a system fairly soon that will enable remote users (i.e., the
        rest of you) to look at the back-end, and even edit database entries if
        desired. ("Fairly soon" means "probably by Monday evening, Pacific-coast-US
        time".) In the meantime, I think the front-end is in good enough condition
        to be seen by the world (or at least by a bunch of _lambengolmor_), with
        the following notes:

        1) This is still something of a work in progress. In particular, I'd
        love to have people's reactions on the documentation, the usability,
        the interface, and whatnot. Is it easy to figure out how to use? Does
        it have sufficient functionality? Can it do everything you think it
        should be able to? (Aside from handling languages besides Quenya, or
        allowing you to edit the contents.)
        2) I have always intended that this should eventually become a collab-
        orative tool, usable by multiple scholars across the world. I am open
        to suggestions on the mechanics of validating editors, maintaining
        updates, and so on.
        3) As described in various places in the system, I am planning to seek
        the Tolkien Estate's permission before loading more of their intel-
        lectual property into the database. Suggestions for approaches to
        take, approaches to avoid, and alterations in the system that might
        make their permission more likely are also quite welcome.
        4) There are undoubtedly cases where I've missed one or more attesta-
        tions, or where there should be a green check mark in the Diachrony
        display. I do not have all of Tolkien's works, and so I only marked
        things that I could be sure of. (This is one reason for eventually
        making it collaborative -- it will allow greater sharing of schol-
        arly knowledge, and greater accuracy than I can provide on my own.)
        Please do not discount Quettahostanie on the basis of its creator's
        lack of material and knowledge! (And feel free to email me point-
        ing out such inaccuracies.)
        5) Since I wanted Quettahostanie to also be usable by the "average"
        Elfling member, and even to be comprehensible by a random Web
        surfer who wanders in, some of the documentation is extremely
        simple. (For example, descriptions of what Quenya is.) The docu-
        mentation was intended for a broad-based population, and is *not*
        what I would have written if I were only targeting members of this
        list! (In other words, if you feel like the docs are "talking down
        to you", those parts are intended for other people, like folks who
        have just seen Peter Jackson's movie and done a Web search on
        "Tolkien".)

        That being said, you can see the Quettahostanie database at:

        http://www.freaknation.com/quenya/

        I will probably not be available to answer questions (or read comments) for
        roughly the next 24 hours -- perhaps more like 36 -- but I will be happy to
        catch up on list mail then. Please use your own judgement on whether to
        send comments to this list (where they can be picked over and usefully
        added to by others), or to my personal inbox (where they won't be
        cluttering up this list). No need to send to both places.

        And thank you all in advance for whatever feedback you can provide.

        --Kai MacTane
        ----------------------------------------------------------------------
        "And the Devil in a black dress watches over,
        My guardian angel walks away..."
        --Sisters of Mercy,
        "Temple of Love"
      • Boris Shapiro
        [Folks: let s not use needless abbreviations, especially where they are not very widely known, and where they aren t spelled out in full at least at their
        Message 3 of 16 , Jul 23, 2002
        • 0 Attachment
          [Folks: let's not use needless abbreviations, especially where they are
          not very widely known, and where they aren't spelled out in full at least
          at their first usage in a post, and especially not in subject lines.
          Thanks, Carl.]

          Aiya!

          So, about LDB [linguistic databases]. I suppose we all want to make
          clear what features do we want it to possess. Probably the best way to
          do it is to analyze a Quenya phrase providing all the linguistic
          information we need this database to store.

          Let's use "Elen siila luumenn' omentielvo". I confess I may be short
          of knowledge to undertake an all-encompassing analysis of it.
          Perhaps the venerable lambengolmor would give us a valuable lesson?


          Namaarie! S.Y., Elenhil Laiquendo


          : eressea, eldamar i laa fiirimo tuvitas pole :


          [Engaging in the close analysis of a phrase that Boris is inviting is
          great. However, I'm of mixed mind about whether this forum is the right
          one for the broader topic of laying the groundwork for a proposed
          database; but I'll allow it for now at least, so long as it doesn't drift
          off topic. (It's not that the topic itself is necessarily unsuitable for a
          linguistics mailing list, I just fear that it will either overwhelm the
          list, or quickly drift off topic, or both.) I would ask that everyone
          wanting to participate in a broader discussion of linguistic database
          issues please prefix all of your posts with [LDB] (yes, I know what I just
          said above, and I appreciate the irony), so that those not interested can
          easily avoid them. Thanks, Carl]
        • Kai MacTane
          ... As you ve probably noticed in Quettahostanie (QH), my approach is to store individual elements in the database rather than entire phrases. Of course, the
          Message 4 of 16 , Jul 23, 2002
          • 0 Attachment
            At 7/23/02 10:39 AM , Boris Shapiro wrote:

            > So, about LDB [linguistic databases]. I suppose we all want to make
            > clear what features do we want it to possess. Probably the best way to
            > do it is to analyze a Quenya phrase providing all the linguistic
            > information we need this database to store.

            As you've probably noticed in Quettahostanie (QH), my approach is to store
            individual elements in the database rather than entire phrases.

            Of course, the structure of QH already encodes some of my ideas about
            what's important to track and what's not -- I spent some time ruminating
            about the database architecture before implementing it, thinking to myself
            "It would be good if it kept track of *this*... oh, and *that* would be
            useful, too."

            Nonetheless, I'll see if I can throw out most of that (as if it were a
            pre-conception), and try to analyze and answer from a fresh start --
            thinking like a scholar rather than a database architect.

            > Let's use "Elen siila luumenn' omentielvo". I confess I may be short
            > of knowledge to undertake an all-encompassing analysis of it.
            > Perhaps the venerable lambengolmor would give us a valuable lesson?

            I'm only a would-be or wanna-be _lambengolmo_, but here's my analysis of
            the phrase.

            To start with, a quick interlinear translation:

            Elen síla lumenn' omentielvo.
            star shine (contin. sg.) hour (allat.) two+meeting+3sg poss. (gen.)
            [A] star is shining [the] hour-onto meeting-ours-of

            Further notes, word-by-word:

            [A]: Not necessary (or possible) in Quenya; no indefinite article exists in
            Quenya. Necessary in translation into English to conform with English
            grammar, which requires articles.

            _Elen_: "star", from the root EL-. This is related to _Elda_ and _ele/ela_
            (see the Silmarillion appendix entry; the original first utterance of the
            Elves is given there as _ele_ but in Q&E as _ela_).

            [For a truly complete analysis, I'd add a note on the first appearance of
            _elen_ in the Quenya Corpus. I don't feel like looking it up now, since I
            get the impression that the style of this analysis is more important than
            the specifics for each word, for purposes of this discussion. I'll throw in
            similar notes about where there should be more complete references as this
            analysis continues.]

            The word is expressed in the nominative singular.

            _síla_: continuative singular of _sil-_ "to shine". The continuative of
            primary verbs in Quenya is apparently formed by lengthening the stem-vowel
            (except before consonant clusters) and affixing _-a_ (see references to
            other examples of continuative verbs in the Corpus). If no pronominal
            suffix is appended, the verb is apparently assumed to be 3sg.

            Hence, a star "is shining" (currently, at this moment; it may not be the
            star's usual activity).

            [the]: Possible but not necessary in Quenya -- could be (or could have
            been) represented by the (definite) article _i_, attested in many other
            locations (_Namárie_, _Markirya_, big long list here...).

            _lumenn'_: Elided form of _lumenna_, with the final vowel dropped to avoid
            conflict with the initial vowel of the following word. (Cite other examples
            of this in the Corpus.) _lumenna_ is the allative declension of _lume_
            "hour, time", and represents that the star (or, notionally, its light) is
            shining *toward* (or "at, into, or onto") the hour. An English translation
            might be "on the hour", "onto the hour", or "upon the hour". (Cite other
            attestations of both the allative case declension and _lume_.)

            _omentielvo_: First, note that this word appears in LR 1st Ed. as
            _omentielmo_ (and in some American editions as the typo _omentilmo_). The
            glossed English meaning does not change from one edition to the next.

            This word consists of the prefix _o-_, the base word _mentie_ "meeting",
            and a 3rd-person plural possessive suffix _-lva_, declined in the genitive
            case. Alternatively, you could parse it as the prefix _o-_, the base word
            _mentie_ "meeting", a third-person plural possessive particle _-lv-_, and
            the genitive ending _-o_.

            The _o-_ prefix denotes a confluence of two things (contrasting with the
            prefix _yo-_, denoting a confluence of three or more things). (See Quendi &
            Eldar.)

            The possessive _-lva_ or _-lv-_ apparently denotes the inclusive "we", in
            which the person addressed is included in the group referred to. There is
            some controversy over whether this might, at some time, have been the
            marker for the dual "we", denoting a group consisting *solely* of the
            speaker and the person addressed. (Insert references to various discussions
            of this -- it could be a quite long list.)

            The genitive case is used to associate the meeting with the
            previously-referenced hour: "the hour of our meeting". (Also insert
            references on the genitive case, including discussions of when to use it
            versus the possessive/compositive case, and attestations of other genitive
            declensions in the Corpus, such as _rámar aldaron_.)

            The entire phrase dates to 19?? (when was Tolkien actually *writing* Book
            I? the early '40s? date could be ascertained with reference to _Letters_,
            which I don't own), and was maintained, with the exception of the change
            from _omentielmo_ to _omentielvo_, in the re-publication of LotR in 1965.

            Whew.

            There may well be other aspects of this phrase that would be valuable to
            note in an analysis; I welcome other people's comments, both on the
            analysis itself and on any bearing it might have on a potential database
            structure (or on the already-existing/proposed structure of Quettahostanie).

            >[Engaging in the close analysis of a phrase that Boris is inviting is
            >great.

            It's an interesting exercise -- it left me flexing slightly different
            muscles than I've found myself using in entering the sample data in
            Quettahostanie lately. (Indeed, the bits where I didn't bother to look up
            various references, but instead simply wrote "insert reference to
            such-and-so", are partly because looking up references in my inadequate
            Tolkien library is an activity that I *have* been doing lately, and I'm a
            bit tired of it!)

            >However, I'm of mixed mind about whether this forum is the right
            >one for the broader topic of laying the groundwork for a proposed
            >database; but I'll allow it for now at least, so long as it doesn't drift
            >off topic. (It's not that the topic itself is necessarily unsuitable for a
            >linguistics mailing list, I just fear that it will either overwhelm the
            >list, or quickly drift off topic, or both.)

            An understandable concern. I do hope that Quettahostanie can be a useful
            tool on a scholarly level, and that this discussion will therefore prove
            beneficial and topical for this group. But, since I'm a strong candidate
            for "person here who's most likely to want to overdo the database
            discussion", I'll try to keep myself in check on that score.

            --Kai MacTane
            ----------------------------------------------------------------------
            "Soft and only you, lost and only you,
            Strange as angels."
            --The Cure,
            "Just Like Heaven"
          • Fredrik
            ... Using the definite article may be ungrammatical in Quenya when the noun phrase is already made definite by a genitival qualifier. So _lambe Eldaron_
            Message 5 of 16 , Jul 23, 2002
            • 0 Attachment
              >[the]: Possible but not necessary in Quenya -- could be (or could have
              >been) represented by the (definite) article _i_, attested in many other
              >locations (_Namárie_, _Markirya_, big long list here...).

              Using the definite article may be ungrammatical in Quenya when the noun
              phrase is already made definite by a genitival qualifier. So _lambe
              Eldaron_ translates as 'THE language of the Eldar', and 'THE splendour of
              Orome' is _alkar Oromeo_, without the definite article in Quenya (WJ:368f).
              (In my native Swedish, "alvernas språk" means 'the language of the Elves',
              while **"alvernas språket", using the definite article, would be
              impossible.)

              It is interesting to note the use of 'those who' in the literal translation
              of _i arani Eldaron_ (WJ:369): 'those among the Eldar who were kings'; and
              to compare the constructions _mi nínaron_ (VT43:31), _mi wenderon_
              (VT44:18), and the ablative sense of _Oiolosseo_ 'from Oiolosse' in
              Galadriel's Lament.

              Perhaps _i_ has a determinative sense in _i yave mónalyo Yésus_ (VT43:28)
              'the fruit of thy womb: Jesus' (or 'that fruit of thy womb that is Jesus');
              while "Blessed is the fruit of thy womb." would be, simply, _Aistana yave
              mónalyo_? (In which case _aistana i yave mónalyo_, unless demonstrative
              [someone pointing to a certain child], would sound truncated, hanging in
              mid-air as it were: blessed is the fruit of thy womb that...)

              /Fredrik
            • Boris Shapiro
              Aiya! Wednesday, July 24, 2002, 12:27:58 AM, Kai MacTane wrote: First, I have to say that I didn t have the possibility of seeing QH by myself, so I ll rely on
              Message 6 of 16 , Jul 24, 2002
              • 0 Attachment
                Aiya!

                Wednesday, July 24, 2002, 12:27:58 AM, Kai MacTane wrote:

                First, I have to say that I didn't have the possibility of seeing QH
                by myself, so I'll rely on your answers and patience :)

                >> So, about LDB [linguistic databases]. I suppose we all want to make
                >> clear what features do we want it to possess. Probably the best way
                >> to do it is to analyze a Quenya phrase providing all the linguistic
                >> information we need this database to store.
                >>
                KM> As you've probably noticed in Quettahostanie (QH), my approach is to store
                KM> individual elements in the database rather than entire phrases.

                Does it make sense? But the question should be what do you regard as
                an individual element and are they stored absolutely independently of
                their context?

                I suppose I lack proper vocabulary and knowledge in programming, but
                in my view the desired LDB [linguistic database] (or should we call it
                _ELDA_ for "Elvish Linguistic DAtabase"? :) should be object-oriented,
                and have a nested structure so that there are multiple levels of
                objects like a nested doll. In my view an object is a linguistically
                important element in of a given text stored in LDB which possesses the
                required linguistic description. But there are different types of
                objects: two words could be two individual lexical objects, but at the
                same time they could be a sole syntactical object! And a sentence
                could itself be a clause, a part of a complex sentense, thus being a
                syntactical object, too! And all these objects viewed on different
                levels should possess different descriptions.

                I'd like to know how does your QH deal with such information

                KM> Of course, the structure of QH already encodes some of my ideas
                KM> about what's important to track and what's not -- I spent some
                KM> time ruminating about the database architecture before
                KM> implementing it, thinking to myself "It would be good if it kept
                KM> track of *this*... oh, and *that* would be useful, too."

                I know that the problem of creating an optimized DB is how to design
                an optimized architecture before the actual programming. That's why I
                regard the proposed analysis (intending to make out the desired
                structure of the linguistic data to be included in ELDA) to be of
                great importance.


                Namaarie! S.Y., Elenhil Laiquendo


                : raavannar vantar · tuile loctuva : i yulma carne miru quanta peltuvar :
              • Rich Alderson
                ... The following call for participation came out on the Linguist mailing list (issue 13.1964) on Monday; the statement of motivation seems appropriate at this
                Message 7 of 16 , Jul 24, 2002
                • 0 Attachment
                  On Wednesday, July 24, 2002, Boris Shapiro wrote:

                  > In my view an object is a linguistically important element in of a given text
                  > stored in LDB which possesses the required linguistic description. But there
                  > are different types of objects: two words could be two individual lexical
                  > objects, but at the same time they could be a sole syntactical object! And a
                  > sentence could itself be a clause, a part of a complex sentense, thus being a
                  > syntactical object, too! And all these objects viewed on different levels
                  > should possess different descriptions.

                  The following call for participation came out on the Linguist mailing list
                  (issue 13.1964) on Monday; the statement of motivation seems appropriate at
                  this stage of the discussion of Kai MacTane's database:

                  Date: Mon, 22 Jul 2002 13:54:46 +0300
                  From: "Kiril Simov" <kivs@...>
                  Subject: Treebanks and Linguistic Theories 2002 - Call for Participation

                  Treebanks and Linguistic Theories 2002
                  20th and 21st September 2002, Sozopol, Bulgaria
                  http://www.BulTreeBank.org/TLT2002.html

                  Call for Participation

                  Workshop motivation and aims:

                  Treebanks are a language resource that provides annotations of natural
                  languages at various levels of structure: at the word level, the phrase
                  level, the sentence level, and sometimes also at the level of function-
                  argument structure. Treebanks have become crucially important for the
                  development of data-driven approaches to natural language processing, human
                  language technologies, grammar extraction and linguistic research in
                  general. There are a number of on-going projects on compilation of
                  representative treebanks for languages that still lack them (Spanish,
                  Bulgarian, Portuguese,Turkish) and a number of on-going projects on
                  compilation of treebanks for specific purposes for languages that already
                  have them (English).

                  The practices of building syntactically processed corpora have proved that
                  aiming at more detailed description of the data becomes more and more
                  theory-dependent (Prague Dependency Treebank and other dependency-based
                  treebanks as the Italian treebank (TUT) or the Turkish treebank (METU);
                  Verbmobil HPSG Treebanks, Polish HPSG Treebank, Bulgarian HPSG-based
                  Treebank etc.). Therefore the development of treebanks and formal
                  linguistic theories need to be more tightly connected in order to ensure
                  the necessary information flow between them.
                • Beregond. Anders Stenström
                  ... The general idea of having collocutions registered in the database seems sound. But as Rich Alderson s reply indicated, this could easily become too
                  Message 8 of 16 , Jul 24, 2002
                  • 0 Attachment
                    Boris Shapiro wrote:

                    > two words could be two individual lexical objects, but at the
                    > same time they could be a sole syntactical object! And a sentence
                    > could itself be a clause, a part of a complex sentense, thus being a
                    > syntactical object, too! And all these objects viewed on different
                    > levels should possess different descriptions.

                    The general idea of having collocutions registered in the
                    database seems sound. But as Rich Alderson's reply indicated,
                    this could easily become too theory-dependent to look quite
                    good to me. It seems to me that the best idea would be to register
                    all 'contexts', from two-word constructions like _Minas Tirith_ up
                    to long texts like "Namárie" (with full references, or 'attestation
                    details' for each), and then link words to all contexts they occur in.
                    The syntactical analysis can be left to fora outside the database.

                    Meneg suilaid,

                    Beregond
                  • Boris Shapiro
                    Aiya! In this letter I mostly address Kai because he is the author of QH and of the quoted analysis, but everyone is invited, especially to correct my errors
                    Message 9 of 16 , Jul 24, 2002
                    • 0 Attachment
                      Aiya!

                      In this letter I mostly address Kai because he is the author of QH and
                      of the quoted analysis, but everyone is invited, especially to correct
                      my errors and add new aspects to analysis.

                      First, let me make myself clear: the purpose of this analysis is to
                      use an example to collect all the linguistic description we need from
                      a piece of text to: 1) compare it to any available LDB's (like Kai's
                      "Quettahostanie") abilities to determine its applicability to our task;
                      2) to help to create an outline of the architecture of linguistic data
                      to be included in a hypothetical ELDA (Elvish Linguistic DAtabase), if
                      it is to be created.

                      For that purpose we don't need to involve in details of the current
                      phrase. What we need is an outline of what data do we need to store
                      describing a phrase.

                      Wednesday, July 24, 2002, 12:27:58 AM, Kai MacTane wrote:

                      >> Let's use "Elen siila luumenn' omentielvo". I confess I may be short
                      >> of knowledge to undertake an all-encompassing analysis of it.
                      >> Perhaps the venerable lambengolmor would give us a valuable lesson?
                      >>
                      KM> I'm only a would-be or wanna-be _lambengolmo_, but here's my analysis
                      KM> of the phrase.
                      ...
                      KM> [A]: Not necessary (or possible) in Quenya; no indefinite article
                      KM> exists in Quenya. Necessary in translation into English to conform
                      KM> with English grammar, which requires articles.

                      That is why any noun as a syntactic object in ELDA should have as one
                      of its descriptions the indication of its definite/indefinite status,
                      linked to the word it is defined by (not necessarily and article), and
                      Q _i_ (when used as the article) should be linked to the noun it
                      describes; the same applies to virtually any word that defines
                      another.

                      KM> _Elen_: "star", from the root EL-. This is related to _Elda_ and
                      KM> _ele/ela_ (see the Silmarillion appendix entry; the original first
                      KM> utterance of the Elves is given there as _ele_ but in Q&E as _ela_).

                      KM> [For a truly complete analysis, I'd add a note on the first appearance
                      KM> of _elen_ in the Quenya Corpus. I don't feel like looking it up now,
                      KM> since I get the impression that the style of this analysis is more
                      KM> important than the specifics for each word, for purposes of this
                      KM> discussion. I'll throw in similar notes about where there should be
                      KM> more complete references as this analysis continues.]

                      That's why any object (presently, a word-object) should not be stored
                      independently from its context (on which he obviously does depend),
                      and share a date-description with the text-object it is included in.
                      Thus one should be able to search for every case of the word "elen"
                      used with chronology and other contextual conditions for search.

                      Next, a lexical word-object should definitely have a vocabulary
                      description for referential purposes. That was outlined in your lines
                      three paragraphs above. Probably we'll need a dictionary module.

                      KM> The word is expressed in the nominative singular.

                      The case is a grammar category of a word with shows its syntactical
                      relations to other words in a phrase. That reveals a very important
                      element in the structure of description: the syntactical one. For
                      scholarly purposes it is not enough to indicate the case of a noun. It
                      should be presented in a syntactical context.

                      So first comes the sentence itself as a syntactical object. It has
                      certain characteristics to be described with. Like it is being a
                      declarative one, a simple one, etc.

                      Next come the members of the sentence. They too have their own
                      descriptions, like _elen_ being the subject of the phrase. It is its
                      role as the subject that places this noun in the nominative case.

                      A side note: this matter brings us one level deeper - to
                      morphological objects, like the zero ending in _Elen_ which shows
                      it being in the nominative. Such elements have their own
                      descriptions.

                      Next the members of the sentence are grouped in various syntagmas.
                      Each syntagma have its own description, like "elen siila" being an
                      external syntagma, and a predicative one. So depending on the syntagma
                      we are analyzing its members should be described as the definitive or
                      the defined element. The members of a syntagma can be related to each
                      other differently. For example, "siila luumenn[a]" has - well, I don't
                      know how it is called in English, in Russian it is "upravlenie", so in
                      English it could be "control" - a controlling relation. So syntagma
                      member-objects should be linked to their counterparts with which they
                      are related.

                      A member of a sentence usually comprises several syntagmas in which it
                      plays different parts. For example, "siila luumenn[a]" is an objective
                      syntagma, where the object "luumenn[a]" defines the verbal part which
                      is definable. While "luumenn[a] omentielvo" is an objective syntagma,
                      too, but here "luumenn[a]" is defined by "omentielvo". So members of
                      syntagmas define or are defined in several syntagmas, and only the
                      subject of a sentence comprises a single syntagma in which it is the
                      definitive. That's why it is called absolute definitive. Here "elen"
                      is not defined by anything.

                      And so on. I hope that gives you some idea of the nested structure we
                      need. Objects in objects in various hypostases with different
                      descriptions.

                      Kai, forgive me for skipping most of your own analysis, I've seen that
                      in some aspects I simply repeat your one, but I've tried to present it
                      in a more systematic and complex way.


                      Namaarie! S.Y., Elenhil Laiquendo [Boris Shapiro]


                      : linde nar i oomar tolesse vanwa yaamala :

                      [In addition to all the _theoretical_ analytical information of the sort
                      that Boris outlines above, there should be a means of distinguishing
                      Tolkien's own statements about such matters from those that are non-
                      Tolkienian conjecture (however clever and/or well-informed). Carl]
                    • Boris Shapiro
                      Aiya! ... Vice-versa: the subject of a sentence is the member of a syntagma that does not define anything (in any of syntagmata it is include in) and therefore
                      Message 10 of 16 , Jul 25, 2002
                      • 0 Attachment
                        Aiya!

                        Oops, an error of mine:

                        > So members of syntagmas define or are defined in several syntagmas,
                        > and only the subject of a sentence comprises a single syntagma in
                        > which it is the definitive. That's why it is called absolute
                        > definitive. Here "elen" is not defined by anything.

                        Vice-versa: the subject of a sentence is the member of a syntagma that
                        does not define anything (in any of syntagmata it is include in) and
                        therefore it is called absolute defined (or -able, I'm short of
                        English terminology).

                        Namaarie! S.Y., Elenhil Laiquendo

                        : linde nar i oomar tolesse vanwa yaamala :
                      • Fredrik
                        Are we talking about a lexical database, or an annotated corpus, or what? I m not sure that we need or want to encode the syntactical structure of sentences or
                        Message 11 of 16 , Jul 25, 2002
                        • 0 Attachment
                          Are we talking about a lexical database, or an annotated corpus, or what?
                          I'm not sure that we need or want to encode the syntactical structure of
                          sentences or clauses in a database, since they are not given things. In
                          many cases the structural analyses are precisely what we're after: Tolkien
                          did not provide them. There are bound to be disagreements on how to parse a
                          certain sentence; often, two or more analyses are equally possible. Whose
                          analysis should be in the database? I think that the best tool in this case
                          would be one that helps us find all the data we need, telling us exactly
                          where in the texts they are, and where any other (possible) occurrences of
                          the word/ morpheme are, so that we can go there and see for ourselves.

                          /Fredrik


                          [I just want to voice my strong agreement with what Fredrik has said
                          here. Simply recording the occurrence of every "foreign language"
                          element in Tolkien's writings will be an enormous undertaking. If
                          analysis is to be incorporated into such a compilation at all, it is
                          best left until after the compilation is complete. Having the compilation
                          alone, if fully and properly indexed to the corpus, will be enormously
                          useful. So long as the database is designed with extensibility and
                          expansion in mind, analytical information can always be added later. Carl]
                        • Boris Shapiro
                          Aiya! Thursday, July 25, 2002, 1:03:55 AM, Beregond. Anders Stenström wrote: BAS The general idea of having collocutions registered in the BAS database
                          Message 12 of 16 , Jul 26, 2002
                          • 0 Attachment
                            Aiya!

                            Thursday, July 25, 2002, 1:03:55 AM, "Beregond. Anders Stenström" wrote:

                            BAS> The general idea of having collocutions registered in the
                            BAS> database seems sound. But as Rich Alderson's reply indicated,
                            BAS> this could easily become too theory-dependent to look quite good
                            BAS> to me.
                            But the problem of theory dependence seem to me a problem for
                            real-world language treebanks only - when there are multiple treebanks
                            that need to cooperate but are having problems with that because of
                            different linguistic theories used in their architecture.

                            Do you think ELDA would need to be connected with other LDBs?

                            BAS> It seems to me that the best idea would be to register all
                            BAS> 'contexts', from two-word constructions like _Minas Tirith_ up to
                            BAS> long texts like "Namárie" (with full references, or 'attestation
                            BAS> details' for each), and then link words to all contexts they
                            BAS> occur in. The syntactical analysis can be left to fora outside
                            BAS> the database.

                            For me that seems to be a regrettable way of development. That
                            abolishes every use (every extended search query) that I've imagined.
                            What is left then? Just basic number/gender/case descriptions? Is this
                            price good enough, and for what?

                            Thursday, July 25, 2002, 2:11:22 PM, Fredrik wrote:

                            F> I'm not sure that we need or want to encode the syntactical
                            F> structure of sentences or clauses in a database, since they are not
                            F> given things. In many cases the structural analyses are precisely
                            F> what we're after: Tolkien did not provide them. There are bound to
                            F> be disagreements on how to parse a certain sentence; often, two or
                            F> more analyses are equally possible. Whose analysis should be in the
                            F> database?

                            Carl wrote:

                            C> [I just want to voice my strong agreement with what Fredrik has
                            C> said here. Simply recording the occurrence of every "foreign
                            C> language" element in Tolkien's writings will be an enormous
                            C> undertaking. If analysis is to be incorporated into such a
                            C> compilation at all, it is best left until after the compilation is
                            C> complete. Having the compilation alone, if fully and properly
                            C> indexed to the corpus, will be enormously useful. So long as the
                            C> database is designed with extensibility and expansion in mind,
                            C> analytical information can always be added later. Carl]

                            There is one vital aspect of planning the database. As far as I know,
                            the only way to create an optimized database is to thoroughly design
                            its architecture from the very beginning, otherwise adding more and
                            more elements to it will greatly decrease its performance in speed and
                            size. I'm afraid trying to extend an indexed corpus database to a
                            full-scale LDB would be a failure.

                            The problem of work load could be solved by sharing the tasks,
                            provided that there is a unitary analysis scheme. Such a scheme is
                            to be implemented in the programme/interface itself: imagine a
                            template with given description variants. For example, a user enters
                            "Elen siila luumenn' omentielvo" and starts the analysis "wizard". On
                            the lexical analysis step, describing each word he would have to
                            choose between predefined fields, like noun/verb/adjective/adverb etc,
                            sg/pl, m/fem, nom/acc/gen/poss/dat/loc/abl/all/inst/resp, and so on.
                            Provided a comprehensive universal and unitary scheme entering the
                            analysis results would be greatly eased.

                            Namaarie! S.Y., Elenhil Laiquendo [Boris Shapiro]

                            : avartuvan i tauri ni ontar : an luumenya tyeela ar loanyar sintar :
                          • Kai MacTane
                            ... Sorry I ve taken so long. Do you have email but not Web access? Or do you not have a graphical browser? ... Elements are things like parma or -uva- or
                            Message 13 of 16 , Jul 26, 2002
                            • 0 Attachment
                              At 7/24/02 10:49 AM , Boris Shapiro wrote:

                              >First, I have to say that I didn't have the possibility of seeing QH
                              >by myself, so I'll rely on your answers and patience :)

                              Sorry I've taken so long. Do you have email but not Web access? Or do you
                              not have a graphical browser?

                              >Does it make sense? But the question should be what do you regard as
                              >an individual element and are they stored absolutely independently of
                              >their context?

                              Elements are things like "parma" or "-uva-" or "-llo". OTOH, "A ná X lá B"
                              is also listed as one single element. They're generally stored
                              context-independent, though the attestations field lists all places where
                              the element is attested in use, so that people can look up the various
                              contexts in which Tolkien used it.

                              >I suppose I lack proper vocabulary and knowledge in programming, but
                              >in my view the desired LDB [linguistic database] (or should we call it
                              >_ELDA_ for "Elvish Linguistic DAtabase"? :) should be object-oriented,
                              >and have a nested structure so that there are multiple levels of
                              >objects like a nested doll. In my view an object is a linguistically
                              >important element in of a given text stored in LDB which possesses the
                              >required linguistic description. But there are different types of
                              >objects: two words could be two individual lexical objects, but at the
                              >same time they could be a sole syntactical object! And a sentence
                              >could itself be a clause, a part of a complex sentense, thus being a
                              >syntactical object, too! And all these objects viewed on different
                              >levels should possess different descriptions.
                              >
                              >I'd like to know how does your QH deal with such information

                              It doesn't. It stores things pretty much only at the morphological level,
                              and leaves it to humans to do higher-level stuff.

                              The sort of multi-level analysis you suggest, and which also seems to be
                              suggested by Rich Alderson's mention of treebanks, might be valuable and
                              useful, but it is certainly beyond the level of something I could write.

                              --Kai MacTane
                              ----------------------------------------------------------------------
                              "But every night I burn,/Every night I call your name.
                              Every night I burn,/Every night I fall again..."
                              --The Cure,
                              "Burn"
                            • Kai MacTane
                              ... I suppose we could add a category somewhere for phrases . I agree that sytactic analysis should be left to the humans, not machines -- I m honestly not
                              Message 14 of 16 , Jul 26, 2002
                              • 0 Attachment
                                At 7/24/02 02:03 PM , Beregond. Anders Stenström wrote:

                                > The general idea of having collocutions registered in the
                                >database seems sound. But as Rich Alderson's reply indicated,
                                >this could easily become too theory-dependent to look quite
                                >good to me. It seems to me that the best idea would be to register
                                >all 'contexts', from two-word constructions like _Minas Tirith_ up
                                >to long texts like "Namárie" (with full references, or 'attestation
                                >details' for each), and then link words to all contexts they occur in.
                                >The syntactical analysis can be left to fora outside the database.

                                I suppose we could add a category somewhere for "phrases". I agree that
                                sytactic analysis should be left to the humans, not machines -- I'm
                                honestly not sure they can handle it at all yet; I know I personally can't
                                make them do it. (Consider the current state of Babelfish, which has had
                                years of research and the efforts of a large number of people poured into
                                it. It can give you the general idea of what something means, but it's
                                painfully obvious that it's not about to put professional translators out
                                of business any time soon, *especially* regarding poetic and artistic works.)

                                --Kai MacTane
                                ----------------------------------------------------------------------
                                "Deadly angels for reality and passion..."
                                --Shriekback,
                                "Gunning for the
                                Buddha"
                              • Kai MacTane
                                ... Interesting point. Though I think this means that nearly any noun in ELDA would be entered at least twice: once in definite form, and then again in
                                Message 15 of 16 , Jul 26, 2002
                                • 0 Attachment
                                  At 7/24/02 09:59 PM , Boris Shapiro wrote:

                                  >KM> [A]: Not necessary (or possible) in Quenya; no indefinite article
                                  >KM> exists in Quenya. Necessary in translation into English to conform
                                  >KM> with English grammar, which requires articles.
                                  >
                                  >That is why any noun as a syntactic object in ELDA should have as one
                                  >of its descriptions the indication of its definite/indefinite status,
                                  >linked to the word it is defined by (not necessarily and article), and
                                  >Q _i_ (when used as the article) should be linked to the noun it
                                  >describes; the same applies to virtually any word that defines
                                  >another.

                                  Interesting point. Though I think this means that nearly any noun in ELDA
                                  would be entered at least twice: once in definite form, and then again in
                                  indefinite form. (After all, most nouns can be used both definitely and
                                  indefinitely.)

                                  >That's why any object (presently, a word-object) should not be stored
                                  >independently from its context (on which he obviously does depend),
                                  >and share a date-description with the text-object it is included in.
                                  >Thus one should be able to search for every case of the word "elen"
                                  >used with chronology and other contextual conditions for search.

                                  Ouch! While I agree that a context-dependent database would be an
                                  interesting and probably very useful thing, I must admit I'm a bit confused
                                  about how one would use it. Would searches be things like: "_elen_, where
                                  used as subject (not object) and only where indefinite", and so on? (I can
                                  sort of see how that search should at least return "_elen síla lumenn'
                                  omentielvo_", while not returning "_Aiya Earendil elenion ancalima_".)

                                  At the moment, QH's means of dealing with context is simply to provide
                                  references to all attested uses of the element in the "Attestations" field.

                                  >Next, a lexical word-object should definitely have a vocabulary
                                  >description for referential purposes. That was outlined in your lines
                                  >three paragraphs above. Probably we'll need a dictionary module.

                                  Which, to figure out homonyms, will need to be able to carry out some
                                  actual syntactic analysis. (Which you do explicitly call for elsewhere in
                                  your post.) Unfortunately, I'm afraid I don't know how to get software to
                                  do that, and I'm especially wary of the concept of getting software to be
                                  able to carry out accurate syntactic analysis on poetic material.

                                  >And so on. I hope that gives you some idea of the nested structure we
                                  >need. Objects in objects in various hypostases with different
                                  >descriptions.

                                  It does give me some idea of it, yes. I think that what you propose is an
                                  impressive and worthwhile project, but it is one which is utterly beyond my
                                  abilities. I'm sorry.

                                  >Kai, forgive me for skipping most of your own analysis, I've seen that
                                  >in some aspects I simply repeat your one, but I've tried to present it
                                  >in a more systematic and complex way.

                                  No problem there; it was, after all, just an example analysis. I think it
                                  served its purpose, and you did right to skip large chunks of it.

                                  --Kai MacTane
                                  ----------------------------------------------------------------------
                                  "Then, when they spill the demon seed
                                  Turn and face into the wind.
                                  All along you still believed...
                                  Believed you were immune."
                                  --Thomas Dolby,
                                  "The Flat Earth"
                                • Kai MacTane
                                  ... What sorts of search queries do you envision? Can you give me some examples? --Kai MacTane ... In another life I see you/As an angel flying high, And the
                                  Message 16 of 16 , Jul 26, 2002
                                  • 0 Attachment
                                    At 7/26/02 01:09 AM , Boris Shapiro wrote:

                                    >For me that seems to be a regrettable way of development. That
                                    >abolishes every use (every extended search query) that I've imagined.
                                    >What is left then? Just basic number/gender/case descriptions? Is this
                                    >price good enough, and for what?

                                    What sorts of search queries do you envision? Can you give me some examples?

                                    --Kai MacTane
                                    ----------------------------------------------------------------------
                                    "In another life I see you/As an angel flying high,
                                    And the hands of time will free you/You will cast your chains aside,
                                    And the dawn will come and kiss away
                                    Every tear that's ever fallen from your eyes...
                                    --Concrete Blonde,
                                    "Caroline"
                                  Your message has been successfully submitted and would be delivered to recipients shortly.