Loading ...
Sorry, an error occurred while loading the content.
 

Re: [Lambengolmor] [LDB] Elements or phrases?

Expand Messages
  • Beregond. Anders Stenström
    ... The general idea of having collocutions registered in the database seems sound. But as Rich Alderson s reply indicated, this could easily become too
    Message 1 of 16 , Jul 24 2:03 PM
      Boris Shapiro wrote:

      > two words could be two individual lexical objects, but at the
      > same time they could be a sole syntactical object! And a sentence
      > could itself be a clause, a part of a complex sentense, thus being a
      > syntactical object, too! And all these objects viewed on different
      > levels should possess different descriptions.

      The general idea of having collocutions registered in the
      database seems sound. But as Rich Alderson's reply indicated,
      this could easily become too theory-dependent to look quite
      good to me. It seems to me that the best idea would be to register
      all 'contexts', from two-word constructions like _Minas Tirith_ up
      to long texts like "Namárie" (with full references, or 'attestation
      details' for each), and then link words to all contexts they occur in.
      The syntactical analysis can be left to fora outside the database.

      Meneg suilaid,

      Beregond
    • Boris Shapiro
      Aiya! In this letter I mostly address Kai because he is the author of QH and of the quoted analysis, but everyone is invited, especially to correct my errors
      Message 2 of 16 , Jul 24 9:59 PM
        Aiya!

        In this letter I mostly address Kai because he is the author of QH and
        of the quoted analysis, but everyone is invited, especially to correct
        my errors and add new aspects to analysis.

        First, let me make myself clear: the purpose of this analysis is to
        use an example to collect all the linguistic description we need from
        a piece of text to: 1) compare it to any available LDB's (like Kai's
        "Quettahostanie") abilities to determine its applicability to our task;
        2) to help to create an outline of the architecture of linguistic data
        to be included in a hypothetical ELDA (Elvish Linguistic DAtabase), if
        it is to be created.

        For that purpose we don't need to involve in details of the current
        phrase. What we need is an outline of what data do we need to store
        describing a phrase.

        Wednesday, July 24, 2002, 12:27:58 AM, Kai MacTane wrote:

        >> Let's use "Elen siila luumenn' omentielvo". I confess I may be short
        >> of knowledge to undertake an all-encompassing analysis of it.
        >> Perhaps the venerable lambengolmor would give us a valuable lesson?
        >>
        KM> I'm only a would-be or wanna-be _lambengolmo_, but here's my analysis
        KM> of the phrase.
        ...
        KM> [A]: Not necessary (or possible) in Quenya; no indefinite article
        KM> exists in Quenya. Necessary in translation into English to conform
        KM> with English grammar, which requires articles.

        That is why any noun as a syntactic object in ELDA should have as one
        of its descriptions the indication of its definite/indefinite status,
        linked to the word it is defined by (not necessarily and article), and
        Q _i_ (when used as the article) should be linked to the noun it
        describes; the same applies to virtually any word that defines
        another.

        KM> _Elen_: "star", from the root EL-. This is related to _Elda_ and
        KM> _ele/ela_ (see the Silmarillion appendix entry; the original first
        KM> utterance of the Elves is given there as _ele_ but in Q&E as _ela_).

        KM> [For a truly complete analysis, I'd add a note on the first appearance
        KM> of _elen_ in the Quenya Corpus. I don't feel like looking it up now,
        KM> since I get the impression that the style of this analysis is more
        KM> important than the specifics for each word, for purposes of this
        KM> discussion. I'll throw in similar notes about where there should be
        KM> more complete references as this analysis continues.]

        That's why any object (presently, a word-object) should not be stored
        independently from its context (on which he obviously does depend),
        and share a date-description with the text-object it is included in.
        Thus one should be able to search for every case of the word "elen"
        used with chronology and other contextual conditions for search.

        Next, a lexical word-object should definitely have a vocabulary
        description for referential purposes. That was outlined in your lines
        three paragraphs above. Probably we'll need a dictionary module.

        KM> The word is expressed in the nominative singular.

        The case is a grammar category of a word with shows its syntactical
        relations to other words in a phrase. That reveals a very important
        element in the structure of description: the syntactical one. For
        scholarly purposes it is not enough to indicate the case of a noun. It
        should be presented in a syntactical context.

        So first comes the sentence itself as a syntactical object. It has
        certain characteristics to be described with. Like it is being a
        declarative one, a simple one, etc.

        Next come the members of the sentence. They too have their own
        descriptions, like _elen_ being the subject of the phrase. It is its
        role as the subject that places this noun in the nominative case.

        A side note: this matter brings us one level deeper - to
        morphological objects, like the zero ending in _Elen_ which shows
        it being in the nominative. Such elements have their own
        descriptions.

        Next the members of the sentence are grouped in various syntagmas.
        Each syntagma have its own description, like "elen siila" being an
        external syntagma, and a predicative one. So depending on the syntagma
        we are analyzing its members should be described as the definitive or
        the defined element. The members of a syntagma can be related to each
        other differently. For example, "siila luumenn[a]" has - well, I don't
        know how it is called in English, in Russian it is "upravlenie", so in
        English it could be "control" - a controlling relation. So syntagma
        member-objects should be linked to their counterparts with which they
        are related.

        A member of a sentence usually comprises several syntagmas in which it
        plays different parts. For example, "siila luumenn[a]" is an objective
        syntagma, where the object "luumenn[a]" defines the verbal part which
        is definable. While "luumenn[a] omentielvo" is an objective syntagma,
        too, but here "luumenn[a]" is defined by "omentielvo". So members of
        syntagmas define or are defined in several syntagmas, and only the
        subject of a sentence comprises a single syntagma in which it is the
        definitive. That's why it is called absolute definitive. Here "elen"
        is not defined by anything.

        And so on. I hope that gives you some idea of the nested structure we
        need. Objects in objects in various hypostases with different
        descriptions.

        Kai, forgive me for skipping most of your own analysis, I've seen that
        in some aspects I simply repeat your one, but I've tried to present it
        in a more systematic and complex way.


        Namaarie! S.Y., Elenhil Laiquendo [Boris Shapiro]


        : linde nar i oomar tolesse vanwa yaamala :

        [In addition to all the _theoretical_ analytical information of the sort
        that Boris outlines above, there should be a means of distinguishing
        Tolkien's own statements about such matters from those that are non-
        Tolkienian conjecture (however clever and/or well-informed). Carl]
      • Boris Shapiro
        Aiya! ... Vice-versa: the subject of a sentence is the member of a syntagma that does not define anything (in any of syntagmata it is include in) and therefore
        Message 3 of 16 , Jul 25 1:15 AM
          Aiya!

          Oops, an error of mine:

          > So members of syntagmas define or are defined in several syntagmas,
          > and only the subject of a sentence comprises a single syntagma in
          > which it is the definitive. That's why it is called absolute
          > definitive. Here "elen" is not defined by anything.

          Vice-versa: the subject of a sentence is the member of a syntagma that
          does not define anything (in any of syntagmata it is include in) and
          therefore it is called absolute defined (or -able, I'm short of
          English terminology).

          Namaarie! S.Y., Elenhil Laiquendo

          : linde nar i oomar tolesse vanwa yaamala :
        • Fredrik
          Are we talking about a lexical database, or an annotated corpus, or what? I m not sure that we need or want to encode the syntactical structure of sentences or
          Message 4 of 16 , Jul 25 3:11 AM
            Are we talking about a lexical database, or an annotated corpus, or what?
            I'm not sure that we need or want to encode the syntactical structure of
            sentences or clauses in a database, since they are not given things. In
            many cases the structural analyses are precisely what we're after: Tolkien
            did not provide them. There are bound to be disagreements on how to parse a
            certain sentence; often, two or more analyses are equally possible. Whose
            analysis should be in the database? I think that the best tool in this case
            would be one that helps us find all the data we need, telling us exactly
            where in the texts they are, and where any other (possible) occurrences of
            the word/ morpheme are, so that we can go there and see for ourselves.

            /Fredrik


            [I just want to voice my strong agreement with what Fredrik has said
            here. Simply recording the occurrence of every "foreign language"
            element in Tolkien's writings will be an enormous undertaking. If
            analysis is to be incorporated into such a compilation at all, it is
            best left until after the compilation is complete. Having the compilation
            alone, if fully and properly indexed to the corpus, will be enormously
            useful. So long as the database is designed with extensibility and
            expansion in mind, analytical information can always be added later. Carl]
          • Boris Shapiro
            Aiya! Thursday, July 25, 2002, 1:03:55 AM, Beregond. Anders Stenström wrote: BAS The general idea of having collocutions registered in the BAS database
            Message 5 of 16 , Jul 26 1:09 AM
              Aiya!

              Thursday, July 25, 2002, 1:03:55 AM, "Beregond. Anders Stenström" wrote:

              BAS> The general idea of having collocutions registered in the
              BAS> database seems sound. But as Rich Alderson's reply indicated,
              BAS> this could easily become too theory-dependent to look quite good
              BAS> to me.
              But the problem of theory dependence seem to me a problem for
              real-world language treebanks only - when there are multiple treebanks
              that need to cooperate but are having problems with that because of
              different linguistic theories used in their architecture.

              Do you think ELDA would need to be connected with other LDBs?

              BAS> It seems to me that the best idea would be to register all
              BAS> 'contexts', from two-word constructions like _Minas Tirith_ up to
              BAS> long texts like "Namárie" (with full references, or 'attestation
              BAS> details' for each), and then link words to all contexts they
              BAS> occur in. The syntactical analysis can be left to fora outside
              BAS> the database.

              For me that seems to be a regrettable way of development. That
              abolishes every use (every extended search query) that I've imagined.
              What is left then? Just basic number/gender/case descriptions? Is this
              price good enough, and for what?

              Thursday, July 25, 2002, 2:11:22 PM, Fredrik wrote:

              F> I'm not sure that we need or want to encode the syntactical
              F> structure of sentences or clauses in a database, since they are not
              F> given things. In many cases the structural analyses are precisely
              F> what we're after: Tolkien did not provide them. There are bound to
              F> be disagreements on how to parse a certain sentence; often, two or
              F> more analyses are equally possible. Whose analysis should be in the
              F> database?

              Carl wrote:

              C> [I just want to voice my strong agreement with what Fredrik has
              C> said here. Simply recording the occurrence of every "foreign
              C> language" element in Tolkien's writings will be an enormous
              C> undertaking. If analysis is to be incorporated into such a
              C> compilation at all, it is best left until after the compilation is
              C> complete. Having the compilation alone, if fully and properly
              C> indexed to the corpus, will be enormously useful. So long as the
              C> database is designed with extensibility and expansion in mind,
              C> analytical information can always be added later. Carl]

              There is one vital aspect of planning the database. As far as I know,
              the only way to create an optimized database is to thoroughly design
              its architecture from the very beginning, otherwise adding more and
              more elements to it will greatly decrease its performance in speed and
              size. I'm afraid trying to extend an indexed corpus database to a
              full-scale LDB would be a failure.

              The problem of work load could be solved by sharing the tasks,
              provided that there is a unitary analysis scheme. Such a scheme is
              to be implemented in the programme/interface itself: imagine a
              template with given description variants. For example, a user enters
              "Elen siila luumenn' omentielvo" and starts the analysis "wizard". On
              the lexical analysis step, describing each word he would have to
              choose between predefined fields, like noun/verb/adjective/adverb etc,
              sg/pl, m/fem, nom/acc/gen/poss/dat/loc/abl/all/inst/resp, and so on.
              Provided a comprehensive universal and unitary scheme entering the
              analysis results would be greatly eased.

              Namaarie! S.Y., Elenhil Laiquendo [Boris Shapiro]

              : avartuvan i tauri ni ontar : an luumenya tyeela ar loanyar sintar :
            • Kai MacTane
              ... Sorry I ve taken so long. Do you have email but not Web access? Or do you not have a graphical browser? ... Elements are things like parma or -uva- or
              Message 6 of 16 , Jul 26 12:34 PM
                At 7/24/02 10:49 AM , Boris Shapiro wrote:

                >First, I have to say that I didn't have the possibility of seeing QH
                >by myself, so I'll rely on your answers and patience :)

                Sorry I've taken so long. Do you have email but not Web access? Or do you
                not have a graphical browser?

                >Does it make sense? But the question should be what do you regard as
                >an individual element and are they stored absolutely independently of
                >their context?

                Elements are things like "parma" or "-uva-" or "-llo". OTOH, "A ná X lá B"
                is also listed as one single element. They're generally stored
                context-independent, though the attestations field lists all places where
                the element is attested in use, so that people can look up the various
                contexts in which Tolkien used it.

                >I suppose I lack proper vocabulary and knowledge in programming, but
                >in my view the desired LDB [linguistic database] (or should we call it
                >_ELDA_ for "Elvish Linguistic DAtabase"? :) should be object-oriented,
                >and have a nested structure so that there are multiple levels of
                >objects like a nested doll. In my view an object is a linguistically
                >important element in of a given text stored in LDB which possesses the
                >required linguistic description. But there are different types of
                >objects: two words could be two individual lexical objects, but at the
                >same time they could be a sole syntactical object! And a sentence
                >could itself be a clause, a part of a complex sentense, thus being a
                >syntactical object, too! And all these objects viewed on different
                >levels should possess different descriptions.
                >
                >I'd like to know how does your QH deal with such information

                It doesn't. It stores things pretty much only at the morphological level,
                and leaves it to humans to do higher-level stuff.

                The sort of multi-level analysis you suggest, and which also seems to be
                suggested by Rich Alderson's mention of treebanks, might be valuable and
                useful, but it is certainly beyond the level of something I could write.

                --Kai MacTane
                ----------------------------------------------------------------------
                "But every night I burn,/Every night I call your name.
                Every night I burn,/Every night I fall again..."
                --The Cure,
                "Burn"
              • Kai MacTane
                ... I suppose we could add a category somewhere for phrases . I agree that sytactic analysis should be left to the humans, not machines -- I m honestly not
                Message 7 of 16 , Jul 26 12:45 PM
                  At 7/24/02 02:03 PM , Beregond. Anders Stenström wrote:

                  > The general idea of having collocutions registered in the
                  >database seems sound. But as Rich Alderson's reply indicated,
                  >this could easily become too theory-dependent to look quite
                  >good to me. It seems to me that the best idea would be to register
                  >all 'contexts', from two-word constructions like _Minas Tirith_ up
                  >to long texts like "Namárie" (with full references, or 'attestation
                  >details' for each), and then link words to all contexts they occur in.
                  >The syntactical analysis can be left to fora outside the database.

                  I suppose we could add a category somewhere for "phrases". I agree that
                  sytactic analysis should be left to the humans, not machines -- I'm
                  honestly not sure they can handle it at all yet; I know I personally can't
                  make them do it. (Consider the current state of Babelfish, which has had
                  years of research and the efforts of a large number of people poured into
                  it. It can give you the general idea of what something means, but it's
                  painfully obvious that it's not about to put professional translators out
                  of business any time soon, *especially* regarding poetic and artistic works.)

                  --Kai MacTane
                  ----------------------------------------------------------------------
                  "Deadly angels for reality and passion..."
                  --Shriekback,
                  "Gunning for the
                  Buddha"
                • Kai MacTane
                  ... Interesting point. Though I think this means that nearly any noun in ELDA would be entered at least twice: once in definite form, and then again in
                  Message 8 of 16 , Jul 26 1:28 PM
                    At 7/24/02 09:59 PM , Boris Shapiro wrote:

                    >KM> [A]: Not necessary (or possible) in Quenya; no indefinite article
                    >KM> exists in Quenya. Necessary in translation into English to conform
                    >KM> with English grammar, which requires articles.
                    >
                    >That is why any noun as a syntactic object in ELDA should have as one
                    >of its descriptions the indication of its definite/indefinite status,
                    >linked to the word it is defined by (not necessarily and article), and
                    >Q _i_ (when used as the article) should be linked to the noun it
                    >describes; the same applies to virtually any word that defines
                    >another.

                    Interesting point. Though I think this means that nearly any noun in ELDA
                    would be entered at least twice: once in definite form, and then again in
                    indefinite form. (After all, most nouns can be used both definitely and
                    indefinitely.)

                    >That's why any object (presently, a word-object) should not be stored
                    >independently from its context (on which he obviously does depend),
                    >and share a date-description with the text-object it is included in.
                    >Thus one should be able to search for every case of the word "elen"
                    >used with chronology and other contextual conditions for search.

                    Ouch! While I agree that a context-dependent database would be an
                    interesting and probably very useful thing, I must admit I'm a bit confused
                    about how one would use it. Would searches be things like: "_elen_, where
                    used as subject (not object) and only where indefinite", and so on? (I can
                    sort of see how that search should at least return "_elen síla lumenn'
                    omentielvo_", while not returning "_Aiya Earendil elenion ancalima_".)

                    At the moment, QH's means of dealing with context is simply to provide
                    references to all attested uses of the element in the "Attestations" field.

                    >Next, a lexical word-object should definitely have a vocabulary
                    >description for referential purposes. That was outlined in your lines
                    >three paragraphs above. Probably we'll need a dictionary module.

                    Which, to figure out homonyms, will need to be able to carry out some
                    actual syntactic analysis. (Which you do explicitly call for elsewhere in
                    your post.) Unfortunately, I'm afraid I don't know how to get software to
                    do that, and I'm especially wary of the concept of getting software to be
                    able to carry out accurate syntactic analysis on poetic material.

                    >And so on. I hope that gives you some idea of the nested structure we
                    >need. Objects in objects in various hypostases with different
                    >descriptions.

                    It does give me some idea of it, yes. I think that what you propose is an
                    impressive and worthwhile project, but it is one which is utterly beyond my
                    abilities. I'm sorry.

                    >Kai, forgive me for skipping most of your own analysis, I've seen that
                    >in some aspects I simply repeat your one, but I've tried to present it
                    >in a more systematic and complex way.

                    No problem there; it was, after all, just an example analysis. I think it
                    served its purpose, and you did right to skip large chunks of it.

                    --Kai MacTane
                    ----------------------------------------------------------------------
                    "Then, when they spill the demon seed
                    Turn and face into the wind.
                    All along you still believed...
                    Believed you were immune."
                    --Thomas Dolby,
                    "The Flat Earth"
                  • Kai MacTane
                    ... What sorts of search queries do you envision? Can you give me some examples? --Kai MacTane ... In another life I see you/As an angel flying high, And the
                    Message 9 of 16 , Jul 26 2:13 PM
                      At 7/26/02 01:09 AM , Boris Shapiro wrote:

                      >For me that seems to be a regrettable way of development. That
                      >abolishes every use (every extended search query) that I've imagined.
                      >What is left then? Just basic number/gender/case descriptions? Is this
                      >price good enough, and for what?

                      What sorts of search queries do you envision? Can you give me some examples?

                      --Kai MacTane
                      ----------------------------------------------------------------------
                      "In another life I see you/As an angel flying high,
                      And the hands of time will free you/You will cast your chains aside,
                      And the dawn will come and kiss away
                      Every tear that's ever fallen from your eyes...
                      --Concrete Blonde,
                      "Caroline"
                    Your message has been successfully submitted and would be delivered to recipients shortly.