Aiya! Thursday, July 25, 2002, 1:03:55 AM, Beregond. Anders Stenström wrote: BAS The general idea of having collocutions registered in the BAS databaseMessage 1 of 16 , Jul 26, 2002View SourceAiya!
Thursday, July 25, 2002, 1:03:55 AM, "Beregond. Anders Stenström" wrote:
BAS> The general idea of having collocutions registered in the
BAS> database seems sound. But as Rich Alderson's reply indicated,
BAS> this could easily become too theory-dependent to look quite good
BAS> to me.
But the problem of theory dependence seem to me a problem for
real-world language treebanks only - when there are multiple treebanks
that need to cooperate but are having problems with that because of
different linguistic theories used in their architecture.
Do you think ELDA would need to be connected with other LDBs?
BAS> It seems to me that the best idea would be to register all
BAS> 'contexts', from two-word constructions like _Minas Tirith_ up to
BAS> long texts like "Namárie" (with full references, or 'attestation
BAS> details' for each), and then link words to all contexts they
BAS> occur in. The syntactical analysis can be left to fora outside
BAS> the database.
For me that seems to be a regrettable way of development. That
abolishes every use (every extended search query) that I've imagined.
What is left then? Just basic number/gender/case descriptions? Is this
price good enough, and for what?
Thursday, July 25, 2002, 2:11:22 PM, Fredrik wrote:
F> I'm not sure that we need or want to encode the syntactical
F> structure of sentences or clauses in a database, since they are not
F> given things. In many cases the structural analyses are precisely
F> what we're after: Tolkien did not provide them. There are bound to
F> be disagreements on how to parse a certain sentence; often, two or
F> more analyses are equally possible. Whose analysis should be in the
C> [I just want to voice my strong agreement with what Fredrik has
C> said here. Simply recording the occurrence of every "foreign
C> language" element in Tolkien's writings will be an enormous
C> undertaking. If analysis is to be incorporated into such a
C> compilation at all, it is best left until after the compilation is
C> complete. Having the compilation alone, if fully and properly
C> indexed to the corpus, will be enormously useful. So long as the
C> database is designed with extensibility and expansion in mind,
C> analytical information can always be added later. Carl]
There is one vital aspect of planning the database. As far as I know,
the only way to create an optimized database is to thoroughly design
its architecture from the very beginning, otherwise adding more and
more elements to it will greatly decrease its performance in speed and
size. I'm afraid trying to extend an indexed corpus database to a
full-scale LDB would be a failure.
The problem of work load could be solved by sharing the tasks,
provided that there is a unitary analysis scheme. Such a scheme is
to be implemented in the programme/interface itself: imagine a
template with given description variants. For example, a user enters
"Elen siila luumenn' omentielvo" and starts the analysis "wizard". On
the lexical analysis step, describing each word he would have to
choose between predefined fields, like noun/verb/adjective/adverb etc,
sg/pl, m/fem, nom/acc/gen/poss/dat/loc/abl/all/inst/resp, and so on.
Provided a comprehensive universal and unitary scheme entering the
analysis results would be greatly eased.
Namaarie! S.Y., Elenhil Laiquendo [Boris Shapiro]
: avartuvan i tauri ni ontar : an luumenya tyeela ar loanyar sintar :
... Sorry I ve taken so long. Do you have email but not Web access? Or do you not have a graphical browser? ... Elements are things like parma or -uva- orMessage 2 of 16 , Jul 26, 2002View SourceAt 7/24/02 10:49 AM , Boris Shapiro wrote:
>First, I have to say that I didn't have the possibility of seeing QHSorry I've taken so long. Do you have email but not Web access? Or do you
>by myself, so I'll rely on your answers and patience :)
not have a graphical browser?
>Does it make sense? But the question should be what do you regard asElements are things like "parma" or "-uva-" or "-llo". OTOH, "A ná X lá B"
>an individual element and are they stored absolutely independently of
is also listed as one single element. They're generally stored
context-independent, though the attestations field lists all places where
the element is attested in use, so that people can look up the various
contexts in which Tolkien used it.
>I suppose I lack proper vocabulary and knowledge in programming, butIt doesn't. It stores things pretty much only at the morphological level,
>in my view the desired LDB [linguistic database] (or should we call it
>_ELDA_ for "Elvish Linguistic DAtabase"? :) should be object-oriented,
>and have a nested structure so that there are multiple levels of
>objects like a nested doll. In my view an object is a linguistically
>important element in of a given text stored in LDB which possesses the
>required linguistic description. But there are different types of
>objects: two words could be two individual lexical objects, but at the
>same time they could be a sole syntactical object! And a sentence
>could itself be a clause, a part of a complex sentense, thus being a
>syntactical object, too! And all these objects viewed on different
>levels should possess different descriptions.
>I'd like to know how does your QH deal with such information
and leaves it to humans to do higher-level stuff.
The sort of multi-level analysis you suggest, and which also seems to be
suggested by Rich Alderson's mention of treebanks, might be valuable and
useful, but it is certainly beyond the level of something I could write.
"But every night I burn,/Every night I call your name.
Every night I burn,/Every night I fall again..."
... I suppose we could add a category somewhere for phrases . I agree that sytactic analysis should be left to the humans, not machines -- I m honestly notMessage 3 of 16 , Jul 26, 2002View SourceAt 7/24/02 02:03 PM , Beregond. Anders Stenström wrote:
> The general idea of having collocutions registered in theI suppose we could add a category somewhere for "phrases". I agree that
>database seems sound. But as Rich Alderson's reply indicated,
>this could easily become too theory-dependent to look quite
>good to me. It seems to me that the best idea would be to register
>all 'contexts', from two-word constructions like _Minas Tirith_ up
>to long texts like "Namárie" (with full references, or 'attestation
>details' for each), and then link words to all contexts they occur in.
>The syntactical analysis can be left to fora outside the database.
sytactic analysis should be left to the humans, not machines -- I'm
honestly not sure they can handle it at all yet; I know I personally can't
make them do it. (Consider the current state of Babelfish, which has had
years of research and the efforts of a large number of people poured into
it. It can give you the general idea of what something means, but it's
painfully obvious that it's not about to put professional translators out
of business any time soon, *especially* regarding poetic and artistic works.)
"Deadly angels for reality and passion..."
"Gunning for the
... Interesting point. Though I think this means that nearly any noun in ELDA would be entered at least twice: once in definite form, and then again inMessage 4 of 16 , Jul 26, 2002View SourceAt 7/24/02 09:59 PM , Boris Shapiro wrote:
>KM> [A]: Not necessary (or possible) in Quenya; no indefinite articleInteresting point. Though I think this means that nearly any noun in ELDA
>KM> exists in Quenya. Necessary in translation into English to conform
>KM> with English grammar, which requires articles.
>That is why any noun as a syntactic object in ELDA should have as one
>of its descriptions the indication of its definite/indefinite status,
>linked to the word it is defined by (not necessarily and article), and
>Q _i_ (when used as the article) should be linked to the noun it
>describes; the same applies to virtually any word that defines
would be entered at least twice: once in definite form, and then again in
indefinite form. (After all, most nouns can be used both definitely and
>That's why any object (presently, a word-object) should not be storedOuch! While I agree that a context-dependent database would be an
>independently from its context (on which he obviously does depend),
>and share a date-description with the text-object it is included in.
>Thus one should be able to search for every case of the word "elen"
>used with chronology and other contextual conditions for search.
interesting and probably very useful thing, I must admit I'm a bit confused
about how one would use it. Would searches be things like: "_elen_, where
used as subject (not object) and only where indefinite", and so on? (I can
sort of see how that search should at least return "_elen síla lumenn'
omentielvo_", while not returning "_Aiya Earendil elenion ancalima_".)
At the moment, QH's means of dealing with context is simply to provide
references to all attested uses of the element in the "Attestations" field.
>Next, a lexical word-object should definitely have a vocabularyWhich, to figure out homonyms, will need to be able to carry out some
>description for referential purposes. That was outlined in your lines
>three paragraphs above. Probably we'll need a dictionary module.
actual syntactic analysis. (Which you do explicitly call for elsewhere in
your post.) Unfortunately, I'm afraid I don't know how to get software to
do that, and I'm especially wary of the concept of getting software to be
able to carry out accurate syntactic analysis on poetic material.
>And so on. I hope that gives you some idea of the nested structure weIt does give me some idea of it, yes. I think that what you propose is an
>need. Objects in objects in various hypostases with different
impressive and worthwhile project, but it is one which is utterly beyond my
abilities. I'm sorry.
>Kai, forgive me for skipping most of your own analysis, I've seen thatNo problem there; it was, after all, just an example analysis. I think it
>in some aspects I simply repeat your one, but I've tried to present it
>in a more systematic and complex way.
served its purpose, and you did right to skip large chunks of it.
"Then, when they spill the demon seed
Turn and face into the wind.
All along you still believed...
Believed you were immune."
"The Flat Earth"
... What sorts of search queries do you envision? Can you give me some examples? --Kai MacTane ... In another life I see you/As an angel flying high, And theMessage 5 of 16 , Jul 26, 2002View SourceAt 7/26/02 01:09 AM , Boris Shapiro wrote:
>For me that seems to be a regrettable way of development. ThatWhat sorts of search queries do you envision? Can you give me some examples?
>abolishes every use (every extended search query) that I've imagined.
>What is left then? Just basic number/gender/case descriptions? Is this
>price good enough, and for what?
"In another life I see you/As an angel flying high,
And the hands of time will free you/You will cast your chains aside,
And the dawn will come and kiss away
Every tear that's ever fallen from your eyes...