Loading ...
Sorry, an error occurred while loading the content.

Handling inflected languages (Latin, Finnish, Russian) with DCGs, e. g. in Prolog

Expand Messages
  • Antti Ylikoski
    The Definite Clause Grammars constitute an elegant and concise formalism for Natural Language Processing. In the literature that I have seen, they are
    Message 1 of 1 , Jul 4, 2012
    • 0 Attachment
      The Definite Clause Grammars constitute an elegant and concise
      formalism for Natural Language Processing. In the literature that I
      have seen, they are primarily being applied for English, and I have
      not seen applications for inflected languages such as Latin, Greek,
      Finnish, Japanese, or Russian. (But I'm not saying that there would
      exist no such applications -- I did not carry out any literature

      Recently I discovered that it is, in Prolog, possible to apply the DCG
      formalism as well to inflected languages by means of the "name(Word,
      CharCodeList)" and atom_codes/2 predicates, which can be used to
      explode and implode atoms into the form of character lists and
      character lists into the form of atoms, which atoms will be understood
      by the DCG formalism. Then, the character lists can be arbitrarily
      manipulated to inflect word stems.

      As to DCGs for systems written in LISP: the EXPLODE and IMPLODE
      functions existed in MACLISP, but I understand that for efficiency
      considerations they have been left out of Common LISP. "Premature
      optimization is the root of all evil!" Those functions convert
      symbols to single-character character lists and vice versa,
      respectively. They can be defined in Common LISP as follows:


      ;;; MACLISP had EXPLODE and IMPLODE..................
      ;;; AJY 2011-12-03.

      (defun explode (sym)
      (map 'list #'(lambda (c)
      (intern (make-string 1
      :initial-element c)))
      (symbol-name sym)))

      (defun implode (lst)
      (intern (with-output-to-string (s)
      (loop for item in lst
      do (princ item s)))))


      For an inflected language, see, as a practical example, Finnish:


      (Finnish is an Altaic language, which language group derives its name
      from the mountain range of Altai in Asia. According to the Wikipedia,
      Finnish is related to Estonian, Hungarian, Turkish, and as an Altaic
      language, Japanese.)

      Now for DCG's for inflected languages in Prolog. The Finnish plural
      form of a noun is formed by appending a letter "t" into the word, and
      in certain cases inflecting the noun stem. This is easy to do by
      exploding and imploding atoms, processing lists, and interfacing the
      resulting atoms to the DCG grammar. The trick is standard Prolog:


      % AJY 2012-07-03.
      % Some Definite Clause Grammars in Prolog.

      sentence --> noun_phrase, verb_phrase.

      noun_phrase --> determiner, noun.
      noun_phrase --> proper_noun.
      % .........................

      proper_noun --> ['Antti'].
      proper_noun --> ['Hanna'].
      proper_noun --> ['John'].
      proper_noun --> ['Richard'].

      % ..................

      % ........................

      % ..............................

      % Handle inflected languages with DCG's.
      % AJY 2012-07-03.

      finnish_plural(SingularWord, Plural) :-
      atom_codes(SingularWord, List1),
      atom_codes(t, LetterT),
      append(List1, LetterT, PluralList),
      atom_codes(Plural, PluralList), !.

      finnish_singular(Plural, SingularWord) :-
      atom_codes(Plural, List1),
      atom_codes(t, LetterT),
      append(SingularList, LetterT, List1),
      atom_codes(SingularWord, SingularList), !.


      regards, Antti J Ylikoski
      Helsinki, Finland, the E.U.
      the Microsoft PowerPoint slides series that I created to lecture about
      my doctoral research:
      the so far unfinished manuscript:
    Your message has been successfully submitted and would be delivered to recipients shortly.