Loading ...
Sorry, an error occurred while loading the content.

Re: [Clip] Re: Concept-based Search

Expand Messages
  • Alan C
    ... A specific problem is how to deal with basic words (lemmata) and word forms (flexions), also with compounds etc. So far, the clip searches for whole words
    Message 1 of 34 , Jan 15, 2008
    • 0 Attachment
      On Jan 15, 2008 9:23 AM, Flo <flo.gehrke@...> wrote:

      --- In ntb-clips@yahoogroups.com, "ebbtidalflats" <ebbtidalflats@...>
      wrote:
      >

      A specific problem is how to deal with basic words (lemmata) and word
      forms (flexions), also with compounds etc. So far, the clip searches
      for whole words only (using \b in the Replace Command). Consequently,
      the word list must contain the search term in any word form that it
      is searched. If we make the clip search for substrings, we possibly
      get some nonsense comments because the search term as a whole could
      have a different meaning than a substring of that term.

      "Build" an alternation regex? (on the fly, build a regex) Thus some
      customization of the resultant regex can happen at each time that do a
      "build".

      Have/store the forms/compounds in their line with delimiter -- when needed,
      put into an array.

      (some of my syntax is not exactly correct but is hopefully close enough so
      as to portray the meaning intended from or based upon my example).

      example storage line:

      myword:wordform1:wordform2:compound1

      To use, just find that doc for myword and select line/get line into array

      ^!SetArray %forms%=getline

      then check, what is it, ^%forms0% (the array index) for how many forms or
      compounds there are. Then,

      well, I guess you'd need to loop and find one at a time and INC the array
      index at each loop iteration

      ^!Find "^%forms%^%indx%" ris

      Instead, if you were replacing:

      ^!Replace "^%forms1%|^%forms2%|^%forms3%" >> "whatever" riswa

      | meaning "or" (any of those 3 get replaced)

      Just an idea.

      --
      Alan.


      [Non-text portions of this message have been removed]
    • dracorat
      I should also point out that including the pluras es form is sometimes not desired. For example, if we had car|automobile we would get: Sally cares
      Message 34 of 34 , Jan 18, 2008
      • 0 Attachment
        I should also point out that including the pluras "es" form is
        sometimes not desired. For example, if we had car|automobile we would get:

        Sally cares {automobile} about her dog.

        But for that matter, the "s" form has the same issue, just in lower
        quantity. Thus, it's a question of what's better - smaller dictionary,
        or dictionary with every valid permutation.

        (Or even, just do only "s")

        His cat let out a loud hiss.

        his|owner

        His {owner} cat let out a loud hiss {owner}.

        The "s" case would be pretty rare, however. (The "es" not so rare.)

        --Keith

        --- In ntb-clips@yahoogroups.com, "dracorat" <dracorat@...> wrote:
        >
        > I forgot to include the trailing questionmark. Sorry bout that.
        > (Because it's optional to be plural)
        >
        > If you leave off the $2, the plural form will be changed to the
        > singular. The second capture is what plural form we found.
        >
        > --Keith
        > (Happy ho help. - I LOVE regular expressions)
        >
        >
      Your message has been successfully submitted and would be delivered to recipients shortly.