Loading ...
Sorry, an error occurred while loading the content.

Concept-based Search

Expand Messages
  • Flo
    Hi, Regarding the Find command, I think that we often don t search for a string but we use strings in order to find meanings. That s why a search would be
    Message 1 of 34 , Jan 12, 2008
    • 0 Attachment

      Regarding the Find command, I think that we often don't search for a
      string but we use strings in order to find meanings. That's why a
      search would be useful that finds A if we are searching for A but
      also finds B or C which appear in a semantic relation to A.

      A search like this would find synonyms, for example: When searching
      for "freedom" it would also find "liberty" (or antonyms like "war"
      and "peace" or concepts like "father", "mother", "parents"). We could
      also think of negative relations: Find words in a database that are
      not to be used as index terms (non-descriptors).

      Could we make NoteTab an instrument for executing such a "concept-
      based search"?

      I played a little bit around and - as a first approach - I came to
      the following solution:

      1. First we have to define some synonyms and save them as
      SYNONYMS.TXT (find test data below).

      2. Open a document containing the sample text mentioned below.

      3. Keep SYNONYMS.TXT closed and run the Synonyms clip on that text.

      The clip prompts you for a search word. Enter "horse", for example,
      and watch the cusor move through the document highlighting the search
      word AND its synonyms (if found). You can start the clip from any
      cursor position and interrupt it by pressing Shift key. In case of
      interruption, the clip will restart from the last cursor position
      using the same search word you have entered before, that is, without
      prompting you for a search word again.

      Certainly, we could figure out some improvements of this approach.
      Has anyone created a similar solution? I would be pleased to read
      your experiences with this issue.


      *** Test data and clip ***

      Save as SYNONYMS.TXT...


      Sample text for testing the clip (partly from Wiki)...

      Synonyms are different words with identical or at least similar
      meanings. Words that are synonyms are said to be synonymous, and the
      state of being a synonym is called synonymy. An example of synonyms
      is the words car and automobile. Similarly, if we talk about a long
      time or an extended time, long and extended become synonyms. In the
      figurative sense, two words are often said to be synonymous if they
      have the same connotation. Example: "A young horse is a foal, not
      necessarily a colt. A colt refers only to a young male horse. A horse
      that looks pure white is, in most cases, actually a middle-aged or
      older gray." More examples of English synonyms are: baby and infant,
      student and pupil, buy and purchase, pretty and attractive, sick and
      ill, quickly and speedily, freedom and liberty, dead and deceased.
      Note that the synonyms are defined with respect to certain senses of
      words; for instance, pupil as the "aperture in the iris of the eye"
      is not synonymous with student. Antonyms are words with opposite or
      nearly opposite meanings. For example: dead and alive, near and far,
      war and peace, increase and decrease. The words synonym and antonym
      are themselves antonyms. Hypernyms and hyponyms are words that refer
      to, respectively, a general category and a specific instance of that
      category. For example, vehicle is a hypernym of car, and car is a
      hyponym of vehicle.

      Synonyms clip...

      ; Check if search has been interrupted before
      ^!IfTrue=^%Break% Search Else Next
      ; Get Search Term and its Synonyms from SYNONYMS.TXT
      ^!SetScreenUpdate Off
      ^!Set %Term%=^?{Enter a Search Term:}
      ^!Open ^$GetDocumentPath$SYNONYMS.TXT
      ^!Jump Doc_Start
      ^!Find "^%Term%" RSI
      ^!Set %Search%=^$GetParagraph$
      ^!Close SYNONYMS.TXT,Discard
      ; Find Search Term and Synonyms in Document
      ^!SetScreenUpdate On
      ^!SetHintInfo Press SHIFT to stop search
      ^!Jump Doc_Start

      ^!Find ^%Search% RS
      ^!IfError Message
      ^!Delay 15
      ^!Set %Break%=^$IsShiftKeyDown$
      ^!IfTrue=^%Break% End Else Next
      ^!MoveCursor +1
      ^!Goto Search

      ^!Info No more matches found!
      ^!Set %Break%=0
      ^!Goto End

      Quotation: Why using concepts...

      "Concept-based access to information promises important benefits over
      keyword-based access. One of these benefits is the ability to take
      advantage of semantic relationships among concepts in finding
      relevant documents. Another benefit is the elimination of irrelevant
      documents by identifying conceptual mismatches. Concepts are mental
      structures. Words and phrases are the linguistic representatives of
      concepts. Due to the inherent conciseness of natural language, words
      can represent multiple concepts and different words may represent the
      same or very similar concepts."
    • dracorat
      I should also point out that including the pluras es form is sometimes not desired. For example, if we had car|automobile we would get: Sally cares
      Message 34 of 34 , Jan 18, 2008
      • 0 Attachment
        I should also point out that including the pluras "es" form is
        sometimes not desired. For example, if we had car|automobile we would get:

        Sally cares {automobile} about her dog.

        But for that matter, the "s" form has the same issue, just in lower
        quantity. Thus, it's a question of what's better - smaller dictionary,
        or dictionary with every valid permutation.

        (Or even, just do only "s")

        His cat let out a loud hiss.


        His {owner} cat let out a loud hiss {owner}.

        The "s" case would be pretty rare, however. (The "es" not so rare.)


        --- In ntb-clips@yahoogroups.com, "dracorat" <dracorat@...> wrote:
        > I forgot to include the trailing questionmark. Sorry bout that.
        > (Because it's optional to be plural)
        > If you leave off the $2, the plural form will be changed to the
        > singular. The second capture is what plural form we found.
        > --Keith
        > (Happy ho help. - I LOVE regular expressions)
      Your message has been successfully submitted and would be delivered to recipients shortly.