Loading ...
Sorry, an error occurred while loading the content.

Simple full-text search dictionary

Expand Messages
  • Kontra, Gergely
    Hi! I want to implement a simple full-text search dictionary. I have a textfile with a list of the words. The main question of mine is how can I (easily)
    Message 1 of 5 , Sep 23, 2002
    • 0 Attachment
      Hi!

      I want to implement a simple full-text search dictionary.
      I have a textfile with a list of the words.

      The main question of mine is how can I (easily) collect words, that
      match a regexp. I've found Str.search_forward, but this is seems
      complicated when using with List.filter (raises Exception...).

      Another question, whether to store the dictionary in a say: list or
      not. If yes, how to fill the list efficiently.

      Gergo

      +-[Kontra, Gergely @ Budapest University of Technology and Economics]-+
      | Email: kgergely@..., kgergely@... |
      | URL: turul.eet.bme.hu/~kgergely Mobile: (+36 20) 356 9656 |
      +-------"Olyan langesz vagyok, hogy poroltoval kellene jarnom!"-------+
      .
      Magyar php mirror es magyar php dokumentacio: http://hu.php.net
    • Stalkern 2
      ... Well, if no matching substrings are found an exception Not_found sounds reasonable, doesn t it? ... As far as I know, to have something really performant
      Message 2 of 5 , Sep 23, 2002
      • 0 Attachment
        Il Monday 23 September 2002 16:11, Kontra, Gergely ha scritto:
        > Hi!
        >
        > I want to implement a simple full-text search dictionary.
        > I have a textfile with a list of the words.
        >
        > The main question of mine is how can I (easily) collect words, that
        > match a regexp. I've found Str.search_forward, but this is seems
        > complicated when using with List.filter (raises Exception...).

        Well, if no matching substrings are found an exception Not_found sounds
        reasonable, doesn't it?

        >
        > Another question, whether to store the dictionary in a say: list or
        > not. If yes, how to fill the list efficiently.

        As far as I know, to have something really performant -but not regexp-like-
        you could use the search tree structure that you can find in the exercises at
        http://caml.inria.fr/oreilly-book/html/book-ora020.html.
        Open the page with lynx if your browser gives garbled output here (Konqueror
        and Galeon do)

        Bye
        Ernesto
      • Gianluca Moro
        ... In my opinion, to implement a dictionary, a good solution is to use an hash-table. There s a module in Caml which implements this structure, called
        Message 3 of 5 , Sep 23, 2002
        • 0 Attachment
          --- In ocaml_beginners@y..., "Kontra, Gergely" <kgergely@m...> wrote:
          > The main question of mine is how can I (easily) collect words, that
          > match a regexp. I've found Str.search_forward, but this is seems
          > complicated when using with List.filter (raises Exception...).
          >
          > Another question, whether to store the dictionary in a say: list or
          > not. If yes, how to fill the list efficiently.

          In my opinion, to implement a dictionary, a good solution
          is to use an hash-table.
          There's a module in Caml which implements this structure,
          called Hashtbl: worth trying.

          By the way, the hash module has primitives to extract data from the
          table, so you should not use List.filter, anyway, if the
          word is not found, you got an exception (you must catch it!)

          bye
          Gianluca
        • Kontra, Gergely
          ... I know about the exception. That s why I asked about anything easier to implement. BUT, a full-text search dictionary shouldn t use more complex datatype,
          Message 4 of 5 , Sep 23, 2002
          • 0 Attachment
            >> Another question, whether to store the dictionary in a say: list or
            >> not. If yes, how to fill the list efficiently.
            >
            >In my opinion, to implement a dictionary, a good solution
            >is to use an hash-table.
            >There's a module in Caml which implements this structure,
            >called Hashtbl: worth trying.
            >
            >By the way, the hash module has primitives to extract data from the
            >table, so you should not use List.filter, anyway, if the
            >word is not found, you got an exception (you must catch it!)

            I know about the exception. That's why I asked about anything easier to
            implement. BUT, a full-text search dictionary shouldn't use more complex
            datatype, while the searching algorithm cannot be better than linear
            search :(

            Gergo
            +-[Kontra, Gergely @ Budapest University of Technology and Economics]-+
            | Email: kgergely@..., kgergely@... |
            | URL: turul.eet.bme.hu/~kgergely Mobile: (+36 20) 356 9656 |
            +-------"Olyan langesz vagyok, hogy poroltoval kellene jarnom!"-------+
            .
            Magyar php mirror es magyar php dokumentacio: http://hu.php.net
          • Stalkern 2
            ... I don t know whether this is true theoretically, I don t think so. Both lexical trees and hashtables should be better, because uin the first case you only
            Message 5 of 5 , Sep 24, 2002
            • 0 Attachment
              Il Monday 23 September 2002 20:11, Kontra, Gergely ha scritto:
              > the searching algorithm cannot be better than linear
              > search

              I don't know whether this is true theoretically, I don't think so. Both
              lexical trees and hashtables should be better, because uin the first case you
              only check the previous letters and then the corresponding cluster of keys,
              in the second case you only search the keys.

              By the way, if by "full-text search dictionary" you mean that you have got
              some text and will look in all of it *regardless of keys and definitions*,
              neither the lexical tree nor the hash table figure should apply, because it
              is NOT a key-value "dictionary" from a programming point of view, but just
              text.

              For what concerns exceptions,
              try (yourSearchFunction yourArguments) with e -> [ ]
              is so easy that I can hardly imagine anything easier.
              I'm curious about what result would you expect to get otherwise in case the
              word is not found.


              Ciao
              Ernesto
            Your message has been successfully submitted and would be delivered to recipients shortly.