Loading ...
Sorry, an error occurred while loading the content.
 

Re: "ocaml_beginners"::[] Unix "locate" database access ?

Expand Messages
  • Fabrice Marchant
    Thanks a lot Rich and Grant ! ... I ve apt-got the source and look at this code. It is short. locatedb man page explains how front compression of lists
    Message 1 of 13 , Mar 11, 2007
      Thanks a lot Rich and Grant !

      Rich wrote :
      > ... The best bet is to download the source code (in
      > findutils) and try to reverse engineer the format from the source --
      > it uses a weird type of compression. From locate/code.c:
      I've apt-got the source and look at this code. It is short.
      locatedb man page explains how "front compression" of lists works.
      However, I'll consider to work another way, maybe re-scanning the disks with the OCaml program.

      Grant wrote :
      > If you're trying to do something simple that you could do in a shell, you
      > can use one of the Unix process commands and "screen scrape" the results.
      I've soon used Sys.command but I do not know how to crop the results back to my OCaml program :
      have no idea about how to proceed to the "screen scrape"...
      If you had further explanations about this. Thanks.
      I wonder if we can consider this as a good programming practice ?

      Best regards

      Fabrice
    • William Neumann
      ... Look at the Unix.open_process* functions William
      Message 2 of 13 , Mar 11, 2007
        On Mar 11, 2007, at 12:04 PM, Fabrice Marchant wrote:

        > I've soon used Sys.command but I do not know how to crop the
        > results back to my OCaml program :
        > have no idea about how to proceed to the "screen scrape"...

        Look at the Unix.open_process* functions <http://caml.inria.fr/pub/
        docs/manual-ocaml/libref/
        Unix.html#6_Highlevelprocessandredirectionmanagement>

        William D. Neumann

        "I eat T-bone steaks, I lift barbell plates, I'm sweeter than a
        German chocolate cake. I'm the reflection of perfection, the number
        one selection. I'm the man of the hour, the man with the power, too
        sweet to be sour. The ladies' pet, the men's regret, where what you
        see is what you get, and what you don't see, is better yet."

        --Superstar Billy Graham
      • Fabrice Marchant
        Thanks a lot William ! ... I ve seen the doc and can try these process functions now. But that isn t obvious for me. If you know a screen scrape example
        Message 3 of 13 , Mar 12, 2007
          Thanks a lot William !

          > Look at the Unix.open_process* functions <http://caml.inria.fr/pub/
          > docs/manual-ocaml/libref/
          > Unix.html#6_Highlevelprocessandredirectionmanagement>

          I've seen the doc and can try these process functions now.
          But that isn't obvious for me. If you know a "screen scrape" example somewhere...

          Regards

          Fabrice
        • Robert Roessler
          ... All that is really meant here is that you must examine the text being generated/returned by running commands and parse it enough to be able to recognize
          Message 4 of 13 , Mar 12, 2007
            Fabrice Marchant wrote:
            > Thanks a lot William !
            >
            > > Look at the Unix.open_process* functions <http://caml.inria.fr/pub/
            > <http://caml.inria.fr/pub/>
            > > docs/manual-ocaml/libref/
            > > Unix.html#6_Highlevelprocessandredirectionmanagement>
            >
            > I've seen the doc and can try these process functions now.
            > But that isn't obvious for me. If you know a "screen scrape" example
            > somewhere...

            All that is really meant here is that you must examine the text being
            generated/returned by running commands and parse it enough to be able
            to recognize and extract the data of interest to you.

            Typically, this could mean using simple pattern-matching to "see"
            lines that have useful data, and ignore ones that don't.

            The next level of complexity (for information spread across multiple
            lines) is to remember what you have seen and are therefore expecting
            to see next - essentially simulating a simple finite state machine.

            At the extreme end of this approach, you might actually create a
            grammar and employ lexical analysis and generator tools.

            Robert Roessler
            robertr@...
            http://www.rftp.com
          • Martin Jambon
            ... See slurp_command on that page: http://martin.jambon.free.fr/toolbox.html#programs Micmatch.Text.iter_lines_of_channel can also be useful when combined
            Message 5 of 13 , Mar 12, 2007
              On Mon, 12 Mar 2007, Fabrice Marchant wrote:

              > Thanks a lot William !
              >
              >> Look at the Unix.open_process* functions <http://caml.inria.fr/pub/
              >> docs/manual-ocaml/libref/
              >> Unix.html#6_Highlevelprocessandredirectionmanagement>
              >
              > I've seen the doc and can try these process functions now.
              > But that isn't obvious for me. If you know a "screen scrape" example
              > somewhere...

              See "slurp_command" on that page:
              http://martin.jambon.free.fr/toolbox.html#programs

              Micmatch.Text.iter_lines_of_channel can also be useful when combined with
              Unix.open_process_in.


              Martin

              --
              Martin Jambon
              http://martin.jambon.free.fr
            • Fabrice Marchant
              Thanks Robert for your abstract but useful explanations. I discover this way of working. There is no reason though to be specific to OCaml. Regards Fabrice
              Message 6 of 13 , Mar 12, 2007
                Thanks Robert for your abstract but useful explanations.

                I discover this way of working. There is no reason though to be specific to OCaml.

                Regards

                Fabrice
              • Grant Olson
                Sorry, I guess screen scrape might be an American term. But yes, this isn t specific to OCaml. Robert pretty much summed it up, you issue a command and
                Message 7 of 13 , Mar 12, 2007
                  Sorry, I guess 'screen scrape' might be an American term. But yes, this
                  isn't specific to OCaml.



                  Robert pretty much summed it up, you issue a command and process the
                  returned text content to get the information you're looking for. Opinions
                  vary as to whether this is a good way to do things. In some ways it fits in
                  with the Unix tradition of processing info by piping text output into the
                  input of another program like grep or awk or sed. But like Unix piping, you
                  can suddenly have your program break when you do a system update or move to
                  another OS or do something else that changes an expected input. Sometimes
                  it's the ONLY way to get what you need.



                  For something quick screen scraping may be easier than reverse engineering
                  or writing a library for locatedb, if you're trying to get some simple
                  information. But if you're going to be doing significant work with the
                  database, it is probably worth writing a proper access library to manipulate
                  the db file.



                  As usual, wikipedia has more info than you wanted to know. ;-)



                  http://en.wikipedia.org/wiki/Screen_scraping



                  -Grant



                  _____

                  From: ocaml_beginners@yahoogroups.com
                  [mailto:ocaml_beginners@yahoogroups.com] On Behalf Of Fabrice Marchant
                  Sent: Monday, March 12, 2007 7:18 PM
                  To: ocaml_beginners@yahoogroups.com
                  Subject: Re: "ocaml_beginners"::[] Unix "locate" database access ?



                  Thanks Robert for your abstract but useful explanations.

                  I discover this way of working. There is no reason though to be specific to
                  OCaml.

                  Regards

                  Fabrice



                  [Non-text portions of this message have been removed]
                • Oliver Bandel
                  ... [...] Unix.open_process_in is what you can use. The term screen scrape, when used for the following,is a misnomer, because you directly read from the
                  Message 8 of 13 , Mar 13, 2007
                    On Mon, Mar 12, 2007 at 11:35:16PM +0100, Fabrice Marchant wrote:
                    > Thanks a lot William !
                    >
                    > > Look at the Unix.open_process* functions <http://caml.inria.fr/pub/
                    > > docs/manual-ocaml/libref/
                    > > Unix.html#6_Highlevelprocessandredirectionmanagement>
                    >
                    > I've seen the doc and can try these process functions now.
                    > But that isn't obvious for me. If you know a "screen scrape" example somewhere...
                    >
                    [...]

                    Unix.open_process_in

                    is what you can use.

                    The term screen scrape, when used for the following,is a misnomer,
                    because you directly read from the process you inderectly invoked via
                    open_process_in:

                    ===============================================================================
                    first:~/Desktop/OCAML-Programmierung-Dokus oliver$ ocaml unix.cma
                    Objective Caml version 3.09.3

                    # open Unix;;
                    # let channel = open_process_in "ls -lt";;
                    val channel : in_channel = <abstr>
                    # while true do print_endline (input_line channel) done ;;
                    total 14216
                    drwxr-xr-x 59 oliver oliver 2006 28 Feb 11:45 OCAML-htmlman-Reference-Manual
                    -rw-r--r-- 1 oliver oliver 124188 21 Feb 19:32 ocamldoc-doku.pdf
                    drwxr-xr-x 4 oliver oliver 136 14 Feb 23:30 style Files
                    -rw-r--r-- 1 oliver oliver 28295 14 Feb 23:30 style.html
                    drwxr-xr-x 5 oliver oliver 170 12 Feb 10:45 Camlp4-Tutorial
                    drwxr-xr-x 8 oliver oliver 272 12 Feb 10:41 OCAMl-Infos-Web-OCamlP3l-und-anderes
                    -rw-r--r-- 1 oliver oliver 415384 4 Feb 07:11 javavsocaml.pdf
                    drwxr-xr-x 3 oliver oliver 102 24 Aug 2006 GoF-DesignPatterns
                    drwxr-xr-x 36 oliver oliver 1224 7 May 2006 OCAML-COCOA
                    drwxr-xr-x 27 oliver oliver 918 7 May 2006 Ocaml--Format-Module
                    drwxr-xr-x 3 oliver oliver 102 7 May 2006 diverses
                    -rw-r--r-- 1 oliver oliver 71171 7 Feb 2006 Ocaml-two-forms-of-LET.pdf
                    -rw-r--r-- 1 oliver oliver 1880564 19 Nov 2005 ocaml-3.09-refman.pdf
                    drwxr-xr-x 5 oliver oliver 170 25 Oct 2005 81bbc08defeb05351c2e0e3164dca32c.en Files
                    -rw-r--r-- 1 oliver oliver 10907 25 Oct 2005 81bbc08defeb05351c2e0e3164dca32c.en.html
                    drwxr-xr-x 5 oliver oliver 170 21 Oct 2005 OCaml-vs-other-Languages
                    drwxr-xr-x 22 oliver oliver 748 21 Oct 2005 OCAML-diverses
                    -rw-r--r-- 1 oliver oliver 21316 15 May 2005 not a bug.html
                    drwxr-xr-x 3 oliver oliver 102 15 May 2005 not a bug_files
                    -rw-r--r-- 1 oliver oliver 100525 15 May 2005 FAQ_EXPERT-eng.html
                    -rw-r--r-- 1 oliver oliver 17976 2 May 2005 Polymorphic-Variants.pdf
                    drwxr-xr-x 4 oliver oliver 136 29 Mar 2005 Ocaml-an-introduction
                    drwxr-xr-x 4 oliver oliver 136 29 Mar 2005 OCaml-concise-introduction
                    drwxr-xr-x 30 oliver oliver 1020 29 Mar 2005 OCAML-Tutorial
                    -rw-r--r-- 1 oliver oliver 360389 29 Mar 2005 book.pdf
                    -rw-r--r-- 1 oliver oliver 302580 27 Feb 2005 chapter1.pdf
                    -rw------- 1 oliver oliver 5096 10 Feb 2005 Ocaml-Extlib.url
                    -rw-r--r-- 1 oliver oliver 188570 22 Dec 2004 ocamllex-tutorial.pdf
                    -rw-r--r-- 1 oliver oliver 458918 22 Dec 2004 ocamlyacc-tutorial.pdf
                    -rw-r--r-- 1 oliver oliver 2875150 20 Dec 2004 ocaml-OReilly-Book.pdf
                    -rw-r--r-- 1 oliver oliver 129871 13 May 2003 recursive-modules-note.pdf
                    -rwxr-xr-x 1 oliver oliver 126386 11 Jan 2003 KOPIE-camlp4-3.06-tutorial.ps.gz
                    -rwxr-xr-x 1 oliver oliver 126386 20 Aug 2002 camlp4-3.06-tutorial.ps.gz
                    Exception: End_of_file.
                    #
                    ===============================================================================

                    open_process_in is the aequivalent to a Unix popen(3) call
                    with a OCaml's channels on top of it.

                    You give that function the command you would type in a shell
                    and get back the stdout of the process - the stuff you
                    normally get to screen, when calling the command from
                    the shell.

                    BTW: after you finished, you have to call close_process_in; I didn't do thatin the
                    above example.

                    Best wishes,
                    Oliver Bandel
                  • Fabrice Marchant
                    Thanks to Martin and Oliver for their code examples I ve experimented with : absolutely perfect ! Oliver said screen scraping was a misnomer. I think so !
                    Message 9 of 13 , Mar 14, 2007
                      Thanks to Martin and Oliver for their code examples I've experimented with : absolutely perfect !

                      Oliver said "screen scraping" was a misnomer. I think so ! Before your explanations, I was close to imaginate we must feed an OCR system with pixel bits...

                      Thanks to Grant for his interesting explanations about "screen scraping", its possible drawbacks and for his advices about the choice of this programming method.

                      The quality of the answers is incredible on this list.

                      Very sorry I only thanks now to these clever answers.

                      Regards
                    Your message has been successfully submitted and would be delivered to recipients shortly.