Loading ...
Sorry, an error occurred while loading the content.

Using the Unix module in portable code

Expand Messages
  • talex5
    Hi, I m looking into rewriting 0install in a compiled language, for better speed and type-safety
    Message 1 of 7 , Jul 8 12:23 PM
    • 0 Attachment
      Hi,

      I'm looking into rewriting 0install in a compiled language, for better speed and type-safety (http://roscidus.com/blog/blog/2013/06/09/choosing-a-python-replacement-for-0install/).

      OCaml seemed to be the best choice, and I've converted about 2000 lines without too much trouble. But distributing the result is proving tricky. In particular, it seems that bytecode that uses the Unix module is not portable between Unix and Windows (the source code is, just not the bytecode):

      http://stackoverflow.com/questions/17315402/how-to-make-ocaml-bytecode-that-works-on-windows-and-linux

      I see from the OCaml source code that there are actually two implementations of the same unix.mli interface: otherlibs/unix/unix.ml and otherlibs/win32unix/unix.ml. Since my program should run on either, I want to include both versions in my executable and choose the correct one at runtime.

      Getting both is easy enough. To test, I defined a portable_unix.ml file with:

      module type UnixTypes =
      sig
      [ contents of unix/unix.mli ]
      end

      module Unix : UnixType =
      struct
      [ contents of unix/unix.ml ]
      end

      module Win32 : UnixType =
      struct
      [ contents of win32unix/unix.ml ]
      end

      That's all fine. But how can I select the right module at runtime? I can see that the last thing the bytecode does is:

      PUSHACC0
      PUSHACC2
      SETGLOBAL Portable_unix

      I want to do something like this instead:

      SETGLOBAL Portable_unix =
      if windows then GETGLOBAL Portable_unix.Win32
      else GETGLOBAL Portable_unix.Unix

      This doesn't need to work for native code; I can just use the preprocessor to compile in the right version there. It only needs to work with bytecode.

      Of course, using classes rather than modules would be cleaner, but then I'd have to change all the code which uses Unix (including third-party libraries), which seems like it would be a lot of work.

      Any suggestions on how to do this, or better ideas? I can distribute a modified ocamlrun if necessary (at least then I only need to provide multiple binaries once, not once for every release of every tool written in OCaml).

      Thanks,
    • Gabriel Scherer
      This looks like a genuinely good use-case of first-class module, a language extension introduced in
      Message 2 of 7 , Jul 8 2:17 PM
      • 0 Attachment
        This looks like a genuinely good use-case of first-class module, a
        language extension introduced in
        http://caml.inria.fr/pub/docs/manual-ocaml/manual021.html#toc81

        Code example below :

        module type S = sig
        type t
        val default : t
        val to_string : t -> string
        end

        module A = struct
        type t = int
        let default = 3
        let to_string = string_of_int
        end

        module B = struct
        type t = bool
        let default = false
        let to_string = string_of_bool
        end

        let () = Random.self_init ()

        module C =
        (val (if Random.bool () then (module A : S) else (module B : S)) : S)

        let () =
        print_endline (C.to_string C.default)


        On Mon, Jul 8, 2013 at 9:23 PM, talex5 <talex5@...> wrote:
        > Hi,
        >
        > I'm looking into rewriting 0install in a compiled language, for better speed and type-safety (http://roscidus.com/blog/blog/2013/06/09/choosing-a-python-replacement-for-0install/).
        >
        > OCaml seemed to be the best choice, and I've converted about 2000 lines without too much trouble. But distributing the result is proving tricky. In particular, it seems that bytecode that uses the Unix module is not portable between Unix and Windows (the source code is, just not the bytecode):
        >
        > http://stackoverflow.com/questions/17315402/how-to-make-ocaml-bytecode-that-works-on-windows-and-linux
        >
        > I see from the OCaml source code that there are actually two implementations of the same unix.mli interface: otherlibs/unix/unix.ml and otherlibs/win32unix/unix.ml. Since my program should run on either, I want to include both versions in my executable and choose the correct one at runtime.
        >
        > Getting both is easy enough. To test, I defined a portable_unix.ml file with:
        >
        > module type UnixTypes =
        > sig
        > [ contents of unix/unix.mli ]
        > end
        >
        > module Unix : UnixType =
        > struct
        > [ contents of unix/unix.ml ]
        > end
        >
        > module Win32 : UnixType =
        > struct
        > [ contents of win32unix/unix.ml ]
        > end
        >
        > That's all fine. But how can I select the right module at runtime? I can see that the last thing the bytecode does is:
        >
        > PUSHACC0
        > PUSHACC2
        > SETGLOBAL Portable_unix
        >
        > I want to do something like this instead:
        >
        > SETGLOBAL Portable_unix =
        > if windows then GETGLOBAL Portable_unix.Win32
        > else GETGLOBAL Portable_unix.Unix
        >
        > This doesn't need to work for native code; I can just use the preprocessor to compile in the right version there. It only needs to work with bytecode.
        >
        > Of course, using classes rather than modules would be cleaner, but then I'd have to change all the code which uses Unix (including third-party libraries), which seems like it would be a lot of work.
        >
        > Any suggestions on how to do this, or better ideas? I can distribute a modified ocamlrun if necessary (at least then I only need to provide multiple binaries once, not once for every release of every tool written in OCaml).
        >
        > Thanks,
        >
        >
        >
        > ------------------------------------
        >
        > Archives up to December 31, 2011 are also downloadable at http://www.connettivo.net/cntprojects/ocaml_beginners
        > The archives of the very official ocaml list (the seniors' one) can be found at http://caml.inria.fr
        > Attachments are banned and you're asked to be polite, avoid flames etc.Yahoo! Groups Links
        >
        >
        >
      • talex5
        Ah, perfect - that looks like just what I need!
        Message 3 of 7 , Jul 9 8:33 AM
        • 0 Attachment
          Ah, perfect - that looks like just what I need!

          --- In ocaml_beginners@yahoogroups.com, Gabriel Scherer <gabriel.scherer@...> wrote:
          >
          > This looks like a genuinely good use-case of first-class module, a
          > language extension introduced in
          > http://caml.inria.fr/pub/docs/manual-ocaml/manual021.html#toc81
          >
          > Code example below :
          >
          > module type S = sig
          > type t
          > val default : t
          > val to_string : t -> string
          > end
          >
          > module A = struct
          > type t = int
          > let default = 3
          > let to_string = string_of_int
          > end
          >
          > module B = struct
          > type t = bool
          > let default = false
          > let to_string = string_of_bool
          > end
          >
          > let () = Random.self_init ()
          >
          > module C =
          > (val (if Random.bool () then (module A : S) else (module B : S)) : S)
          >
          > let () =
          > print_endline (C.to_string C.default)
          >
          >
          > On Mon, Jul 8, 2013 at 9:23 PM, talex5 <talex5@...> wrote:
          > > Hi,
          > >
          > > I'm looking into rewriting 0install in a compiled language, for better speed and type-safety (http://roscidus.com/blog/blog/2013/06/09/choosing-a-python-replacement-for-0install/).
          > >
          > > OCaml seemed to be the best choice, and I've converted about 2000 lines without too much trouble. But distributing the result is proving tricky. In particular, it seems that bytecode that uses the Unix module is not portable between Unix and Windows (the source code is, just not the bytecode):
          > >
          > > http://stackoverflow.com/questions/17315402/how-to-make-ocaml-bytecode-that-works-on-windows-and-linux
          > >
          > > I see from the OCaml source code that there are actually two implementations of the same unix.mli interface: otherlibs/unix/unix.ml and otherlibs/win32unix/unix.ml. Since my program should run on either, I want to include both versions in my executable and choose the correct one at runtime.
          > >
          > > Getting both is easy enough. To test, I defined a portable_unix.ml file with:
          > >
          > > module type UnixTypes =
          > > sig
          > > [ contents of unix/unix.mli ]
          > > end
          > >
          > > module Unix : UnixType =
          > > struct
          > > [ contents of unix/unix.ml ]
          > > end
          > >
          > > module Win32 : UnixType =
          > > struct
          > > [ contents of win32unix/unix.ml ]
          > > end
          > >
          > > That's all fine. But how can I select the right module at runtime? I can see that the last thing the bytecode does is:
          > >
          > > PUSHACC0
          > > PUSHACC2
          > > SETGLOBAL Portable_unix
          > >
          > > I want to do something like this instead:
          > >
          > > SETGLOBAL Portable_unix =
          > > if windows then GETGLOBAL Portable_unix.Win32
          > > else GETGLOBAL Portable_unix.Unix
          > >
          > > This doesn't need to work for native code; I can just use the preprocessor to compile in the right version there. It only needs to work with bytecode.
          > >
          > > Of course, using classes rather than modules would be cleaner, but then I'd have to change all the code which uses Unix (including third-party libraries), which seems like it would be a lot of work.
          > >
          > > Any suggestions on how to do this, or better ideas? I can distribute a modified ocamlrun if necessary (at least then I only need to provide multiple binaries once, not once for every release of every tool written in OCaml).
          > >
          > > Thanks,
        • Gerd Stolpmann
          ... This phenomenon is not limited to the unix library. In the ocaml distribution the same problem occurs also for threads, and if you are using third-party
          Message 4 of 7 , Jul 9 11:17 AM
          • 0 Attachment
            Am Montag, den 08.07.2013, 19:23 +0000 schrieb talex5:

            > In particular, it seems that bytecode that uses the Unix module is not
            > portable between Unix and Windows (the source code is, just not the
            > bytecode):
            >
            > http://stackoverflow.com/questions/17315402/how-to-make-ocaml-bytecode-that-works-on-windows-and-linux
            >
            > I see from the OCaml source code that there are actually two
            > implementations of the same unix.mli interface: otherlibs/unix/unix.ml
            > and otherlibs/win32unix/unix.ml. Since my program should run on
            > either, I want to include both versions in my executable and choose
            > the correct one at runtime.

            This phenomenon is not limited to the unix library. In the ocaml
            distribution the same problem occurs also for threads, and if you are
            using third-party libraries, you'll discover more implementations that
            vary with the platform (not only unix vs. windows, but also linux vs.
            bsd, 32 vs. 64 bits, etc.).

            A relatively easy workaround is not to link the bytecode files to an
            executable, but leave them as cma's, and just load them into a script.
            That means compile all your stuff so you get mylib.cma, and then create
            (or generate) a script like

            #!/usr/bin/env ocaml
            #directory "/where/my/stuff/is/installed";;
            #load "unix.cma";;
            #load "mylib.cma";;

            (For Windows, provide a .bat driver side-by-side.)

            Note that the loaded modules are executed by #load, so just be sure you
            link your main program as last module into mylib.cma (or have a separate
            mymain.cmo), and explicitly call "exit" to terminate the program
            (otherwise you'll end up in an interactive top loop after your program
            is done).

            Gerd

            >
            > Getting both is easy enough. To test, I defined a portable_unix.ml
            > file with:
            >
            > module type UnixTypes =
            > sig
            > [ contents of unix/unix.mli ]
            > end
            >
            > module Unix : UnixType =
            > struct
            > [ contents of unix/unix.ml ]
            > end
            >
            > module Win32 : UnixType =
            > struct
            > [ contents of win32unix/unix.ml ]
            > end
            >
            > That's all fine. But how can I select the right module at runtime? I
            > can see that the last thing the bytecode does is:
            >
            > PUSHACC0
            > PUSHACC2
            > SETGLOBAL Portable_unix
            >
            > I want to do something like this instead:
            >
            > SETGLOBAL Portable_unix =
            > if windows then GETGLOBAL Portable_unix.Win32
            > else GETGLOBAL Portable_unix.Unix
            >
            > This doesn't need to work for native code; I can just use the
            > preprocessor to compile in the right version there. It only needs to
            > work with bytecode.
            >
            > Of course, using classes rather than modules would be cleaner, but
            > then I'd have to change all the code which uses Unix (including
            > third-party libraries), which seems like it would be a lot of work.
            >
            > Any suggestions on how to do this, or better ideas? I can distribute a
            > modified ocamlrun if necessary (at least then I only need to provide
            > multiple binaries once, not once for every release of every tool
            > written in OCaml).
            >
            > Thanks,
            >
            >
            >
            >
            >

            --
            ------------------------------------------------------------
            Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany
            gerd@... http://www.gerd-stolpmann.de
            Phone: +49-6151-153855 Fax: +49-6151-997714
            ------------------------------------------------------------
          • talex5
            ... I tried compiling my program (which uses Unix, Xmlm and Yojson libraries) on a 32-bit Ubuntu system and then ran the bytecode on a 64-bit Arch system and
            Message 5 of 7 , Jul 9 1:41 PM
            • 0 Attachment
              --- In ocaml_beginners@yahoogroups.com, Gerd Stolpmann <gerd@...> wrote:
              >
              > Am Montag, den 08.07.2013, 19:23 +0000 schrieb talex5:
              >
              > > In particular, it seems that bytecode that uses the Unix module is not
              > > portable between Unix and Windows (the source code is, just not the
              > > bytecode):
              > >
              > > http://stackoverflow.com/questions/17315402/how-to-make-ocaml-bytecode-that-works-on-windows-and-linux
              > >
              > > I see from the OCaml source code that there are actually two
              > > implementations of the same unix.mli interface: otherlibs/unix/unix.ml
              > > and otherlibs/win32unix/unix.ml. Since my program should run on
              > > either, I want to include both versions in my executable and choose
              > > the correct one at runtime.
              >
              > This phenomenon is not limited to the unix library. In the ocaml
              > distribution the same problem occurs also for threads, and if you are
              > using third-party libraries, you'll discover more implementations that
              > vary with the platform (not only unix vs. windows, but also linux vs.
              > bsd, 32 vs. 64 bits, etc.).

              I tried compiling my program (which uses Unix, Xmlm and Yojson libraries) on a 32-bit Ubuntu system and then ran the bytecode on a 64-bit Arch system and it worked. What kind of problems are likely to occur?

              > A relatively easy workaround is not to link the bytecode files to an
              > executable, but leave them as cma's, and just load them into a script.
              > That means compile all your stuff so you get mylib.cma, and then create
              > (or generate) a script like
              >
              > #!/usr/bin/env ocaml
              > #directory "/where/my/stuff/is/installed";;
              > #load "unix.cma";;
              > #load "mylib.cma";;
              >
              > (For Windows, provide a .bat driver side-by-side.)

              (no need for a .bat script; 0install provides a cross-platform replacement for #! lines)

              But I don't see how this would help. Either unix.cma is shipped with my program (in which case it's the unix.cma for the build system, not for the platform running the program), or it's the unix.cma shipped with the platform, in which case the hashes won't match.

              As far as I can tell, when you compile something like:

              Foo.bar ()

              the name "bar" isn't included in the bytecode. Instead, you get something like "call function#3 from Foo". OCaml prevents that from calling the wrong function by refusing to link if anything in the interface has changed ("inconsistent assumptions over interface").

              Has the cma trick worked for you?
            • Gerd Stolpmann
              ... E.g. you cannot load an int outside the supported range. Some libraries have such issues, although rare (e.g. they want to pack as many bits as possible
              Message 6 of 7 , Jul 9 4:23 PM
              • 0 Attachment
                Am Dienstag, den 09.07.2013, 20:41 +0000 schrieb talex5:
                >
                > --- In ocaml_beginners@yahoogroups.com, Gerd Stolpmann <gerd@...>
                > wrote:
                > >
                > > Am Montag, den 08.07.2013, 19:23 +0000 schrieb talex5:
                > >
                > > > In particular, it seems that bytecode that uses the Unix module is
                > not
                > > > portable between Unix and Windows (the source code is, just not
                > the
                > > > bytecode):
                > > >
                > > >
                > http://stackoverflow.com/questions/17315402/how-to-make-ocaml-bytecode-that-works-on-windows-and-linux
                > > >
                > > > I see from the OCaml source code that there are actually two
                > > > implementations of the same unix.mli interface:
                > otherlibs/unix/unix.ml
                > > > and otherlibs/win32unix/unix.ml. Since my program should run on
                > > > either, I want to include both versions in my executable and
                > choose
                > > > the correct one at runtime.
                > >
                > > This phenomenon is not limited to the unix library. In the ocaml
                > > distribution the same problem occurs also for threads, and if you
                > are
                > > using third-party libraries, you'll discover more implementations
                > that
                > > vary with the platform (not only unix vs. windows, but also linux
                > vs.
                > > bsd, 32 vs. 64 bits, etc.).
                >
                > I tried compiling my program (which uses Unix, Xmlm and Yojson
                > libraries) on a 32-bit Ubuntu system and then ran the bytecode on a
                > 64-bit Arch system and it worked. What kind of problems are likely to
                > occur?

                E.g. you cannot load an int outside the supported range. Some libraries
                have such issues, although rare (e.g. they want to pack as many bits as
                possible into an int, and make compile-time optimizations). Here is an
                extreme example where even the representation is different (well, it's
                from me, but other developers have probably had similar ideas):

                http://docs.camlcity.org/docs/godisrc/ocamlnet-3.6.5.tar.gz/ocamlnet-3.6.5/src/netstring/netnumber.mlp

                I'm currently not even sure whether the 32 bit version would run on 64
                bits.
                >
                > > A relatively easy workaround is not to link the bytecode files to an
                > > executable, but leave them as cma's, and just load them into a
                > script.
                > > That means compile all your stuff so you get mylib.cma, and then
                > create
                > > (or generate) a script like
                > >
                > > #!/usr/bin/env ocaml
                > > #directory "/where/my/stuff/is/installed";;
                > > #load "unix.cma";;
                > > #load "mylib.cma";;
                > >
                > > (For Windows, provide a .bat driver side-by-side.)
                >
                > (no need for a .bat script; 0install provides a cross-platform
                > replacement for #! lines)
                >
                > But I don't see how this would help. Either unix.cma is shipped with
                > my program (in which case it's the unix.cma for the build system, not
                > for the platform running the program), or it's the unix.cma shipped
                > with the platform, in which case the hashes won't match.

                Right. But on the other hand, there is absolutely no guarantee that
                bytecode produced with one version of ocaml can be run on other versions
                of ocaml. E.g. the representation of lazy values changed several times.
                Recently the hash algorithm was changed. You would have to go through
                all changes to ensure that your bytecode is still compatible.

                > As far as I can tell, when you compile something like:
                >
                > Foo.bar ()
                >
                > the name "bar" isn't included in the bytecode.

                You just access the n-th value of the module, so the name turns into a
                number.

                > Instead, you get something like "call function#3 from Foo". OCaml
                > prevents that from calling the wrong function by refusing to link if
                > anything in the interface has changed ("inconsistent assumptions over
                > interface").

                Yes, you can view the hashes with ocamlobjinfo.

                > Has the cma trick worked for you?

                I'm actually not using it anymore - many years ago I used that for
                running ocaml on a machine I did not have a compile environment for
                (essentially, no shell access). In this case, I was able to create a
                compatible environment with exactly the same ocaml version on another
                machine.

                I don't think it is possible to produce universally runnable bytecode
                executables without shipping the version of ocamlrun together with the
                bytecode. It is just not made for this use case - don't mix this up with
                strictly controlled environments like jvm.

                Gerd

                --
                ------------------------------------------------------------
                Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany
                gerd@... http://www.gerd-stolpmann.de
                Phone: +49-6151-153855 Fax: +49-6151-997714
                ------------------------------------------------------------
              • talex5
                ... [...] ... OK, I ve done some tests and your solution seems to work well:
                Message 7 of 7 , Jul 11 8:52 AM
                • 0 Attachment
                  --- In ocaml_beginners@yahoogroups.com, Gerd Stolpmann <gerd@...> wrote:
                  >
                  > Am Dienstag, den 09.07.2013, 20:41 +0000 schrieb talex5:
                  > >
                  > > --- In ocaml_beginners@yahoogroups.com, Gerd Stolpmann <gerd@>
                  > > wrote:
                  > > >
                  > > > Am Montag, den 08.07.2013, 19:23 +0000 schrieb talex5:
                  > > >
                  > > > > In particular, it seems that bytecode that uses the Unix module is
                  > > not
                  > > > > portable between Unix and Windows (the source code is, just not
                  > > the
                  > > > > bytecode):
                  > > > >
                  > > > >
                  > > http://stackoverflow.com/questions/17315402/how-to-make-ocaml-bytecode-that-works-on-windows-and-linux
                  [...]
                  > > > A relatively easy workaround is not to link the bytecode files to an
                  > > > executable, but leave them as cma's, and just load them into a
                  > > script.
                  > > > That means compile all your stuff so you get mylib.cma, and then
                  > > create
                  > > > (or generate) a script like
                  > > >
                  > > > #!/usr/bin/env ocaml
                  > > > #directory "/where/my/stuff/is/installed";;
                  > > > #load "unix.cma";;
                  > > > #load "mylib.cma";;
                  > > >
                  > > > (For Windows, provide a .bat driver side-by-side.)
                  > >
                  > > (no need for a .bat script; 0install provides a cross-platform
                  > > replacement for #! lines)
                  > >
                  > > But I don't see how this would help. Either unix.cma is shipped with
                  > > my program (in which case it's the unix.cma for the build system, not
                  > > for the platform running the program), or it's the unix.cma shipped
                  > > with the platform, in which case the hashes won't match.
                  >
                  > Right. But on the other hand, there is absolutely no guarantee that
                  > bytecode produced with one version of ocaml can be run on other versions
                  > of ocaml. E.g. the representation of lazy values changed several times.
                  > Recently the hash algorithm was changed. You would have to go through
                  > all changes to ensure that your bytecode is still compatible.

                  OK, I've done some tests and your solution seems to work well:

                  http://roscidus.com/blog/blog/2013/07/07/ocaml-binary-compatibility/#windows--linux-compatibility

                  I was able to compile 0install to bytecode on Linux, and then run that bytecode on both Windows and Linux.

                  [...]
                  > I don't think it is possible to produce universally runnable bytecode
                  > executables without shipping the version of ocamlrun together with the
                  > bytecode. It is just not made for this use case - don't mix this up with
                  > strictly controlled environments like jvm.

                  OK, this is what I'll do to start with. OCaml changes slowly enough that we can probably stick with a single version for a few years.

                  Hopefully some kind of dynamic linker can be added later to allow a bit more flexibility. I guess changes to e.g the hash algorithm wouldn't affect other code, since the interface hasn't changed. Unless ocamlc saves pre-built hash-tables in the binaries or something? But isn't it designed to allow implementation changes, just as long as the interface stays the same?

                  Thanks!
                Your message has been successfully submitted and would be delivered to recipients shortly.