Loading ...
Sorry, an error occurred while loading the content.

Re: non-traditional use of threads

Expand Messages
  • Mike Austin
    What if the runtime system profiled the program to see which functions could be made asynchronous? For example, x = longCalculation() y = shortCalculation() If
    Message 1 of 7 , Dec 1, 2003
    • 0 Attachment
      What if the runtime system profiled the program to see which
      functions could be made asynchronous?

      For example,

      x = longCalculation()
      y = shortCalculation()

      If the average of longCalculation() was greater than any other calls,
      it would start making them asynchronous. The order of operations in
      source does matter -- if you call your time consuming functions
      first, there is a better chance of parallelism. Or at least I would
      think so.

      -- Mike

      --- In langsmiths@yahoogroups.com, Kyle Hayes <kyle@s...> wrote:
      >
      > <delurk>
      >
      > Intel did some interesting research using HT machines to speed up
      > single-threaded code. The second thread 'pipe' is used to do
      things like
      > localized prefetching, cache heating, precalculation etc.
      Apparently they
      > got some fairly good results from it. I am frantically trying to
      find the
      > papers now on CiteSeer without success. Anyone have a link?
      >
      > While not the "typical" use of threading, this kind of stuff was
      surprizingly
      > good for single threaded apps. As most processors seem to be
      heading in the
      > HT or multi-core direction, this might make more and more sense in
      a
      > language.
      >
      > Even in languages that have great parallel support (Occam for
      instance), many
      > people find it hard to program in such a way to efficiently use
      multiple
      > parallel resources. For some classes of problem parallel programs
      are fairly
      > easy to do and very efficient (many numeric array processing
      problems fall
      > into this category). For others, it is a lot harder. I have yet
      to see any
      > parallel paradigms that seem to be easy (read no-thought-required)
      to use and
      > program for non-trivial cases of parallelism. Some OO stuff with
      implicit
      > async messages and futures _might_ be one way.
      >
      > For some interesting, if somewhat dated, notes on this, look at the
      TimeWarp
      > OS from JPL. It was a system for "extracting concurrency from
      parallel
      > simulations". Fully buzzword compliant. However, one of the most
      > fascinating things about it was the "eager" parallelism. Different
      > processors would simply run ahead with their parts of the
      simulation as fast
      > as possible and then roll back when they received messages in
      the "past". A
      > language that did this could make parallel code much easier to
      write and
      > possibly more efficient. We did see super-linear speed-ups more
      often than
      > we thought we would in TimeWarp. (I was a college student and did
      some work
      > on simulations designed to test TimeWarp back in the 1980's.)
      >
      > Best,
      > Kyle
      >
      > </delurk>
    • Kyle Hayes
      ... The Intel stuff tended to be lower level than that. Cache prefetching, some address precalc and that sort of thing is all I remember. I wish I could find
      Message 2 of 7 , Dec 1, 2003
      • 0 Attachment
        On Monday 01 December 2003 11:22, Mike Austin wrote:
        > What if the runtime system profiled the program to see which
        > functions could be made asynchronous?
        >
        > For example,
        >
        > x = longCalculation()
        > y = shortCalculation()
        >
        > If the average of longCalculation() was greater than any other calls,
        > it would start making them asynchronous. The order of operations in
        > source does matter -- if you call your time consuming functions
        > first, there is a better chance of parallelism. Or at least I would
        > think so.

        The Intel stuff tended to be lower level than that. Cache prefetching, some
        address precalc and that sort of thing is all I remember. I wish I could
        find the links...

        They also did some conditional unfolding (executing both paths of an if for
        instance) I think.

        The idea was to see if the basic concepts behind predicated execution a la ARM
        or IA64 would work when applied to systems with HT. They found that it did
        and worked rather well. Instead of one or two instructions, a small number
        of instructions could be executed "out of band" via the other virtual
        processor. Since these instructions tended to be located near the ones
        currently executing on the main virtual processor, the cache effects were
        quite minimal (HT shares most of the resources of the processor except
        registers).

        This isn't a language specific thing, but if a language made this easy to do
        under the hood I think it would be quite interesting to try.

        Best,
        Kyle
      Your message has been successfully submitted and would be delivered to recipients shortly.