Re: "Stackless" implementation methods
- --- In firstname.lastname@example.org, Kyle Hayes <kyle@s...> wrote:
> On Sunday 28 March 2004 13:48, Chuck Adams wrote:what
> > Jason Evans wrote:
> > > So, stackless implementations are almost always slower as a
> > > result of this impedance mismatch, right?
> > Slower in some ways perhaps, stack slicing always creates some
> > overhead, and heap allocation of frames just busts cache all
> > around. On the other hand, it's faster to switch tasklets in
> > stackless than it is to call a method, so it really depends on
> > you want to call "slower".part
> I think Chuck answered this well. It is not that it is necessarily
> slower, but that things are different. Most of the implementations
> I've seen are definitely slower. But, those tend to be in
> interpreted languages anyway so the difference in speed on this
> might be overshadowed by other factors.generational
> One thing to think about with the cache problem is that a
> collector will tend to put the most accessed objects into acourse,
> relatively small area. So, with modern caches now regularly around
> 1MB for L2, I don't see this as the problem it once was. Of
> the extremely simple access patterns of a stack make excellentmy 'squirrel' is stackless. I switched from 'stackful' to 'stackless'
> caching easy.
> As with everything, a sufficient supply of elbow grease and
> inspiration will make even stackless systems fast.
> P.S. does stackless Python really munge the C stack? Weird. That
> sounds dangerous.
while I was prototyping. I personally haven't noticed any performance
hit (and I'm very careful about that), the only thing is that I
gained a lot of flexibility.
My approach consist in having a single stack per coroutine(I call
coroutines cooperative threads) where I store the calls frames;
variables and virtual registers(squirrel is reg based).
Squirrel has both generators and coroutines.
Generators can only yield a single call(no nested calls) so I
preallocate a buffer as big as their stack frame. When a they get
yielded I jut memcopy the frame from the stack to this buffer and
memset(0) the frma in the main stack. No allocation occurs during the
execution and ususally a call frame is about 5 to 10 stack positions
so is small(and fast).
For coroutines I allocate a new stack that grows independently from
the main one, but this was an implementation coiche(because was for
free, 10 lines of additional code), I could use the same approach
that I use for generators.
My advise is to go stackless if you can, there isn't any particular
problem compared to the C stack based approach and it opens a lot of
Steve: I took a look at Io's coroutines and I'm having hard time
understanding what's is going on.
Could you explain me what all the setjump() hacking is all about? I'm
The programming language Squirrel
- This doesn't bear directly on what you are asking, but I think it's
an interesting aside to note that the language, ToonTalk, does not
have any built-in stack. There is no native subroutine call
facility. On the other hand, it has built-in queues.