Re: How fast is your GCD code? Here's mine...
- From: WarrenS
> Yes, as per Brennan's bug fix, theI think Bob S posted his highly tuned code to sci.math, or just possibly the Mersenne Forum, about 5 years ago. It had special cases for removing small multiples of the smaller number, which looked like it would annoy branch prediction units, which in theory he should have been taking into account. (I was on an Alpha in those days, and branch prediction was everything, so I was more hypersensitive to such bubbles. However, perhaps the code was good because the ifs were actually quite predictable; you'd need to try it out on a modern machine.)
> ctz's should be ctzll's.
> That also speeds it up to 139 nanosec.
- From: Jack Brennen <jfb@...>
> Apologies once again, but I don'tA noble aim. So why did you overlook this:
> want to leave broken code out there...
> >> int64 d;Which is classic UB.
> >> d&= d>>63; //where 63+1=wordsize of uint64s
- From: WarrenS <warren.wds@...>
> > > b -= d+d+a;It's a sparse enough inner loop that I can easily imagine the increased dependency makes it slower. What's the latency of an add nowadays? I know it's crept up to about 6 in the past decade (at least on the SIMD units). Something like that's a huge bubble, and should definitely be avoided.
> > > a += d; //the obvious "optimization" of this
> & previous line... makes it slower!
> > > (...)
> > I can't imagine that
> > a += d ; b -= d+a
> > would be slower.
> --it is slower! On my computer, anyhow.