- --- In langsmiths@yahoogroups.com, Marcin 'Qrczak' Kowalczyk

<qrczak@k...> wrote:> W li¶cie z nie, 01-08-2004, godz. 02:56 +0000, cr88192 napisa³:

times

>

> > I would probably just need to call rand though (or maybe a few

> > so I have a little wider range of bits).

well, then you have good random numbers (actually, similar is if you

>

> I use Mersenne Twister

> <http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html>

> This is what Python and Ruby use.

>

use linux...).

rand on cygwin is crappy, but it is close enough to random (err, umn,

even if it does quite often generate the same sequence of numbers...).

> > don't have lazy lists.

this is what they are, but a reason would be why might want to

> > I have considered them, but a strong reason to implement them is

> > still absent.

>

> A lazy list allows to process a long list incrementally as it is

> produced, without keeping the whole list in memory at once.

>

implement them.

though they seem cool, I don't know of any real use for them...

> > was also writing stuff for bignums (and finally now understand

compared to

> > division...).

>

> I use GMP <http://www.swox.com/gmp/>. AFAIK it's quite fast

> others. It uses some non-obvious algorithms when appropriate, e.g.

numbers,

> Karatsuba multiplication and FFT for multiplying really huge

> various tricks for division etc. (all documented). It uses

specialized

> assembly for various processors for critical operations.

yes.

>

mine is probably slow, oh well...

mine uses pretty basic algos for the most part.

> > someone I know was claimimg that the 'short' type was non-

standard.

>

so.

> It is standard in all versions of C and C++ in the last 20 years or

>

yes, it seems so.

I don't know why that dude was claiming otherwise...

yes, the size is variable, but so is the size of everything else, it

just means a little tweaking will be needed for an arch where sizeof

(short)!=2, but afaik on most major architectures anymore sizeof

(short)=2...

> > on a 32 bit system anyways, short is 16 bit, int is 32 bit, and

long

> > is still 32 bit. I can't use int for bignums, since I could not

assuming I have 2 values I want to multiply, for example:

> > preserve overflow.

>

> I'm not sure if I understand.

>

int a, b, c;

...

c=a*b;

a*b may be larger than that which can be represented in c, in which

case it will be cut off (this is called overflow).

to implement bignums, I have to preserve any bits that overflow,

which means either:

a and b have to be in the range of -32768..32767 to make sure an

overflow is not possible;

a, b, and c need to be long long's, with a and be in the range -

2147483648..2147483647;

I have to write the function in assembler;

...

I opted for using shorts for the individual parts of the number, and

using int's for the intermediate values.

the bignums are represented internally as length-prefixed arrays of

unsigned shorts in low-high order, which seemed most convinient (I

can append or crop off high-order bits as needed and such).

> GMP uses the same internal representation for all numbers, with a

separate

> separately allocated array of "limbs" (digits in some large base,

> usually a limb is a C long). It's thus more efficient to use a

> representation for small numbers (I use 31 or 64 bit ints tagged in

the

> least significant bit). GMP provides various functions working on

big

> ints mixed with small ints.

yes.

>

> In this setting you have to detect overflow for various operations

to

> know when to switch to big ints if arguments are small ints but the

int

> result doesn't fit. It's also wise to check whether some resulting

> big int can be represented in a small int, so integers have unique

> representations. Of course GMP takes care of overflows during big

> computation (carry propagation).

ok.

>

> The detection of overflow of small ints might be conservative: you

don't

> have to detect all cases, because you always have a slower option to

small

> perform the computation on big ints and check whether it fits in a

> int afterwards.

yes.

>

> The method of overflow checking I use:

was

> - For addition: if ((~(x ^ y) & (x_plus_y ^ x)) >= 0) then there was

> no overflow.

> - For subtraction: if (((x ^ y) & (x_minus_y ^ x)) >= 0) then there

> no overflow.

inline

> In both of the above it would be a bit more efficient to use

> assembly, because C doesn't provide direct access to add-with-

carry.

> I didn't bother.

multiplication

> - For multiplication: if an integer type of the size of twice of my

> small int is available (e.g. long long), I perform the

> in this type and check the range of the result (on x86 GCC is

smart

> enough to use 32bit × 32bit -> 64bit multiplication). If it's not

small

> available, I check for arguments being in the safe range of using

> only half of bits, which is only a conservative approximation.

> OCaml counts bits and checks whether the sum of bit counts is

> enough - this is more accurate but slower.

only

> - For negation, integer division, absolute value, GCD - there are

> edge cases dealing with the most negative small int.

small

> - Integer remainder, bitwise and, or, xor, not, bit shift right -

> don't overflow.

> - Bit shift left - if an integer type of the size of twice of my

> int is available, and the shift is small enough, I use this type

and

> check the range of the result. Otherwise I just use big ints.

both

> - Power - I use big ints and check the result afterwards even if

> arguments are small ints (GMP provides a function for raising a

result in

> C unsigned long to the power of a C unsigned long, with the

> a big int).

Haskell

>

> You can see what other language implementations do, e.g. Glasgow

> Compiler, OCaml, or some Lisps/Schemes.

I have an idea allready how to do it.

>

I was mostly talking about writing the code for working with

bignums...

> > however, I am unsure how exactly to approach printing a bignum in

yes.

> > base 10 efficiently.

>

> GMP provides a function for that (for any base 2..36).

>

I am just thinking recursive division could be slow, since, eg, I

will have to do a fair number of them to extract all the digits.

> > (I am also endlessly running into the little issues related to

use of

> > 2's complement math for everything, probably keeping everything

oh

> > positive and using a sign bit would have been less of a hassle,

> > well...).

i.e.

>

> It depends. Small ints are probably better kept in a native format,

> 2's complement. GMP uses sign-magnitude internally.

ok.

>

I used two's complement for the internal workings of the bignums.

this means that annoyingly often I have to check if the value is

negative, in which case I negate it, and change it back when done.

when padding out numbers or doing many operations, it is necessary to

check whether the pad value should be all 0 or all 1, which is a

hassle. similarly, there are lots of other sign-related issues...

with flags I would just have to twiddle them occasionally, or

occasionally vary the operations depending on the signs of the

inputs, but not much else (I could just assume everything were

positive).

of well, too late now probably, it is only a minor hassle.

flags or other such features would probably also complicate my length-

prefixed array of shorts style design anyways... - W liście z nie, 01-08-2004, godz. 10:09 +0000, cr88192 napisał:

> yes, the size is variable, but so is the size of everything else, it

There is a guarantee that short holds at least -32767..32767. Usually it

> just means a little tweaking will be needed for an arch where sizeof

> (short)!=2, but afaik on most major architectures anymore sizeof

> (short)=2...

holds -32768..32767. There are exotic architectures where it has 32 bits

because they don't provide efficient arithmetic on smaller sizes.

> to implement bignums, I have to preserve any bits that overflow,

GMP usually uses limbs of the size of the native word, so it must find

> which means either:

> a and b have to be in the range of -32768..32767 to make sure an

> overflow is not possible;

> a, b, and c need to be long long's, with a and be in the range -

> 2147483648..2147483647;

> I have to write the function in assembler;

a way to multiply them into a number composed of two parts of that size.

It does it either in assembler for a plethora of architectures it knows

about, or as a portable fallback by splitting each number into halves

and multiplying them separately.

There is no need to reinvent this. There is a huge amount of people's

knowledge about mathematics and about various processors put in GMP,

lots of tricks and parameters tried and tested and tuned over years.

It is there to use it.

> > GMP provides a function for that (for any base 2..36).

http://www.swox.com/gmp/manual/Binary-to-Radix.html

>

> yes.

> I am just thinking recursive division could be slow, since, eg, I

> will have to do a fair number of them to extract all the digits.

Really, there is no need to reinvent it, just use it.

--

__("< Marcin Kowalczyk

\__/ qrczak@...

^^ http://qrnik.knm.org.pl/~qrczak/ - --- In langsmiths@yahoogroups.com, Marcin 'Qrczak' Kowalczyk

<qrczak@k...> wrote:> W li¶cie z nie, 01-08-2004, godz. 10:09 +0000, cr88192 napisa³:

it

>

> > yes, the size is variable, but so is the size of everything else,

> > just means a little tweaking will be needed for an arch where

sizeof

> > (short)!=2, but afaik on most major architectures anymore sizeof

Usually it

> > (short)=2...

>

> There is a guarantee that short holds at least -32767..32767.

> holds -32768..32767. There are exotic architectures where it has 32

bits

> because they don't provide efficient arithmetic on smaller sizes.

ok.

>

> > to implement bignums, I have to preserve any bits that overflow,

find

> > which means either:

> > a and b have to be in the range of -32768..32767 to make sure an

> > overflow is not possible;

> > a, b, and c need to be long long's, with a and be in the range -

> > 2147483648..2147483647;

> > I have to write the function in assembler;

>

> GMP usually uses limbs of the size of the native word, so it must

> a way to multiply them into a number composed of two parts of that

size.

> It does it either in assembler for a plethora of architectures it

knows

> about, or as a portable fallback by splitting each number into

halves

> and multiplying them separately.

yes, either is possible.

>

actually, given the fact that x86 is little endian, I can also treat

pairs of shorts as ints as well, though there would be issues here

(possibility of an odd length count, or that in most cases it will

not be properly aligned, but this could be fixed easy enough...).

> There is no need to reinvent this. There is a huge amount of

people's

> knowledge about mathematics and about various processors put in GMP,

well, I am still one of those "write everything myself" type people.

> lots of tricks and parameters tried and tested and tuned over years.

> It is there to use it.

>

> > > GMP provides a function for that (for any base 2..36).

ok.

> >

> > yes.

> > I am just thinking recursive division could be slow, since, eg, I

> > will have to do a fair number of them to extract all the digits.

>

> http://www.swox.com/gmp/manual/Binary-to-Radix.html

> Really, there is no need to reinvent it, just use it.

>

the idea of multiplying to extract digits is interesting. I would

need to figure how to approach this.

probably what could work would be a fixed length/position multiply by

integer type function which extracts any overflow values.

this could be ok...

all for now. - W liście z nie, 01-08-2004, godz. 12:55 +0000, cr88192 napisał:

> well, I am still one of those "write everything myself" type people.

I know - that's why I keep mentioning things which are already done,

and done well, ready to be used :-)

Why do you write everything yourself?

I try to reinvent only things which can't be easily reused. That is,

when either the design space is large and nobody did it the way I wanted

it to be done (e.g. the language itself), or when external code for

a particular purpose is hard to integrate for technical reasons (e.g.

garbage collector), or to avoid dependency on a heavy and unpopular

external library if a lightweight solution is wanted (e.g. Unicode

stuff).

For generating executables I rely on GCC. This solves a huge portability

problem.

When I don't reuse code, I reuse ideas and algorithms. This is what

open source is about. This is actually more common:

- borrowed a nice implementation of merge sort from OCaml;

- borrowed a representation of hash tables from Python;

- borrowed the basic algorithm of garbage collection from Qish;

- translated Parsec parser combinator library from Haskell;

- googled for an implementation of rationalize for Scheme and

translated it to Kogut;

- some integer overflow handling code were looked at GHC and OCaml.

--

__("< Marcin Kowalczyk

\__/ qrczak@...

^^ http://qrnik.knm.org.pl/~qrczak/ - --- In langsmiths@yahoogroups.com, Marcin 'Qrczak' Kowalczyk

<qrczak@k...> wrote:> W li¶cie z nie, 01-08-2004, godz. 12:55 +0000, cr88192 napisa³:

people.

>

> > well, I am still one of those "write everything myself" type

>

usually because this is how I typically approach things.

> I know - that's why I keep mentioning things which are already done,

> and done well, ready to be used :-)

>

> Why do you write everything yourself?

>

I don't like lib dependencies, and I am not that fond of using other

peoples' code. I am not sure why exactly, but this is typically how I

operate...

> I try to reinvent only things which can't be easily reused. That is,

wanted

> when either the design space is large and nobody did it the way I

> it to be done (e.g. the language itself), or when external code for

(e.g.

> a particular purpose is hard to integrate for technical reasons

> garbage collector), or to avoid dependency on a heavy and unpopular

ok.

> external library if a lightweight solution is wanted (e.g. Unicode

> stuff).

>

> For generating executables I rely on GCC. This solves a huge

portability

> problem.

yes, similar here actually.

>

I don't write the compiler since it allready exists and is a seperate

entity.

> When I don't reuse code, I reuse ideas and algorithms. This is what

ok.

> open source is about. This is actually more common:

> - borrowed a nice implementation of merge sort from OCaml;

> - borrowed a representation of hash tables from Python;

> - borrowed the basic algorithm of garbage collection from Qish;

> - translated Parsec parser combinator library from Haskell;

> - googled for an implementation of rationalize for Scheme and

> translated it to Kogut;

> - some integer overflow handling code were looked at GHC and OCaml.

>

usually I find it a hassle to look up info, so I come up with crap by

myself...

eg: once I knew what hash tables were, they were trivial to implement

(previously I had been confused, and implemented binary trees).

similar for many other things, a vague description is enough to give

one an understanding of the algo in most cases (except maybe dct,

which I still don't get, but this may be because of my weak math

skills...).

but really, what does a few hundred or a few k lines cost to write?...

not that much really, and one may have gained some understanding of

the problem domain and have the knowlege that it was achieved by

one's own work (instead of ripping of someone else).

hell, I need something to do, given I am somewhat uncreative.

hmm, ever notice how often bugs don't survive almost drowning in

coffee? usually they might sort of survive for a little while, then

start twitching out and die. or they will look like they might

survive, then jump in the air and be dead by the time they land...

insects probably get a similar caffine rush to humans, but much

stronger since they almost drown in it...

I once drank about 1 liter of coffee at once (maybe more, I don't

remember), and then passed out. maybe that is sort of like what the

bugs experience.

it is hard to describe the effect really (everything turns into noise

with some general images forming and then disintegrating, so many

things happen at once but so chaotic that it looks like tv static).

when I fully regained conciousness I felt like crap (ultra-headache),

and I never did that again (actually, I drink much less coffee now

than I did around then, this was about 97 or 98).

hmm, didn't puke then though. I remember a while before that when I

had puked from eating a whole bunch of coffee grounds, don't remember

why though... (this was about 97).

actually, around then I also learned why not to combine soda and

wine, err, and why not to do a lot of things food or drink related...

or something...