Loading ...
Sorry, an error occurred while loading the content.

Re: "ocaml_beginners"::[] Ugly floats in Ocaml

Expand Messages
  • Brian Hurt
    ... Short answer: No. Long answer: here s the problem. Floating point is of finite precision. For any base, *any* base, there are simple fractions which can
    Message 1 of 8 , Jan 9, 2006
    • 0 Attachment
      On Mon, 9 Jan 2006, roparzhhemon wrote:

      >
      >
      > Hello all,
      >
      > my problem can be summed up in one line :
      >
      > #10.87+.0.01;;
      > - : float = 10.879999999999999
      >
      > Of course I can redefine my own float type and operations,
      > but isn't it a pity that Caml behaves like that ?

      Short answer: No.

      Long answer: here's the problem. Floating point is of finite precision.
      For any base, *any* base, there are simple fractions which can not be
      represented exactly in a finite number of digits. Base 10, for example,
      can't represent 1/3rd exactly without an infinite number of 3's. Ocaml
      floats happen to be IEEE-754 double precision floating point numbers,
      which work in binary. Which means that in addition to 1/3rd, they also
      can't exactly represent 1/5 or 1/10 without an infinite number of bits.

      Which means that with any finite amount of precision, you will simply hold
      an approximation. Instead of 1/3, you get 0.3333. Of course, 1/3 * 3 =
      1, but 0.3333 * 3 = 0.9999. Welcome to numeric roundoff. Doesn't matter
      what you do, with finite precision, you will hit numeric round off.
      Decimal vr.s binary floats, both hit round off. You have rational
      arithmetic- you represent the number a/b as the tuple of the two numbers
      (a, b). Then someone asks you to normalize the vector [1, 1, 0]. Which
      is simply the vector [1/sqrt(2), 1/sqrt(2), 0]. Except there are no two
      finite precision numbers a and b such that a/b = sqrt(2). And so you get
      numeric roundoff.

      So we have to deal with numeric roundoff no matter what we do. We can't
      escape from the problem. So why not use an arithmetic that the computer
      can do fast, and gives us as many digits as possible? This is why the
      hardware manufacturers went with binary- as did C, Ocaml, Java, C++, Perl,
      Python, Ruby, Fortran, and pretty much every other language except Cobol.
      64 bit floats gives us 15 decimal digits of precision (actually a little
      more)- enough digits to measure from here to the moon in micrometers. And
      that isn't an exageration. The moon is a little bit less than 400,000
      kilometers from the earth. That's six digits of precision to measure the
      distance in kilometers. Nine digits gets you meters. 12 digits gets you
      millimeters. 15 digits, micrometers. The distance from here to the moon,
      with no round off, in micrometers.

      Surely, that's enough precision, isn't it? I mean, for any real life use.
      So long as we don't lose too many bits of precision. Because the problem
      is that the errors can accumulate. There are, broadly speaking, two
      different sorts of numeric algorithms (the situation isn't precisely this
      clear, but I'm simplifying here). The first type is the unstable
      algorithms. With these algorithms, errors accumulate. If you're a little
      high this step, next step you'll be even higher. The other type is stable
      algorithms. With these algorithms, errors cancel- a little high this
      round, a little low next round. I liken it to a marble with a bowl. Put
      a marble in a bowl, and move it away from the bottom of the bowl, and the
      marble will roll around the bottom of the bowl- but it won't leave the
      bowl. The position of the ball will always be reasonable near the bottom
      of the bowl. Now take the bowl and turn it upside down and put the marble
      on top. So long as it's exactly on the top of the bowl, balanced
      precariously, it'll stay there. But move it away from the top of the
      overturned bowl, and it'll start rolling away from the top of the bowl,
      accelerating away as it does.

      With an unstable algorithm, any finite number of bits is insufficient-
      because you lose them to roundoff error, sooner or later. I've seen
      algorithms that double the number of bits of roundoff error every
      iteration. Start with one bit wrong, and six steps later your entire
      64-bit floating point number is wrong. The newbie solution to this is to
      just add more bits- but remember the number of bits being eaten doubles
      every step. So a 128 bit floating point number is totally garbage at
      seven steps. A 1 Kbyte floating point number is wrong after 13 steps. A
      1 Mbyte number is wrong after 23 steps. Even if you went to 1 GByte
      numbers, they'd be wrong after 33 steps. Meanwhile, the stable algorithm
      just got down with a million iterations, and only has a couple of bits
      wrong.

      Nor, I comment, does interval arithmetic help. The idea of interval
      arithmetic is that you represent each number as an interval a to b. Two
      numbers. I can then calculate the bounds of any operation. Say I'm
      calculating x - y, and I know that x is somewhere in the bounds of x1 to
      x2 (x1 < x2), and y is between y1 and y2. Then I know that x - y is
      somewhere between x1 - y2 and x2 - y1. And I know that x/y is somewhere
      between x1/y2 and x2/y1.

      A counter example of why this doesn't work is Newton's method. You
      remember Newton's method from Calculus, right? If you're trying to find a
      solution to f(x) = 0, you calculate x' = x - f(x)/f'(x). If x is at all
      close to the right answer, then x' is even closer to the right answer.
      You feed x' in as the next x, and you converge on the right answer. Were
      you to run interval arithmetic on this algorithm, you'd find that in each
      step, the bounds became larger- when, in reality, the answer you're
      getting is becoming more accurate. If you were to start with 1 correct
      bit in your initial guess, after one iteration you'd have two iterations.
      After the second iteration, you'd have four correct bits, then 8, then 16.
      After six steps, you could quit because you have the right answer. Errors
      in one step tend to be canceled in the next step- Newton's method is very
      stable. Overshoot a little this step, next round you'll undershoot a
      little- but you're still converging and gaining correct bits.

      So interval arithmetic doesn't work- and neither does bigger floats nor
      rational arithmetic, nor a couple of dozen other ideas that have been
      invented and tried out in the last fifty plus years we've been struggling
      with this (this isn't a new problem- one of the candidates for first
      digital computers was built by Professor Atanasoff at Iowa State to solve
      partial differential equations, and had this problem). The solution is to
      use stable algorithms, and to be aware that floating point is only only
      accurate to 15 digits or so.

      I bring up interval arithmetic because Sun has recently rediscovered it,
      at least according to their press releases. Bad news, Sun: interval
      arithmetic had been *re*discovered at least twice before I was born- it
      ain't new. And it doesn't work- didn't work then, doesn't work now.

      So what's a stable algorithm, and what isn't? Especially when
      conceptually small changes make big differences. Gaussian elimination-
      unstable. Gaussian elimination with partial pivoting- stable.

      Well, there are two different ways you can tell this. Way number one is
      to get a PhD in Mathematic specializing in numerical analysis, and analyze
      the algorithms yourself. Or, you can do what I do. Which is buy a book
      on numerical analysis, and skip the sections where they actually derive
      what the error bounds for the algorithms are, except for the conclusions.
      They generally make the conclusions fairly obvious. "As we can see from
      the above text, the error bounds for Gaussian elimination without partial
      pivoting accumulate at an unacceptable rate, while the error bounds for
      Gaussian elimination with partial pivoting accumulate at an asymptocially
      slower rate." Right. Mental note to self: always partially pivot when
      doing Gaussian elimination.

      I hope this helps.

      Brian
    • Mike Meyer
      ... Ok, what s up with: # 0.1 ;; - : float = 0.1 I know you can t represent .1 exactly in a float. Why does it round the value to one decimal place here? And:
      Message 2 of 8 , Jan 9, 2006
      • 0 Attachment
        In <Pine.LNX.4.63.0601091031200.10377@...>, Brian Hurt <bhurt@...> typed:
        > > #10.87+.0.01;;
        > > - : float = 10.879999999999999
        >
        > Long answer: here's the problem. Floating point is of finite precision.
        > For any base, *any* base, there are simple fractions which can not be
        > represented exactly in a finite number of digits. Base 10, for example,
        > can't represent 1/3rd exactly without an infinite number of 3's. Ocaml
        > floats happen to be IEEE-754 double precision floating point numbers,
        > which work in binary. Which means that in addition to 1/3rd, they also
        > can't exactly represent 1/5 or 1/10 without an infinite number of bits.

        Ok, what's up with:

        # 0.1 ;;
        - : float = 0.1

        I know you can't represent .1 exactly in a float. Why does it round
        the value to one decimal place here?

        And:

        # 10.87 +. 0.01 ;;
        - : float = 10.879999999999999
        # 0.87 +. 0.01 ;;
        - : float = 0.88
        #

        Surely if the binary representation of .88 terminates, the
        representation of 10.88 terminates?

        Or, if you prefer, what heuristic does whatever function prints values
        in the REPL loop use to decide when to print a a value "exactly", and
        when to print a "rounded" version? Better yet, what are my choices for
        printing floats, and where are their heuristics documented?

        Thanks,
        <mike
        --
        Mike Meyer <mwm@...> http://www.mired.org/consulting.html
        Independent Network/Unix/Perforce consultant, email for more information.
      • Lars Nilsson
        ... Termination is driven by what can be represented as sums 1/2^n, where n is from 0 (or one, I forget), up to the number of bits in the mantissa in a
        Message 3 of 8 , Jan 9, 2006
        • 0 Attachment
          On 1/9/06, Mike Meyer <mwm-keyword-ocaml.8afb38@...> wrote:
          > Surely if the binary representation of .88 terminates, the
          > representation of 10.88 terminates?
          >
          > Or, if you prefer, what heuristic does whatever function prints values
          > in the REPL loop use to decide when to print a a value "exactly", and
          > when to print a "rounded" version? Better yet, what are my choices for
          > printing floats, and where are their heuristics documented?

          "Termination" is driven by what can be represented as sums 1/2^n,
          where n is from 0 (or one, I forget), up to the number of bits in the
          mantissa in a floating point, if the bit is set in that particular
          position. If you had 3 bits of mantissa, you'd be able to represent 8
          different floats [1] (not counting the exponent shifting of the
          value), with an infinte number of floats than cannot be accurately
          represented. The 64/80 bit IEEE floats are more accurate, but are
          still limited to a finite number of floats that can be exactly
          represented.

          Assuming no shifting due to the exponent:

          000 -> 0
          100 -> 0.5
          010 -> 0.25
          110 -> 0.75
          001 -> 0.125
          101 -> 0.625
          011 -> 0.375
          111 -> 0.875

          Shifting due to a different exponent would simply multiply the value by 2^n.

          Lars Nilsson
        • Brian Hurt
          ... Because the printing routine recognizes that the real value isn t 0.1, but close enough to 0.1 that it prints it out as such. If you do: # 0.1 +. 0.1 +.
          Message 4 of 8 , Jan 9, 2006
          • 0 Attachment
            On Mon, 9 Jan 2006, Mike Meyer wrote:

            > In <Pine.LNX.4.63.0601091031200.10377@...>, Brian Hurt <bhurt@...> typed:
            >>> #10.87+.0.01;;
            >>> - : float = 10.879999999999999
            >>
            >> Long answer: here's the problem. Floating point is of finite precision.
            >> For any base, *any* base, there are simple fractions which can not be
            >> represented exactly in a finite number of digits. Base 10, for example,
            >> can't represent 1/3rd exactly without an infinite number of 3's. Ocaml
            >> floats happen to be IEEE-754 double precision floating point numbers,
            >> which work in binary. Which means that in addition to 1/3rd, they also
            >> can't exactly represent 1/5 or 1/10 without an infinite number of bits.
            >
            > Ok, what's up with:
            >
            > # 0.1 ;;
            > - : float = 0.1
            >
            > I know you can't represent .1 exactly in a float. Why does it round
            > the value to one decimal place here?

            Because the printing routine recognizes that the real value isn't 0.1, but
            close enough to 0.1 that it prints it out as such. If you do:

            # 0.1 +. 0.1 +. 0.1 +. 0.1 +. 0.1 +. 0.1 +. 0.1 +. 0.1 +. 0.1 +. 0.1;;
            - : float = 0.999999999999999889
            #

            You learn that 0.1 isn't really 1/10. Adding 0.1 to itself magnifies the
            error enough that the printing routine no longer recognizes 1.0 as 1.0.

            On the other hand, if you saw:

            # 0.1;;
            - : float = 0.09999999999999998
            #

            I.e. if the printing routine didn't do the rounding, we'd be having
            essentially the same conversation.


            >
            > And:
            >
            > # 10.87 +. 0.01 ;;
            > - : float = 10.879999999999999
            > # 0.87 +. 0.01 ;;
            > - : float = 0.88
            > #
            >
            > Surely if the binary representation of .88 terminates, the
            > representation of 10.88 terminates?

            No.

            The floating point format is effectively three parts, the fractional part
            (f), the exponent (e), and the sign (s). The fractional part is 52 bits,
            and the exponent part 11 bits, plus a sign bit for 64 bits. The number
            represented by a given f, e, and s is:

            -1^s * (1 + (f/2^52)) * 2^(e - 1023)

            This is why it's called floating point- because the exponent can vary,
            where (in the bits) the decimal point is can float.

            So given this format, consider how do you add two numbers with different
            exponents, different "sizes" if you will? You shift the smaller number to
            the right until the exponents are equal. But when the operation is done,
            I still have to fit the result into 64 bits, so the extra bits just get
            rounded off.

            What this means is that adding two different numbers of signfigantly
            different sizes you have a much larger error than you have when adding two
            numbers of closer to the same size. The situation can get so bad that the
            entire result of adding two non-zero representable numbers is simply the
            larger of the two numbers. Consider:

            # 1. +. 1e-16;;
            - : float = 1.
            # 0.1 +. 1e-16;;
            - : float = 0.100000000000000103
            #

            This isn't the print routine being flaky:

            # let x = 1. +. 1e-16;;
            val x : float = 1.
            # x +. x +. x +. x +. x +. x +. x +. x +. x +. x;;
            - : float = 10.
            #

            When we try to add 1e-16 to 1.0, the 1e-16 just shifted so far to the
            right that when we go to put the result back into a 64 bit float, it's
            entire contribution is rounded away and we get 1.0 right back. Note that
            1.0 is precisely representable (s=0, f=0, and e=1023) in floating point
            arithmetic, so adding 1.0 to itself 10 times gets 10.0 precisely.

            I'm skipping a whole boatload of details here- for example the "special"
            FP numbers like denormalized numbers, infinities, and NAN. I'm also
            skipping the fact that the x86 has an 80-bit float format, that values in
            registers get stored in so long as they stay in registers, so things get
            rounded to different values (different numbers of signifigant bits)
            depending upon when things get moved from registers out to memory (also,
            this 80-bit format isn't supported by SSE).

            >
            > Or, if you prefer, what heuristic does whatever function prints values
            > in the REPL loop use to decide when to print a a value "exactly", and
            > when to print a "rounded" version? Better yet, what are my choices for
            > printing floats, and where are their heuristics documented?

            I beleive this is the algorithm used:
            http://citeseer.ist.psu.edu/28233.html

            Although hopefully someone with more knowledge can comment on this.

            If you want the gory details. The basic idea is that every floating point
            number actually represents a range of numbers. A given floating point
            number x (ignoring accumulated round off error) could be any number
            between (1 - 2^-53)*x and (1 + 2^-53)*x. Because all numbers in that
            range will get rounded off to x. So what the algorithm does is it looks
            at all the rational numbers in that range for the "smallest" one, the one
            with the smallest numerator and denominator. Thus it is more inclined to
            pick 1/10 instead of 9999999999999998/10000000000000000, even though both
            numbers have the same representation due to rounding.

            Brian
          Your message has been successfully submitted and would be delivered to recipients shortly.