Loading ...
Sorry, an error occurred while loading the content.

Re: "ocaml_beginners"::[] Ugly floats in Ocaml

Expand Messages
  • Mike Meyer
    ... Ok, what s up with: # 0.1 ;; - : float = 0.1 I know you can t represent .1 exactly in a float. Why does it round the value to one decimal place here? And:
    Message 1 of 8 , Jan 9, 2006
    • 0 Attachment
      In <Pine.LNX.4.63.0601091031200.10377@...>, Brian Hurt <bhurt@...> typed:
      > > #10.87+.0.01;;
      > > - : float = 10.879999999999999
      >
      > Long answer: here's the problem. Floating point is of finite precision.
      > For any base, *any* base, there are simple fractions which can not be
      > represented exactly in a finite number of digits. Base 10, for example,
      > can't represent 1/3rd exactly without an infinite number of 3's. Ocaml
      > floats happen to be IEEE-754 double precision floating point numbers,
      > which work in binary. Which means that in addition to 1/3rd, they also
      > can't exactly represent 1/5 or 1/10 without an infinite number of bits.

      Ok, what's up with:

      # 0.1 ;;
      - : float = 0.1

      I know you can't represent .1 exactly in a float. Why does it round
      the value to one decimal place here?

      And:

      # 10.87 +. 0.01 ;;
      - : float = 10.879999999999999
      # 0.87 +. 0.01 ;;
      - : float = 0.88
      #

      Surely if the binary representation of .88 terminates, the
      representation of 10.88 terminates?

      Or, if you prefer, what heuristic does whatever function prints values
      in the REPL loop use to decide when to print a a value "exactly", and
      when to print a "rounded" version? Better yet, what are my choices for
      printing floats, and where are their heuristics documented?

      Thanks,
      <mike
      --
      Mike Meyer <mwm@...> http://www.mired.org/consulting.html
      Independent Network/Unix/Perforce consultant, email for more information.
    • Lars Nilsson
      ... Termination is driven by what can be represented as sums 1/2^n, where n is from 0 (or one, I forget), up to the number of bits in the mantissa in a
      Message 2 of 8 , Jan 9, 2006
      • 0 Attachment
        On 1/9/06, Mike Meyer <mwm-keyword-ocaml.8afb38@...> wrote:
        > Surely if the binary representation of .88 terminates, the
        > representation of 10.88 terminates?
        >
        > Or, if you prefer, what heuristic does whatever function prints values
        > in the REPL loop use to decide when to print a a value "exactly", and
        > when to print a "rounded" version? Better yet, what are my choices for
        > printing floats, and where are their heuristics documented?

        "Termination" is driven by what can be represented as sums 1/2^n,
        where n is from 0 (or one, I forget), up to the number of bits in the
        mantissa in a floating point, if the bit is set in that particular
        position. If you had 3 bits of mantissa, you'd be able to represent 8
        different floats [1] (not counting the exponent shifting of the
        value), with an infinte number of floats than cannot be accurately
        represented. The 64/80 bit IEEE floats are more accurate, but are
        still limited to a finite number of floats that can be exactly
        represented.

        Assuming no shifting due to the exponent:

        000 -> 0
        100 -> 0.5
        010 -> 0.25
        110 -> 0.75
        001 -> 0.125
        101 -> 0.625
        011 -> 0.375
        111 -> 0.875

        Shifting due to a different exponent would simply multiply the value by 2^n.

        Lars Nilsson
      • Brian Hurt
        ... Because the printing routine recognizes that the real value isn t 0.1, but close enough to 0.1 that it prints it out as such. If you do: # 0.1 +. 0.1 +.
        Message 3 of 8 , Jan 9, 2006
        • 0 Attachment
          On Mon, 9 Jan 2006, Mike Meyer wrote:

          > In <Pine.LNX.4.63.0601091031200.10377@...>, Brian Hurt <bhurt@...> typed:
          >>> #10.87+.0.01;;
          >>> - : float = 10.879999999999999
          >>
          >> Long answer: here's the problem. Floating point is of finite precision.
          >> For any base, *any* base, there are simple fractions which can not be
          >> represented exactly in a finite number of digits. Base 10, for example,
          >> can't represent 1/3rd exactly without an infinite number of 3's. Ocaml
          >> floats happen to be IEEE-754 double precision floating point numbers,
          >> which work in binary. Which means that in addition to 1/3rd, they also
          >> can't exactly represent 1/5 or 1/10 without an infinite number of bits.
          >
          > Ok, what's up with:
          >
          > # 0.1 ;;
          > - : float = 0.1
          >
          > I know you can't represent .1 exactly in a float. Why does it round
          > the value to one decimal place here?

          Because the printing routine recognizes that the real value isn't 0.1, but
          close enough to 0.1 that it prints it out as such. If you do:

          # 0.1 +. 0.1 +. 0.1 +. 0.1 +. 0.1 +. 0.1 +. 0.1 +. 0.1 +. 0.1 +. 0.1;;
          - : float = 0.999999999999999889
          #

          You learn that 0.1 isn't really 1/10. Adding 0.1 to itself magnifies the
          error enough that the printing routine no longer recognizes 1.0 as 1.0.

          On the other hand, if you saw:

          # 0.1;;
          - : float = 0.09999999999999998
          #

          I.e. if the printing routine didn't do the rounding, we'd be having
          essentially the same conversation.


          >
          > And:
          >
          > # 10.87 +. 0.01 ;;
          > - : float = 10.879999999999999
          > # 0.87 +. 0.01 ;;
          > - : float = 0.88
          > #
          >
          > Surely if the binary representation of .88 terminates, the
          > representation of 10.88 terminates?

          No.

          The floating point format is effectively three parts, the fractional part
          (f), the exponent (e), and the sign (s). The fractional part is 52 bits,
          and the exponent part 11 bits, plus a sign bit for 64 bits. The number
          represented by a given f, e, and s is:

          -1^s * (1 + (f/2^52)) * 2^(e - 1023)

          This is why it's called floating point- because the exponent can vary,
          where (in the bits) the decimal point is can float.

          So given this format, consider how do you add two numbers with different
          exponents, different "sizes" if you will? You shift the smaller number to
          the right until the exponents are equal. But when the operation is done,
          I still have to fit the result into 64 bits, so the extra bits just get
          rounded off.

          What this means is that adding two different numbers of signfigantly
          different sizes you have a much larger error than you have when adding two
          numbers of closer to the same size. The situation can get so bad that the
          entire result of adding two non-zero representable numbers is simply the
          larger of the two numbers. Consider:

          # 1. +. 1e-16;;
          - : float = 1.
          # 0.1 +. 1e-16;;
          - : float = 0.100000000000000103
          #

          This isn't the print routine being flaky:

          # let x = 1. +. 1e-16;;
          val x : float = 1.
          # x +. x +. x +. x +. x +. x +. x +. x +. x +. x;;
          - : float = 10.
          #

          When we try to add 1e-16 to 1.0, the 1e-16 just shifted so far to the
          right that when we go to put the result back into a 64 bit float, it's
          entire contribution is rounded away and we get 1.0 right back. Note that
          1.0 is precisely representable (s=0, f=0, and e=1023) in floating point
          arithmetic, so adding 1.0 to itself 10 times gets 10.0 precisely.

          I'm skipping a whole boatload of details here- for example the "special"
          FP numbers like denormalized numbers, infinities, and NAN. I'm also
          skipping the fact that the x86 has an 80-bit float format, that values in
          registers get stored in so long as they stay in registers, so things get
          rounded to different values (different numbers of signifigant bits)
          depending upon when things get moved from registers out to memory (also,
          this 80-bit format isn't supported by SSE).

          >
          > Or, if you prefer, what heuristic does whatever function prints values
          > in the REPL loop use to decide when to print a a value "exactly", and
          > when to print a "rounded" version? Better yet, what are my choices for
          > printing floats, and where are their heuristics documented?

          I beleive this is the algorithm used:
          http://citeseer.ist.psu.edu/28233.html

          Although hopefully someone with more knowledge can comment on this.

          If you want the gory details. The basic idea is that every floating point
          number actually represents a range of numbers. A given floating point
          number x (ignoring accumulated round off error) could be any number
          between (1 - 2^-53)*x and (1 + 2^-53)*x. Because all numbers in that
          range will get rounded off to x. So what the algorithm does is it looks
          at all the rational numbers in that range for the "smallest" one, the one
          with the smallest numerator and denominator. Thus it is more inclined to
          pick 1/10 instead of 9999999999999998/10000000000000000, even though both
          numbers have the same representation due to rounding.

          Brian
        Your message has been successfully submitted and would be delivered to recipients shortly.