Loading ...
Sorry, an error occurred while loading the content.

Re: [PBML] Why does Perl insist that 1 does not equal 1??

Expand Messages
  • chen li
    When you mean equal in computer, do you mean = or == ? ... ____________________________________________________________________________________ Access over
    Message 1 of 8 , Nov 2, 2006
    • 0 Attachment
      When you mean equal in computer, do you mean "=" or
      "=="?



      --- thisis_not_anapple <thisis_not_anapple@...>
      wrote:

      > I'm learning Perl and working through the O'Reilly
      > book Beginning Perl
      > for Bioinformatics. When working on an exercise I've
      > come across
      > something strange I can't quite figure out.
      >
      > I've copied the source and output below. I've looked
      > at O'Reilly's
      > answer to the exercise which is essentially the same
      > as mine and I've
      > tested it and found it behaves in exactly the same
      > way as mine so I
      > don't think there's any error in my code (unless
      > they made the same
      > error).
      >
      > Basically, the sub takes 4 numbers as arguments and
      > before proceeding
      > does a check to make sure that the 4 numbers add up
      > to 1. If they
      > don't, it prints an error and terminates the
      > program.
      >
      > The problem is that for some combinations of
      > numbers, even though they
      > should add to 1, and even though Perl claims they
      > add to 1 (see the
      > output to note that the expression adding the 4
      > numbers is displayed
      > as "1" in the output) it also claims that they don't
      > equal 1 and
      > therefore the test fails.
      >
      > This doesn't generally happen. There are lots of
      > combinations of 4
      > numbers that add to 1 and are interpreted correctly
      > by Perl. Even
      > changing the order of the variables for which these
      > 4 particular
      > numbers are assigned can make the program work. But
      > for some
      > particular combinations, it just doesn't.
      >
      > In the line that assigns 0.1 to $probT, if I delete
      > 0.1 and the
      > comment mark so the expression following it is
      > assigned to $probT,
      > then the program works fine. However, running in
      > debug mode and
      > printing the value assigned to $probT returns "0.1".
      > so why it behaves
      > this way is beyond me.
      >
      > Is this some sort of floating point rounding issue?
      >
      > I'm running ActivePerl 5.8.8 on Windows XP Pro SP2.
      >
      > Thanks!
      >
      > Here's the source:
      > ==================================================
      > use strict;
      > use warnings;
      >
      > # O'Reilly Beginning Perl for Bioinformatics
      > Excercise:
      >
      > # write sub that returns a random nucleotide, where
      > the probability of
      > each nucleutide is specified in the arguments
      > (ensure they add up to 1)
      >
      > my ($probA, $probG, $probC) = (.7, .1, .1); # define
      > probability for
      > each nt
      > my $probT = 0.1; # 1-$probA-$probG-$probC;
      > my %count = ('A'=>0, 'G'=>0, 'C'=>0, 'T'=>0); #
      > let's count the picks
      > to verify it worked correctly
      > my $pickno = 100;
      >
      > print "The probabilities are: A=$probA, G=$probG,
      > C=$probC,
      > T=$probT\nHere's $pickno picks:\n\n";
      > for (1..$pickno) {
      > my $nt = get_prob_nt($probA, $probG, $probC,
      > $probT);
      > $count{$nt}++;
      > print "$nt, ";
      > }
      > print "\nThe counts are $count{'A'}, $count{'G'},
      > $count{'C'},
      > $count{'T'}\n";
      >
      > # the sub to pick a random nt based on the given
      > probabilities
      > sub get_prob_nt {
      > my ($probA, $probG, $probC, $probT) = @_; # get
      > probabilities from
      > arguments
      > if (($probA + $probG + $probC + $probT) != 1) { #
      > make sure they add to 1
      > print '$probA + $probG + $probC + $probT =
      > ',"$probA + $probG +
      > $probC + $probT = ",($probA + $probG + $probC +
      > $probT),"\n";
      > print '(($probA + $probG + $probC + $probT) != 1)
      > = |',(($probA +
      > $probG + $probC + $probT) != 1),"|\n";
      > print '(($probA + $probG + $probC + $probT) == 1)
      > = |',(($probA +
      > $probG + $probC + $probT) == 1),"|\n";
      > exit; # exit program
      > }
      > my $pick = rand( 1); # picks a random number
      > between 0 and 1 (can
      > equal 0, can't equal 1)
      > if ($pick < $probA) {
      > return 'A';
      > } elsif ($pick < $probA+$probG) {
      > return 'G';
      > } elsif ($pick < $probA+$probG+$probC) {
      > return 'C';
      > } else {
      > return 'T';
      > }
      > }
      > ==================================================
      >
      > Here's the output:
      > ==================================================
      > The probabilities are: A=0.7, G=0.1, C=0.1, T=0.1
      > Here's 100 picks:
      >
      > $probA + $probG + $probC + $probT = 0.7 + 0.1 + 0.1
      > + 0.1 = 1
      > (($probA + $probG + $probC + $probT) != 1) = |1|
      > (($probA + $probG + $probC + $probT) == 1) =
      > ||==================================================
      >
      >
      >
      >
      >




      ____________________________________________________________________________________
      Access over 1 million songs - Yahoo! Music Unlimited
      (http://music.yahoo.com/unlimited)
    • Paul Archer
      ... You got it. You need to think in terms of precision. For example, a simplistic way to do things, multiply everything by 10, add your four numbers up, use
      Message 2 of 8 , Nov 2, 2006
      • 0 Attachment
        7:11pm, thisis_not_anapple wrote:

        >[massive snippage]
        > Is this some sort of floating point rounding issue?
        >
        You got it. You need to think in terms of precision. For example, a
        simplistic way to do things, multiply everything by 10, add your four
        numbers up, use int to make sure they're rounded off, and compare to 10
        instead of 1.

        Paul



        ----------------------
        | Wanted |
        | $10,000 reward |
        | Schroedinger's Cat |
        | Dead or Alive |
        ----------------------
      • thisis_not_anapple
        Rob, I looked at the document you linked to but it s a little over my head. There s a reasonably good explanation on floating points in Wikipedia (which also
        Message 3 of 8 , Nov 6, 2006
        • 0 Attachment
          Rob, I looked at the document you linked to but it's a little over my
          head. There's a reasonably good explanation on floating points in
          Wikipedia (which also links to the same article) so I think I got the
          gist of it.

          While, I was aware of the concept of floating point rounding errors, I
          guess I always assumed (incorrectly) that it was only an issue for
          complex calculations. I was also especially surprised to see that the
          order of addition makes a difference. However, I suppose that makes
          sense since changing the order of addition will change the
          intermediate results, some of which may contribute differently to
          roundoff errors.

          So is it fair to say, then, that one should NEVER do a direct
          comparison on floating point numbers but ALWAYS check only that
          they're the same to a given precision? If so, are there any accepted
          best practises for doing the check and for how many digits of
          precision to check for?

          From what I gathered, there should be 16 significant digits (in
          decimal) stored in a floating point value, but the 16th digit may be
          wrong due to roundoff error so you should never trust a floating point
          to equal a decimal to more than 15 digits. However, since roundoff
          errors can accumulate during calculations, any number that's a result
          of a calculation won't necessarily match it's mathematical result to
          15 digits.

          Anyway, here's an attempt to compare floating point numbers correctly
          that still has me a little mystified:

          The source:
          ======================================
          use strict;
          use warnings;

          my ($a, $b, $c, $d) = (.7, .1, .1, .1);
          my $sum = $a + $b + $c + $d;
          $\ = "\n";
          for my $sig (15..17) {
          print "\$sig = $sig";
          print "\$sum = $sum = ", sprintf("%.${sig}g", $sum);
          print "compf(\$sum,1,$sig) = ",compf($sum,1,$sig);
          print "\$sum eq 1 = ", ($sum eq 1),"\n";
          }

          sub compf {
          my ($f1, $f2, $sig) = @_;
          $f1 = sprintf("%.${sig}g", $f1);
          $f2 = sprintf("%.${sig}g", $f2);
          if ($f1 eq $f2) {
          return 0; # equal
          } elsif ($f1 < $f2) {
          return -1; # less-than
          } else {
          return 1; # greater-than
          }
          }
          ======================================

          The Output:
          ======================================
          $sig = 15
          $sum = 1 = 1
          compf($sum,1,15) = 0
          $sum eq 1 = 1

          $sig = 16
          $sum = 1 = 0.9999999999999999
          compf($sum,1,16) = 1
          $sum eq 1 = 1

          $sig = 17
          $sum = 1 = 0.99999999999999989
          compf($sum,1,17) = -1
          $sum eq 1 = 1

          ======================================

          So:
          1) When using 16 significant digits, my compf() function incorrectly
          claims that $sum is greater than 1, even though the sprintf() result
          indicates otherwise. Is this because '<' is the wrong operator when
          comparing numbers represented as strings? If not, what have I missed here?

          2) The sprintf function seems to return a value with 17 significant
          digits. I thought floating point numbers only contain 16...

          3) Why does the 'eq' operator return a true value when the '=='
          returns a false? From my experimenting, I find that the print
          statement rounds off floating point numbers to 15 digits (which makes
          sense if it's trying to hide roundoff errors in the last digit). When
          applying the 'eq' operator, does it also round off a floating point to
          15 digits prior to converting to a string?

          4) Is there a more efficient version of compf() that everyone other
          than me knows about? (Ok, I'm a little paranoid... ;) )

          Thanks for all your help!
        • Rob Biedenharn
          ... This is application dependent. For your example, checking within . 0001 would be more than enough. ... You re making things worse, not better. Using
          Message 4 of 8 , Nov 6, 2006
          • 0 Attachment
            On Nov 6, 2006, at 6:06 PM, thisis_not_anapple wrote:

            > Rob, I looked at the document you linked to but it's a little over my
            > head. There's a reasonably good explanation on floating points in
            > Wikipedia (which also links to the same article) so I think I got the
            > gist of it.
            >
            > While, I was aware of the concept of floating point rounding errors, I
            > guess I always assumed (incorrectly) that it was only an issue for
            > complex calculations. I was also especially surprised to see that the
            > order of addition makes a difference. However, I suppose that makes
            > sense since changing the order of addition will change the
            > intermediate results, some of which may contribute differently to
            > roundoff errors.
            >
            > So is it fair to say, then, that one should NEVER do a direct
            > comparison on floating point numbers but ALWAYS check only that
            > they're the same to a given precision? If so, are there any accepted
            > best practises for doing the check and for how many digits of
            > precision to check for?

            This is application dependent. For your example, checking within .
            0001 would be more than enough.

            >
            > From what I gathered, there should be 16 significant digits (in
            > decimal) stored in a floating point value, but the 16th digit may be
            > wrong due to roundoff error so you should never trust a floating point
            > to equal a decimal to more than 15 digits. However, since roundoff
            > errors can accumulate during calculations, any number that's a result
            > of a calculation won't necessarily match it's mathematical result to
            > 15 digits.
            >
            > Anyway, here's an attempt to compare floating point numbers correctly
            > that still has me a little mystified:

            ...code removed...

            > So:
            > 1) When using 16 significant digits, my compf() function incorrectly
            > claims that $sum is greater than 1, even though the sprintf() result
            > indicates otherwise. Is this because '<' is the wrong operator when
            > comparing numbers represented as strings? If not, what have I
            > missed here?

            You're making things worse, not better. Using numeric less-than (<)
            causes the strings to be coerced back to number before being
            compared. You need 'le' if you want to compare strings (but be
            careful as ("10" le "2") is true).

            > 2) The sprintf function seems to return a value with 17 significant
            > digits. I thought floating point numbers only contain 16...

            You ask for 17 and it will try to give them to you (I don't know if
            you can trust them to have any accuracy when you get that precise.)

            As an exercise, I tried:
            $ perl -le 'print sprintf("%.100g", log(2));'
            0.69314718055994528622676398299518041312694549560546875
            $ perl -le 'print length sprintf("%.100g", log(2));'
            55

            So it apparently only gives a max of 53 digits on my system. And they
            are probably not correct after about 15:
            $ perl -MMath::Trig -le 'print sprintf("%.100g", Math::Trig::acos(-1));'
            3.141592653589793115997963468544185161590576171875
            see http://
            3.141592653589793238462643383279502884197169399375105820974944592.com/
            or http://en.wikipedia.org/wiki/Pi
            or http://www.research.att.com/~njas/sequences/A00796

            > 3) Why does the 'eq' operator return a true value when the '=='
            > returns a false? From my experimenting, I find that the print
            > statement rounds off floating point numbers to 15 digits (which makes
            > sense if it's trying to hide roundoff errors in the last digit). When
            > applying the 'eq' operator, does it also round off a floating point to
            > 15 digits prior to converting to a string?
            >
            > 4) Is there a more efficient version of compf() that everyone other
            > than me knows about? (Ok, I'm a little paranoid... ;) )
            >
            > Thanks for all your help!

            Here's what I used recently when dealing with percentages (numbers
            between 0 and 1.0) that had about 6 significant digits.

            my $epsilon = 1.0e-8;

            sub approx_eq {
            my ($a, $b, $tolerance) = @_;
            return abs($a - $b) < $tolerance;
            }
            sub approx_ge {
            my ($a, $b, $tolerance) = @_;
            return($a > $b or approx_eq(@_));
            }

            Used as:
            push @error_msgs, "percentages don't sum to 1.0 (sum=$sum)"
            unless (approx_eq($sum, 1.0, $epsilon));

            or as:
            if (approx_ge($count, $nth, $epsilon)) {
            ...
            }

            You could make the tolerance optional and use a default in the sub,
            too. In my case, there was actually a command-line option to force a
            different value for $epsilon.

            -Rob

            Rob Biedenharn http://agileconsultingllc.com
            Rob@...
          • thisis_not_anapple
            ... ... I see your point. I didn’t want to use the ‘le’ operator since I wanted a numerical value comparison not an a string comparison which
            Message 5 of 8 , Nov 9, 2006
            • 0 Attachment
              --- In perl-beginner@yahoogroups.com, Rob Biedenharn <Rob@...> wrote:
              <snip>

              > You're making things worse, not better. Using numeric less-than (<)
              > causes the strings to be coerced back to number before being
              > compared. You need 'le' if you want to compare strings (but be
              > careful as ("10" le "2") is true).
              >

              I see your point. I didn’t want to use the ‘le’ operator since I
              wanted a numerical value comparison not an a string comparison which
              would fail in situations like the example you mentioned.

              In the example I gave, my function failed because after doing the
              initial string comparison for equality, it used the string in the ‘<’
              operation. Since one of the strings was ‘0.9999999999999999’ as a
              result of the sprintf operation, it got rounded up to ‘1’ when
              converted back to a number (presumably, due to roundoff error again).
              I modified the function to retain the original numerical values for
              use in the ‘<’ operation and now the function seems to work as expected:

              sub compf2 {
              my ($f1, $f2, $sig) = @_;
              my $s1 = sprintf("%.${sig}g", $f1);
              my $s2 = sprintf("%.${sig}g", $f2);
              if ($s1 eq $s2) {
              return 0; # equal
              } elsif ($f1 < $f2) {
              return -1; # less-than
              } else {
              return 1; # greater-than
              }
              }

              Doing some more experimentation I see that (at least on my system) the
              ‘print’ statement will round numbers to 15 significant digits and the
              same thing happens when numbers are automatically converted to
              strings. So using ‘eq’ instead of ‘==’ could be a quick way of
              checking for numerical equality to an accuracy of 15 significant
              digits. However, I suspect that’s too much accuracy to account for
              error accumulation from calculations and I’m not sure if this behavior
              is universal for all versions of Perl.

              > > 2) The sprintf function seems to return a value with 17 significant
              > > digits. I thought floating point numbers only contain 16...
              >
              > You ask for 17 and it will try to give them to you (I don't know if
              > you can trust them to have any accuracy when you get that precise.)
              >
              > As an exercise, I tried:
              > $ perl -le 'print sprintf("%.100g", log(2));'
              > 0.69314718055994528622676398299518041312694549560546875
              > $ perl -le 'print length sprintf("%.100g", log(2));'
              > 55
              >
              > So it apparently only gives a max of 53 digits on my system. And they
              > are probably not correct after about 15:
              > $ perl -MMath::Trig -le 'print sprintf("%.100g", Math::Trig::acos(-1));'
              > 3.141592653589793115997963468544185161590576171875
              > see http://
              > 3.141592653589793238462643383279502884197169399375105820974944592.com/
              > or http://en.wikipedia.org/wiki/Pi
              > or http://www.research.att.com/~njas/sequences/A00796
              >

              On my system, the sprintf function never returns a number with more
              than 17 significant digits. I tried it with the examples you gave.
              Typically the 17th digit is not accurate. In your pi example, above,
              although many additional digits are shown, it loses accuracy after the
              16th digit. I’m not clear on where the extra digits are pulled from. I
              suspect the differences in the maximum number of digits shown between
              are systems comes down to Perl version and/or OS. At first, I thought
              maybe on your system more memory is used to store the floating point,
              so it may have a higher accuracy. But if that were the case, pi should
              match its true value to far more digits than 16, which did not seem to
              be the case.

              In any case, it seems like nothing past 15 digits should ever be trusted…

              > Here's what I used recently when dealing with percentages (numbers
              > between 0 and 1.0) that had about 6 significant digits.
              >
              > my $epsilon = 1.0e-8;
              >
              > sub approx_eq {
              > my ($a, $b, $tolerance) = @_;
              > return abs($a - $b) < $tolerance;
              > }
              > sub approx_ge {
              > my ($a, $b, $tolerance) = @_;
              > return($a > $b or approx_eq(@_));
              > }
              >
              > Used as:
              > push @error_msgs, "percentages don't sum to 1.0 (sum=$sum)"
              > unless (approx_eq($sum, 1.0, $epsilon));
              >
              > or as:
              > if (approx_ge($count, $nth, $epsilon)) {
              > ...
              > }
              >
              > You could make the tolerance optional and use a default in the sub,
              > too. In my case, there was actually a command-line option to force a
              > different value for $epsilon.

              That approach seems reasonable when you have an expectation for what
              the magnitude of the numbers you’re comparing are. However, I’d like
              to have a function I can call for comparison that is independent of
              scale, which is why I like the idea of rounding to an arbitrary number
              of significant digits, which effectively gives you a percentage
              accuracy regardless of scale.

              I suppose, the same could be true of your approach, if you make
              $epsilon dependent on the other arguments. For instance calling
              something like: approx_eq($a, $b, $a*10**-8) would give you roughly
              the same precision, even if $a and $b were on the scale of 10-7. Or
              your functions could be modified to do this automatically. Something like:

              sub approx_eq {
              my ($a, $b, $tolerance) = @_;
              if (abs($a) > abs($b)) {
              $tolerance = abs($a) * 10**-$tolerance;
              } else {
              $tolerance = abs($b) * 10**-$tolerance;
              }
              return abs($a - $b) < $tolerance;
              }

              Finally, can you get into situations where the accuracy seems worse
              due to rounding effects? For instance if a calculation should
              theoretically give:
              0.1234999999999987 but due to roundoff error gives 0.1235000000000013.
              In this case if you round off the number to 4 to 14 significant digits
              they will be equal. But if you were to round off to only 3 significant
              digits, they wouldn’t. Although, I believe in this case comparing the
              absolute difference to a tolerance would always work while using the
              sprintf approach to round each number first would fail in rare cases.

              Is that right?
              Am I just over thinking this at this point?

              Thanks again!

              P.S. Sorry for the rambling…
            • Rob Biedenharn
              ... It s not roundoff error exactly. The problem is fundamentally that the computer is storing floating BINARY point values that represent floating DECIMAL
              Message 6 of 8 , Nov 10, 2006
              • 0 Attachment
                On Nov 9, 2006, at 7:29 PM, thisis_not_anapple wrote:

                > --- In perl-beginner@yahoogroups.com, Rob Biedenharn <Rob@...> wrote:
                > <snip>
                >
                >> You're making things worse, not better. Using numeric less-than (<)
                >> causes the strings to be coerced back to number before being
                >> compared. You need 'le' if you want to compare strings (but be
                >> careful as ("10" le "2") is true).
                >>
                >
                > I see your point. I didn’t want to use the ‘le’ operator since I
                > wanted a numerical value comparison not an a string comparison which
                > would fail in situations like the example you mentioned.
                >
                > In the example I gave, my function failed because after doing the
                > initial string comparison for equality, it used the string in the â
                > €˜<’
                > operation. Since one of the strings was ‘0.9999999999999999’ as a
                > result of the sprintf operation, it got rounded up to ‘1’ when
                > converted back to a number (presumably, due to roundoff error again).

                It's not "roundoff" error exactly. The problem is fundamentally that
                the computer is storing floating BINARY point values that represent
                floating DECIMAL point numbers. As an exercise, let's look at how to
                represent 0.3 (decimal) as binary:

                n 2^n bit cummulative
                0 1. 0.
                -1 0.5 0
                -2 0.25 1 0.25000000000000000
                -3 0.125 0
                -4 0.0625 0
                -5 0.03125 1 0.28125000000000000
                -6 0.015625 1 0.29687500000000000
                -7 0.0078125 0
                -8 0.00390625 0
                -9 0.001953125 1 0.29882812500000000
                -10 0.0009765625 1 0.29980468750000000
                -11 0.00048828125 0
                -12 0.000244140625 0
                -13 0.0001220703125 1 0.29992675781250000
                -14 0.00006103515625 1 0.29998779296875000
                -15 0.000030517578125 0

                so that's 0.010011001100110... (Trust me that it goes on forever
                like this.)

                At some point the limit of bits is set and some values are never
                going to be represented exactly. (But it's interesting to think that
                a number like 0.296875 is represented more accurately that 0.3 within
                the computer.)

                > In any case, it seems like nothing past 15 digits should ever be
                > trusted…
                >
                > .... Although, I believe in this case comparing the
                > absolute difference to a tolerance would always work while using the
                > sprintf approach to round each number first would fail in rare cases.
                >
                > Is that right?
                > Am I just over thinking this at this point?
                >
                > Thanks again!
                >
                > P.S. Sorry for the rambling…

                If you need that kind of accuracy, you probably need to deal with a
                lot more theory. Everything you'd need is in the paper I first cited
                (or probably in the Wikipedia article cited by someone else). One
                thing that quickly gets into a danger area is doing operations on
                values with vastly different magnitudes.

                If you know the number of digits and the scale, you can use fixed
                point arithmetic where an integer represents something like the
                number of pennies or 1/256th nautical mile (like an air traffic
                control system I worked on years ago with no floating point
                coprocessor). Believe me, things get much more complicated when you
                have to keep track of the scale yourself.

                -Rob

                Rob Biedenharn http://agileconsultingllc.com
                Rob@...
              Your message has been successfully submitted and would be delivered to recipients shortly.