- I'm learning Perl and working through the O'Reilly book Beginning Perl

for Bioinformatics. When working on an exercise I've come across

something strange I can't quite figure out.

I've copied the source and output below. I've looked at O'Reilly's

answer to the exercise which is essentially the same as mine and I've

tested it and found it behaves in exactly the same way as mine so I

don't think there's any error in my code (unless they made the same

error).

Basically, the sub takes 4 numbers as arguments and before proceeding

does a check to make sure that the 4 numbers add up to 1. If they

don't, it prints an error and terminates the program.

The problem is that for some combinations of numbers, even though they

should add to 1, and even though Perl claims they add to 1 (see the

output to note that the expression adding the 4 numbers is displayed

as "1" in the output) it also claims that they don't equal 1 and

therefore the test fails.

This doesn't generally happen. There are lots of combinations of 4

numbers that add to 1 and are interpreted correctly by Perl. Even

changing the order of the variables for which these 4 particular

numbers are assigned can make the program work. But for some

particular combinations, it just doesn't.

In the line that assigns 0.1 to $probT, if I delete 0.1 and the

comment mark so the expression following it is assigned to $probT,

then the program works fine. However, running in debug mode and

printing the value assigned to $probT returns "0.1". so why it behaves

this way is beyond me.

Is this some sort of floating point rounding issue?

I'm running ActivePerl 5.8.8 on Windows XP Pro SP2.

Thanks!

Here's the source:

==================================================

use strict;

use warnings;

# O'Reilly Beginning Perl for Bioinformatics Excercise:

# write sub that returns a random nucleotide, where the probability of

each nucleutide is specified in the arguments (ensure they add up to 1)

my ($probA, $probG, $probC) = (.7, .1, .1); # define probability for

each nt

my $probT = 0.1; # 1-$probA-$probG-$probC;

my %count = ('A'=>0, 'G'=>0, 'C'=>0, 'T'=>0); # let's count the picks

to verify it worked correctly

my $pickno = 100;

print "The probabilities are: A=$probA, G=$probG, C=$probC,

T=$probT\nHere's $pickno picks:\n\n";

for (1..$pickno) {

my $nt = get_prob_nt($probA, $probG, $probC, $probT);

$count{$nt}++;

print "$nt, ";

}

print "\nThe counts are $count{'A'}, $count{'G'}, $count{'C'},

$count{'T'}\n";

# the sub to pick a random nt based on the given probabilities

sub get_prob_nt {

my ($probA, $probG, $probC, $probT) = @_; # get probabilities from

arguments

if (($probA + $probG + $probC + $probT) != 1) { # make sure they add to 1

print '$probA + $probG + $probC + $probT = ',"$probA + $probG +

$probC + $probT = ",($probA + $probG + $probC + $probT),"\n";

print '(($probA + $probG + $probC + $probT) != 1) = |',(($probA +

$probG + $probC + $probT) != 1),"|\n";

print '(($probA + $probG + $probC + $probT) == 1) = |',(($probA +

$probG + $probC + $probT) == 1),"|\n";

exit; # exit program

}

my $pick = rand( 1); # picks a random number between 0 and 1 (can

equal 0, can't equal 1)

if ($pick < $probA) {

return 'A';

} elsif ($pick < $probA+$probG) {

return 'G';

} elsif ($pick < $probA+$probG+$probC) {

return 'C';

} else {

return 'T';

}

}

==================================================

Here's the output:

==================================================

The probabilities are: A=0.7, G=0.1, C=0.1, T=0.1

Here's 100 picks:

$probA + $probG + $probC + $probT = 0.7 + 0.1 + 0.1 + 0.1 = 1

(($probA + $probG + $probC + $probT) != 1) = |1|

(($probA + $probG + $probC + $probT) == 1) =

||================================================== - On Nov 9, 2006, at 7:29 PM, thisis_not_anapple wrote:

> --- In perl-beginner@yahoogroups.com, Rob Biedenharn <Rob@...> wrote:

It's not "roundoff" error exactly. The problem is fundamentally that

> <snip>

>

>> You're making things worse, not better. Using numeric less-than (<)

>> causes the strings to be coerced back to number before being

>> compared. You need 'le' if you want to compare strings (but be

>> careful as ("10" le "2") is true).

>>

>

> I see your point. I didnâ€™t want to use the â€˜leâ€™ operator since I

> wanted a numerical value comparison not an a string comparison which

> would fail in situations like the example you mentioned.

>

> In the example I gave, my function failed because after doing the

> initial string comparison for equality, it used the string in the â

> €˜<â€™

> operation. Since one of the strings was â€˜0.9999999999999999â€™ as a

> result of the sprintf operation, it got rounded up to â€˜1â€™ when

> converted back to a number (presumably, due to roundoff error again).

the computer is storing floating BINARY point values that represent

floating DECIMAL point numbers. As an exercise, let's look at how to

represent 0.3 (decimal) as binary:

n 2^n bit cummulative

0 1. 0.

-1 0.5 0

-2 0.25 1 0.25000000000000000

-3 0.125 0

-4 0.0625 0

-5 0.03125 1 0.28125000000000000

-6 0.015625 1 0.29687500000000000

-7 0.0078125 0

-8 0.00390625 0

-9 0.001953125 1 0.29882812500000000

-10 0.0009765625 1 0.29980468750000000

-11 0.00048828125 0

-12 0.000244140625 0

-13 0.0001220703125 1 0.29992675781250000

-14 0.00006103515625 1 0.29998779296875000

-15 0.000030517578125 0

so that's 0.010011001100110... (Trust me that it goes on forever

like this.)

At some point the limit of bits is set and some values are never

going to be represented exactly. (But it's interesting to think that

a number like 0.296875 is represented more accurately that 0.3 within

the computer.)

> In any case, it seems like nothing past 15 digits should ever be

If you need that kind of accuracy, you probably need to deal with a

> trustedâ€¦

>

> .... Although, I believe in this case comparing the

> absolute difference to a tolerance would always work while using the

> sprintf approach to round each number first would fail in rare cases.

>

> Is that right?

> Am I just over thinking this at this point?

>

> Thanks again!

>

> P.S. Sorry for the ramblingâ€¦

lot more theory. Everything you'd need is in the paper I first cited

(or probably in the Wikipedia article cited by someone else). One

thing that quickly gets into a danger area is doing operations on

values with vastly different magnitudes.

If you know the number of digits and the scale, you can use fixed

point arithmetic where an integer represents something like the

number of pennies or 1/256th nautical mile (like an air traffic

control system I worked on years ago with no floating point

coprocessor). Believe me, things get much more complicated when you

have to keep track of the scale yourself.

-Rob

Rob Biedenharn http://agileconsultingllc.com

Rob@...