## Percentile of inserted element

Expand Messages
• I want an efficient subroutine that inserts \$elt into @x and returns the percentile of \$elt in @x. For example, if @x is (2,2,3,3,3,4,5,6,7,8) and \$elt is 3,
Message 1 of 2 , Oct 11, 2008
• 0 Attachment
I want an efficient subroutine that inserts \$elt into @x and returns
the percentile of \$elt in @x.

For example, if @x is (2,2,3,3,3,4,5,6,7,8) and \$elt is 3, the
subroutine returns .363636... and @x is now
(2,2,3,3,3,3,4,5,6,7,8). Why .363636...?:

% After insertion, @x has 11 elements. \$elt (3) beats 2 of those
elements and ties 4 (including itself). Ties count as half-wins, so
the new element beats 2+(4/2) or 4 elements, and 4/11 is .3636...

Below is one way to do it. I'd turn it into a subroutine if I wanted
to use it, but it's really inefficient. If I insert 100 elements and
want the percentile of each as its inserted, it has to do something
like 50K comparisons. The subroutine's also ugly, but ignore that.

I'm guessing a subroutine that maintained a sorted list would work
much better.

My ugly inefficient solution:

@mylist = (2,2,3,3,3,4,5,6,7,8);
\$elt = 3;
push(@mylist,\$elt); #insert the element first

for \$i (@mylist) {
if (\$elt>\$i) {\$win++; next} # how many times does \$elt beat list values
if (\$elt<\$i) {\$lose++; next} # how many times does \$elt lose to list value
\$tie++; # ties count as half-wins, including the element I just inserted
}

print "WIN: \$win, TIE: \$tie, LOSE: \$lose\n";
# could use \$#mylist+1 for denominator below; 1. is to force floating point
print 1.*(\$win+\$tie/2)/(\$win+\$lose+\$tie);
print "\n";

--
We're just a Bunch Of Regular Guys, a collective group that's trying
to understand and assimilate technology. We feel that resistance to
new ideas and technology is unwise and ultimately futile.
• On Sat, Oct 11, 2008 at 12:31 PM, Kelly Jones ... I d agree. ... It s not THAT ugly, although keeping the array sorted would be very helpful. You needn t keep
Message 2 of 2 , Oct 13, 2008
• 0 Attachment
On Sat, Oct 11, 2008 at 12:31 PM, Kelly Jones
<kelly.terry.jones@...>wrote:

> I want an efficient subroutine that inserts \$elt into @x and returns the
> percentile of \$elt in @x.
>
> For example, if @x is (2,2,3,3,3,4,5,6,7,8) and \$elt is 3, the subroutine
> returns .363636... and @x is now (2,2,3,3,3,3,4,5,6,7,8). Why .363636?
>
> After insertion, @x has 11 elements. \$elt (3) beats 2 of those elements and
> ties 4 (including itself). Ties count as half-wins, so the new element beats
> 2+(4/2) or 4 elements, and 4/11 is .3636...
>
> Below is one way to do it. I'd turn it into a subroutine if I wanted to use
> it, but it's really inefficient. If I insert 100 elements and want the
> percentile of each as its inserted, it has to do something like 50K
> comparisons. The subroutine's also ugly, but ignore that.
>
> I'm guessing a subroutine that maintained a sorted list would work much
> better.
>

I'd agree.

> @mylist = (2,2,3,3,3,4,5,6,7,8);
> \$elt = 3;
> push(@mylist,\$elt); #insert the element first
>
> for \$i (@mylist) {
> if (\$elt>\$i) {\$win++; next} # how many times does \$elt beat list values
> if (\$elt<\$i) {\$lose++; next} # how many times does \$elt lose to list value
> \$tie++; # ties count as half-wins, including the element I just inserted
> }
>
> (\$win+\$tie/2)/(\$win+\$lose+\$tie);
>

It's not THAT ugly, although keeping the array sorted would be very
helpful. You needn't keep track of the number of \$lose, because you don't
need \$win + \$lose + \$tie (the size of the array): the size of the array can
be gotten from the array itself in scalar context, as my function below
shows.

my @numbers = (1,2,5,5,8,9,9,9,9,10);
my \$next = 5;

my \$pct = insert_and_get_percentile(\@numbers, \$next); # pass @numbers by
reference

# assumes that the input list is ALREADY SORTED
sub insert_and_get_percentile {
my (\$data, \$n) = @_;
my \$score = 0;

push @\$data, \$n; # \$n, the new value, is now at the END of the array

# an old-school for loop for going through the data
for (my \$i = 0; \$i < @\$data; \$i++) {

# if data[x] is less than the new value add 1
\$score++ if \$data->[\$i] < \$n;

# if data[x] is equal to the new value, add 1/2
\$score += 0.5 if \$data->[\$i] == \$n;

# if data[x] is greater than the new value:
# a) add 1/2 (for the new value itself),
# b) move the new value to right before data[x] (via pop() and
splice()),
# c) and stop looping through the array
\$score += 0.5, splice(@\$data, \$i, 0, pop @\$data), last if \$data->[\$i]
> \$n;

}

return \$score / @\$data;
}

--
[Mary said,] "Do whatever he tells you." ~ John 2:5
The Cross Reference - http://thecrossreference.blogspot.com/
Nos autem praedicamus Christum crucifixum (1 Cor 1:23)

[Non-text portions of this message have been removed]
Your message has been successfully submitted and would be delivered to recipients shortly.