Loading ...
Sorry, an error occurred while loading the content.

Percentile of inserted element

Expand Messages
  • Kelly Jones
    I want an efficient subroutine that inserts $elt into @x and returns the percentile of $elt in @x. For example, if @x is (2,2,3,3,3,4,5,6,7,8) and $elt is 3,
    Message 1 of 2 , Oct 11, 2008
    • 0 Attachment
      I want an efficient subroutine that inserts $elt into @x and returns
      the percentile of $elt in @x.

      For example, if @x is (2,2,3,3,3,4,5,6,7,8) and $elt is 3, the
      subroutine returns .363636... and @x is now
      (2,2,3,3,3,3,4,5,6,7,8). Why .363636...?:

      % After insertion, @x has 11 elements. $elt (3) beats 2 of those
      elements and ties 4 (including itself). Ties count as half-wins, so
      the new element beats 2+(4/2) or 4 elements, and 4/11 is .3636...

      Below is one way to do it. I'd turn it into a subroutine if I wanted
      to use it, but it's really inefficient. If I insert 100 elements and
      want the percentile of each as its inserted, it has to do something
      like 50K comparisons. The subroutine's also ugly, but ignore that.

      I'm guessing a subroutine that maintained a sorted list would work
      much better.

      My ugly inefficient solution:

      @mylist = (2,2,3,3,3,4,5,6,7,8);
      $elt = 3;
      push(@mylist,$elt); #insert the element first

      for $i (@mylist) {
      if ($elt>$i) {$win++; next} # how many times does $elt beat list values
      if ($elt<$i) {$lose++; next} # how many times does $elt lose to list value
      $tie++; # ties count as half-wins, including the element I just inserted
      }

      print "WIN: $win, TIE: $tie, LOSE: $lose\n";
      # could use $#mylist+1 for denominator below; 1. is to force floating point
      print 1.*($win+$tie/2)/($win+$lose+$tie);
      print "\n";

      --
      We're just a Bunch Of Regular Guys, a collective group that's trying
      to understand and assimilate technology. We feel that resistance to
      new ideas and technology is unwise and ultimately futile.
    • Jeff Pinyan
      On Sat, Oct 11, 2008 at 12:31 PM, Kelly Jones ... I d agree. ... It s not THAT ugly, although keeping the array sorted would be very helpful. You needn t keep
      Message 2 of 2 , Oct 13, 2008
      • 0 Attachment
        On Sat, Oct 11, 2008 at 12:31 PM, Kelly Jones
        <kelly.terry.jones@...>wrote:

        > I want an efficient subroutine that inserts $elt into @x and returns the
        > percentile of $elt in @x.
        >
        > For example, if @x is (2,2,3,3,3,4,5,6,7,8) and $elt is 3, the subroutine
        > returns .363636... and @x is now (2,2,3,3,3,3,4,5,6,7,8). Why .363636?
        >
        > After insertion, @x has 11 elements. $elt (3) beats 2 of those elements and
        > ties 4 (including itself). Ties count as half-wins, so the new element beats
        > 2+(4/2) or 4 elements, and 4/11 is .3636...
        >
        > Below is one way to do it. I'd turn it into a subroutine if I wanted to use
        > it, but it's really inefficient. If I insert 100 elements and want the
        > percentile of each as its inserted, it has to do something like 50K
        > comparisons. The subroutine's also ugly, but ignore that.
        >
        > I'm guessing a subroutine that maintained a sorted list would work much
        > better.
        >

        I'd agree.


        > @mylist = (2,2,3,3,3,4,5,6,7,8);
        > $elt = 3;
        > push(@mylist,$elt); #insert the element first
        >
        > for $i (@mylist) {
        > if ($elt>$i) {$win++; next} # how many times does $elt beat list values
        > if ($elt<$i) {$lose++; next} # how many times does $elt lose to list value
        > $tie++; # ties count as half-wins, including the element I just inserted
        > }
        >
        > ($win+$tie/2)/($win+$lose+$tie);
        >

        It's not THAT ugly, although keeping the array sorted would be very
        helpful. You needn't keep track of the number of $lose, because you don't
        need $win + $lose + $tie (the size of the array): the size of the array can
        be gotten from the array itself in scalar context, as my function below
        shows.

        my @numbers = (1,2,5,5,8,9,9,9,9,10);
        my $next = 5;

        my $pct = insert_and_get_percentile(\@numbers, $next); # pass @numbers by
        reference

        # assumes that the input list is ALREADY SORTED
        sub insert_and_get_percentile {
        my ($data, $n) = @_;
        my $score = 0;

        push @$data, $n; # $n, the new value, is now at the END of the array

        # an old-school for loop for going through the data
        for (my $i = 0; $i < @$data; $i++) {

        # if data[x] is less than the new value add 1
        $score++ if $data->[$i] < $n;

        # if data[x] is equal to the new value, add 1/2
        $score += 0.5 if $data->[$i] == $n;

        # if data[x] is greater than the new value:
        # a) add 1/2 (for the new value itself),
        # b) move the new value to right before data[x] (via pop() and
        splice()),
        # c) and stop looping through the array
        $score += 0.5, splice(@$data, $i, 0, pop @$data), last if $data->[$i]
        > $n;

        }

        return $score / @$data;
        }

        --
        [Mary said,] "Do whatever he tells you." ~ John 2:5
        The Cross Reference - http://thecrossreference.blogspot.com/
        Nos autem praedicamus Christum crucifixum (1 Cor 1:23)


        [Non-text portions of this message have been removed]
      Your message has been successfully submitted and would be delivered to recipients shortly.