Loading ...
Sorry, an error occurred while loading the content.

Call to vote: Return value of STRING 'index_of' and 'substring_index'

Expand Messages
  • Roger Browne
    We ve now been kicking the features index_of and substring_index around for a while, and I think it s time to move towards a conclusion. The major issue is
    Message 1 of 1 , Aug 7 12:13 PM
      We've now been kicking the features 'index_of' and 'substring_index'
      around for a while, and I think it's time to move towards a conclusion.

      The major issue is whether the result for a failed search should be "0"
      or "count + 1". I think there was enough diversity of opinion that we
      should put this one to a vote, and I'm initiating a poll now.

      There's a strong case to be made for retaining a result of "0", because
      it preserves compatibility with ELKS 95, and with all except one of the
      current implementations. Nevertheless, if "count + 1" is favoured
      strongly enough then it's better to change now than later.

      You should soon receive a message from the eGroups system inviting you
      to cast your vote. The eGroups poll will run for three days, and is open
      to all members of NICE. If you are unable to cast your vote within that
      time, or have problems with the eGroups polling system, feel free to
      email your vote to this list.


      For your convenience, here's a summary of some of the points made in the
      discussion. For full details, check the eGroups archives.

      === from Roger Browne, 12 July 2000 ===

      ELKS 95 includes these header comments:

      index_of (c: CHARACTER; start: INTEGER): INTEGER
      -- Position of first occurrence of c at or after start;
      -- 0 if none.

      substring_index (other: STRING; start: INTEGER) : INTEGER
      -- Position of first occurrence of other at or after start;
      -- 0 if none.

      Both features are required to return a result of 0 if the requested
      CHARACTER or STRING is not found.

      ISE, HACT and VE follow this behaviour. However, SmallEiffel returns
      "count + 1" if the requested CHARACTER or STRING is not found.

      === from James McKim, 13 July 2000 ===

      I can and have lived with either, but I prefer SmallEiffel's version.
      To me it's just a whole lot more intutive.

      === from Peter Horan, 23 July 2000 ===

      I have a strong preference for returning count + 1 on failure of
      index_of based on
      the continuity of this choice.

      [Peter then posted an example of splitting a STRING into its delimited
      components, where a missing trailing delimiter was acceptable. Peter
      showed how it was necessary to add the code "if separator_position = 0
      then separator_position := s.count + 1 end" to make the code work

      === from Joachim Durchholz, 24 July 2000 ===

      I have a strong preference for returning 0, for various reasons...

      [Joachim's reasons included elegance and efficiency. He gave a code
      example where the "0" case was simpler:]

      ...f all that interests me is whether I can find a substring somewhere,
      I can write

      if (some_complicated_expression).substring_index = 0 then

      else I have to write

      if (some_complicated_expression).substring_index
      = (some_complicated_expression).count + 1

      === from Arno Wagner, 24 July 2000 ===

      [Arno pointed out that it was very poor design to return an error flag
      as a "special value" of the result data. I think most of us would agree
      with this - but we are dealing with an existing library standard rather
      than a "greenfield" design exercise, so we may need to settle for less
      than the optimum design. Arno wrote:

      All in all I am strongly against using INTEGERS to express
      BOOLEAN values is there is a reasonable way to use BOOLEANs directly.
      Maybe we could keep 'index_of' and 'substring_index' for
      compatibility (with a mandatory 'deprecated' warning) and
      put new features along the lines I suggested above
      into STRING.

      [Arno's suggestion was to provide for command-query separation within
      class STRING, by means of the following three features.]

      has_character: BOOLEAN
      has_substring: BOOLEAN
      last_found_index: INTEGER

      === from Joachim Durchholz, 24 July: ===

      ...This would require a serious redesign of STRING ... Unfortunately
      [the extra attribute(s)] would burden every single instance of STRING
      ... In particular, this would affect every single string literal in the
      language. STRING is already a relatively slow class; I don't think that
      we should do this.

      === from Peter Horan, 25 July ===

      [quoting an earlier message from Joachim Durchholz:]
      > 2) Returning "count + 1" is a rather arbitrary choice. For example, if I'm

      I would not say it was arbitrary - I would say it was a "continuation"
      of the
      function. The result "count + 1" is less unexpected than zero.

      === from Pierre Metras, 25 July ===

      0 or any well defined constant `Not_Found' is better than count+1.

      === from Greg Compestine, 25 July ===

      Would it be sufficient for the contract on index_of/etc to specify
      that if the item is not in the string, then valid_index(Result) =
      false? This covers both alternative values, 0 and count+1 and
      eliminates the need for explicit range checks.

      [Greg, that option is not present in the vote that I am starting today -
      but it's the current way to write interoperable code, so in a sense it's
      the standard idiom unless we can get an identical implementation across
      all Eiffel compilers.]

      === from Joachim Durchholz, 26 July ===

      Unfortunately, I see a very bad problem here: it's too easy to write
      that seems to be correct (i.e. passes all tests) but is nonportable
      (i.e. it
      expects a zero result and will work fine until it's ported to an
      that will return count+1).

      From this, I derive a pompously-named Unique Result Principle: Do not
      multiple query results to mean the same thing...

      === from Peter Horan, 27 July ===

      [revisiting the example that he introduced on 23 July:]

      If I refactor the code, I might write in the body

      if has_from(s, field_start, separator)
      separator_position := s.index_of(separator, field-start)
      separator_position := s.count + 1
      x.item_field := s.substring(field_start, separator_position - 1)


      has_from(s: STRING; field_start, separator: INTEGER) is
      -- Does s contain the separator at or after the field_start?
      Result := field_start <= s.count and then
      s.index_of(separator, field_start) /= 0

      Hmm... Refactoring, if this is right, improves the code. My requirement
      count + 1 is a little weaker.

      === END ===

      Roger Browne - roger@... - Everything Eiffel
      19 Eden Park Lancaster LA1 4SJ UK - Phone +44 1524 32428
    Your message has been successfully submitted and would be delivered to recipients shortly.