We've now been kicking the features 'index_of' and 'substring_index'
around for a while, and I think it's time to move towards a conclusion.
The major issue is whether the result for a failed search should be "0"
or "count + 1". I think there was enough diversity of opinion that we
should put this one to a vote, and I'm initiating a poll now.
There's a strong case to be made for retaining a result of "0", because
it preserves compatibility with ELKS 95, and with all except one of the
current implementations. Nevertheless, if "count + 1" is favoured
strongly enough then it's better to change now than later.
You should soon receive a message from the eGroups system inviting you
to cast your vote. The eGroups poll will run for three days, and is open
to all members of NICE. If you are unable to cast your vote within that
time, or have problems with the eGroups polling system, feel free to
email your vote to this list.
SUMMARY OF DISCUSSION
For your convenience, here's a summary of some of the points made in the
discussion. For full details, check the eGroups archives.
=== from Roger Browne, 12 July 2000 ===
ELKS 95 includes these header comments:
index_of (c: CHARACTER; start: INTEGER): INTEGER
-- Position of first occurrence of c at or after start;
-- 0 if none.
substring_index (other: STRING; start: INTEGER) : INTEGER
-- Position of first occurrence of other at or after start;
-- 0 if none.
Both features are required to return a result of 0 if the requested
CHARACTER or STRING is not found.
ISE, HACT and VE follow this behaviour. However, SmallEiffel returns
"count + 1" if the requested CHARACTER or STRING is not found.
=== from James McKim, 13 July 2000 ===
I can and have lived with either, but I prefer SmallEiffel's version.
To me it's just a whole lot more intutive.
=== from Peter Horan, 23 July 2000 ===
I have a strong preference for returning count + 1 on failure of
index_of based on
the continuity of this choice.
[Peter then posted an example of splitting a STRING into its delimited
components, where a missing trailing delimiter was acceptable. Peter
showed how it was necessary to add the code "if separator_position = 0
then separator_position := s.count + 1 end" to make the code work
=== from Joachim Durchholz, 24 July 2000 ===
I have a strong preference for returning 0, for various reasons...
[Joachim's reasons included elegance and efficiency. He gave a code
example where the "0" case was simpler:]
...f all that interests me is whether I can find a substring somewhere,
I can write
if (some_complicated_expression).substring_index = 0 then
else I have to write
= (some_complicated_expression).count + 1
=== from Arno Wagner, 24 July 2000 ===
[Arno pointed out that it was very poor design to return an error flag
as a "special value" of the result data. I think most of us would agree
with this - but we are dealing with an existing library standard rather
than a "greenfield" design exercise, so we may need to settle for less
than the optimum design. Arno wrote:
All in all I am strongly against using INTEGERS to express
BOOLEAN values is there is a reasonable way to use BOOLEANs directly.
Maybe we could keep 'index_of' and 'substring_index' for
compatibility (with a mandatory 'deprecated' warning) and
put new features along the lines I suggested above
[Arno's suggestion was to provide for command-query separation within
class STRING, by means of the following three features.]
=== from Joachim Durchholz, 24 July: ===
...This would require a serious redesign of STRING ... Unfortunately
[the extra attribute(s)] would burden every single instance of STRING
... In particular, this would affect every single string literal in the
language. STRING is already a relatively slow class; I don't think that
we should do this.
=== from Peter Horan, 25 July ===
[quoting an earlier message from Joachim Durchholz:]
> 2) Returning "count + 1" is a rather arbitrary choice. For example, if I'm
I would not say it was arbitrary - I would say it was a "continuation"
function. The result "count + 1" is less unexpected than zero.
=== from Pierre Metras, 25 July ===
0 or any well defined constant `Not_Found' is better than count+1.
=== from Greg Compestine, 25 July ===
Would it be sufficient for the contract on index_of/etc to specify
that if the item is not in the string, then valid_index(Result) =
false? This covers both alternative values, 0 and count+1 and
eliminates the need for explicit range checks.
[Greg, that option is not present in the vote that I am starting today -
but it's the current way to write interoperable code, so in a sense it's
the standard idiom unless we can get an identical implementation across
all Eiffel compilers.]
=== from Joachim Durchholz, 26 July ===
Unfortunately, I see a very bad problem here: it's too easy to write
that seems to be correct (i.e. passes all tests) but is nonportable
expects a zero result and will work fine until it's ported to an
that will return count+1).
From this, I derive a pompously-named Unique Result Principle: Do not
multiple query results to mean the same thing...
=== from Peter Horan, 27 July ===
[revisiting the example that he introduced on 23 July:]
If I refactor the code, I might write in the body
if has_from(s, field_start, separator)
separator_position := s.index_of(separator, field-start)
separator_position := s.count + 1
x.item_field := s.substring(field_start, separator_position - 1)
has_from(s: STRING; field_start, separator: INTEGER) is
-- Does s contain the separator at or after the field_start?
Result := field_start <= s.count and then
s.index_of(separator, field_start) /= 0
Hmm... Refactoring, if this is right, improves the code. My requirement
count + 1 is a little weaker.
=== END ===
Roger Browne - roger@...
- Everything Eiffel
19 Eden Park Lancaster LA1 4SJ UK - Phone +44 1524 32428