Loading ...
Sorry, an error occurred while loading the content.

'left_adjust' and 'right_adjust'

Expand Messages
  • Roger Browne
    Let s now look at left_adjust and right_adjust . The purpose of these features is to remove leading and trailing spaces. Here are the ELKS95 versions:
    Message 1 of 25 , Nov 1, 2000
    • 0 Attachment
      Let's now look at 'left_adjust' and 'right_adjust'. The purpose of these
      features is to remove leading and trailing spaces. Here are the ELKS95
      versions:

      left_adjust
      -- Remove leading white space.
      ensure
      new_count: (count /= 0) implies (item (1) /= ' ')

      right_adjust
      -- Remove trailing white space.
      ensure
      new_count: (count /= 0) implies (item (count) /= ' ')

      To keep this message conveniently short, I'll only show 'left_adjust'
      from now on (in each case, 'right_adjust' follows the same pattern).
      Here is the ISE/HACT version:

      left_adjust
      -- Remove leading whitespace.
      ensure
      new_count: (count /= 0) implies ((item (1) /= ' ')
      and (item (1) /= '%T') and (item (1) /= '%R')
      and (item (1) /= '%N'))

      As you can see, ISE/HACT consider "whitespace" to be blank, tab, return
      or newline.

      SE rephrases the header comment and the postcondition:

      left_adjust
      -- Remove leading blanks.
      ensure
      stripped: empty or else item(1) /= ' '

      VE treats ASCII codes 0 through 32 as "whitespace":

      left_adjust
      -- Remove leading white space
      ensure
      new_count : (count /= 0) implies (item (1).code > (' ').code)

      None of these specifications is complete, nor is this one from the
      COLZEN and McKIM SEP 99 proposals:

      left_adjust
      -- Remove leading white space.
      ensure
      shorter: count <= old count
      first_not_white_space: not is_empty implies ("%T%R%N
      ").index_of (first) = 0

      There are several issues to discuss regarding 'left_adjust' and
      'right_adjust':

      DEFINITION OF WHITESPACE
      ========================

      Is "whitespace" a blank (as in ELKS95), or blank/tab/return/newline (as
      implemented by ISE/HACT and in the 1999 proposals), or "blank or control
      character" as implemented by VE?

      I don't think the answer is too important, provided it's standard. These
      features are convenience features, not fundamental features. Whatever
      answer we choose, the user will need to "code it themselves" if their
      application requires a different answer.

      I note, by the way, that a SmallEiffel user recently requested that
      FORMFEED be added to the codes recognized by CHARACTER.is_separator
      (which already recognizes blank/tab/newline/return/null). No doubt a
      similar argument could be made to add NULL and FORMFEED (%U and %F) to
      the "whitespace" recognized by 'left/right_adjust'.

      My inclination is to avoid all these arguments, now and in the future,
      by adopting the arbitrary but simple VE decision that all ASCII codes <=
      32 are "whitespace". This nicely addresses the basic purpose of these
      features, which I guess is to "clean up sloppy user input".

      FEATURE NAME
      ============

      In his September 99 proposal, James McKim wrote:

      > I've never been sure why this feature isn't called
      > `remove_leading_white_space'.

      Yeah, I don't think any of us know why it was called 'left_adjust'. But
      is it important enough to need a change, when the existing features are
      mostly interoperable?

      MAKING THE SPECIFICATION COMPLETE
      =================================

      In his September 99 proposal, James McKim explored a few possibilities.
      He wrote:

      > ... I don't see a way to give a complete, compilable spec without some
      > help. I can give a complete spec as a comment. Something like
      >
      > -- there_exists k, 1..old count
      > -- ((old clone(Current)).is_equal(old substring(1, k) + Current) and
      > -- for_all i, 1..k (("%T%R%N ").index_of (old item (i)) > 0)) and
      > -- (k = old count or else (("%T%R%N ").index_of (old item (k + 1) = 0)))
      >
      > Whew!
      >
      > The help I would need to write a compilable spec is a feature like:
      >
      > first_dark_index : INTEGER
      > -- Index of first character that is not white space, count + 1
      > -- if no such character.
      > ensure
      > empty_case: count = 0 implies Result = 1
      > dark_anchor: count > 0 and then
      > ("%T%R%N ").index_of (item (1)) = 0
      > implies Result = 1
      > recurse: count > 0 and then
      > ("%T%R%N ").index_of (item (1)) > 0 implies
      > Result = 1 + substring (2, count).first_dark_index
      >
      > Now the postcondition for `left_adjust' is just...
      >
      > (old clone (Current)).is_equal
      > (old substring(1, first_dark_index-1) + Current)
      >
      > Trouble is, I'm not convinced `first_dark_index' is independently
      > useful enough to be in the kernel.

      Nor am I!

      So far, we have completely specified every feature that we have reviewed
      from ARRAY and STRING. This is fantastic! But there's no getting away
      from the fact that we are not going to be able to do this for all the
      ELKS features (and probably not even for all the STRING features).

      We seem to have a few options open to us:

      (a) incomplete specifications
      -----------------------------

      We could specify 'left_adjust' with a postcondition something like this:

      ensure
      shorter: count <= old count
      first_not_white_space: not is_empty implies
      ("%T%R%N ").index_of (first) = 0
      rest_unchanged: is_equal((old clone(current)).
      substring(old count - count + 1, old count))

      This is correct, but incomplete because it doesn't guarantee that we
      didn't remove any non-blank leading characters.

      (b) complete but non-compilable specifications
      ----------------------------------------------

      Something like Jim's "there exists" suggestion (which I must admit I
      haven't checked yet).

      (c) a combination of (a) and (b)
      --------------------------------

      Both (a) and (b) can be included if this is considered desirable.

      (d) add 'first/last_dark_index' to ELKS
      ---------------------------------------

      This would make the specification compilable, but I really would be
      reluctant to clutter up ELKS STRING with such specialized features.

      (e) use 'first/last_dark_index' without adding it to ELKS
      ---------------------------------------------------------

      We could adopt an extended form of specification which allows for "local
      functions" that make the specification rigorous, without requiring that
      those features are actually implemented by an ELKS-compliant library.
      For example:

      ensure
      left_adjusted: (old clone (Current)).is_equal
      (old substring(1, first_dark_index-1) + Current)
      using
      first_dark_index: INTEGER
      -- Index of first character that is not white space, count + 1
      -- if no such character.
      ensure
      empty_case: count = 0 implies Result = 1
      dark_anchor: count > 0 and then
      ("%T%R%N ").index_of (item (1)) = 0
      implies Result = 1
      recurse: count > 0 and then
      ("%T%R%N ").index_of (item (1)) > 0 implies
      Result = 1 + substring (2, count).first_dark_index

      (f) provide a "specimen implementation"
      ---------------------------------------

      Here's how it might work:

      left_adjust
      -- Remove leading white space
      specimen
      local
      i: INTEGER;
      do
      from
      i := 1
      invariant i >= 1 and i <= count + 1
      variant count + 1 - i
      until i = count + 1 or else item(i) > ' ' loop
      i := i + 1
      end
      keep_tail (count - i + 1)
      ensure
      shorter: count <= old count
      first_not_white_space: not is_empty implies
      ("%T%R%N ").index_of (first) = 0
      rest_unchanged: is_equal((old clone(current)).
      substring(old count - count + 1, old count))

      This approach has the advantage of providing an implementation that any
      vendor may use to become instantly ELKS compliant. Naturally, the
      specimen implementation will be the cleanest algorithm rather than the
      fastest code, so there is still plenty of space for vendors to innovate
      if they wish to do so.

      I must admit I rather like this approach.

      (g) add a corresponding function to ELKS
      ----------------------------------------

      If we add a function 'left_adjusted' to ELKS (which returns a new STRING
      equal to the old one with leading white space removed), we can specify
      this new function with a recursive postcondition. Then, the
      postcondition to 'left_adjust' becomes straightforward:

      ensure
      leading_white_space_removed: is_equal(old left_adjusted)

      What do you think?

      Regards,
      Roger
      --
      Roger Browne - roger@... - Everything Eiffel
      19 Eden Park Lancaster LA1 4SJ UK - Phone +44 1524 32428
    • Arno Wagner
      ... [Rest of post sniped, as it is long] Wow what a pace! Let s have a look at this: Regarding whitespace ... My opinion is that the task of defining what
      Message 2 of 25 , Nov 1, 2000
      • 0 Attachment
        --- Roger Browne wrote:
        > Let's now look at 'left_adjust' and 'right_adjust'.
        [Rest of post sniped, as it is long]

        Wow what a pace! Let's have a look at this:

        Regarding whitespace
        --------------------
        My opinion is that the task of defining what whitespace is
        belongs to CHARACTER. As long as we define a STRING to be a
        sequence of CHARACTERs, the string does not need to know
        about the attributes of individual elements. I would however
        accept the "ASCII < 32" proposal as a working solution.

        Feature Name
        ------------
        The name ist strange, but what the hell. I see no potential for
        misunderstanding. Let us just keep it.

        Complete Specification
        ----------------------
        We could do a recursive specification like the following:

        left_adjust
        -- Remove leading white space
        require
        string_not_void: s /= Void
        ensure
        not_modified_if_no_leading_whitespace:
        (old empty or else not old item(1).is_whitespace) implies
        is_equal(old clone)
        recurse: (not old empty and then item(1).is_whitespace) implies
        is_equal(old substring(2,count).left_adjust)

        The right_adjust variant can be constructed in the same way.

        The postcondition will be only partially executed, as the
        recursive call will be unchecked. However it ensures the
        correctness in the sense that

        1. The postcondition directly ensures that strings with no
        leading whitespaces are unmodified.
        2. If we have leading whitespace, removing one leading
        whitespace does not change the result of 'left_adjust'.
        (But will bring us one step closer to 1., as the
        'substring' call will remove a leading whitespace.
        Formal proof left as an exercise :-)).

        So while that does not give a full check, it provides some
        check and a complete specification. I belive we had recursive
        postconditions before, e.g. in 'occurences' and 'has'.

        I also see some merit in the specimen approach, but as a
        recursive approach is feasible and we already used the
        technique I would propose to use the recursive specification
        in order to keep the standard as homogenous as possible.

        I definitely think we should not include new features just
        to make pre/postconditions work! If we need new features,
        we should have a genuine need for them on their own merits.

        Regards
        Arno
        --------------------------------------------------------------------
        Arno Wagner Dipl. Inform. ETH Zuerich wagner@...
        GnuPG: F0C049F1 FP: 8C E0 6F A5 CC B1 5A 11 ED C7 AD D2 05 5E BB 6F
        Sig of the week: "If you can keep your calm while all others panic,
        you probably don't understand the situation"
      • Roger Browne
        ... Unfortunately, this approach cannot work for left_adjust , because it is a command (i.e. a procedure). But firstly: the string_not_void precondition is
        Message 3 of 25 , Nov 1, 2000
        • 0 Attachment
          Arno Wagner wrote:

          > We could do a recursive specification like the following:
          >
          > left_adjust
          > -- Remove leading white space
          > require
          > string_not_void: s /= Void
          > ensure
          > not_modified_if_no_leading_whitespace:
          > (old empty or else not old item(1).is_whitespace) implies
          > is_equal(old clone)
          > recurse: (not old empty and then item(1).is_whitespace) implies
          > is_equal(old substring(2,count).left_adjust)

          Unfortunately, this approach cannot work for 'left_adjust', because it
          is a command (i.e. a procedure).

          But firstly: the "string_not_void" precondition is meaningless, as there
          is no 's'. We are applying 'left_adjust' to 'current', which can never
          be void. The rules of the Eiffel language prohibit us from applying a
          feature call to a void reference.

          The problem with the postcondition is in the last subexpression:

          is_equal(old substring(2,count).left_adjust)

          Because 'left_adjust' is a procedure, not a function, there is no way to
          call it within an assertion (the body of which is an expression, not an
          instruction).

          So I think a recursive specification of 'left_adjust' is not possible,
          and we are back to the options described in my previous message.

          Regards,
          Roger
          --
          Roger Browne - roger@... - Everything Eiffel
          19 Eden Park Lancaster LA1 4SJ UK - Phone +44 1524 32428
        • Arno Wagner
          ... [...] ... Right, forget this part. ... Right again. I was obviously confused when I wrote the post. ... I should have thought a little longer on this one.
          Message 4 of 25 , Nov 1, 2000
          • 0 Attachment
            --- Roger Browne <egroups@e...> wrote:
            > Arno Wagner wrote:
            >
            > > We could do a recursive specification like the following:
            > >
            [...]
            > > is_equal(old substring(2,count).left_adjust)
            >
            > Unfortunately, this approach cannot work for 'left_adjust',
            > because it is a command (i.e. a procedure).
            >
            > But firstly: the "string_not_void" precondition is meaningless,
            > as there is no 's'. We are applying 'left_adjust' to 'current',
            > which can never be void. The rules of the Eiffel language
            > prohibit us from applying a feature call to a void reference.

            Right, forget this part.

            > The problem with the postcondition is in the last subexpression:
            >
            > is_equal(old substring(2,count).left_adjust)
            >
            > Because 'left_adjust' is a procedure, not a function, there is
            > no way to call it within an assertion (the body of which is an
            > expression, not an instruction).

            Right again. I was obviously confused when I wrote the post.

            > So I think a recursive specification of 'left_adjust' is not
            > possible, and we are back to the options described in my
            > previous message.

            I should have thought a little longer on this one. Oh well.

            Here is another possibility that is not recursive.
            From Rogers original message:
            > (a) incomplete specifications
            > -----------------------------
            > We could specify 'left_adjust' with a postcondition something
            > like this:
            >
            > ensure
            > shorter: count <= old count
            > first_not_white_space: not is_empty implies
            > ("%T%R%N ").index_of (first) = 0
            > rest_unchanged: is_equal((old clone(current)).
            > substring(old count - count + 1, old count))
            >
            > This is correct, but incomplete because it doesn't guarantee
            > that we didn't remove any non-blank leading characters.

            The part about non-blank characters can be corrected, if in an
            unelegant way:

            only_blanks_removed:
            old occurences('%T') + old occurences('%R') +
            old ocurrences('%N') + old occurences(' ') =
            occurences('%T') + occurences('%R') +
            ocurrences('%N') + occurences(' ') + old count - count

            Of course this needs explicit enumeration. For a small set of
            whitespaces it is o.k.. For "ASCII < 32" is gets very cumbersome.
            I think I would like the specimen better.

            Regards
            Arno
          • Alex Shuksto
            ... I don t think that definition of whitespaces is the task of class STRING. Whitespace is a character, so all STRING need is a routine in CHARACTER to
            Message 5 of 25 , Nov 2, 2000
            • 0 Attachment
              > DEFINITION OF WHITESPACE
              > ========================
              I don't think that definition of 'whitespaces' is the task of class
              STRING. Whitespace is a character, so all STRING need is a routine in
              CHARACTER to check is it a whitespace or not. Also I don't think that it is
              a good idea to define all characters from 000 to 032 as whitespaces.
              > FEATURE NAME
              > ============
              The name is "left_adjust" (or "right_abjust"). If we'll need it, we always
              can define it as obsolete and invent a new name.
              > MAKING THE SPECIFICATION COMPLETE
              > =================================
              I think that no one option can solve this problem itself. Most "powerful"
              option is to provide a specimen implementation. May be the right way is to
              combine specimen implementation with corresponding function.
              And about adding something like first_dark_character - it will be a useless
              feature. We may need function which returns string without whitespace at
              first place, but we don't need function which only knows about it.
              --
              Quo fas et gloria ducunt?
              Alex.
            • Ignacio Calvo
              This is a hard problem! :) After thinking for a while, I can only get one solution: creating two new functions, left_adjusted and right_adjusted , with easy
              Message 6 of 25 , Nov 2, 2000
              • 0 Attachment
                This is a hard problem! :)
                After thinking for a while, I can only get one solution: creating two new
                functions, 'left_adjusted' and 'right_adjusted', with easy recursive
                postconditions the way Arno did, and then we could get:

                left_adjust is
                ...
                ensure
                is_left_adjusted: is_equal (old clone.left_adjusted)

                right_adjust is
                ...
                ensure
                is_right_adjusted: is_equal (old clone.right_adjusted)


                The new functions can be very useful; often one is uninterested by the
                trimmed string itself and the only thing one wants to do is, for example, to
                pass it as an argument:

                parse (s.right_adjusted)

                Let's work a proposal, following closely Arno's try:

                left_adjusted: STRING is
                -- The same without leading whitespaces
                ensure
                result_not_void: Result /= Void
                same_if_no_leading_whitespaces:
                (old empty or else not old item (1).is_whitespace) implies
                Result.is_equal (old clone)
                recurse:
                (not old empty and then item (1).is_whitespace) implies
                Result.is_equal (old substring(2,count).left_adjusted)

                right_adjusted: STRING is
                -- The same without trailing whitespaces
                ensure
                result_not_void: Result /= Void
                same_if_no_trailing_whitespaces:
                (old empty or else not old item (count).is_whitespace) implies
                Result.is_equal (old clone)
                recurse:
                (not old empty and then item (count).is_whitespace) implies
                Result.is_equal (old substring (1, count - 1).right_adjusted)

                The parentheses that enclose the 'or else'/'and then' antecedents aren't
                necessary but clear up things.

                Experimental thing
                ------------------
                This sets the ground for other useful procedure/function pair: 'adjust' and
                'adjusted', traditionaly called 'trim'/'trimmed', with a trivial
                specification:

                adjust is
                -- Remove leading and trailing whitespaces
                ensure
                is_adjusted: is_equal (old clone.adjusted)

                adjusted: STRING is
                -- The same without leading or trailing whitespaces
                ensure
                result_not_void: Result /= Void
                result_adjusted:
                Result.is_equal (old clone.left_adjusted.right_adjusted)

                I think these are the most required "whitespace elimination" features, and
                that's why I think we should consider adding it to ELKS (not now if there is
                a minimal opposition, but when STRING or ELKS is near to be finished).

                Saludos from Spain!
                Ignacio Calvo
              • Ignacio Calvo
                Oh sorry, Roger, I didn t read the last of your suggestions, which was precisely what I say in my message... anyway the specification work and
                Message 7 of 25 , Nov 2, 2000
                • 0 Attachment
                  Oh sorry, Roger, I didn't read the last of your suggestions, which was
                  precisely what I say in my message... anyway the specification work and
                  'adjust'/'adjusted' suggestion remain original ;-)

                  Saludos from Spain!
                  Ignacio Calvo
                • Arno Wagner
                  ... Stangely I also overlooked the last sugesstion. I think it is the best option so far, left_adjusted and right_adjusted have enough merit of their own
                  Message 8 of 25 , Nov 3, 2000
                  • 0 Attachment
                    --- "Ignacio Calvo" <icalvo@i...> wrote:
                    > Oh sorry, Roger, I didn't read the last of your suggestions,
                    > which was precisely what I say in my message... anyway the
                    > specification work and 'adjust'/'adjusted' suggestion remain
                    > original ;-)
                    >
                    > Saludos from Spain!
                    > Ignacio Calvo

                    Stangely I also overlooked the last sugesstion. I think it is the
                    best option so far, 'left_adjusted' and 'right_adjusted' have
                    enough merit of their own and this removes our specification
                    problem quite elegantly.

                    And, as an added benefit, we do _not_ have to specify within
                    STRING what whitespace is!

                    I am not sure we should include 'adjusted' now. There are further
                    whitespace removal features, e.g. a 'compact' that compresses
                    sequences of whitespace in just one and a 'unify_whitespace' that
                    maps every whitespace character to ' ', to name two I can think of.

                    Regards,
                    Arno
                    ---------------------------------------------------------------------
                    Arno Wagner Dipl. Inform. ETH Zuerich wagner@...
                    GnuPG: ID:F0C049F1 FP:8C E0 6F A5 CC B1 5A 11 ED C7 AD D2 05 5E BB 6F
                    Sig of the week: "If you can keep your calm while all others panic,
                    you probably don't understand the situation"
                  • Roger Browne
                    ... What is your preferred alternative, and why is it better? Here are two other possibilities. Both are based on the definitions in the standard C library
                    Message 9 of 25 , Nov 3, 2000
                    • 0 Attachment
                      Alex Shuksto wrote:

                      > ... I don't think that it is a good idea to define
                      > all characters from 000 to 032 as whitespaces.

                      What is your preferred alternative, and why is it better?

                      Here are two other possibilities. Both are based on the definitions in
                      the standard C library <ctype.h>.

                      (1) Use the same definition as "isspace(c)", i.e. trim the following
                      characters: blank, formfeed, newline, return, tab, vertical tab.

                      (2) Use the same definition as "!isgraph(c)", i.e. trim the characters
                      with ASCII codes 0 to 31 inclusive (the control characters), and 32
                      (blank), and 127 (DEL).

                      The first option seems a good choice if 'left_adjust' and 'right_adjust'
                      are to be maximally useful for tasks such as extracting words from a
                      STRING.

                      The second option seems a good choice if 'left_adjust' and
                      'right_adjust' are to be maximally useful for tasks such as cleaning up
                      sloppy interactive user input.

                      > I don't think that definition of 'whitespaces' is the
                      > task of class STRING. Whitespace is a character, so
                      > all STRING need is a routine in
                      > CHARACTER to check is it a whitespace or not.

                      I had a look at what is already in class CHARACTER. There are no
                      relevant features in ELKS95 CHARACTER or in the VE implementation.

                      ISE and HACT and SE implement:

                      is_alpha
                      is_digit
                      is_lower
                      is_upper

                      In addition, SE implements these:

                      is_letter (same as 'is_alpha')
                      is_decimal_digit (same as 'is_digit')
                      is_binary_digit
                      is_octal_digit
                      is_hexadecimal_digit
                      is_separator
                      is_letter_or_digit
                      is_ascii
                      is_bit (same as 'is_binary_digit')

                      Of these, the interesting one is 'is_separator' which (in the next SE
                      release) will return true for the following codes: blank, tab, newline,
                      return, null and formfeed.

                      TowerEiffel used to provide a class with features corresponding to those
                      in <ctype.h>, and I recall that it was popular class. It made it easier
                      to port software from C. It also made it easier to wrap external C
                      software, because the Eiffel features would often be suitable for use as
                      preconditions to the external C features.

                      Here's a possible approach:

                      (1) In CHARACTER, provide classification functions. The definitions
                      would match those in <ctype.h>. The Eiffel names could be the
                      same, or could follow the Eiffel naming style and existing vendor
                      usage, e.g.:

                      <ctype.h> name possible name in class CHARACTER
                      isalnum(c) is_letter_or_digit
                      isalpha(c) is_letter
                      iscntrl(c) is_control
                      isdigit(c) is_digit
                      isgraph(c) is_graph -- a poor name, I know
                      islower(c) is_lower
                      isprint(c) is_print -- another poor name
                      ispunct(c) is_punctuation
                      isspace(c) is_separator
                      isupper(c) is_upper
                      isxdigit(c) is_hexadecimal_digit

                      (2) In STRING, specify 'left_adjust' and 'right_adjust' in terms of
                      these.

                      Is there any interest in this approach?

                      If so, I have no objection to sorting out the names of the CHARACTER
                      features at this point in the ELKS process. But if people want to use
                      different definitions to those used by C, then it becomes a bigger task
                      and I'd rather get on with STRING first (perhaps just adding a comment
                      to 'left_adjust' and 'right_adjust' stating that we plan to refine these
                      features in a future ELKS after CHARACTER has been reviewed).

                      Regards,
                      Roger
                      --
                      Roger Browne - roger@... - Everything Eiffel
                      19 Eden Park Lancaster LA1 4SJ UK - Phone +44 1524 32428
                    • Alexander Kogtenkov
                      ... BTW, the features we are discussing might be useful not only for whitespaces. Instead we could remove the notion of whitespace at all and ask a user to
                      Message 10 of 25 , Nov 3, 2000
                      • 0 Attachment
                        Arno Wagner wrote:

                        > ... There are further
                        > whitespace removal features, e.g. a 'compact' that compresses
                        > sequences of whitespace in just one and a 'unify_whitespace' that
                        > maps every whitespace character to ' ', to name two I can think of.

                        BTW, the features we are discussing might be useful not only for
                        whitespaces. Instead we could remove the notion of whitespace
                        at all and ask a user to specify what characters he wants to remove.
                        E.g., for the "left_adjust" we could have

                        left_adjust (characters_to_remove_from_the_left: STRING) is
                        ...

                        I _do not_ propose to make such a change for the existing features,
                        these are just ideas for the future.

                        Regards,
                        Alexander Kogtenkov
                        Object Tools, Moscow
                      • Arno Wagner
                        ... I think this would be a good choice. ... I don t know. If there are control characters (other than those that are an isspace as well) in the STRING,
                        Message 11 of 25 , Nov 3, 2000
                        • 0 Attachment
                          Roger Browne wrote:
                          > Alex Shuksto wrote:
                          >
                          > > ... I don't think that it is a good idea to define
                          > > all characters from 000 to 032 as whitespaces.
                          >
                          > What is your preferred alternative, and why is it better?
                          >
                          > Here are two other possibilities. Both are based on the
                          > definitions in the standard C library <ctype.h>.
                          >
                          > (1) Use the same definition as "isspace(c)", i.e. trim the
                          > following characters: blank, formfeed, newline, return,
                          > tab, vertical tab.
                          >

                          I think this would be a good choice.

                          > (2) Use the same definition as "!isgraph(c)", i.e. trim the
                          > characters with ASCII codes 0 to 31 inclusive (the control
                          > characters), and 32 (blank), and 127 (DEL).
                          >

                          I don't know. If there are control characters (other than those
                          that are an 'isspace' as well) in the STRING, maybe they serve
                          some purpose and should not just be ingnored. It is not very
                          likely to type, .e.g., a ctrl-D by accident.

                          > > I don't think that definition of 'whitespaces' is the
                          > > task of class STRING. Whitespace is a character, so
                          > > all STRING need is a routine in
                          > > CHARACTER to check is it a whitespace or not.
                          >
                          > I had a look at what is already in class CHARACTER. There are no
                          > relevant features in ELKS95 CHARACTER or in the VE implementation.
                          >
                          [...]
                          > Of these, the interesting one is 'is_separator' which (in the
                          > next SE release) will return true for the following codes:
                          > blank, tab, newline, return, null and formfeed.
                          >
                          [...]
                          > Here's a possible approach:
                          >
                          > (1) In CHARACTER, provide classification functions. The
                          > definitions would match those in <ctype.h>. The Eiffel
                          > names could be the same, or could follow the Eiffel
                          > naming style and existing vendor usage, e.g.:
                          >
                          > <ctype.h> name possible name in class CHARACTER
                          > isalnum(c) is_letter_or_digit
                          > isalpha(c) is_letter
                          > iscntrl(c) is_control
                          > isdigit(c) is_digit
                          > isgraph(c) is_graph -- a poor name, I know
                          > islower(c) is_lower
                          > isprint(c) is_print -- another poor name
                          > ispunct(c) is_punctuation
                          > isspace(c) is_separator
                          > isupper(c) is_upper
                          > isxdigit(c) is_hexadecimal_digit
                          >
                          > (2) In STRING, specify 'left_adjust' and 'right_adjust' in
                          > terms of these.
                          >
                          > Is there any interest in this approach?

                          I like this approach. We use something that has been used
                          successfully for a long time and get the benefit that we
                          don't have to make some definitions in STRING that don't
                          belong there IMHO. I would also propose to explicitely
                          name the basic c funftions in the header comments, e.g.:

                          is_seperator: BOOLEAN
                          -- Returns true on ' ','%F','%N','%R','%T','%/11/'
                          -- Follows the ANSI/ISO C function "isspace()"

                          There does not seem to be a short for for the vertical tab.

                          > If so, I have no objection to sorting out the names of the
                          > CHARACTER features at this point in the ELKS process. But
                          > if people want to use different definitions to those used
                          > by C, then it becomes a bigger task and I'd rather get on
                          > with STRING first (perhaps just adding a comment to 'left_adjust'
                          > and 'right_adjust' stating that we plan to refine these
                          > features in a future ELKS after CHARACTER has been reviewed).

                          I think we should just take these names and move on. I really
                          don't see how we could come up with something significantly
                          better for static classification. The proposal of Alexander is
                          something we could do in a future version of ELKS, but then
                          we should use a name like

                          left_adjust_with_set(set_of_seperators:STRING)

                          Other opinions?
                          Could we have a vote on the whole package of features?

                          Regards,
                          Arno

                          ---------------------------------------------------------------------
                          Arno Wagner Dipl. Inform. ETH Zuerich wagner@...
                          GnuPG: ID:F0C049F1 FP:8C E0 6F A5 CC B1 5A 11 ED C7 AD D2 05 5E BB 6F
                          "The early bird gets the worm, but the second mouse gets the cheese."
                        • Simon Parker
                          Good afternoon. On Friday, November 03, 2000 11:00 AM, Roger Browne ... I think this is the first mention of the word trim in this discussion. I ve
                          Message 12 of 25 , Nov 3, 2000
                          • 0 Attachment
                            Good afternoon.

                            On Friday, November 03, 2000 11:00 AM, Roger Browne
                            [SMTP:egroups@...] wrote:
                            > Alex Shuksto wrote:
                            >
                            > > ... I don't think that it is a good idea to define
                            > > all characters from 000 to 032 as whitespaces.
                            >
                            > What is your preferred alternative, and why is it better?
                            >
                            > Here are two other possibilities. Both are based on the definitions in
                            > the standard C library <ctype.h>.
                            >
                            > (1) Use the same definition as "isspace(c)", i.e. trim the following
                            > characters: blank, formfeed, newline, return, tab, vertical tab.

                            <aside>
                            I think this is the first mention of the word 'trim' in this discussion.
                            I've always known this operation as 'trimming', and I can't think of a
                            language or library other than Eiffel which uses the name 'adjust'. Have I
                            spent too long writing VAX COBOL and BASIC?

                            I might revisit this in a renaming proposal at the appropriate point...
                            </aside>

                            [...]
                            >
                            > Here's a possible approach:
                            >
                            > (1) In CHARACTER, provide classification functions. The definitions
                            > would match those in <ctype.h>. The Eiffel names could be the
                            > same, or could follow the Eiffel naming style and existing vendor
                            > usage, e.g.:
                            >
                            > <ctype.h> name possible name in class CHARACTER
                            > isalnum(c) is_letter_or_digit
                            > isalpha(c) is_letter
                            > iscntrl(c) is_control
                            > isdigit(c) is_digit
                            > isgraph(c) is_graph -- a poor name, I know
                            > islower(c) is_lower
                            > isprint(c) is_print -- another poor name
                            > ispunct(c) is_punctuation
                            > isspace(c) is_separator
                            > isupper(c) is_upper
                            > isxdigit(c) is_hexadecimal_digit
                            >
                            > (2) In STRING, specify 'left_adjust' and 'right_adjust' in terms of
                            > these.
                            >
                            > Is there any interest in this approach?

                            Spot on!

                            I think that means preserving the names with the standard modification of
                            'is*' to 'is_*'.
                            Now or later, we could add synonyms for the really ugly ones. How about
                            this:

                            > isalnum(c) is_alnum, is_alphanumeric,
                            is_letter_or_digit
                            > isalpha(c) is_alpha, is_letter
                            > iscntrl(c) is_cntrl, is_control
                            > isdigit(c) is_digit
                            > isgraph(c) is_graph -- a poor name, I know
                            > islower(c) is_lower
                            > isprint(c) is_print, is_printable
                            > ispunct(c) is_punct, is_punctuation
                            > isspace(c) is_space, is_white, is_whitespace
                            > isupper(c) is_upper
                            > isxdigit(c) is_xdigit, is_hexadecimal_digit

                            I am at a lost for a sensible alternative to 'is_graph'. Perhaps if we
                            ignore it it will go away...

                            I've dropped 'is_separator' because I think the notion of separators is too
                            context-dependent. What if I'm only interested in line separators? Or I
                            consider a hyphen '-' to be a word separator too? We should save that for a
                            scanner class.

                            >
                            > If so, I have no objection to sorting out the names of the CHARACTER
                            > features at this point in the ELKS process. But if people want to use
                            > different definitions to those used by C, then it becomes a bigger task
                            [...]

                            A final observation: a more lasting classification scheme may be that
                            associated with Unicode rather than ASCII. Let's not get stuck in this, but
                            grab these established, standardised names and move on. As I said, the
                            synonyms are optional.

                            >
                            > Regards,
                            > Roger

                            Regards,
                            Simon

                            Simon Parker +353 87 249 7859
                          • Jeff Clark
                            ... I like this approach. May I suggest is_graphic and is_printable ? Also, is_hexadecimal_digit is a little long-winded. How about is_hex_digit ? The
                            Message 13 of 25 , Nov 3, 2000
                            • 0 Attachment
                              Roger Browne wrote:

                              > Here's a possible approach:
                              >
                              > (1) In CHARACTER, provide classification functions. The definitions
                              > would match those in <ctype.h>. The Eiffel names could be the
                              > same, or could follow the Eiffel naming style and existing vendor
                              > usage, e.g.:
                              >
                              > <ctype.h> name possible name in class CHARACTER
                              > isalnum(c) is_letter_or_digit
                              > isalpha(c) is_letter
                              > iscntrl(c) is_control
                              > isdigit(c) is_digit
                              > isgraph(c) is_graph -- a poor name, I know
                              > islower(c) is_lower
                              > isprint(c) is_print -- another poor name
                              > ispunct(c) is_punctuation
                              > isspace(c) is_separator
                              > isupper(c) is_upper
                              > isxdigit(c) is_hexadecimal_digit
                              >
                              > (2) In STRING, specify 'left_adjust' and 'right_adjust' in terms of
                              > these.
                              >
                              > Is there any interest in this approach?

                              I like this approach.

                              May I suggest "is_graphic" and "is_printable"? Also, "is_hexadecimal_digit"
                              is a little long-winded. How about "is_hex_digit"? The meaning is still
                              clear and keeps the name to a reaonable length.

                              --
                              Jeff Clark | SDRC Metaphase Technology Group
                              "Too soon old, too late smart" | http://www.sdrc.com
                              mailto:jlc@... | http://www.linuxfan.com/~jeffclark
                              ------------------------------------------------------------------------------
                              You saw "The Matrix", right? Be advised that learning Eiffel is like
                              taking the red pill, and ignoring it is like taking the blue pill.
                            • Alex Shuksto
                              ... For me the choice is (1), because the task of cleaning user s input depends on ours main goal and deleting such characters as etc by default is a
                              Message 14 of 25 , Nov 3, 2000
                              • 0 Attachment
                                > What is your preferred alternative, and why is it better?
                                >
                                > Here are two other possibilities. Both are based on the definitions in
                                > the standard C library <ctype.h>.
                                >
                                > (1) Use the same definition as "isspace(c)", i.e. trim the following
                                > characters: blank, formfeed, newline, return, tab, vertical tab.
                                >
                                > (2) Use the same definition as "!isgraph(c)", i.e. trim the characters
                                > with ASCII codes 0 to 31 inclusive (the control characters), and 32
                                > (blank), and 127 (DEL).
                                >
                                > The first option seems a good choice if 'left_adjust' and 'right_adjust'
                                > are to be maximally useful for tasks such as extracting words from a
                                > STRING.
                                >
                                > The second option seems a good choice if 'left_adjust' and
                                > 'right_adjust' are to be maximally useful for tasks such as cleaning up
                                > sloppy interactive user input.
                                For me the choice is (1), because the task of cleaning user's input
                                depends on ours main goal and deleting such characters as <Ctrl-D> etc by
                                default is a wrong way (I think).
                                > I had a look at what is already in class CHARACTER. There are no
                                > relevant features in ELKS95 CHARACTER or in the VE implementation.
                                >
                                > ISE and HACT and SE implement:
                                >
                                > is_alpha
                                > is_digit
                                > is_lower
                                > is_upper
                                >
                                > In addition, SE implements these:
                                >
                                > is_letter (same as 'is_alpha')
                                > is_decimal_digit (same as 'is_digit')
                                > is_binary_digit
                                > is_octal_digit
                                > is_hexadecimal_digit
                                > is_separator
                                > is_letter_or_digit
                                > is_ascii
                                > is_bit (same as 'is_binary_digit')
                                >
                                > Of these, the interesting one is 'is_separator' which (in the next SE
                                > release) will return true for the following codes: blank, tab, newline,
                                > return, null and formfeed.
                                Really a good idea. There was an opinion that separator is a notion, which
                                user must define himself. It's a PERL point of view and it's good, but I
                                have no idea, how to implement it now.
                                >
                                > TowerEiffel used to provide a class with features corresponding to those
                                > in <ctype.h>, and I recall that it was popular class. It made it easier
                                > to port software from C. It also made it easier to wrap external C
                                > software, because the Eiffel features would often be suitable for use as
                                > preconditions to the external C features.
                                >
                                > Here's a possible approach:
                                >
                                > (1) In CHARACTER, provide classification functions. The definitions
                                > would match those in <ctype.h>. The Eiffel names could be the
                                > same, or could follow the Eiffel naming style and existing vendor
                                > usage, e.g.:
                                >
                                > <ctype.h> name possible name in class CHARACTER
                                > isalnum(c) is_letter_or_digit
                                > isalpha(c) is_letter
                                > iscntrl(c) is_control
                                > isdigit(c) is_digit
                                > isgraph(c) is_graph -- a poor name, I know
                                > islower(c) is_lower
                                > isprint(c) is_print -- another poor name
                                > ispunct(c) is_punctuation
                                > isspace(c) is_separator
                                > isupper(c) is_upper
                                > isxdigit(c) is_hexadecimal_digit
                                >
                                > (2) In STRING, specify 'left_adjust' and 'right_adjust' in terms of
                                > these.
                                >
                                > Is there any interest in this approach?
                                Yes. It will be very useful.
                                >
                                > If so, I have no objection to sorting out the names of the CHARACTER
                                > features at this point in the ELKS process. But if people want to use
                                > different definitions to those used by C, then it becomes a bigger task
                                > and I'd rather get on with STRING first (perhaps just adding a comment
                                > to 'left_adjust' and 'right_adjust' stating that we plan to refine these
                                > features in a future ELKS after CHARACTER has been reviewed).
                                Of course. I'm confident in that.
                                --
                                Quo fas et gloria ducunt?
                                Alex.
                              • Alex Shuksto
                                ... And about the names. All is ok (exept is_upper & is_lower may be - I think is_upper_case & is_lower_case will be better choice), but there is to poor names
                                Message 15 of 25 , Nov 3, 2000
                                • 0 Attachment
                                  > <ctype.h> name possible name in class CHARACTER
                                  > isalnum(c) is_letter_or_digit
                                  > isalpha(c) is_letter
                                  > iscntrl(c) is_control
                                  > isdigit(c) is_digit
                                  > isgraph(c) is_graph -- a poor name, I know
                                  > islower(c) is_lower
                                  > isprint(c) is_print -- another poor name
                                  > ispunct(c) is_punctuation
                                  > isspace(c) is_separator
                                  > isupper(c) is_upper
                                  > isxdigit(c) is_hexadecimal_digit
                                  And about the names. All is ok (exept is_upper & is_lower may be -
                                  I think is_upper_case & is_lower_case will be better choice), but there is
                                  to poor names such is_graph & is_print. I can say that

                                  is_graph = is_print and not is_space

                                  (I know that there is no such routine, but...) so, do we really need
                                  is_graph? Why space are "higher" than other separators? If we introduce the
                                  notion of separator, we must use it! And if we will substitute is_space for
                                  is_separator, do we really need a routine for one string of code?

                                  And the last about is_print - I think is_printable is a good choice.

                                  BTW, all that is a terms of class CHARACTER... Not STRING...
                                  --
                                  Quo fas et gloria ducunt?
                                  Alex.
                                • mike corbeil ordinary user account
                                  ... I ve come across terminology, before, where there was a term for white space characters, space and tab (and sometimes for indent), another term for
                                  Message 16 of 25 , Nov 3, 2000
                                  • 0 Attachment
                                    Simon Parker wrote:

                                    > Good afternoon.
                                    >
                                    > On Friday, November 03, 2000 11:00 AM, Roger Browne
                                    > [SMTP:egroups@...] wrote:
                                    > > Alex Shuksto wrote:
                                    > >
                                    > > > ... I don't think that it is a good idea to define
                                    > > > all characters from 000 to 032 as whitespaces.
                                    > >
                                    > > What is your preferred alternative, and why is it better?

                                    I've come across terminology, before, where there was a term for white space
                                    characters, space and tab (and sometimes for indent), another term for
                                    invisible characters, another for printable, and a term for special
                                    characters.

                                    White space may include line and page break characters, f.e., for these
                                    effectively create white space; however, I don't know if they do in the
                                    context of this thread. These invisible characters aren't ws in the sense
                                    that space is.



                                    > > Here are two other possibilities. Both are based on the definitions in
                                    > > the standard C library <ctype.h>.
                                    > >
                                    > > (1) Use the same definition as "isspace(c)", i.e. trim the following
                                    > > characters: blank, formfeed, newline, return, tab, vertical tab.
                                    >
                                    > <aside>
                                    > I think this is the first mention of the word 'trim' in this discussion.
                                    > I've always known this operation as 'trimming', and I can't think of a
                                    > language or library other than Eiffel which uses the name 'adjust'. Have I
                                    > spent too long writing VAX COBOL and BASIC?

                                    The reason most likely has to do with the type of work you've done, as opposed
                                    to what platforms and programming languages you've used. "Adjust" is a common
                                    term in word processing, but often not used in programming, unless you
                                    developed or used a word processor or typesetting language.

                                    Which term or name fits depends on on the app. E.g., chomp and chop in Perl
                                    both trim - something, but neither adjusts anything; unless a person sees
                                    adjusting and trimming as synonymous in this context.

                                    For what this thread seems to be about, I'd say that "adjust" is a more
                                    appropriate term, if it's for what I think it's for.

                                    Adjusting doesn't necessarily imply removing anything, whereas trimming does.
                                    You get a hair trim, and you lose hair. A person adjusts their hairdoo (what
                                    ever it's called) and the person doesn't necessarily lose any hair. You trim
                                    or prune your hedges or bushes, but rarely is this referred to as adjusting
                                    them. If you have a bush or plant you want to grow straight and it's bent
                                    over, then you might provide tutors to help the plant to adjust to the way you
                                    want it to grow (you wouldn't necessarily do any trimming, although might need
                                    to do so). You adjust a tv picture horizontally and or vert., but never trim
                                    one. You adjust sound to remove noise, but we don't usually think of this as
                                    trimming; more as filtering.

                                    As for text, it may be either adjusted or trimmed, depending on what's being
                                    done with or to the text. If I write a script to read and parse a flat ascii
                                    file, then I may have to do some trimming of extraneous spaces; however, if I
                                    write in a word processor and adjust text in one direction or another without
                                    removing any of the text, then I'd be adjusting the text, without trimming.
                                    If white space is removed in the process of adjusting text, then it's
                                    nonetheless thought of as adjusting, instead of trimming; the trimming being a
                                    consequence, as opposed to being thought of as the means to the goal. The
                                    means is adjusting and this may or may not remove total ws.

                                    Context.


                                    > I might revisit this in a renaming proposal at the appropriate point...
                                    > </aside>
                                    >
                                    > [...]
                                    > >
                                    > > Here's a possible approach:
                                    > >
                                    > > (1) In CHARACTER, provide classification functions. The definitions
                                    > > would match those in <ctype.h>. The Eiffel names could be the
                                    > > same, or could follow the Eiffel naming style and existing vendor
                                    > > usage, e.g.:
                                    > >
                                    > > <ctype.h> name possible name in class CHARACTER
                                    > > isalnum(c) is_letter_or_digit
                                    > > isalpha(c) is_letter
                                    > > iscntrl(c) is_control
                                    > > isdigit(c) is_digit
                                    > > isgraph(c) is_graph -- a poor name, I know
                                    > > islower(c) is_lower
                                    > > isprint(c) is_print -- another poor name
                                    > > ispunct(c) is_punctuation
                                    > > isspace(c) is_separator
                                    > > isupper(c) is_upper
                                    > > isxdigit(c) is_hexadecimal_digit
                                    > >
                                    > > (2) In STRING, specify 'left_adjust' and 'right_adjust' in terms of
                                    > > these.
                                    > >
                                    > > Is there any interest in this approach?
                                    >
                                    > Spot on!
                                    >
                                    > I think that means preserving the names with the standard modification of
                                    > 'is*' to 'is_*'.
                                    > Now or later, we could add synonyms for the really ugly ones. How about
                                    > this:
                                    >
                                    > > isalnum(c) is_alnum, is_alphanumeric,
                                    > is_letter_or_digit
                                    > > isalpha(c) is_alpha, is_letter
                                    > > iscntrl(c) is_cntrl, is_control
                                    > > isdigit(c) is_digit
                                    > > isgraph(c) is_graph -- a poor name, I know
                                    > > islower(c) is_lower
                                    > > isprint(c) is_print, is_printable
                                    > > ispunct(c) is_punct, is_punctuation
                                    > > isspace(c) is_space, is_white, is_whitespace
                                    > > isupper(c) is_upper
                                    > > isxdigit(c) is_xdigit, is_hexadecimal_digit
                                    >
                                    > I am at a lost for a sensible alternative to 'is_graph'. Perhaps if we
                                    > ignore it it will go away...

                                    I'm not sure what problem you see wrt is_graph. What makes this one stand out
                                    to you, whereas is_alpha doesn't?

                                    is_graph, based on the man page, would replace the need to say "is_printable()
                                    and not is_space()". If everywhere is_graph would be called the longer
                                    alternative was used, instead, and the definition ever changed (may be
                                    unlikely, but may be possible, maybe), then this could mean needing to make
                                    many modifications. If is_graph() exists, then only it'd need to be modified
                                    and then the app or system rebuilt, which means more reusable and easier to
                                    maintain code.

                                    is_graph() could make use of is_printable() and is_space(). This would not be
                                    necessary, though probably more convenient and quicker to implement than the
                                    alternative.

                                    If it's just for the name of is_graph being questioned, then are there other
                                    apps for this set of characters?

                                    If not, then I don't see an immediate problem with this name. If there is
                                    such a conflict, then another level of abstraction is due; however, that could
                                    be eliminated by naming the function is_printable_but_not_space(), say.
                                    is_graph seems better, if it can be or is defined in a portable or reusable
                                    manner.


                                    > I've dropped 'is_separator' because I think the notion of separators is too
                                    > context-dependent. What if I'm only interested in line separators? Or I
                                    > consider a hyphen '-' to be a word separator too? We should save that for a
                                    > scanner class.

                                    Scanners and parsers.


                                    > > If so, I have no objection to sorting out the names of the CHARACTER
                                    > > features at this point in the ELKS process. But if people want to use
                                    > > different definitions to those used by C, then it becomes a bigger task
                                    > [...]
                                    >
                                    > A final observation: a more lasting classification scheme may be that
                                    > associated with Unicode rather than ASCII. Let's not get stuck in this, but
                                    > grab these established, standardised names and move on. As I said, the
                                    > synonyms are optional.

                                    I'd only add that although some Eiffel compilers compile to C, not all do;
                                    Eiffel is supposed to be a language of its own; and design shouldn't reflect
                                    implementation, except when it's necessary or useful to do so. I'd therefore
                                    suggest that C may be used for some cross-reference purposes, but Eiffel
                                    shouldn't be modelled on C, except when necessary or appropriate; the latter
                                    meaning when C does something in a sufficiently general or generic manner and
                                    time is slim.

                                    This may mean more work being required to fully define and develop Eiffel, and
                                    given that this may impede the process of developing or completing Eiffel, the
                                    C approach may be valid to more strictly adopt, for some time. However, this
                                    should be done in a manner that would make it relatively easy to modify these
                                    aspects, the implementations, later on, thereby giving an immediate solution,
                                    but one that'd be easy enough to polish later on; while possibly needing to
                                    provide backwards compatibilty, until everyone using prior versions of Eiffel
                                    libraries has upgraded to the new and current standard at that time (and for
                                    which some fairly simple script or scripts could be created for conversion
                                    tools).

                                    "Polarization" may or may not be helpful; it might be more productive to
                                    "cheat" a little in the shorter term, while noting or keeping track of
                                    objectives for the longer term or the eventual. However, it's always worth
                                    investing some time wrt design, before implementation. Perfect design isn't
                                    always easy, and often doesn't happen, up front; therefore, some iterative
                                    nature in the process makes sense. However, it's always good to design in a
                                    manner that makes making modifications easy to do, and this implies designing
                                    and implementing reusable code (may seem time consuming, but good design ends
                                    up saving time).

                                    Otoh, maybe I'm far off wrt what I think to have understood.

                                    mike


                                    >
                                    >
                                    > >
                                    > > Regards,
                                    > > Roger
                                    >
                                    > Regards,
                                    > Simon
                                    >
                                    > Simon Parker +353 87 249 7859
                                    >
                                    >
                                    > ---------------------------
                                    >
                                    > http://www.eiffel-nice.org/
                                    >
                                    > --------------------------
                                  • mike corbeil ordinary user account
                                    ... It s what follows the _ that seems to be in debate, here, but graph is not much worse than alpha, although there is, in the sense that graph doesn t
                                    Message 17 of 25 , Nov 3, 2000
                                    • 0 Attachment
                                      Alex Shuksto wrote:

                                      > > <ctype.h> name possible name in class CHARACTER
                                      > > isalnum(c) is_letter_or_digit
                                      > > isalpha(c) is_letter
                                      > > iscntrl(c) is_control
                                      > > isdigit(c) is_digit
                                      > > isgraph(c) is_graph -- a poor name, I know
                                      > > islower(c) is_lower
                                      > > isprint(c) is_print -- another poor name
                                      > > ispunct(c) is_punctuation
                                      > > isspace(c) is_separator
                                      > > isupper(c) is_upper
                                      > > isxdigit(c) is_hexadecimal_digit
                                      > And about the names. All is ok (exept is_upper & is_lower may be -
                                      > I think is_upper_case & is_lower_case will be better choice), but there is
                                      > to poor names such is_graph & is_print. I can say that

                                      It's what follows the '_' that seems to be in debate, here, but graph is not
                                      much worse than alpha, although there is, in the sense that graph doesn't
                                      represent a simple single unit, whereas alpha more or less, or kind of, does.

                                      "graph" is a noun or a verb, either a graph image or the action of creating
                                      one ("to graph" or "graph those stats, please"), whereas alpha means
                                      alphabetic (character) and is an adjective.

                                      So, there is that type of problem that I can see wrt the name is_graph.

                                      However, wrt is_upper, it's only a short-form adjective for a longer one,
                                      is_uppercase. In both cases, the effect is at least the same; testing for
                                      the upper form of the argument, using an adj. invokation. Besides, upper is
                                      never a noun.

                                      I don't know that mere abbreivations should constitute a problem, esp. when
                                      the abbreviations are clear and don't modify the meaning. E.g., when we want
                                      to clear the screen, we don't say clearscreen, but instead abbreviations, like
                                      clear or clr. Whether the object is a screen or a desktop should and must not
                                      matter, for the meaning of "clear" depends on what the object is; therefore,
                                      the meaning depends on context, which means that there's some degree of
                                      genericity.

                                      Being able to work by that idea makes the difference between someone who's
                                      good at designing reusuable code, and someone who isn't. To be good, one must
                                      be able to abstract, about as far as anyone could care to bother or think.

                                      Not using rare and or non-generic abbreviations is good, for it permits the
                                      work to be more easily maintained, for user's to transition more easily, ...;
                                      however, I don't shy from using Perl's default syntax when it's good, or at
                                      least not bad, to do so. The aim is for a balance between what makes
                                      maintainable and reusable code, and overdoing or underdoing a job.


                                      > And the last about is_print - I think is_printable is a good choice.

                                      That's better, for it respects the adjective rule. The methods must be either
                                      nouns, or adj's, but not a mish-mash mix. Otherwise, consistency in style is
                                      comprised, in a way which is sufficiently meanginful wrt learning that it's
                                      worth not doing to begin with. Where nouns should be used, nouns should be
                                      used. Etcetera.

                                      mike


                                      >
                                      > BTW, all that is a terms of class CHARACTER... Not STRING...



                                      > --
                                      > Quo fas et gloria ducunt?
                                      > Alex.
                                      >
                                      >
                                      > ---------------------------
                                      >
                                      > http://www.eiffel-nice.org/
                                      >
                                      > --------------------------
                                    • Alex Shuksto
                                      ... Don t you think that it is a theme for discussion? I cannot say that my names just better because of that, that and may be that. But, I must say agian,
                                      Message 18 of 25 , Nov 3, 2000
                                      • 0 Attachment
                                        On Fri, Nov 03, 2000 at 10:20:33PM -0500, mike corbeil ordinary user account wrote:
                                        > It's what follows the '_' that seems to be in debate, here, but graph is not
                                        > much worse than alpha, although there is, in the sense that graph doesn't
                                        > represent a simple single unit, whereas alpha more or less, or kind of, does.
                                        >
                                        > "graph" is a noun or a verb, either a graph image or the action of creating
                                        > one ("to graph" or "graph those stats, please"), whereas alpha means
                                        > alphabetic (character) and is an adjective.
                                        >
                                        > So, there is that type of problem that I can see wrt the name is_graph.
                                        >
                                        > However, wrt is_upper, it's only a short-form adjective for a longer one,
                                        > is_uppercase. In both cases, the effect is at least the same; testing for
                                        > the upper form of the argument, using an adj. invokation. Besides, upper is
                                        > never a noun.
                                        >
                                        > I don't know that mere abbreivations should constitute a problem, esp. when
                                        > the abbreviations are clear and don't modify the meaning. E.g., when we want
                                        > to clear the screen, we don't say clearscreen, but instead abbreviations, like
                                        > clear or clr. Whether the object is a screen or a desktop should and must not
                                        > matter, for the meaning of "clear" depends on what the object is; therefore,
                                        > the meaning depends on context, which means that there's some degree of
                                        > genericity.
                                        >
                                        > Being able to work by that idea makes the difference between someone who's
                                        > good at designing reusuable code, and someone who isn't. To be good, one must
                                        > be able to abstract, about as far as anyone could care to bother or think.
                                        >
                                        > Not using rare and or non-generic abbreviations is good, for it permits the
                                        > work to be more easily maintained, for user's to transition more easily, ...;
                                        > however, I don't shy from using Perl's default syntax when it's good, or at
                                        > least not bad, to do so. The aim is for a balance between what makes
                                        > maintainable and reusable code, and overdoing or underdoing a job.
                                        >
                                        Don't you think that it is a theme for discussion? I cannot say that my
                                        names just better because of that, that and may be that. But, I must say
                                        agian, sorry, there is a discussion about class STRING. Lets wait a little,
                                        until we will discuss CHARACTER...

                                        > That's better, for it respects the adjective rule. The methods must be either
                                        > nouns, or adj's, but not a mish-mash mix. Otherwise, consistency in style is
                                        > comprised, in a way which is sufficiently meanginful wrt learning that it's
                                        > worth not doing to begin with. Where nouns should be used, nouns should be
                                        > used. Etcetera.
                                        Exactly.
                                        --
                                        Quo fas et gloria ducunt?
                                        Alex.
                                      • Peter Horan
                                        ... is_glyph Concise OED: (1) a sculptured character or symbol -- Peter Horan School of Computing and Mathematics peter@deakin.edu.au
                                        Message 19 of 25 , Nov 5, 2000
                                        • 0 Attachment
                                          Simon Parker wrote:

                                          > I am at a lost for a sensible alternative to 'is_graph'. Perhaps if we
                                          > ignore it it will go away...

                                          is_glyph

                                          Concise OED: (1) a sculptured character or symbol
                                          --
                                          Peter Horan School of Computing and Mathematics
                                          peter@... Deakin University
                                          +61-3-5227 1234 (Voice) Geelong, Victoria 3217, AUSTRALIA
                                          +61-3-5227 2028 (FAX) http://www.cm.deakin.edu.au/~peter

                                          -- The Eiffel guarantee: From specification to implementation
                                          -- (http://www.cetus-links.org/oo_eiffel.html)
                                        • Ulrich Windl
                                          It seems we lack a CHARACTER.is_whitespace feature. ... Regards, Ulrich
                                          Message 20 of 25 , Nov 5, 2000
                                          • 0 Attachment
                                            It seems we lack a CHARACTER.is_whitespace feature.

                                            On 1 Nov 2000, at 12:31, Roger Browne wrote:

                                            > As you can see, ISE/HACT consider "whitespace" to be blank, tab, return
                                            > or newline.

                                            > VE treats ASCII codes 0 through 32 as "whitespace":

                                            > DEFINITION OF WHITESPACE
                                            > ========================
                                            >
                                            > Is "whitespace" a blank (as in ELKS95), or blank/tab/return/newline (as
                                            > implemented by ISE/HACT and in the 1999 proposals), or "blank or control
                                            > character" as implemented by VE?

                                            Regards,
                                            Ulrich
                                          • Roger Browne
                                            ... It does seem to be a complex issue. I think it s going to take a long time to sort it out, and I think that we should leave character classification until
                                            Message 21 of 25 , Nov 6, 2000
                                            • 0 Attachment
                                              Arno Wagner wrote:

                                              > ... I am surprised that [character classification] is
                                              > such a controversial subject.

                                              It does seem to be a complex issue.

                                              I think it's going to take a long time to sort it out, and I think that
                                              we should leave character classification until we work on the ELKS 2002
                                              CHARACTER specification.

                                              Meanwhile, I suggest that we simply remove 'left_adjust' and
                                              'right_adjust' from the ELKS 2001 STRING specification. Vendors will
                                              probably retain their existing implementations, so no code will break.

                                              I'm going to run a poll on that issue. If it passes, we needn't consider
                                              the specifications of 'left_adjust' and 'right_adjust' further.

                                              You should soon receive an automatic message from the eGroups polling
                                              system inviting you to vote on the following proposal:

                                              Remove features 'left_adjust' and 'right_adjust'
                                              from ELKS 2001 STRING.

                                              Voting is open to all NICE members. The egroups poll will run for three
                                              days. If you are unable to vote within that time, or have problems with
                                              the egroups polling system, feel free to post your vote directly to this
                                              list.

                                              > After all we want solid, dependable semantics, not
                                              > anything fancy.

                                              I agree, and I suggest the following:

                                              1. Someone writes an Eiffel mixin class CTYPE_DOT_H to
                                              implement the C-style tests, e.g.:

                                              is_alnum(c: CHARACTER): BOOLEAN

                                              This way, we could very soon have a useful, 100% interoperable
                                              implementation.

                                              Contrast that with weeks or months to amend the specification of
                                              CHARACTER, and months or years for the vendors to provide interoperable
                                              implementations.

                                              2. Someone writes an Eiffel mixin class to implement
                                              STRING trimming operations, i.e.:

                                              left_trim(s: STRING)
                                              left_trimmed(s: STRING): STRING
                                              right_trim(s: STRING)
                                              right_trimmed(s: STRING): STRING
                                              trim(s: STRING)
                                              trimmed(s: STRING): STRING

                                              Again, we could very soon have a useful, 100% interoperable
                                              implementation - compared to the time and energy required to reach an
                                              interoperable solution within class STRING itself.

                                              Regards,
                                              Roger
                                              --
                                              Roger Browne - roger@... - Everything Eiffel
                                              19 Eden Park Lancaster LA1 4SJ UK - Phone +44 1524 32428
                                            • Simon Parker
                                              Good morning. On Friday, November 03, 2000 11:27 PM, mike corbeil ordinary user account ... space ... sense ... I ve always considered whitespace to include
                                              Message 22 of 25 , Nov 6, 2000
                                              • 0 Attachment
                                                Good morning.

                                                On Friday, November 03, 2000 11:27 PM, mike corbeil ordinary user account
                                                [SMTP:mcorbeil@...] wrote:
                                                > Simon Parker wrote:
                                                >
                                                > > Good afternoon.
                                                > >
                                                > > On Friday, November 03, 2000 11:00 AM, Roger Browne
                                                > > [SMTP:egroups@...] wrote:
                                                > > > Alex Shuksto wrote:
                                                > > >
                                                > > > > ... I don't think that it is a good idea to define
                                                > > > > all characters from 000 to 032 as whitespaces.
                                                > > >
                                                > > > What is your preferred alternative, and why is it better?
                                                >
                                                > I've come across terminology, before, where there was a term for white
                                                space
                                                > characters, space and tab (and sometimes for indent), another term for
                                                > invisible characters, another for printable, and a term for special
                                                > characters.
                                                >
                                                > White space may include line and page break characters, f.e., for these
                                                > effectively create white space; however, I don't know if they do in the
                                                > context of this thread. These invisible characters aren't ws in the
                                                sense
                                                > that space is.
                                                >

                                                I've always considered 'whitespace' to include the invisible characters
                                                whose purpose is formatting. That includes cr, lf, ff, vt, ht.
                                                I wouldn't include ASCII control characters with other meanings, such as
                                                FS, GS, EOD.

                                                This is irrelevant, however, as long as our standard is explicit and
                                                reasonable.

                                                >
                                                >
                                                > > > Here are two other possibilities. Both are based on the definitions
                                                in
                                                > > > the standard C library <ctype.h>.
                                                > > >
                                                > > > (1) Use the same definition as "isspace(c)", i.e. trim the following
                                                > > > characters: blank, formfeed, newline, return, tab, vertical tab.
                                                > >
                                                > > <aside>
                                                > > I think this is the first mention of the word 'trim' in this
                                                discussion.
                                                > > I've always known this operation as 'trimming', and I can't think of a
                                                > > language or library other than Eiffel which uses the name 'adjust'.
                                                Have I
                                                > > spent too long writing VAX COBOL and BASIC?
                                                >
                                                > The reason most likely has to do with the type of work you've done, as
                                                opposed
                                                > to what platforms and programming languages you've used. "Adjust" is a
                                                common
                                                > term in word processing, but often not used in programming, unless you
                                                > developed or used a word processor or typesetting language.
                                                >

                                                [discussion of other terms and scenarios omitted]

                                                > > I might revisit this in a renaming proposal at the appropriate point...
                                                > > </aside>


                                                Thanks for illuminating the broader context. 'Adjust' is fine.

                                                > >
                                                > > [...]
                                                > > >
                                                > > > Here's a possible approach:
                                                > > >
                                                > > > (1) In CHARACTER, provide classification functions. The
                                                definitions
                                                > > > would match those in <ctype.h>. The Eiffel names could be the
                                                > > > same, or could follow the Eiffel naming style and existing
                                                vendor
                                                > > > usage, e.g.:
                                                > > >
                                                > > > <ctype.h> name possible name in class CHARACTER
                                                > > > isalnum(c) is_letter_or_digit
                                                > > > isalpha(c) is_letter
                                                > > > iscntrl(c) is_control
                                                > > > isdigit(c) is_digit
                                                > > > isgraph(c) is_graph -- a poor name, I know
                                                > > > islower(c) is_lower
                                                > > > isprint(c) is_print -- another poor name
                                                > > > ispunct(c) is_punctuation
                                                > > > isspace(c) is_separator
                                                > > > isupper(c) is_upper
                                                > > > isxdigit(c) is_hexadecimal_digit
                                                > > >
                                                > > > (2) In STRING, specify 'left_adjust' and 'right_adjust' in terms
                                                of
                                                > > > these.
                                                > > >
                                                > > > Is there any interest in this approach?
                                                > >
                                                > > Spot on!
                                                > >
                                                > > I think that means preserving the names with the standard modification
                                                of
                                                > > 'is*' to 'is_*'.
                                                > > Now or later, we could add synonyms for the really ugly ones. How about
                                                > > this:
                                                > >
                                                > > > isalnum(c) is_alnum, is_alphanumeric,
                                                > > is_letter_or_digit
                                                > > > isalpha(c) is_alpha, is_letter
                                                > > > iscntrl(c) is_cntrl, is_control
                                                > > > isdigit(c) is_digit
                                                > > > isgraph(c) is_graph -- a poor name, I know
                                                > > > islower(c) is_lower
                                                > > > isprint(c) is_print, is_printable
                                                > > > ispunct(c) is_punct, is_punctuation
                                                > > > isspace(c) is_space, is_white, is_whitespace
                                                > > > isupper(c) is_upper
                                                > > > isxdigit(c) is_xdigit, is_hexadecimal_digit
                                                > >
                                                > > I am at a lost for a sensible alternative to 'is_graph'. Perhaps if we
                                                > > ignore it it will go away...
                                                >
                                                > I'm not sure what problem you see wrt is_graph. What makes this one
                                                stand out
                                                > to you, whereas is_alpha doesn't?

                                                The name is less intuitive than the other names, that's all. I wasn't
                                                familiar with its purpose, but my flippant comment was unhelpful.

                                                >
                                                > is_graph, based on the man page, would replace the need to say
                                                "is_printable()
                                                > and not is_space()".

                                                Thanks. So in fact 'isprint' is the one which is not intuitive ;-)
                                                I still have no suggestions for a good synonym.

                                                > [...]
                                                > > A final observation: a more lasting classification scheme may be that
                                                > > associated with Unicode rather than ASCII. Let's not get stuck in this,
                                                but
                                                > > grab these established, standardised names and move on. As I said, the
                                                > > synonyms are optional.
                                                >
                                                > I'd only add that although some Eiffel compilers compile to C, not all
                                                do;
                                                > Eiffel is supposed to be a language of its own; and design shouldn't
                                                reflect
                                                > implementation, except when it's necessary or useful to do so. I'd
                                                therefore
                                                > suggest that C may be used for some cross-reference purposes, but Eiffel
                                                > shouldn't be modelled on C, except when necessary or appropriate; the
                                                latter
                                                > meaning when C does something in a sufficiently general or generic manner
                                                and
                                                > time is slim.

                                                These classification features have merit in that they are defined by a
                                                formal standard, and widely understood in the programming community. The
                                                association with C is incidental.

                                                By mentioning Unicode I wasn't intending to imply that ASCII is irrelevant
                                                or transitory, but that this issue will need revisiting if and when Eiffel
                                                moves that way. I agree it's out of scope.

                                                [...]


                                                Regards,
                                                Simon

                                                Simon Parker +353 87 249 7859
                                              • Simon Parker
                                                ... I like this. Confusing for those who specialise in characters and glyphs, though.
                                                Message 23 of 25 , Nov 6, 2000
                                                • 0 Attachment
                                                  On Sunday, November 05, 2000 10:54 PM, Peter Horan [SMTP:peter@...] wrote:
                                                  > Simon Parker wrote:
                                                  >
                                                  > > I am at a lost for a sensible alternative to 'is_graph'. Perhaps if we
                                                  > > ignore it it will go away...
                                                  >
                                                  > is_glyph
                                                  >
                                                  > Concise OED: (1) a sculptured character or symbol

                                                  I like this. Confusing for those who specialise in characters and glyphs, though.

                                                  > --
                                                  > Peter Horan School of Computing and Mathematics
                                                  > peter@... Deakin University
                                                  > +61-3-5227 1234 (Voice) Geelong, Victoria 3217, AUSTRALIA
                                                  > +61-3-5227 2028 (FAX) http://www.cm.deakin.edu.au/~peter
                                                  >
                                                  > -- The Eiffel guarantee: From specification to implementation
                                                  > -- (http://www.cetus-links.org/oo_eiffel.html)
                                                  >
                                                  >
                                                  > ---------------------------
                                                  >
                                                  > http://www.eiffel-nice.org/
                                                  >
                                                  > --------------------------
                                                • Ulrich Windl
                                                  ... I suggest using a different name like ANSI_C_CHARACTER maybe _CLASSIFICATION appended. Regards, Ulrich
                                                  Message 24 of 25 , Nov 6, 2000
                                                  • 0 Attachment
                                                    On 6 Nov 2000, at 12:16, Roger Browne wrote:

                                                    > 1. Someone writes an Eiffel mixin class CTYPE_DOT_H to
                                                    > implement the C-style tests, e.g.:
                                                    >
                                                    > is_alnum(c: CHARACTER): BOOLEAN
                                                    >
                                                    > This way, we could very soon have a useful, 100% interoperable
                                                    > implementation.
                                                    >

                                                    I suggest using a different name like ANSI_C_CHARACTER maybe
                                                    "_CLASSIFICATION" appended.

                                                    Regards,
                                                    Ulrich
                                                  • Couder Christian
                                                    Hi everybody, ... I tryed to do this. (I know about Eiffel for a long time, but it s a few month since I really started using it.) So here are the files. I
                                                    Message 25 of 25 , Nov 12, 2000
                                                    • 0 Attachment
                                                      Hi everybody,

                                                      As Roger Browne suggested:

                                                      > Arno Wagner wrote:
                                                      >
                                                      > > After all we want solid, dependable semantics, not
                                                      > > anything fancy.
                                                      >
                                                      > I agree, and I suggest the following:
                                                      >
                                                      > 1. Someone writes an Eiffel mixin class CTYPE_DOT_H to
                                                      > implement the C-style tests, e.g.:
                                                      >
                                                      > is_alnum(c: CHARACTER): BOOLEAN
                                                      >
                                                      > This way, we could very soon have a useful, 100% interoperable
                                                      > implementation.
                                                      >
                                                      > Contrast that with weeks or months to amend the specification of
                                                      > CHARACTER, and months or years for the vendors to provide interoperable
                                                      > implementations.

                                                      I tryed to do this. (I know about Eiffel for a long time, but it's a few
                                                      month since I really started using it.)

                                                      So here are the files. I compiled them with the new -0.76 SmallEiffel
                                                      release on a Mandrake Linux 7.0, and
                                                      I am not sure it compiles with other compilers.

                                                      I didn't try to put many preconditions or postconditions so please feel
                                                      free to suggest me some...

                                                      And tell me about any other problem or suggestion you have.

                                                      Bye,
                                                      Christian.

                                                      PS: if you want to reply to me privately, use an address made of
                                                      (my_first_name + <at-sign> + "alcove.fr")
                                                    Your message has been successfully submitted and would be delivered to recipients shortly.