Loading ...
Sorry, an error occurred while loading the content.

Re: ELKS and Unicode

Expand Messages
  • Arno Wagner
    ... In my opinion german Umlauts and other language dependant features that require support for more than one character set don t have a place in ELKS at the
    Message 1 of 14 , Nov 6, 2000
    • 0 Attachment
      --- In eiffel-nice-library@egroups.com, "Ulrich Windl"
      <ulrich.windl@r...> wrote:
      > On 6 Nov 2000, at 10:30, Arno Wagner wrote:
      >
      > > So let's not overdo it: We have 7-bit ASCII, we have working
      > > classification functions in ANSI/ISO-C. I would really like
      > > if we could just use them instead of doing something half-baked.
      > >
      >
      > I hope you are not saying that "Using German Umlauts in Eiffel" is
      > "overdone". BTW: literal strings are 7 bit, but "%/255/" has 8 bits
      > internally. So we should handle it.
      >
      > Regards,
      > Ulrich
      In my opinion german Umlauts and other language dependant
      features that require support for more than one character set
      don't have a place in ELKS at the moment. Of course the
      implementations will usually use at least 8 bit for a character
      but for values above 127 Eiffel does not assign any meaning except
      that it is a character. The advantage of this is that we have
      working libraries _now_. On the other hand every developer
      is free tho use the additional character codes in any way
      desired and supported by the target system.

      So yes "Using German Umlauts in Eiffel" in a way that requires
      specific definitions in ELKS is overdone.

      Regards,
      Arno
    • Arno Wagner
      ... wrote: Ooops, some miscommunication seems to have happend. O.K. we should continue the issue here. ... I emailed the following to
      Message 2 of 14 , Nov 6, 2000
      • 0 Attachment
        --- In eiffel-nice-library@egroups.com, "Ulrich Windl"
        <ulrich.windl@r...> wrote:

        Ooops, some miscommunication seems to have happend. O.K.
        we should continue the issue here.

        > On 6 Nov 2000, at 12:11, Arno Wagner wrote:
        >
        > > On Mon, Nov 06, 2000 at 11:56:06AM +0100, Ulrich Windl wrote:
        > > > On 6 Nov 2000, at 10:30, Arno Wagner wrote:
        > > >
        > > > > So let's not overdo it: We have 7-bit ASCII, we have working
        > > > > classification functions in ANSI/ISO-C. I would really like
        > > > > if we could just use them instead of doing something
        > > > > half-baked.
        > > > >
        > > >
        > > > I hope you are not saying that "Using German Umlauts in Eiffel"
        > > > is "overdone". BTW: literal strings are 7 bit, but "%/255/"
        > > > has 8 bits internally. So we should handle it.
        > > >
        > > > Regards,
        > > > Ulrich
        >
        I emailed the following to Ulrich, because I didn't see his post,
        and he emailed me. I thought he wanted to make this a private
        conversation.
        After I saw the post, I replied to it too. So the following is
        redundant and was not intended to be posted here.
        > [[translated summary follows]]
        > > Ich w"urde sagen, deutsche Umlaute, und anders sprachspezifische
        > > Sachen, die multi-character-table Unterst"utzung brauchen, haben
        > > im Moment nichts in ELKS verloren. Nat"urlich wird ein Byte
        > > (mindestens) pro Zeichen verwendet, aber die Werte >127
        > > werden transparent eingesetzt, d.h. Eiffel weiss nichts "uber sie
        > > ausser dass sie Zeichen sind. Dass hat den grossen Vorteil, dass
        > > wir jetzt Bibliotheken haben, die funktionieren, und nicht erst
        > > in einigen Jahren. Andererseits kann jeder Entwickler mit den
        > > h"oheren Codes tun was er will und sein Zielsystem unterst"utzt.
        >
        > [[ELKS does not need 8 bits right now. ELKS (Eiffel) does not treat
        > characters >127 in any special way right now]]
        >
        > However I see a problem: For a very old C-library macros like
        > "islower()" could return the wrong value if used for 8 bit
        > characters (I know what I'm talking about). However ANSI says
        > that the routines are only valid if "isascii()" returns "true".

        That is what I assumed it says. As Eiffel also only deals with
        ASCII (and not 8-bit ASCII extensions), where is the problem?
        After all if you allow characters > 127 you are supposed to handle
        them yourself. Don't forget that the final I in ASCII stands for
        'interchange'.

        > As ELKS also has some classifying features, I think we need an
        > equivalent if "isascii()", or Eiffel/ELKS actually has to care
        about
        > characters >127. What about "is_eiffel_character". This says a lot
        > about the problem.
        >

        'is_iso_8859_1' to 'is_iso_8859_15' would be far worse. And where to
        stop? Full unicode? Just to illustrate the problem: Here in
        Switzerland they do use umlauts but no capital umlauts. So a german
        capital umlaut would be something else entirely. I think we
        should stay away from this, we cannot solve this in a satisfying
        manner.

        We could add 'isascii()' as a temporary fix.

        >
        > BTW: Most people mean ISO-8859-something when they talk about ASCII
        > these days.
        A lot people use words they don't really know the meaning of.
        Other examples are "firewall", "internet", "operating system", ...

        Not something we should start while defining a _standard_.

        Regards,
        Arno
        ---------------------------------------------------------------------
        Arno Wagner Dipl. Inform. ETH Zuerich wagnerATtik.ee.ethz.ch
        GnuPG:ID:F0C049F1 FP:8C E0 6F A5 CC B1 5A 11 ED C7 AD D2 05 5E BB 6F
        "The early bird gets the worm, but the second mouse gets the cheese."
      • Simon Parker
        ... An excellent suggestion, thanks.
        Message 3 of 14 , Nov 6, 2000
        • 0 Attachment
          On Saturday, November 04, 2000 8:51 AM, Alexander Kogtenkov [SMTP:kwaxer@...] wrote:
          > Simon Parker wrote:
          >
          > > > A final observation: a more lasting classification scheme may be that
          > > > associated with Unicode rather than ASCII.
          >
          > Roger Browne wrote:
          >
          > > There's a lot that we can usefully do with standardised basic ASCII
          > > libraries (for example, we can process Eiffel lexical tokens or
          > > MIME-formatted email). So please don't consider the current effort
          > > wasted just because we are not considering Unicode at this time.
          >
          > We can take the advantage of the both worlds, Unicode and ASCII.
          > Unicode divides all the characters into 30 general categories. ASCII
          > characters fall into the following 12:
          >
          > Letter, Uppercase: ABCDEFGHIJKLMNOPQRSTUWXYZ
          > Letter, Lowercase: abcdefghijklmnopqrstuwxyz
          > Number, Decimal Digit: 0123456789
          > Separator, Space: code 20
          > Other, Control: codes 00-1F, 7F
          > Punctuation, Dash: -
          > Punctuation, Open: ([{
          > Punctuation, Close: )]}
          > Punctuation, Other: !"#%&'*,./:;?@\
          > Symbol, Math: +<=>|~
          > Symbol, Currency: $
          > Symbol, Modifier: ^`
          >
          > We can use this classification or combine some groups into the
          > supergroups. It would allow to migrate smoothly to Unicode
          > when required.

          An excellent suggestion, thanks.

          >
          > Regards,
          > Alexander Kogtenkov
          > Object Tools, Moscow
          >
          >
          >
          > ---------------------------
          >
          > http://www.eiffel-nice.org/
          >
          > --------------------------
        • Ulrich Windl
          On 6 Nov 2000, at 12:00, Arno Wagner wrote: [...] ... The problem happens when you allow a user to type in STRINGs and you do some post-processing. You would
          Message 4 of 14 , Nov 6, 2000
          • 0 Attachment
            On 6 Nov 2000, at 12:00, Arno Wagner wrote:

            [...]
            > > However I see a problem: For a very old C-library macros like
            > > "islower()" could return the wrong value if used for 8 bit
            > > characters (I know what I'm talking about). However ANSI says
            > > that the routines are only valid if "isascii()" returns "true".
            >
            > That is what I assumed it says. As Eiffel also only deals with
            > ASCII (and not 8-bit ASCII extensions), where is the problem?
            > After all if you allow characters > 127 you are supposed to handle
            > them yourself. Don't forget that the final I in ASCII stands for
            > 'interchange'.

            The problem happens when you allow a user to type in STRINGs and you do
            some post-processing. You would not believe how much software can't
            deal with "Universtität": Some convert the 'ä' so something else,
            others omit it, other replace it by blanks...

            So if we want to write code for the real world, we cannot simply ignore
            the existence of accented characters, specifically not in the MS-
            Windows world.

            I still admit that the issue is messy, and I wonder if Eiffel should
            redo what is done in ANSI-C locale and C library. If it wants to be
            independent, it should redo it, thereby duplicating a lot of work and
            bytes.

            Regards,
            Ulrich
          • Arno Wagner
            ... handle ... I do belive. Usually I don t use umlauts at all (and I don t have tem on my keyboard), but I also experienced reactions ranging from arbitraty
            Message 5 of 14 , Nov 6, 2000
            • 0 Attachment
              --- In eiffel-nice-library@egroups.com, "Ulrich Windl"
              <ulrich.windl@r...> wrote:
              > On 6 Nov 2000, at 12:00, Arno Wagner wrote:
              >
              > [...]
              > > > However I see a problem: For a very old C-library macros like
              > > > "islower()" could return the wrong value if used for 8 bit
              > > > characters (I know what I'm talking about). However ANSI says
              > > > that the routines are only valid if "isascii()" returns "true".
              > >
              > > That is what I assumed it says. As Eiffel also only deals with
              > > ASCII (and not 8-bit ASCII extensions), where is the problem?
              > > After all if you allow characters > 127 you are supposed to
              handle
              > > them yourself. Don't forget that the final I in ASCII stands for
              > > 'interchange'.
              >
              > The problem happens when you allow a user to type in STRINGs
              > and you do some post-processing. You would not believe how much
              > software can't deal with "Universität":

              I do belive. Usually I don't use umlauts at all (and I don't have
              tem on my keyboard), but I also experienced reactions ranging
              from arbitraty replacement to immediate crash.

              > Some convert the 'ä' so something else,
              > others omit it, other replace it by blanks...
              >
              > So if we want to write code for the real world, we cannot
              > simply ignore the existence of accented characters,
              > specifically not in the MS-Windows world.
              >
              I agree. The basic problem is that these characters _are_ on
              the keyboard and so can be typed in. And ordinary users don't know
              about the specific problems that causes. On the other hand
              typewriters with umlauts where not the standard for a long
              time, so the keyboard could just have stayed plain-ASCII.

              > I still admit that the issue is messy, and I wonder if Eiffel
              should
              > redo what is done in ANSI-C locale and C library. If it wants to be
              > independent, it should redo it, thereby duplicating a lot of
              > work and bytes.
              >
              > Regards,
              > Ulrich

              My guess is that at the moment transparent handling of
              non-ascii is sufficient. After all if you can type it in
              the c-library on the machine should be able to handle it.

              Regards,
              Arno
            Your message has been successfully submitted and would be delivered to recipients shortly.