Loading ...
Sorry, an error occurred while loading the content.

Re: [NH] Strange characters...

Expand Messages
  • loro
    ... In the only way that matters. ... The host shouldn t send that header at all, but we didn t know it was the host s doing. I still don t. ... That s another
    Message 1 of 28 , Nov 9, 2008
    • 0 Attachment
      Axel Berger wrote:
      >loro wrote:
      > > There you see! He does declare a charset.
      >
      >In away yes,

      In the only way that matters.

      > but it is in fact the provider doing for him and doing it
      >wrong for the actual content.

      The host shouldn't send that header at all, but we didn't know it was
      the host's doing. I still don't.

      > > Easy maybe, but wrong. Superscript 1 and 2 are not in the illegal range.
      >
      >For UTF-8 the ones Bob uses are.

      That's another matter than the illegal windows characters you were
      referring to.

      Lotta
    • Axel Berger
      ... Where in Bob s source did you find UTF-8? If not there it must be in the HTML headers. It has to be somewhere. ... No. The browser and the validator first
      Message 2 of 28 , Nov 9, 2008
      • 0 Attachment
        loro wrote:
        > but we didn't know it was the host's doing. I still don't.

        Where in Bob's source did you find UTF-8? If not there it must be in the
        HTML headers. It has to be somewhere.

        > That's another matter than the illegal windows characters you were
        > referring to.

        No. The browser and the validator first look what encoding to expect and
        then parse the code using that. So what is illegal and what is not
        solely depends on that declaration. In another context those very same
        characters may be perfectly legal, but that doesn't matter.

        Axel
      • loro
        ... I think you mean HTTP headers. That doesn t mean the host made that happen. ... Superscript 1 and 2, encoded the way they are, are not exclusive to
        Message 3 of 28 , Nov 9, 2008
        • 0 Attachment
          Axel Berger wrote:
          >loro wrote:
          > > but we didn't know it was the host's doing. I still don't.
          >
          >Where in Bob's source did you find UTF-8? If not there it must be in the
          >HTML headers. It has to be somewhere.

          I think you mean HTTP headers. That doesn't mean the host made that happen.

          > > That's another matter than the illegal windows characters you were
          > > referring to.
          >
          >No. The browser and the validator first look what encoding to expect and
          >then parse the code using that. So what is illegal and what is not
          >solely depends on that declaration. In another context those very same
          >characters may be perfectly legal, but that doesn't matter.

          Superscript 1 and 2, encoded the way they are, are not exclusive to
          cp-1252, and that was what you were talking about. They aren't
          "illegal", they just mean different things in ANSI and UTF-8. No
          validator refuses to parse Bob's page because of those two
          characters. They would have through an error had they been in the so
          called illegal range though.

          Lotta
        • loro
          ... This is so backwards. If Bob can use .htaccess, he should of course use it to make the server send the character encoding he prefers, which may very well
          Message 4 of 28 , Nov 9, 2008
          • 0 Attachment
            Axel Berger wrote:
            >Bob Gorman wrote:
            > > Something in the Head part of my html files?
            >
            >Yes, that too. First you need to shut up that server. You might need
            >help from your provider, but this line in .htaccess ought to be the
            >first step:
            >
            > AddDefaultCharset Off

            This is so backwards. If Bob can use .htaccess, he should of course
            use it to make the server send the character encoding he prefers,
            which may very well be UTF-8 for all we know. Why in the whole world
            not use it as it is intended instead of relying solely on a fallback
            mechanism like Meta?

            ><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=US-ASCII">

            Except ASCII doesn't cover superscript 1 and 2, so that's hardly an
            improvement.

            Bob, just use the entities (¹ and ²) for now and be done
            with it. You can read up about character encoding when you feel up to
            it. I fear this will just confuse you and I'm sorry for my part in that.

            Lotta
          • Bob Gorman
            ... Yes, I did, & I m happy for now. ... I will. I obviously need to learn about character sets and this mysterious .htaccess, but it can wait till I get a
            Message 5 of 28 , Nov 9, 2008
            • 0 Attachment
              loro wrote:

              > Bob, just use the entities (¹ and ²) for now and be done
              > with it.

              Yes, I did, & I'm happy for now.

              > You can read up about character encoding when you feel up to
              > it.

              I will.
              I obviously need to learn about character sets and this mysterious
              .htaccess, but it can wait till I get a good night's sleep.

              I have 30+ web pages now and plan over time to double that, so I want to
              learn good practices now, to avoid excessive fix-up later.

              Thanks, and good night.

              Bob
            • Axel Berger
              ... Absolutely. It was way past midnight here, when I wrote that. ... Well, someone who has control over the server. And that usually is not the customer or
              Message 6 of 28 , Nov 9, 2008
              • 0 Attachment
                loro wrote:
                > I think you mean HTTP headers.

                Absolutely. It was way past midnight here, when I wrote that.

                > That doesn't mean the host made that happen.

                Well, someone who has control over the server. And that usually is not
                the customer or only to the very limited degree .htaccess allows.

                > Superscript 1 and 2, encoded the way they are, are not exclusive to
                > cp-1252, and that was what you were talking about. They aren't
                > "illegal", they just mean different things in ANSI and UTF-8.

                I have to admit to not being firm in UTF-8. I do know that (nearly?)
                everything that's in the upper 128 for other encodings is a two
                character sequence in UTF-8. And I have tried this: The validator said
                "illegal, no UTF-8" first and was satisfied when I overrode that with
                telling it "use cp-1252". I have not checked which characters were the
                offenders, but the ones that showed up wrong in the browser is a good
                guess IMHO.

                Axel
              • loro
                ... But you suggested Bob would use .htaccess. I d say declaring the charset is one of the most common uses people make of .htaccess. ... You are absolutely
                Message 7 of 28 , Nov 9, 2008
                • 0 Attachment
                  Axel Berger wrote:
                  > > That doesn't mean the host made that happen.
                  >
                  >Well, someone who has control over the server. And that usually is not
                  >the customer or only to the very limited degree .htaccess allows.

                  But you suggested Bob would use .htaccess. I'd say declaring the
                  charset is one of the most common uses people make of .htaccess.

                  >UTF-8. And I have tried this: The validator said
                  >"illegal, no UTF-8" first and was satisfied when I overrode that with
                  >telling it "use cp-1252".

                  You are absolutely right. The W3C validator does do that now (while
                  the WDG one does not). My bad.

                  Lotta
                • Axel Berger
                  ... We do know, Bob told us. He uses NoteTab and writes in his native Windows charset. Apart from that you re right of course, and I already said so. If Bob
                  Message 8 of 28 , Nov 9, 2008
                  • 0 Attachment
                    loro wrote:
                    > This is so backwards. If Bob can use .htaccess, he should of course
                    > use it to make the server send the character encoding he prefers,
                    > which may very well be UTF-8 for all we know.

                    We do know, Bob told us. He uses NoteTab and writes in his native
                    Windows charset.
                    Apart from that you're right of course, and I already said so.
                    If Bob can make the server send the correct HTTP headers, that's best.
                    Only his provider can tell him that. The two providers I'm using (one is
                    my own choice and with the other I'm webmaster for someone else) don't,
                    but at least allow me to stop them sending the wrong ones.

                    > Bob, just use the entities (¹ and ²) for now and be done
                    > with it.

                    That is a possibility. It is the one I use on my own site for maximum
                    backwards compatibility and there I declare US-ASCII in step with what
                    I'm actually doing.

                    For the other site I made easy maintainabilty by others the priority and
                    declare cp-1252, meaning that they can just type whatever their Windows
                    computer allows them and need not bother about encoding.
                    Unless you want to restrict yourself to the lowest common denominator on
                    ideological grounds, like I do, that's the best choice. It means in
                    essence "whatever you can type and display correctly in NoteTab, the
                    server and browser will accept and display correctly too."

                    > You can read up about character encoding when you feel up to it.

                    Bob, there really is not much to it. Most computers use a 255 character
                    alphabet - I'm ignoring extensions like UTF for the moment. In all these
                    the first 127 characters are identical and standardized by ASCII. The
                    top 128 ones, your ä ö ü é ê € µ and so on, can be all over the place.
                    This used to be more of a problem when the Macs, Ataris, Amigas DOS with
                    cp-437, DOS with cp-850 and so on all had sizeable market shares. As
                    long as you are using Windows and don't switch to cyrillic, greek,
                    hebrew or something like that, everything you type and display will be
                    encoded as cp-1252 (of which terms like AnsiNew and others are synonyms,
                    but not ANSI, Latin-1 or ISO 8859-1). So if you go and tell that to the
                    browsers rendering your pages, you'll be fine. If you don't, they or the
                    server have to guess and may guess wrong. That's all there is to it.

                    Axel
                  • loro
                    ... Axel, it s only the so called illegal range that s unique to the windows codepage. The rest, as the superscript characters at hand, are not. ... Trial and
                    Message 9 of 28 , Nov 10, 2008
                    • 0 Attachment
                      >We do know, Bob told us. He uses NoteTab and writes in his native
                      >Windows charset.

                      Axel, it's only the so called illegal range that's unique to the
                      windows codepage. The rest, as the superscript characters at hand, are not.

                      >If Bob can make the server send the correct HTTP headers, that's best.
                      >Only his provider can tell him that.

                      Trial and error works pretty well too. ;-)

                      >The two providers I'm using (one is
                      >my own choice and with the other I'm webmaster for someone else) don't,
                      >but at least allow me to stop them sending the wrong ones.

                      Do they let you use .htaccess but they don't let you use it to
                      declare the character encoding? That sounds strange and unusual indeed.


                      >encoded as cp-1252 (of which terms like AnsiNew and others are synonyms,
                      >but not ANSI, Latin-1 or ISO 8859-1). So if you go and tell that to the
                      >browsers rendering your pages, you'll be fine.

                      So he will with an iso latin charset.

                      I'll be quiet now. This doesn't lead anywhere and has very little to
                      do with Bob's question. Again, I'm sorry for this bickering. It
                      really wasn't my intention but that's how it turned out. I just
                      wanted Bob to get an answer to his question.

                      Lotta
                    Your message has been successfully submitted and would be delivered to recipients shortly.