Loading ...
Sorry, an error occurred while loading the content.

Re: Editing iso-8859-8 files in UTF-8 locale (fwd)

Expand Messages
  • Ron Aaron
    ... Zvi - You can do this: set fencs=utf-8,iso8859-8,latin1 which will make vim see a -8 encoded file (if possible) first. This means that any file with a
    Message 1 of 14 , Feb 3 4:21 PM
    • 0 Attachment
      Zvi Har'El <rl@...> writes:
      >
      >I started recently to use vim in a UTF-8 xterm, and encountered the following
      >problem: I have several ISO8859-8 files which I have to maintain. The only way
      >to edit them is using the vim command
      >
      >:e ++enc=iso8859-8 filename
      >
      >There is no way that I know to put the ++enc in a modline, or to do something
      >like
      >
      >vi, +++enc=iso8859-8 filename
      >
      >so that I could define an alias for vim on this kind of files. Is there a
      >feature which I don't see, or should I ask to put it on the wish list? This is
      >ofcourse not specific to iso8859-8, but to all iso8859-* encodings except
      >latin1, which is the default file encoding for that. BTW, I have
      >LC_CTYPE=he_IL.UTF-8 in my environment, so while vim recognizes automatically
      >that the file is iso8859-*, why is the default iso8859-1? It should be taken
      >from my locale, and set tobe iso8859-8!

      Zvi -

      You can do this:

      set fencs=utf-8,iso8859-8,latin1

      which will make vim "see" a -8 encoded file (if possible) first. This means
      that any file with a char(224) etc will be seen as -8 instead of latin1, but
      that is probably appropriate for your useage.

      Best regards,

      Ron
    • Tomas Zellerin
      ... Just out of curiosity, what is use of the latin1 at the end? Is there any valid latin-1 file that is not also valid iso-8859-8? Actually, I had similar
      Message 2 of 14 , Feb 3 11:36 PM
      • 0 Attachment
        On Sun, Feb 03, 2002 at 04:21:08PM -0800, Ron Aaron wrote:
        > Zvi Har'El <rl@...> writes:
        > >
        > ...
        >
        > Zvi -
        >
        > You can do this:
        >
        > set fencs=utf-8,iso8859-8,latin1
        >
        > which will make vim "see" a -8 encoded file (if possible) first. This means
        > that any file with a char(224) etc will be seen as -8 instead of latin1, but
        > that is probably appropriate for your useage.
        >
        Just out of curiosity, what is use of the latin1 at the end? Is there
        any valid latin-1 file that is not also valid iso-8859-8?

        Actually, I had similar problem recently; most files I came into contact
        with are in latin2, but few from my windows-based coworkers came (and
        should stay) in cp1250. The fileencodings are not sufficient to
        distinguish, so I wrote (as Bram suggested) a plugin with autocommand that
        checks certain places (last line in .txt files, meta charset in html
        files) and re-inputs file in proper encoding. I dont have it at hand
        just now, but can send it later if you are interested.

        regards

        Tomas Zellerin
      • Zvi Har'El
        ... Actually, iso8859-8 has almost no legal character in the 0xc0..0xdf range. I don t though is this is considered when chosing the iso8859-8 file encoding.
        Message 3 of 14 , Feb 4 12:25 AM
        • 0 Attachment
          On Mon, 4 Feb 2002, Tomas Zellerin wrote:

          > Just out of curiosity, what is use of the latin1 at the end? Is there
          > any valid latin-1 file that is not also valid iso-8859-8?

          Actually, iso8859-8 has almost no legal character in the 0xc0..0xdf range.
          I don't though is this is considered when chosing the iso8859-8 file encoding.

          >
          > Actually, I had similar problem recently; most files I came into contact
          > with are in latin2, but few from my windows-based coworkers came (and
          > should stay) in cp1250. The fileencodings are not sufficient to
          > distinguish, so I wrote (as Bram suggested) a plugin with autocommand that
          > checks certain places (last line in .txt files, meta charset in html
          > files) and re-inputs file in proper encoding. I dont have it at hand
          > just now, but can send it later if you are interested.
          >
          > regards
          >
          > Tomas Zellerin
          >
          I'll be very grateful to get it, either here or off-list. The part about meta
          charset should, I beleive, become a standard part of the html syntax file.

          --
          Dr. Zvi Har'El mailto:rl@... Department of Mathematics
          tel:+972-54-227607 Technion - Israel Institute of Technology
          fax:+972-4-8324654 http://www.math.technion.ac.il/~rl/ Haifa 32000, ISRAEL
          "If you can't say somethin' nice, don't say nothin' at all." -- Thumper (1942)
          Monday, 22 Shevat 5762, 4 February 2002, 10:18AM
        • Tomas Zellerin
          ... Here it is; it was originally self-use only hack, so it may need further adjustments. Comments are welcome, of course. I would not allow this to standard
          Message 4 of 14 , Feb 4 1:07 AM
          • 0 Attachment
            On Mon, Feb 04, 2002 at 10:25:10AM +0200, Zvi Har'El wrote:
            > > ... The fileencodings are not sufficient to
            > > distinguish, so I wrote (as Bram suggested) a plugin with autocommand that
            > > checks certain places (last line in .txt files, meta charset in html
            > > files) and re-inputs file in proper encoding. I dont have it at hand
            > > just now, but can send it later if you are interested.
            > >
            > I'll be very grateful to get it, either here or off-list. The part about meta
            > charset should, I beleive, become a standard part of the html syntax file.
            >
            Here it is; it was originally self-use only hack, so it may need further
            adjustments. Comments are welcome, of course. I would not allow this to
            standard syntax file unless it really matures, but I may put it on
            vim.sf.net.

            Last line of the script illustrates what should be on the last line of a
            txt file.

            Tomas Zellerin
          • Zvi Har'El
            ... Ron, Very good idea. I now use it, and it works great. I really cannot understand how vim decides (correctly) the certain files are latin1 and not
            Message 5 of 14 , Feb 4 2:35 AM
            • 0 Attachment
              On Sun, 3 Feb 2002, Ron Aaron wrote:

              > Zvi Har'El <rl@...> writes:
              > >
              > >I started recently to use vim in a UTF-8 xterm, and encountered the following
              > >problem: I have several ISO8859-8 files which I have to maintain. The only way
              > >to edit them is using the vim command
              > >
              > >:e ++enc=iso8859-8 filename
              > >
              > >There is no way that I know to put the ++enc in a modline, or to do something
              > >like
              > >
              > >vi, +++enc=iso8859-8 filename
              > >
              > >so that I could define an alias for vim on this kind of files. Is there a
              > >feature which I don't see, or should I ask to put it on the wish list? This is
              > >ofcourse not specific to iso8859-8, but to all iso8859-* encodings except
              > >latin1, which is the default file encoding for that. BTW, I have
              > >LC_CTYPE=he_IL.UTF-8 in my environment, so while vim recognizes automatically
              > >that the file is iso8859-*, why is the default iso8859-1? It should be taken
              > >from my locale, and set tobe iso8859-8!
              >
              > Zvi -
              >
              > You can do this:
              >
              > set fencs=utf-8,iso8859-8,latin1
              >
              > which will make vim "see" a -8 encoded file (if possible) first. This means
              > that any file with a char(224) etc will be seen as -8 instead of latin1, but
              > that is probably appropriate for your useage.
              >
              > Best regards,
              >
              > Ron
              >

              Ron,

              Very good idea. I now use it, and it works great. I really cannot understand
              how vim decides (correctly) the certain files are latin1 and not iso8859-8,
              and any hint will help.

              Best,

              Zvi.

              --
              Dr. Zvi Har'El mailto:rl@... Department of Mathematics
              tel:+972-54-227607 Technion - Israel Institute of Technology
              fax:+972-4-8324654 http://www.math.technion.ac.il/~rl/ Haifa 32000, ISRAEL
              "If you can't say somethin' nice, don't say nothin' at all." -- Thumper (1942)
              Monday, 22 Shevat 5762, 4 February 2002, 12:31PM
            • Ron Aaron
              ... ... nice plugin snipped ... What s the problem with putting enc=cpxxxx in the modeline? That s what I do, for obstinant files, and it works a charm.
              Message 6 of 14 , Feb 4 6:41 AM
              • 0 Attachment
                Tomas Zellerin <zellerin@...> writes:
                >" charset.vim
                >" vim plugin for autosetting charset
                ... nice plugin snipped ...

                What's the problem with putting 'enc=cpxxxx' in the modeline? That's what I
                do, for obstinant files, and it works a charm.

                Best regards,

                Ron
              • Ron Aaron
                ... It is simple; -8 files do not have characters below char(224) and above char(128), but latin1 files might. This autodetect won t always work, but mostly it
                Message 7 of 14 , Feb 4 6:43 AM
                • 0 Attachment
                  Zvi Har'El <rl@...> writes:
                  >On Sun, 3 Feb 2002, Ron Aaron wrote:
                  >
                  >> You can do this:
                  >>
                  >> set fencs=utf-8,iso8859-8,latin1
                  >>
                  >> which will make vim "see" a -8 encoded file (if possible) first. This means
                  >> that any file with a char(224) etc will be seen as -8 instead of latin1, but
                  >> that is probably appropriate for your useage.
                  >
                  >Ron,
                  >
                  >Very good idea. I now use it, and it works great. I really cannot understand
                  >how vim decides (correctly) the certain files are latin1 and not iso8859-8,
                  >and any hint will help.

                  It is simple; -8 files do not have characters below char(224) and above
                  char(128), but latin1 files might.

                  This autodetect won't always work, but mostly it does.

                  Best regards,

                  Ron
                • Tomas Zellerin
                  ... I think enc is a global option; it is the internal encoding used by vim. Thus, it can be only one per vim running, and I would be even ready for strange
                  Message 8 of 14 , Feb 4 6:47 AM
                  • 0 Attachment
                    On Mon, Feb 04, 2002 at 06:41:15AM -0800, Ron Aaron wrote:
                    > Tomas Zellerin <zellerin@...> writes:
                    > >" charset.vim
                    > >" vim plugin for autosetting charset
                    > ... nice plugin snipped ...
                    >
                    > What's the problem with putting 'enc=cpxxxx' in the modeline? That's what I
                    > do, for obstinant files, and it works a charm.
                    >
                    I think 'enc' is a global option; it is the internal encoding used by
                    vim. Thus, it can be only one per vim running, and I would be even ready
                    for strange thinks happening when changing it in mid-session. Am i
                    wrong, is it really so easy?
                  • Ron Aaron
                    ... Oh, my mistake: I meant to say, fenc=cpxxx . Ron
                    Message 9 of 14 , Feb 4 8:31 AM
                    • 0 Attachment
                      Tomas Zellerin <zellerin@...> writes:
                      >On Mon, Feb 04, 2002 at 06:41:15AM -0800, Ron Aaron wrote:
                      >> Tomas Zellerin <zellerin@...> writes:
                      >> >" charset.vim
                      >> >" vim plugin for autosetting charset
                      >> ... nice plugin snipped ...
                      >>
                      >> What's the problem with putting 'enc=cpxxxx' in the modeline? That's what I
                      >> do, for obstinant files, and it works a charm.
                      >>
                      >I think 'enc' is a global option; it is the internal encoding used by
                      >vim. Thus, it can be only one per vim running, and I would be even ready
                      >for strange thinks happening when changing it in mid-session. Am i
                      >wrong, is it really so easy?

                      Oh, my mistake: I meant to say, 'fenc=cpxxx'. <blush>

                      Ron
                    • Tomas Zellerin
                      ... On my machine, fenc=xxx in modelines does this: - first, read file into buffer - then, change fenc as if changed from command mode, i.e., flag the file as
                      Message 10 of 14 , Feb 4 11:37 PM
                      • 0 Attachment
                        On Mon, Feb 04, 2002 at 08:31:40AM -0800, Ron Aaron wrote:
                        >
                        > Oh, my mistake: I meant to say, 'fenc=cpxxx'. <blush>
                        >
                        On my machine, fenc=xxx in modelines does this:
                        - first, read file into buffer
                        - then, change fenc as if changed from command mode, i.e., flag
                        the file as modified and later save it (or rather, the
                        approximation based on wrong encoding used when read) in
                        different encoding.
                        Not what I need. Do you have different experience?

                        As a side-note, I uploaded modified (I hope to better) version of script
                        to vim.sf.net. I still have there done one thing (probably, among other)
                        in rather clumsy way. Anybody will give an advice?

                        Problem: perform search() in limited range of lines (e.g., 1-n, or
                        1-/<\/head>/)
                        My solution: jump to n or /<head>/ and use flags "nowrap" and
                        "backwards" in search
                        Proper solution: ???
                      • Zvi Har'El
                        ... fenc does not help in a modline! the only thing that helps in a modline is filencodings (plural), and then you need to do :e! to make it work. From
                        Message 11 of 14 , Feb 5 12:43 AM
                        • 0 Attachment
                          On Mon, 4 Feb 2002, Ron Aaron wrote:

                          > Tomas Zellerin <zellerin@...> writes:
                          > >On Mon, Feb 04, 2002 at 06:41:15AM -0800, Ron Aaron wrote:
                          > >> Tomas Zellerin <zellerin@...> writes:
                          > >> >" charset.vim
                          > >> >" vim plugin for autosetting charset
                          > >> ... nice plugin snipped ...
                          > >>
                          > >> What's the problem with putting 'enc=cpxxxx' in the modeline? That's what I
                          > >> do, for obstinant files, and it works a charm.
                          > >>
                          > >I think 'enc' is a global option; it is the internal encoding used by
                          > >vim. Thus, it can be only one per vim running, and I would be even ready
                          > >for strange thinks happening when changing it in mid-session. Am i
                          > >wrong, is it really so easy?
                          >
                          > Oh, my mistake: I meant to say, 'fenc=cpxxx'. <blush>
                          >
                          > Ron
                          >
                          fenc does not help in a modline! the only thing that helps in a modline is
                          filencodings (plural), and then you need to do :e! to make it work. From
                          options.txt:

                          When reading a file 'fileencoding' will be set from 'fileencodings'.
                          To read a file in a certain encoding it won't work by setting
                          'fileencoding', use the |++enc| argument.


                          --
                          Dr. Zvi Har'El mailto:rl@... Department of Mathematics
                          tel:+972-54-227607 Technion - Israel Institute of Technology
                          fax:+972-4-8324654 http://www.math.technion.ac.il/~rl/ Haifa 32000, ISRAEL
                          "If you can't say somethin' nice, don't say nothin' at all." -- Thumper (1942)
                          Tuesday, 23 Shevat 5762, 5 February 2002, 10:40AM
                        • Ron Aaron
                          ... I respectfully disagree. I have a file with: /* vim: fenc=cp1255 tw=78 : */ as the modeline, and it is read as I expect. Try it! I changed the modline to
                          Message 12 of 14 , Feb 5 10:07 AM
                          • 0 Attachment
                            Zvi Har'El <rl@...> writes:
                            >On Mon, 4 Feb 2002, Ron Aaron wrote:
                            >
                            >> Oh, my mistake: I meant to say, 'fenc=cpxxx'. <blush>
                            >>
                            >> Ron
                            >>
                            >fenc does not help in a modline! the only thing that helps in a modline is
                            >filencodings (plural), and then you need to do :e! to make it work. From
                            >options.txt:


                            I respectfully disagree. I have a file with:

                            /* vim: fenc=cp1255 tw=78 :
                            */

                            as the modeline, and it is read as I expect.

                            Try it! I changed the modline to fenc=cp1252, and saved the file and did :e!
                            and the fenc of the file was now cp1252. It does work...

                            Ron
                          • Zvi Har'El
                            ... I tried it. I have put on top of my html, and started vim (in a UTF8 locale) *without* my new setting of fileencodings
                            Message 13 of 14 , Feb 5 12:54 PM
                            • 0 Attachment
                              On Tue, 5 Feb 2002, Ron Aaron wrote:

                              > Zvi Har'El <rl@...> writes:
                              > >On Mon, 4 Feb 2002, Ron Aaron wrote:
                              > >
                              > >> Oh, my mistake: I meant to say, 'fenc=cpxxx'. <blush>
                              > >>
                              > >> Ron
                              > >>
                              > >fenc does not help in a modline! the only thing that helps in a modline is
                              > >filencodings (plural), and then you need to do :e! to make it work. From
                              > >options.txt:
                              >
                              >
                              > I respectfully disagree. I have a file with:
                              >
                              > /* vim: fenc=cp1255 tw=78 :
                              > */
                              >
                              > as the modeline, and it is read as I expect.
                              >
                              > Try it! I changed the modline to fenc=cp1252, and saved the file and did :e!
                              > and the fenc of the file was now cp1252. It does work...
                              >
                              > Ron
                              >

                              I tried it. I have put
                              <!--vim: set fenc=iso-8859-8 : -->
                              on top of my html, and started vim (in a UTF8 locale) *without* my new setting
                              of fileencodings (i.e., typed ":set fencs=utf-8,latin1" to clear my new setting
                              which is now "fencs=utf-8,iso-8859-8,latin1").

                              I now read in my HTML using ":e" . The result was the file was recognized as
                              latin1 encoded, read in and reencoded internally in utf8, so all the Hebrew
                              character "turned French", but fenc was set to iso-8859-8 because of my
                              modeline. Saving the file now causes an error:

                              write error, conversion failed
                              WARNING: Original file may be lost or damaged
                              don't quit the editor until the file is successfully written!

                              The file was completely truncated (zero bytes)!

                              I don't understand completely what worked for you, but as I quoted from the
                              help file options.txt,

                              "When reading a file 'fileencoding' will be set from 'fileencodings'.
                              To read a file in a certain encoding it won't work by setting
                              'fileencoding', use the |++enc| argument."

                              So, the only methods I now have is either use ":e ++fenc=iso-8859-8" or the
                              setting of "fencs=utf8,iso-8859-8,latin1" as you suggested. modelines do not
                              work for me, at least.

                              Best,

                              Zvi.

                              --
                              Dr. Zvi Har'El mailto:rl@... Department of Mathematics
                              tel:+972-54-227607 Technion - Israel Institute of Technology
                              fax:+972-4-8324654 http://www.math.technion.ac.il/~rl/ Haifa 32000, ISRAEL
                              "If you can't say somethin' nice, don't say nothin' at all." -- Thumper (1942)
                              Tuesday, 24 Shevat 5762, 5 February 2002, 10:31PM
                            Your message has been successfully submitted and would be delivered to recipients shortly.