Loading ...
Sorry, an error occurred while loading the content.

internalizing some external preliminary jobs

Expand Messages
  • Hiroshi Iwatani
    Hi all the venerable Vim seniors!The fileencodings option is a little bit shaky at least for the different Japanese encodings. Although Muraoka san has
    Message 1 of 6 , May 29, 2002
    • 0 Attachment
      Hi all the venerable Vim seniors!

      The fileencodings option is a little bit shaky at least for
      the different Japanese encodings. Although Muraoka san has
      given a couple of nifty plugins to us, we'd like to have a
      more clear cut way for discernig types of Japanese charset
      used in a file.

      Will you please give a glimpse on the attached bash shell
      script and C program. How could we implement an autocommand,
      a plugin, or whatever in order to embedd a similar functionality
      onto the Vim proper?

      :h autocmd and :h plugin do not give enough info on this issue.

      TIA
      Hiroshi Iwatani
      p.s. If you want to learn Japanese encodings quickly, see:
      http://web.lfw.org/text/jp.html
    • Colin Keith
      ... Okay well I know _nothing_ of Japanese so please forgive any stupid comments I might make, but according to the following code, isn t it just matching one
      Message 2 of 6 , May 29, 2002
      • 0 Attachment
        On Thu, May 30, 2002 at 08:49:52AM +0900, Hiroshi Iwatani wrote:
        > Hi all the venerable Vim seniors!

        :) Hi,

        Okay well I know _nothing_ of Japanese so please forgive any stupid
        comments I might make, but according to the following code, isn't it
        just matching one range of characters followed by another?


        if (ch >= 0x81 && ch <= 0x9F){
        // prev byte was Shift_JIS group I upper byte
        ch = fgetc(infile);
        if ((ch >= 0x40 && ch <= 0x7e) || // their lower byte values
        (ch >= 0x80 && ch <= 0xff)){
        return 0; // success return
        }
        } else if (ch >= 0xe0 && ch <= 0xea){ // group II upper byte
        ch = fgetc(infile);
        if (ch >= 0x40 && ch <= 0x7e){ // reliable lower byte values
        return 0;
        }
        } else if (ch >= 0xa1){
        ch = fgetc(infile);
        if (ch >= 0xa1){ // we can safely guess it as an EUC-JP
        return 1;
        }
        }

        I.e.

        function! IsSJIS()
        return search("[^V129-^V159][^V064-^V126^V128-^V255]|".
        \ "[^V224-^V234][^V064-^V126]|".
        \ "^V161^V161", 'W');
        :endfunction


        Obviously you'll need to convert the ^ and V sequence that I've typed to
        ctrl-v ... oh, unless anyone knows a better way to include ctrl characters
        in a regexp? Neither \0x00 nor \000 notations seem to work.

        Alternatively, if you have a version of Vim with Perl built into it, you
        might find that you can do this more easily with Perl, as per the Perl
        FAQ:

        perldoc -q 'How can I match strings with multibyte characters'

        So the above function would be:

        function! IsSJIS()
        perl <<EOF
        ...
        <perlcode>
        ...
        EOF
        :endfunction

        (sorry, but I'm too tired to work it out, but you could do the same as above)


        Oh, and of course to put these to use you'll need something like:


        " source IsSJIS.vim first
        au FileType * if IsSJIS() | set fileencodings=..... | endif


        Or you could push the variable setting into the function defined as IsSJIS()
        so instead you'd have:

        function! SetEncoding()
        if( search("[^V129-^V159][^V064-^V126^V128-^V255]|".
        \ "[^V224-^V234][^V064-^V126]|".
        \ "^V161^V161", 'W') )
        set fileencoding=.....
        set fileencodings=.....
        if has('multi_lang')
        set language=...
        endif
        endif
        :endfunction


        au FileType * :call SetEncoding()

        Of course you could always call the external program with fork() or system()
        and see what it returns...

        Colin.
      • Hiroshi Iwatani
        Thanks! Do you mean that Vim can do a trial read of a file before opening it as the editing target? Any way I ll try your recommended method and report the
        Message 3 of 6 , May 31, 2002
        • 0 Attachment
          Thanks!

          Do you mean that Vim can do a trial read of a file before
          opening it as the editing target?

          Any way I'll try your recommended method and report the
          result at my next free time.

          Hiroshi Iwatani

          Colin Keith wrote:
          > On Thu, May 30, 2002 at 08:49:52AM +0900, Hiroshi Iwatani wrote:
          >
          >>Hi all the venerable Vim seniors!
          >>
          >
          > :) Hi,
          >
          > Okay well I know _nothing_ of Japanese so please forgive any stupid
          > comments I might make, but according to the following code, isn't it
          > just matching one range of characters followed by another?
          >
          >
          > if (ch >= 0x81 && ch <= 0x9F){
          > // prev byte was Shift_JIS group I upper byte
          > ch = fgetc(infile);
          > if ((ch >= 0x40 && ch <= 0x7e) || // their lower byte values
          > (ch >= 0x80 && ch <= 0xff)){
          > return 0; // success return
          > }
          > } else if (ch >= 0xe0 && ch <= 0xea){ // group II upper byte
          > ch = fgetc(infile);
          > if (ch >= 0x40 && ch <= 0x7e){ // reliable lower byte values
          > return 0;
          > }
          > } else if (ch >= 0xa1){
          > ch = fgetc(infile);
          > if (ch >= 0xa1){ // we can safely guess it as an EUC-JP
          > return 1;
          > }
          > }
          >
          > I.e.
          >
          > function! IsSJIS()
          > return search("[^V129-^V159][^V064-^V126^V128-^V255]|".
          > \ "[^V224-^V234][^V064-^V126]|".
          > \ "^V161^V161", 'W');
          > :endfunction
          >
          >
          > Obviously you'll need to convert the ^ and V sequence that I've typed to
          > ctrl-v ... oh, unless anyone knows a better way to include ctrl characters
          > in a regexp? Neither \0x00 nor \000 notations seem to work.
          >
          > Alternatively, if you have a version of Vim with Perl built into it, you
          > might find that you can do this more easily with Perl, as per the Perl
          > FAQ:
          >
          > perldoc -q 'How can I match strings with multibyte characters'
          >
          > So the above function would be:
          >
          > function! IsSJIS()
          > perl <<EOF
          > ...
          > <perlcode>
          > ...
          > EOF
          > :endfunction
          >
          > (sorry, but I'm too tired to work it out, but you could do the same as above)
          >
          >
          > Oh, and of course to put these to use you'll need something like:
          >
          >
          > " source IsSJIS.vim first
          > au FileType * if IsSJIS() | set fileencodings=..... | endif
          >
          >
          > Or you could push the variable setting into the function defined as IsSJIS()
          > so instead you'd have:
          >
          > function! SetEncoding()
          > if( search("[^V129-^V159][^V064-^V126^V128-^V255]|".
          > \ "[^V224-^V234][^V064-^V126]|".
          > \ "^V161^V161", 'W') )
          > set fileencoding=.....
          > set fileencodings=.....
          > if has('multi_lang')
          > set language=...
          > endif
          > endif
          > :endfunction
          >
          >
          > au FileType * :call SetEncoding()
          >
          > Of course you could always call the external program with fork() or system()
          > and see what it returns...
          >
          > Colin.
          >
          >
        • Colin Keith
          ... Not as such. The autocmd event BufReadPre is called before the file is actually read, so you could cheat and use that to set the file encoding so that when
          Message 4 of 6 , May 31, 2002
          • 0 Attachment
            On Sat, Jun 01, 2002 at 10:18:20AM +0900, Hiroshi Iwatani wrote:
            > Do you mean that Vim can do a trial read of a file before
            > opening it as the editing target?

            Not as such. The autocmd event BufReadPre is called before the file is
            actually read, so you could cheat and use that to set the file encoding
            so that when vim loads the file, it knows how to handle the contents.


            au BufReadPre * :call IsISJS()

            The problem I can see is that because you need to read the file to determine
            the language encoding, you'll have to open it. That of course will trigger
            the above. In order to prevent loops, you'd need some flags set:

            function IsISJS()
            if exists('g:isisjs')
            return
            :endif

            let g:isijs=1
            ... do the actual check ..
            let g:isijs=0
            :endfunction


            Of course if you have that external program there's no reason you can't use
            that rather than having vim open then reopen it. Call it from a BufReadPre
            autocmd event ... ?
          • Bram Moolenaar
            ... Sounds like a very useful item. ... What we would need for the fileencodings option is a check if the proposed encoding fits with the bytes in the file.
            Message 5 of 6 , Jun 1, 2002
            • 0 Attachment
              Hiroshi Iwatani wrote:

              > The fileencodings option is a little bit shaky at least for
              > the different Japanese encodings. Although Muraoka san has
              > given a couple of nifty plugins to us, we'd like to have a
              > more clear cut way for discernig types of Japanese charset
              > used in a file.

              Sounds like a very useful item.

              > Will you please give a glimpse on the attached bash shell
              > script and C program. How could we implement an autocommand,
              > a plugin, or whatever in order to embedd a similar functionality
              > onto the Vim proper?

              What we would need for the 'fileencodings' option is a check if the
              proposed encoding fits with the bytes in the file. Thus if the next
              item in 'fileencodings' is "euc-jp", there would need to be a check if
              all bytes fall within this encoding. Same for shift-jis. Can you turn
              your issjis.c code into something that works like this? Adding this to
              fileio.c would then cause the next item in 'fileencodings' to be tried
              (rewinding the file) when the test fails.

              I would guess euc-jp can be tested by checking that after a byte in the
              range 0xA1-0xFE another byte in this range follows. It's not clear to
              me how to test for shift-jis.

              --
              hundred-and-one symptoms of being an internet addict:
              80. At parties, you introduce your spouse as your "service provider."

              /// Bram Moolenaar -- Bram@... -- http://www.moolenaar.net \\\
              /// Creator of Vim -- http://vim.sf.net -- ftp://ftp.vim.org/pub/vim \\\
              \\\ Project leader for A-A-P -- http://www.a-a-p.org ///
              \\\ Help me helping AIDS orphans in Uganda - http://iccf-holland.org ///
            • Hiroshi Iwatani
              Though I have yet to fully decipher Vim source code, it vaguely seems that the current implementation of the fileencodings option does not do its own effort in
              Message 6 of 6 , Jun 2, 2002
              • 0 Attachment
                Though I have yet to fully decipher Vim source code, it vaguely
                seems that the current implementation of the fileencodings option
                does not do its own effort in discerning the encoding of the
                file in hand, simply relying on the return values from the iconv
                instead. If it is as such, the true culprit of the option's
                maladroitness might be the GNU library.

                If Vim is to circumvent the situation, however, I believe a
                new specialized generic/general-purpose module should be
                established for finding out the encoding used in the file
                when he tries to open it as an editing target. To add a code
                specific to a particular language and encodings, which happen
                to be Japanese and its two major encodings in this case, into
                existing Vim souce file isn't felt as a better choice for the
                editor's long term health.

                Hiroshi Iwatani

                Bram Moolenaar wrote:
                > Hiroshi Iwatani wrote:
                >
                >
                >>The fileencodings option is a little bit shaky at least for
                >>the different Japanese encodings. Although Muraoka san has
                >>given a couple of nifty plugins to us, we'd like to have a
                >>more clear cut way for discernig types of Japanese charset
                >>used in a file.
                >>
                >
                > Sounds like a very useful item.
                >
                >
                >>Will you please give a glimpse on the attached bash shell
                >>script and C program. How could we implement an autocommand,
                >>a plugin, or whatever in order to embedd a similar functionality
                >>onto the Vim proper?
                >>
                >
                > What we would need for the 'fileencodings' option is a check if the
                > proposed encoding fits with the bytes in the file. Thus if the next
                > item in 'fileencodings' is "euc-jp", there would need to be a check if
                > all bytes fall within this encoding. Same for shift-jis. Can you turn
                > your issjis.c code into something that works like this? Adding this to
                > fileio.c would then cause the next item in 'fileencodings' to be tried
                > (rewinding the file) when the test fails.
                >
                > I would guess euc-jp can be tested by checking that after a byte in the
                > range 0xA1-0xFE another byte in this range follows. It's not clear to
                > me how to test for shift-jis.
                >
                >
              Your message has been successfully submitted and would be delivered to recipients shortly.