Loading ...
Sorry, an error occurred while loading the content.

Re: Vim can't use filename having MBYTE

Expand Messages
  • Bram Moolenaar
    ... Ah, thus vim_isfilec() is called before the chartab[] table has been initialized. That s wrong. ... Right. But has_mbyte will also not be set. Unless
    Message 1 of 10 , Aug 1, 2000
    • 0 Attachment
      Yasuhiro Matsumoto wrote:

      > This problem is not in the part only.
      > The cause is that calling "expand_wildcard" before
      > calling "set_init_3". This mean that no initializing
      > multi-byte table.

      Ah, thus vim_isfilec() is called before the chartab[] table has been
      initialized. That's wrong.

      > Thus, when analysing file-name arguments, it can't judge multi-byte right.

      Right. But has_mbyte will also not be set. Unless you specified it on the
      command line, which is not likely to happen.

      > I attach diff-file on this mail. (change order of calling function)
      > But this is not certain.
      > I want any one's suggestion.

      Changing the order of initializations is very dangerous. In this case the
      vimrc file can do just about anything with file specified in the command line,
      this it should be run with a valid set of arguments. But since 'isfname' can
      be set in the vimrc, expanding may work differently after it.

      This is a tough chicken-egg problem. I really don't know a good solution.
      Perhaps it would be sufficient to initialise chartab[] first, assuming that
      all characters above 0x80 are filename characters?

      --
      hundred-and-one symptoms of being an internet addict:
      56. You leave the modem speaker on after connecting because you think it
      sounds like the ocean wind...the perfect soundtrack for "surfing the net".

      /// Bram Moolenaar Bram@... http://www.moolenaar.net \\\
      \\\ Vim: http://www.vim.org ICCF Holland: http://iccf-holland.org ///
    • mattn@mail.goo.ne.jp
      ... I hope that setting p_cc before calling rem_backslash . I do set fileencoding=japan in vimrc, Is it too later? ... Japanese people use few characterset.
      Message 2 of 10 , Aug 1, 2000
      • 0 Attachment
        Bram@... wrote:
        > Changing the order of initializations is very dangerous. In this case the
        > vimrc file can do just about anything with file specified in the command line,
        > this it should be run with a valid set of arguments. But since 'isfname' can
        > be set in the vimrc, expanding may work differently after it.

        I hope that setting p_cc before calling "rem_backslash".
        I do "set fileencoding=japan" in vimrc, Is it too later?

        > This is a tough chicken-egg problem. I really don't know a good solution.
        > Perhaps it would be sufficient to initialise chartab[] first, assuming that
        > all characters above 0x80 are filename characters?

        Japanese people use few characterset.
        It is called "shift_jis", "euc-jp", etc.
        Japanese MS-Windows use "shift_jis", but there is UNIX used "euc-jp".
        They are composed with lead-byte and trail-byte.

        If using "shift_jis", trail-byte don't have backslash.
        But using "euc-jp", trail-byte may have backslash.
        Thus, it is necessary for skipping trail-byte by lead-byte and way of encoding.
        However, both have something in using non-ascii for lead-byte.
        So how about below solution?

        If on specified backslash lead-char, (it is not path-separator.)
        next character should be single-byte character. It may be ascii.

        ---------------------------------------------
        rem_backslash(str)
        char_u *str;
        {
        #ifdef BACKSLASH_IN_FILENAME
        return (str[0] == '\\'
        #ifdef FEAT_MBYTE
        && isascii(str[1])
        #endif
        && (str[1] == ' '
        || (str[1] != NUL
        && str[1] != '*'
        && str[1] != '?'
        && !vim_isfilec(str[1]))));
        #else
        return (str[0] == '\\' && str[1] != NUL);
        #endif
        }
        ---------------------------------------------

        Yasuhiro Matsumoto
      • Bram Moolenaar
        ... You would set fileencoding or charcode (which are really the same thing) in your vimrc file. But in your vimrc file you may also want to check which
        Message 3 of 10 , Aug 3, 2000
        • 0 Attachment
          Yasuhiro Matsumoto wrote:

          > Bram@... wrote:
          > > Changing the order of initializations is very dangerous. In this case the
          > > vimrc file can do just about anything with file specified in the command
          > > line, this it should be run with a valid set of arguments. But since
          > > 'isfname' can be set in the vimrc, expanding may work differently after
          > > it.
          >
          > I hope that setting p_cc before calling "rem_backslash".
          > I do "set fileencoding=japan" in vimrc, Is it too later?

          You would set 'fileencoding' or 'charcode' (which are really the same thing)
          in your vimrc file. But in your vimrc file you may also want to check which
          arguments Vim got and they need to be expanded before sourcing the vimrc file.
          That won't work.

          A very clever solution would be not to expand wildcards in the arguments until
          they are used. Thus if you would set 'charcode' in your vimrc and then access
          an argument, the expansion would be done right there. I'm not sure if this is
          really possible though.

          > > This is a tough chicken-egg problem. I really don't know a good solution.
          > > Perhaps it would be sufficient to initialise chartab[] first, assuming that
          > > all characters above 0x80 are filename characters?
          >
          > Japanese people use few characterset.
          > It is called "shift_jis", "euc-jp", etc.
          > Japanese MS-Windows use "shift_jis", but there is UNIX used "euc-jp".
          > They are composed with lead-byte and trail-byte.
          >
          > If using "shift_jis", trail-byte don't have backslash.
          > But using "euc-jp", trail-byte may have backslash.
          > Thus, it is necessary for skipping trail-byte by lead-byte and way of
          > encoding. However, both have something in using non-ascii for lead-byte.

          I see the problem. It's near to impossible to guess that a backslash is
          really the second byte of a multi-byte character. Might have been a
          ISO-8859-1 character followed by a backslash.

          > So how about below solution?

          It might solve it in some cases, but not when the double-byte character that
          contains a backslash is followed by an ascii character. I don't think this is
          reliable enough.

          --
          hundred-and-one symptoms of being an internet addict:
          78. You find yourself dialing IP numbers on the phone.

          /// Bram Moolenaar Bram@... http://www.moolenaar.net \\\
          \\\ Vim: http://www.vim.org ICCF Holland: http://iccf-holland.org ///
        • mattn@mail.goo.ne.jp
          ... rem_backslash(str) char_u *str; { #ifdef BACKSLASH_IN_FILENAME return (str[0] == #ifdef FEAT_MBYTE && isascii(str[1]) #endif && (str[1] == ... &&
          Message 4 of 10 , Aug 3, 2000
          • 0 Attachment
            Bram@... wrote:
            > > So how about below solution?
            >
            > It might solve it in some cases, but not when the double-byte character that
            > contains a backslash is followed by an ascii character. I don't think this is
            > reliable enough.

            ---------------------------------------------
            rem_backslash(str)
            char_u *str;
            {
            #ifdef BACKSLASH_IN_FILENAME
            return (str[0] == '\\'
            #ifdef FEAT_MBYTE
            && isascii(str[1])
            #endif
            && (str[1] == ' '
            || (str[1] != NUL
            && str[1] != '*'
            && str[1] != '?'
            && !vim_isfilec(str[1]))));
            #else
            return (str[0] == '\\' && str[1] != NUL);
            #endif
            }
            ---------------------------------------------

            I think eough.
            If I liken multi-byte characters to "[A][B][C][D][E][F]",
            it will work as below.("[B]" and "[C]", [F] contain backslash)

            ex: C:\[A][B]\ [C]\I_love_[D][E][F]s.txt
            (1)
            (2)
            (3)
            (4)
            (5)
            (6)

            In this case, there is six check-point.
            (Your said is (6)?)

            (1) Next character is an ascii, but it is not "*", "?", " ".
            ---> this is not specified backslash.

            (2) Next character is a non-ascii,
            ---> this is not specified backslash.

            (3) Next character is an ascii, and it is " "
            ---> this is specified backslash.

            (4) Next character is an ascii, but it is not "*", "?", " ".
            ---> this is not specified backslash.

            (5) Next character is an ascii, but it is not "*", "?", " ".
            ---> this is not specified backslash.

            (6) Next character is an ascii, but it is not "*", "?", " ".
            ---> this is not specified backslash.
          • Bram Moolenaar
            ... You are right, when [F] is a double-byte character and the second byte is a backslash: [x ]s the backslash would be kept. I see one remaining problem at
            Message 5 of 10 , Aug 4, 2000
            • 0 Attachment
              Yasuhiro Matsumoto wrote:

              > I think eough.
              > If I liken multi-byte characters to "[A][B][C][D][E][F]",
              > it will work as below.("[B]" and "[C]", [F] contain backslash)
              >
              > ex: C:\[A][B]\ [C]\I_love_[D][E][F]s.txt

              You are right, when [F] is a double-byte character and the second byte is a
              backslash: "[x\]s" the backslash would be kept.

              I see one remaining problem at [C], but I'm not sure if it actually happens:

              [x\]\

              The double backslash would be reduced to one. Hmm, the comment about this
              appears to be wrong, a double backslash would be kept. It would work OK then.

              --
              To be rich is not the end, but only a change of worries.

              /// Bram Moolenaar Bram@... http://www.moolenaar.net \\\
              \\\ Vim: http://www.vim.org ICCF Holland: http://iccf-holland.org ///
            Your message has been successfully submitted and would be delivered to recipients shortly.