Loading ...
Sorry, an error occurred while loading the content.

Re: [patch] Handle files with mixed LF - CRLF line endings when doing tag search

Expand Messages
  • Lech Lorens
    ... For an answer please read what Bram wrote and my response to his points. -- Lech Lorens -- -- You received this message from the vim_dev maillist. Do not
    Message 1 of 16 , Apr 2, 2013
    • 0 Attachment
      On 02-Apr-2013 Nazri Ramliy <ayiehere@...> wrote:
      > On Mon, Apr 1, 2013 at 11:16 PM, Lech Lorens <lech.lorens@...> wrote:
      > > The problem is that if Vim reads such a file with ff=unix, it will fail
      > > to find tags if the tag pattern searched should match on a DOS-style
      > > line. The attached patch handles the problem in a naïve but surprisingly
      > > effective way: if a pattern search fails, Vim will try putting "\r\*"
      > > before the last "$" in the pattern and will retry the search.
      >
      > I'd rather have vim scream when it detects that a file is broken like
      > this so I can cure the file.
      >
      > Isn't it better to fix the problem, rather than the symptom?
      >
      > nazri

      For an answer please read what Bram wrote and my response to his points.

      --
      Lech Lorens

      --
      --
      You received this message from the "vim_dev" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php

      ---
      You received this message because you are subscribed to the Google Groups "vim_dev" group.
      To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
      For more options, visit https://groups.google.com/groups/opt_out.
    • glts
      ... I understood Bram s suggestion to mean you could postprocess your tags file, after it has been generated. This would be fast, no matter the size of your
      Message 2 of 16 , Apr 2, 2013
      • 0 Attachment
        On Tue, Apr 2, 2013 at 1:59 AM, Lech Lorens <lech.lorens@...> wrote:
        > On 01-Apr-2013 Bram Moolenaar <Bram@...> wrote:
        >> How about this alternative: Filter your tags file to change the patterns
        >> to include an optional CR before the $: \r\=$
        >
        > This is something I would prefer to avoid. My current situation is that
        > to prepare a tags file I process a few millions lines of code (this is
        > after filtering only the relevant parts from about 80 millions LOC) and
        > it takes ages to run ctags and cscope. I would rather not cause this
        > step to take longer and would prefer a 200 ms search instead of a 100 ms
        > one whenever I execute ":tag foo".

        I understood Bram's suggestion to mean you could postprocess your tags
        file, after it has been generated. This would be fast, no matter the
        size of your codebase. For example for Exuberant ctags:

        /^\([^\t]\+\t[^\t]\+\t[^\t]\+\)\(\$\%[/;"].*\)
        :%s//\1\\r\\=\2/

        I feel this is a situation where a few lines of shell or Vim script
        would be most appropriate.

        --
        --
        You received this message from the "vim_dev" maillist.
        Do not top-post! Type your reply below the text you are replying to.
        For more information, visit http://www.vim.org/maillist.php

        ---
        You received this message because you are subscribed to the Google Groups "vim_dev" group.
        To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
        For more options, visit https://groups.google.com/groups/opt_out.
      • Ben Fritz
        ... It s quite frequent where I work. I even have an autocmd that reloads the file in DOS format if it detects mixed line endings. It sets nomodified but
        Message 3 of 16 , Apr 2, 2013
        • 0 Attachment
          On Monday, April 1, 2013 11:43:38 AM UTC-5, Bram Moolenaar wrote:
          >
          > I wonder how many users actually run into files where only some lines
          >
          > end in a CR. I would consider such a file broken, and first thing would
          >
          > be to strip them all off.
          >

          It's quite frequent where I work. I even have an autocmd that reloads the file in DOS format if it detects mixed line endings. It sets "nomodified" but doesn't save, so if I don't make any further changes, the file on-disk remains unchanged.

          The problem is that many other editors, including Visual Studio and UltraEdit, may read in Unix file format correctly, but depending on how they are configured, will insert Windows line endings on any *new* lines. UltraEdit will even preserve line ending style of any copy-pasted text. That *sounds* like a feature but in reality it is incredibly annoying.

          Do I understand correctly, that I won't see the problem, because I already force-reload the file to remove the mixed line endings?

          --
          --
          You received this message from the "vim_dev" maillist.
          Do not top-post! Type your reply below the text you are replying to.
          For more information, visit http://www.vim.org/maillist.php

          ---
          You received this message because you are subscribed to the Google Groups "vim_dev" group.
          To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
          For more options, visit https://groups.google.com/groups/opt_out.
        • Bovy, Stephen
          Windows has always been an incredibly annoying platform to work on (sigh) But this only becomes a head-ache when you need to support cross-platform code We
          Message 4 of 16 , Apr 2, 2013
          • 0 Attachment
            Windows has always been an incredibly annoying platform to work on (sigh)

            But this only becomes a head-ache when you need to support cross-platform code

            We use clearcase ( and I think we use the property that saves everything in unix format )

            But the editor problem described below is very much a problem

            -----Original Message-----
            From: vim_dev@... [mailto:vim_dev@...] On Behalf Of Ben Fritz
            Sent: Tuesday, April 02, 2013 4:42 AM
            To: vim_dev@...
            Cc: Lech Lorens
            Subject: Re: [patch] Handle files with mixed LF - CRLF line endings when doing tag search

            On Monday, April 1, 2013 11:43:38 AM UTC-5, Bram Moolenaar wrote:
            >
            > I wonder how many users actually run into files where only some lines
            >
            > end in a CR. I would consider such a file broken, and first thing would
            >
            > be to strip them all off.
            >

            It's quite frequent where I work. I even have an autocmd that reloads the file in DOS format if it detects mixed line endings. It sets "nomodified" but doesn't save, so if I don't make any further changes, the file on-disk remains unchanged.

            The problem is that many other editors, including Visual Studio and UltraEdit, may read in Unix file format correctly, but depending on how they are configured, will insert Windows line endings on any *new* lines. UltraEdit will even preserve line ending style of any copy-pasted text. That *sounds* like a feature but in reality it is incredibly annoying.

            Do I understand correctly, that I won't see the problem, because I already force-reload the file to remove the mixed line endings?

            --
            --
            You received this message from the "vim_dev" maillist.
            Do not top-post! Type your reply below the text you are replying to.
            For more information, visit http://www.vim.org/maillist.php

            ---
            You received this message because you are subscribed to the Google Groups "vim_dev" group.
            To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
            For more options, visit https://groups.google.com/groups/opt_out.


            --
            --
            You received this message from the "vim_dev" maillist.
            Do not top-post! Type your reply below the text you are replying to.
            For more information, visit http://www.vim.org/maillist.php

            ---
            You received this message because you are subscribed to the Google Groups "vim_dev" group.
            To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
            For more options, visit https://groups.google.com/groups/opt_out.
          • Lech Lorens
            ... Vim takes about 20 seconds to fix my 200 MB tags file. Perl takes about 6 seconds. That s still ages on an AFAICT high-end Core i7. Too long for my taste
            Message 5 of 16 , Apr 3, 2013
            • 0 Attachment
              On 02-Apr-2013 glts <676c7473@...> wrote:
              > On Tue, Apr 2, 2013 at 1:59 AM, Lech Lorens <lech.lorens@...> wrote:
              > > On 01-Apr-2013 Bram Moolenaar <Bram@...> wrote:
              > >> How about this alternative: Filter your tags file to change the patterns
              > >> to include an optional CR before the $: \r\=$
              > >
              > > This is something I would prefer to avoid. My current situation is that
              > > to prepare a tags file I process a few millions lines of code (this is
              > > after filtering only the relevant parts from about 80 millions LOC) and
              > > it takes ages to run ctags and cscope. I would rather not cause this
              > > step to take longer and would prefer a 200 ms search instead of a 100 ms
              > > one whenever I execute ":tag foo".
              >
              > I understood Bram's suggestion to mean you could postprocess your tags
              > file, after it has been generated. This would be fast, no matter the
              > size of your codebase. For example for Exuberant ctags:
              >
              > /^\([^\t]\+\t[^\t]\+\t[^\t]\+\)\(\$\%[/;"].*\)
              > :%s//\1\\r\\=\2/
              >
              > I feel this is a situation where a few lines of shell or Vim script
              > would be most appropriate.

              Vim takes about 20 seconds to fix my 200 MB tags file. Perl takes about
              6 seconds. That's still ages on an AFAICT high-end Core i7. Too long for
              my taste – I'll stick with my patched Vim.

              I wonder however, if all the people that tried Vim and decided it was
              not suitable for the task tried the simple and obvious most appropriate
              one-liner before giving up and choosing an editor that just worked.

              --
              Lech Lorens

              --
              --
              You received this message from the "vim_dev" maillist.
              Do not top-post! Type your reply below the text you are replying to.
              For more information, visit http://www.vim.org/maillist.php

              ---
              You received this message because you are subscribed to the Google Groups "vim_dev" group.
              To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
              For more options, visit https://groups.google.com/groups/opt_out.
            • Lech Lorens
              ... But it is a kind of a half-solution. When you modify a single line in the file and write it, you end up with a number of changed lines. How do you cope
              Message 6 of 16 , Apr 3, 2013
              • 0 Attachment
                On 02-Apr-2013 Ben Fritz <fritzophrenic@...> wrote:
                > On Monday, April 1, 2013 11:43:38 AM UTC-5, Bram Moolenaar wrote:
                > >
                > > I wonder how many users actually run into files where only some lines
                > >
                > > end in a CR. I would consider such a file broken, and first thing would
                > >
                > > be to strip them all off.
                > >
                >
                > It's quite frequent where I work. I even have an autocmd that reloads the file in DOS format if it detects mixed line endings. It sets "nomodified" but doesn't save, so if I don't make any further changes, the file on-disk remains unchanged.

                But it is a kind of a half-solution. When you modify a single line in
                the file and write it, you end up with a number of changed lines. How do
                you cope with that? In this case I can't simply check in the file
                – I have to undo the line endings modifications which is quite a tedious
                and annoying task.
                I'll be extremely happy to get a hint how I should go about this
                problem.

                > The problem is that many other editors, including Visual Studio and
                > UltraEdit, may read in Unix file format correctly, but depending on
                > how they are configured, will insert Windows line endings on any *new*
                > lines. UltraEdit will even preserve line ending style of any
                > copy-pasted text. That *sounds* like a feature but in reality it is
                > incredibly annoying.

                I beg to differ. In my opinion the real problem is that Vim refuses to
                cooperate. The compilers can deal with it, ctags and cscope can deal
                with it, all other editors can deal with it[1]. Vim can't and now that I'm
                trying to make Vim more friendly, I hear Vim is fine and I should fix
                the files. Vim may be my favourite editor but – sorry – in this
                particular case it is inferior to everything out there.

                > Do I understand correctly, that I won't see the problem, because I already force-reload the file to remove the mixed line endings?

                Yes, you are not going to see any improvements since you are already
                handling this one problem.

                --
                Lech Lorens

                [1] I checked GNU Emacs: I built a tags file and then added a few
                carriage returns at the ends of the lines. It still worked.

                --
                --
                You received this message from the "vim_dev" maillist.
                Do not top-post! Type your reply below the text you are replying to.
                For more information, visit http://www.vim.org/maillist.php

                ---
                You received this message because you are subscribed to the Google Groups "vim_dev" group.
                To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
                For more options, visit https://groups.google.com/groups/opt_out.
              • Benjamin Fritz
                ... Agreed. And I didn t create the autocmd for this particular problem, I created it because I was tired of seeing ^M characters in my file. My first attempt
                Message 7 of 16 , Apr 3, 2013
                • 0 Attachment
                  On Wed, Apr 3, 2013 at 5:22 AM, Lech Lorens <lech.lorens@...> wrote:
                  >
                  > On 02-Apr-2013 Ben Fritz <fritzophrenic@...> wrote:
                  > > On Monday, April 1, 2013 11:43:38 AM UTC-5, Bram Moolenaar wrote:
                  > > >
                  > > > I wonder how many users actually run into files where only some lines
                  > > >
                  > > > end in a CR. I would consider such a file broken, and first thing would
                  > > >
                  > > > be to strip them all off.
                  > > >
                  > >
                  > > It's quite frequent where I work. I even have an autocmd that reloads the
                  > > file in DOS format if it detects mixed line endings. It sets "nomodified"
                  > > but doesn't save, so if I don't make any further changes, the file on-disk
                  > > remains unchanged.
                  >
                  > But it is a kind of a half-solution.

                  Agreed. And I didn't create the autocmd for this particular problem, I created
                  it because I was tired of seeing ^M characters in my file. My first attempt
                  was to use syntax highlighting to just hide them but special character
                  highlighting prevented that from working.

                  > When you modify a single line in
                  > the file and write it, you end up with a number of changed lines.

                  Yes, but I think I made my autocmd smart enough to use whichever fileformat
                  will change the fewest lines.

                  > How do
                  > you cope with that? In this case I can't simply check in the file
                  > – I have to undo the line endings modifications which is quite a tedious
                  > and annoying task.

                  I do just check in the file, without undoing the line ending modifications.
                  Where I work we're only really concerned that all changes get peer reviewed.
                  Plus, a good external diff tool will be able to ignore line ending changes.
                  Maybe that doesn't work for you.

                  > I'll be extremely happy to get a hint how I should go about this
                  > problem.
                  >

                  The best solution is probably to get everybody on your team to use a
                  consistent file format. But I agree Vim should not choke because they don't.

                  > > The problem is that many other editors, including Visual Studio and
                  > > UltraEdit, may read in Unix file format correctly, but depending on
                  > > how they are configured, will insert Windows line endings on any *new*
                  > > lines. UltraEdit will even preserve line ending style of any
                  > > copy-pasted text. That *sounds* like a feature but in reality it is
                  > > incredibly annoying.
                  >
                  > I beg to differ. In my opinion the real problem is that Vim refuses to
                  > cooperate. The compilers can deal with it, ctags and cscope can deal
                  > with it, all other editors can deal with it[1]. Vim can't and now that I'm
                  > trying to make Vim more friendly, I hear Vim is fine and I should fix
                  > the files. Vim may be my favourite editor but – sorry – in this
                  > particular case it is inferior to everything out there.
                  >

                  I may not have been clear. I meant the source of the bad files was other
                  editors acting stupidly. Vim could handle things fine if those editors would
                  just save the file in a consistent format, or allow you to see that the lines
                  are not consistent, as Vim does.

                  I think Vim should accept the output of stupid editors without jumping through
                  hoops. I would actually welcome this patch or a similar one. Rather than doing
                  two searches couldn't we just always insert \r\? before the $ in the search
                  pattern? Like Bram I don't like the idea of two searches whenever a tag is not
                  found but I doubt very much that an extra single optional character will cause
                  a big performance hit. Do you really need to match any number of \r
                  characters?

                  --
                  --
                  You received this message from the "vim_dev" maillist.
                  Do not top-post! Type your reply below the text you are replying to.
                  For more information, visit http://www.vim.org/maillist.php

                  ---
                  You received this message because you are subscribed to the Google Groups "vim_dev" group.
                  To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
                  For more options, visit https://groups.google.com/groups/opt_out.
                • Lech Lorens
                  ... But this makes discovering who the author of some code is (svn blame, git blame, etc.) much harder. Which means it will not be allowed. I would really love
                  Message 8 of 16 , Apr 5, 2013
                  • 0 Attachment
                    On 03-Apr-2013 Benjamin Fritz <fritzophrenic@...> wrote:
                    > On Wed, Apr 3, 2013 at 5:22 AM, Lech Lorens <lech.lorens@...> wrote:


                    > > How do
                    > > you cope with that? In this case I can't simply check in the file
                    > > – I have to undo the line endings modifications which is quite a tedious
                    > > and annoying task.
                    >
                    > I do just check in the file, without undoing the line ending modifications.
                    > Where I work we're only really concerned that all changes get peer reviewed.
                    > Plus, a good external diff tool will be able to ignore line ending changes.
                    > Maybe that doesn't work for you.

                    But this makes discovering who the author of some code is (svn blame,
                    git blame, etc.) much harder. Which means it will not be allowed. I would
                    really love to go this path but I can't and the decision is not up to
                    me.

                    > > I'll be extremely happy to get a hint how I should go about this
                    > > problem.
                    > >
                    >
                    > The best solution is probably to get everybody on your team to use a
                    > consistent file format. But I agree Vim should not choke because they don't.

                    From my experience – having an agreement without automatically enforcing
                    it doesn't work. At my last work place we did have an agreement and
                    there always happened a strange editor that would sometimes ignore the
                    settings and screw things up.

                    > I think Vim should accept the output of stupid editors without jumping through
                    > hoops. I would actually welcome this patch or a similar one. Rather than doing
                    > two searches couldn't we just always insert \r\? before the $ in the search
                    > pattern? Like Bram I don't like the idea of two searches whenever a tag is not
                    > found but I doubt very much that an extra single optional character will cause
                    > a big performance hit. Do you really need to match any number of \r
                    > characters?

                    I would be afraid that there's too much potential here for breaking
                    things. The tag search pattern can be an arbitrarily complicated regular
                    expression. This means that in order not to break it I would have to
                    half-intelligently interpret it. I wouldn't like to do it for two
                    reasons:
                    1. not doing it right and breaking things for some people,
                    2. introducing some more than basic RE-related logic outside RE code.
                    Vim is already difficult to maintain because of all these global
                    variables. I don't want to make it any harder.

                    Thanks for this discussion. I'm afraid it leads nowhere as Bram already
                    decided the target audience is too small to bother.

                    --
                    Lech Lorens

                    --
                    --
                    You received this message from the "vim_dev" maillist.
                    Do not top-post! Type your reply below the text you are replying to.
                    For more information, visit http://www.vim.org/maillist.php

                    ---
                    You received this message because you are subscribed to the Google Groups "vim_dev" group.
                    To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
                    For more options, visit https://groups.google.com/groups/opt_out.
                  • Sung Pae
                    ... git blame has an ignore whitespace option: -w hg blame has the same option: -w svn blame apparently has the equivalent: -x -w guns -- -- You received
                    Message 9 of 16 , Apr 5, 2013
                    • 0 Attachment
                      On Fri, Apr 05, 2013 at 10:45:28AM +0200, Lech Lorens wrote:

                      > On 03-Apr-2013 Benjamin Fritz <fritzophrenic@...> wrote:
                      >
                      > > I do just check in the file, without undoing the line ending
                      > > modifications. Where I work we're only really concerned that all
                      > > changes get peer reviewed. Plus, a good external diff tool will be
                      > > able to ignore line ending changes. Maybe that doesn't work for you.
                      >
                      > But this makes discovering who the author of some code is (svn blame,
                      > git blame, etc.) much harder.

                      git blame has an "ignore whitespace" option: -w

                      hg blame has the same option: -w

                      svn blame apparently has the equivalent: -x -w

                      guns

                      --
                      --
                      You received this message from the "vim_dev" maillist.
                      Do not top-post! Type your reply below the text you are replying to.
                      For more information, visit http://www.vim.org/maillist.php

                      ---
                      You received this message because you are subscribed to the Google Groups "vim_dev" group.
                      To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
                      For more options, visit https://groups.google.com/groups/opt_out.
                    • Ben Fritz
                      ... Well, svn blame at least has an ignore whitespace option. I don t know about other systems, or whether this is TortoiseSVN magic instead of a built-in
                      Message 10 of 16 , Apr 5, 2013
                      • 0 Attachment
                        On Friday, April 5, 2013 3:45:28 AM UTC-5, Lech Lorens wrote:
                        > On 03-Apr-2013 Benjamin Fritz <fritzophrenic@...> wrote:
                        >
                        > > On Wed, Apr 3, 2013 at 5:22 AM, Lech Lorens <lech.lorens@...> wrote:
                        >
                        >
                        >
                        >
                        >
                        > > > How do
                        >
                        > > > you cope with that? In this case I can't simply check in the file
                        >
                        > > > – I have to undo the line endings modifications which is quite a tedious
                        >
                        > > > and annoying task.
                        >
                        > >
                        >
                        > > I do just check in the file, without undoing the line ending modifications.
                        >
                        > > Where I work we're only really concerned that all changes get peer reviewed.
                        >
                        > > Plus, a good external diff tool will be able to ignore line ending changes.
                        >
                        > > Maybe that doesn't work for you.
                        >
                        >
                        >
                        > But this makes discovering who the author of some code is (svn blame,
                        >
                        > git blame, etc.) much harder. Which means it will not be allowed. I would
                        >
                        > really love to go this path but I can't and the decision is not up to
                        >
                        > me.
                        >
                        >

                        Well, svn blame at least has an "ignore whitespace" option. I don't know about other systems, or whether this is TortoiseSVN magic instead of a built-in thing.

                        --
                        --
                        You received this message from the "vim_dev" maillist.
                        Do not top-post! Type your reply below the text you are replying to.
                        For more information, visit http://www.vim.org/maillist.php

                        ---
                        You received this message because you are subscribed to the Google Groups "vim_dev" group.
                        To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
                        For more options, visit https://groups.google.com/groups/opt_out.
                      Your message has been successfully submitted and would be delivered to recipients shortly.