Loading ...
Sorry, an error occurred while loading the content.

Re: Patch: Optimise ascii char classificaton

Expand Messages
  • Ben Fritz
    ... It s not just two comparisons, it s two comparisons plus an and , so there s some extra branch logic for the two-comparison method. I don t know whether
    Message 1 of 12 , Jun 12, 2013
    • 0 Attachment
      On Wednesday, June 12, 2013 1:08:22 PM UTC-5, LCD 47 wrote:
      > > You can do it in one comparison with this trick:
      > >
      > > if ((unsigned)c - '0' < 10)
      >
      > Actually, the second version is a bit slower than the first. :) A
      > comparison is exactly as fast as a subtraction of the same length, and
      > (unsigned)c - '0' needs to store the result in memory.
      >

      It's not just two comparisons, it's two comparisons plus an "and", so there's some extra branch logic for the two-comparison method. I don't know whether the extra branch is more expensive than the memory store using the comparison plus a subtraction method. You'd probably need profiling on a big test case to figure that out. Either way we're talking one or two cycles at most. Eliminating the isalpha() etc. calls was the biggest save here, and avoiding a big lookup table loaded in memory.

      --
      --
      You received this message from the "vim_dev" maillist.
      Do not top-post! Type your reply below the text you are replying to.
      For more information, visit http://www.vim.org/maillist.php

      ---
      You received this message because you are subscribed to the Google Groups "vim_dev" group.
      To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
      For more options, visit https://groups.google.com/groups/opt_out.
    • Dominique PellĂ©
      ... It depends on the machine of course, but often branching is more expensive because the processor can mispredict the outcome, causing it to discard in
      Message 2 of 12 , Jun 12, 2013
      • 0 Attachment
        LCD 47 <lcd047@...> wrote:

        > On 12 June 2013, Dominique Pellé <dominique.pelle@...> wrote:
        > [...]
        >> You can half the number of comparisons when doing range checking.
        >> Instead of doing 2 comparisons...
        >>
        >> if (c >= '0' && c <= '9')
        >>
        >> You can do it in one comparison with this trick:
        >>
        >> if ((unsigned)c - '0' < 10)
        > [...]
        >
        > Actually, the second version is a bit slower than the first. :) A
        > comparison is exactly as fast as a subtraction of the same length, and
        > (unsigned)c - '0' needs to store the result in memory.
        >
        > /lcd

        It depends on the machine of course, but often branching
        is more expensive because the processor can mispredict the
        outcome, causing it to discard in pipeline instruction(s) already
        partially processed:

        http://en.wikipedia.org/wiki/Branch_misprediction
        http://en.wikipedia.org/wiki/Instruction_pipeline#Branches

        That's why there are people who find clever branchless
        algorithms for all kind of things (min, max, abs...).
        The branchless abs() algorithm is even apparently
        patented (sigh):

        http://www.strchr.com/optimized_abs_function

        Having said that, I have compiled this function...

        int is_digit(int d)
        {
        #ifdef OPTIMIZE
        return ((unsigned)d - '0') <= 9;
        #else
        return d >= '0' && d <= '9';
        #endif
        }

        With gcc -O0 on x86_64, the code produced is shorter
        and faster with -DOPTIMIZE as I was hoping for.

        -O0 without -DOPTIMIZE (two comparisons):

        000000000040051c <is_digit>:
        40051c: 55 push %rbp
        40051d: 48 89 e5 mov %rsp,%rbp
        400520: 89 7d fc mov %edi,-0x4(%rbp)
        400523: 83 7d fc 2f cmpl $0x2f,-0x4(%rbp)
        400527: 7e 0d jle 400536 <is_digit+0x1a>
        400529: 83 7d fc 39 cmpl $0x39,-0x4(%rbp)
        40052d: 7f 07 jg 400536 <is_digit+0x1a>
        40052f: b8 01 00 00 00 mov $0x1,%eax
        400534: eb 05 jmp 40053b <is_digit+0x1f>
        400536: b8 00 00 00 00 mov $0x0,%eax
        40053b: 5d pop %rbp
        40053c: c3 retq

        -O0 -DOPTIMIZE (one comparison):

        000000000040051c <is_digit>:
        40051c: 55 push %rbp
        40051d: 48 89 e5 mov %rsp,%rbp
        400520: 89 7d fc mov %edi,-0x4(%rbp)
        400523: 8b 45 fc mov -0x4(%rbp),%eax
        400526: 83 e8 30 sub $0x30,%eax
        400529: 83 f8 09 cmp $0x9,%eax
        40052c: 0f 96 c0 setbe %al
        40052f: 0f b6 c0 movzbl %al,%eax
        400532: 5d pop %rbp
        400533: c3 retq

        But with gcc -O1, -O2, or -O3, the generated code is
        identical anyway whether OPTIMIZE is defined or not.
        For example gcc -O3 produces this code with/without
        -DOPTIMIZE:

        00000000004005f0 <is_digit>:
        4005f0: 83 ef 30 sub $0x30,%edi
        4005f3: 31 c0 xor %eax,%eax
        4005f5: 83 ff 09 cmp $0x9,%edi
        4005f8: 0f 96 c0 setbe %al
        4005fb: c3 retq
        4005fc: 0f 1f 40 00 nopl 0x0(%rax)

        Dominique

        --
        --
        You received this message from the "vim_dev" maillist.
        Do not top-post! Type your reply below the text you are replying to.
        For more information, visit http://www.vim.org/maillist.php

        ---
        You received this message because you are subscribed to the Google Groups "vim_dev" group.
        To unsubscribe from this group and stop receiving emails from it, send an email to vim_dev+unsubscribe@....
        For more options, visit https://groups.google.com/groups/opt_out.
      Your message has been successfully submitted and would be delivered to recipients shortly.