- [ Note: Unless making fast sort algorithm implementations in pure Vim

is your idea of fun, this message will probably bore you to tears. You

have been warned. ]

A few days ago, while trying to speed up the various quick sort

implementation available for Vim, i unexpectedly found another algorithm

that completely blows quick sort away on all fronts. I thought this was

cool enough to warrant posting here.

1. Quick description:

The algorithm (which you can find at the end of this mail) is a binary

insertion sort[1] implementation, with a Vim-specific optimization:

instead of doing a whole bunch of linear swaps to move the current line

to the insertion point, it moves the line in one step.

[1] ...which is like an insertion sort, except that a binary search is used

to find the insertion point

2. Detailed description:

In general, BIS minimized the amount of compares need to sort a list,

with the trade-off being the large amount of swaps needed to move

elements into place. In Vim, however, it's practically just as fast to

delete or insert lines as it is to swap them, so you eliminate this

disadvantage by replacing the O(n/4) (if my math isn't off) swaps needed

per element, on average, with one delete/insert operation, which is a

huge improvement, especially as the input size gets larger.

Add the fact that BIS is non-recursive, which avoids Vim's function

calling overhead (and has the added bonus of side-stepping stack

overflows), and you have an algorithm that not only catches up to

pure-Vim quick sorts, but leaves them way behind in the dust.

3. Benchmarks:

First, a description of the various implementations i compared:

- QSortGenUtils: Hari Krishna Dara's quick sort implementation from

genutils.vim, unmodified. Included only for interest's sake; it's not

really a fair comparison, as it uses abstracted accessors/mutators

which significantly slows it down.

- QSortWebb: Slightly edited version of Robert Webb's quick sort

implementation from :h eval.txt (mostly just some renamed variables,

and some minor optimizations such caching the comparison statement).

For some reason, Webb's original script overflows the stack input as

large as the test data i'm using, while this version doesn't. I

haven't managed to isolate what i seem to have accidentally improved,

yet, so i can't benchmark Webb's original implementation too,

unfortunately.

- QSort: Based on Hari Krishna Dara's algorithm above, but with all the

functionality that isn't needed to sort buffer text removed, and the

same minor optimizations as above applied. I used this as a basis for

experimenting with different pivot selection algorithms (pseudo-random,

median-of-3, Bentley-McKilroy 3-way partitioning) and other variations

on the quick sort theme, such as switching to other algorithms when

the partition size gets small, but these all turned out to be slower

than the plain quick sort (probably because the interpretation

overhead actually exceeds the time savings gained by the extra code).

It was while trying out various simpler algorithms for quick sort to

switch over to, however, that i found...

- BISort: The Binary Insertion Sort described at the top.

- QSort2/BISort2: These are the same as above two functions, but edited

to directly compare lines of text instead of going through a a

comparison function, for maximum speed. These should give a more

accurate representation of each algorithm's relative performance.

Test data used: the archives of a (completely randomly selected)

mailing list, concatenated into one file, weighing in at 41,726 lines

(1.65 MB) of text. I think this should provide good spread of different

input data characteristics.

Each sort was run over a copy of the file three times, and the run time

recorded. The results are listed as <total runtime>: <individual runs>

(both in seconds), with a nice graph at the end (everyone loves graphs):

QSortGenUtils 522: 174 174 174 *************************************

QSortWebb 378: 126 126 126 ***************************

QSort 372: 124 124 124 **************************

QSort2 267: 89 89 89 *******************

BISort 108: 36 36 36 *******

BISort2 70: 24 23 23 *****

As expected, the QSortGenUtils is slowest, but its simplified version

(QSort) manages to be slightly faster than QSortWebb (possibly due to

the slightly different underlying algorithms?). It pales in comparison

to BISort, though, which completes all three runs in less time than a

single QSort run.

There's still the question of how well BISort and and QSort scale

against each other, so i ran single tests (since there doesn't seem to

be significant variation between runs) on larger input sizes too:

Input Size QSort2 BISort2 Ratio

41,726 (from above) 89 23.3+ 3.81 : 1

62,389 (2.32 MB) 189 38 4.97 : 1

104,115 (3.97 MB) 536 72 7.44 : 1

IOW, QSort2 seems to degrade very fast indeed as the input size

increases, compared to BISort2.

4. Reality check

sort(1) sorts the 104,115 line file above in ~1.5 seconds, compared to

BISort2's 72. Nasty.

5. Code

Anyway, enough noise, here's the algorithm. If anyone else is

interested, i've attached a file containing the rest of the code i used

in the benchmarks above.

function BISort(start, end, cmp)

let compare_ival_mid = 'let diff = '.a:cmp.'(i_val, getline(mid))'

let i = a:start + 1

while i <= a:end

" find insertion point via binary search

let i_val = getline(i)

let lo = a:start

let hi = i

while lo < hi

let mid = (lo + hi) / 2

exec compare_ival_mid

if diff < 0

let hi = mid

else

let lo = mid + 1

if diff == 0 | break | endif

endif

endwhile

" do insert

if lo < i

exec i.'d_'

call append(lo - 1, i_val)

endif

let i = i + 1

endwhile

endfunction

function BISort2(start, end)

let i = a:start + 1

while i <= a:end

" find insertion point via binary search

let i_val = getline(i)

let lo = a:start

let hi = i

while lo < hi

let mid = (lo + hi) / 2

let mid_val = getline(mid)

if i_val < mid_val

let hi = mid

else

let lo = mid + 1

if i_val == mid_val | break | endif

endif

endwhile

" do insert

if lo < i

exec i.'d_'

call append(lo - 1, i_val)

endif

let i = i + 1

endwhile

endfunction

--

Piet Delport

Today's subliminal thought is: - Thus saith Piet Delport:
> [1] ...which is like an insertion sort, except that a binary search is used

---------------------------------------------------------------------

> to find the insertion point

Hello!

In thinking about coding a C version, I believe this new sort of yours

most strongly resembles the use of a red-black tree. The problem with

coding this in C/C++ is implementing the search and insertion capability

while retaining lg(n) behavior. (for n insertions, the binary-search

sort has o(n lg(n)) operations, like quicksort and other decent sorting

algorithms). For example, insertion can be done quickly and easily with

doubly-linkd lists but then the binary search can't be done.

A red-black tree (nearly) retains a balanced tree while undergoing

insertions (and deletions, for that matter). Thus with it both the

search and insertion retain lg(n) behavior. In fact, with red-black

trees one merely reads the list and dumps it into the associated

insertion procedure.

Vim hides the work needed to perform insertion while retaining the

ability to do quick binary searches by doing the data structure work

internally. Lisp, actually, would be another language which would

benefit from this sort; again, due to the behind-the-scenes list

management the insertion acts as a low-cost operation.

Still, its a good sort, especially for interpreted languages which

make the search&insertion easy.

Regards,

Chip Campbell

--

Charles E Campbell, Jr, PhD _ __ __

Goddard Space Flight Center / /_/\_\_/ /

cec@... /_/ \/_//_/

PGP public key: http://www.erols.com/astronaut/pgp.html - Apologies for responding very late to this posting, actually this got

lost in my pending mailbox and remembered only recently when there was

some reference to this email.

If Piet is still listening, I am curious how this new fast algorithm

will respond if a more generic interface is plugged over it (the way I

did in my original simple QSort). I was looking for your attachment with

the rest of the code that you mentioned below, but I think you forgot to

attach it. Is it still possible for you to send the code that you used

as a reference (ie the improved QSort) and the setup that you used to

benchmark different algorithms, if you haven't already lost it?

Thank you,

Hari

On Thu, 1 May 2003 at 11:19am, Piet Delport wrote:

> [ Note: Unless making fast sort algorithm implementations in pure Vim

> is your idea of fun, this message will probably bore you to tears. You

> have been warned. ]

>

> A few days ago, while trying to speed up the various quick sort

> implementation available for Vim, i unexpectedly found another algorithm

> that completely blows quick sort away on all fronts. I thought this was

> cool enough to warrant posting here.

>

> 1. Quick description:

>

> The algorithm (which you can find at the end of this mail) is a binary

> insertion sort[1] implementation, with a Vim-specific optimization:

> instead of doing a whole bunch of linear swaps to move the current line

> to the insertion point, it moves the line in one step.

>

> [1] ...which is like an insertion sort, except that a binary search is used

> to find the insertion point

>

> 2. Detailed description:

>

> In general, BIS minimized the amount of compares need to sort a list,

> with the trade-off being the large amount of swaps needed to move

> elements into place. In Vim, however, it's practically just as fast to

> delete or insert lines as it is to swap them, so you eliminate this

> disadvantage by replacing the O(n/4) (if my math isn't off) swaps needed

> per element, on average, with one delete/insert operation, which is a

> huge improvement, especially as the input size gets larger.

>

> Add the fact that BIS is non-recursive, which avoids Vim's function

> calling overhead (and has the added bonus of side-stepping stack

> overflows), and you have an algorithm that not only catches up to

> pure-Vim quick sorts, but leaves them way behind in the dust.

>

> 3. Benchmarks:

>

> First, a description of the various implementations i compared:

>

> - QSortGenUtils: Hari Krishna Dara's quick sort implementation from

> genutils.vim, unmodified. Included only for interest's sake; it's not

> really a fair comparison, as it uses abstracted accessors/mutators

> which significantly slows it down.

>

> - QSortWebb: Slightly edited version of Robert Webb's quick sort

> implementation from :h eval.txt (mostly just some renamed variables,

> and some minor optimizations such caching the comparison statement).

> For some reason, Webb's original script overflows the stack input as

> large as the test data i'm using, while this version doesn't. I

> haven't managed to isolate what i seem to have accidentally improved,

> yet, so i can't benchmark Webb's original implementation too,

> unfortunately.

>

> - QSort: Based on Hari Krishna Dara's algorithm above, but with all the

> functionality that isn't needed to sort buffer text removed, and the

> same minor optimizations as above applied. I used this as a basis for

> experimenting with different pivot selection algorithms (pseudo-random,

> median-of-3, Bentley-McKilroy 3-way partitioning) and other variations

> on the quick sort theme, such as switching to other algorithms when

> the partition size gets small, but these all turned out to be slower

> than the plain quick sort (probably because the interpretation

> overhead actually exceeds the time savings gained by the extra code).

> It was while trying out various simpler algorithms for quick sort to

> switch over to, however, that i found...

>

> - BISort: The Binary Insertion Sort described at the top.

>

> - QSort2/BISort2: These are the same as above two functions, but edited

> to directly compare lines of text instead of going through a a

> comparison function, for maximum speed. These should give a more

> accurate representation of each algorithm's relative performance.

>

> Test data used: the archives of a (completely randomly selected)

> mailing list, concatenated into one file, weighing in at 41,726 lines

> (1.65 MB) of text. I think this should provide good spread of different

> input data characteristics.

>

> Each sort was run over a copy of the file three times, and the run time

> recorded. The results are listed as <total runtime>: <individual runs>

> (both in seconds), with a nice graph at the end (everyone loves graphs):

>

> QSortGenUtils 522: 174 174 174 *************************************

> QSortWebb 378: 126 126 126 ***************************

> QSort 372: 124 124 124 **************************

> QSort2 267: 89 89 89 *******************

> BISort 108: 36 36 36 *******

> BISort2 70: 24 23 23 *****

>

> As expected, the QSortGenUtils is slowest, but its simplified version

> (QSort) manages to be slightly faster than QSortWebb (possibly due to

> the slightly different underlying algorithms?). It pales in comparison

> to BISort, though, which completes all three runs in less time than a

> single QSort run.

>

> There's still the question of how well BISort and and QSort scale

> against each other, so i ran single tests (since there doesn't seem to

> be significant variation between runs) on larger input sizes too:

>

> Input Size QSort2 BISort2 Ratio

> 41,726 (from above) 89 23.3+ 3.81 : 1

> 62,389 (2.32 MB) 189 38 4.97 : 1

> 104,115 (3.97 MB) 536 72 7.44 : 1

>

> IOW, QSort2 seems to degrade very fast indeed as the input size

> increases, compared to BISort2.

>

> 4. Reality check

>

> sort(1) sorts the 104,115 line file above in ~1.5 seconds, compared to

> BISort2's 72. Nasty.

>

> 5. Code

>

> Anyway, enough noise, here's the algorithm. If anyone else is

> interested, i've attached a file containing the rest of the code i used

> in the benchmarks above.

>

>

> function BISort(start, end, cmp)

> let compare_ival_mid = 'let diff = '.a:cmp.'(i_val, getline(mid))'

> let i = a:start + 1

> while i <= a:end

> " find insertion point via binary search

> let i_val = getline(i)

> let lo = a:start

> let hi = i

> while lo < hi

> let mid = (lo + hi) / 2

> exec compare_ival_mid

> if diff < 0

> let hi = mid

> else

> let lo = mid + 1

> if diff == 0 | break | endif

> endif

> endwhile

> " do insert

> if lo < i

> exec i.'d_'

> call append(lo - 1, i_val)

> endif

> let i = i + 1

> endwhile

> endfunction

>

> function BISort2(start, end)

> let i = a:start + 1

> while i <= a:end

> " find insertion point via binary search

> let i_val = getline(i)

> let lo = a:start

> let hi = i

> while lo < hi

> let mid = (lo + hi) / 2

> let mid_val = getline(mid)

> if i_val < mid_val

> let hi = mid

> else

> let lo = mid + 1

> if i_val == mid_val | break | endif

> endif

> endwhile

> " do insert

> if lo < i

> exec i.'d_'

> call append(lo - 1, i_val)

> endif

> let i = i + 1

> endwhile

> endfunction

>

>

__________________________________

Do you Yahoo!?

SBC Yahoo! DSL - Now only $29.95 per month!

http://sbc.yahoo.com