> Example: The library is doing a
"retrospective conversion" of its card
> catalog to an OPAC ("Online Public Access Catalog"). The cards are
> mechanically scanned. They are nominally in order, however, due to being
> handled by humans, a lot, there "might be" some errors in sequence.
> "Might" - yeah, right. Which sort is "better"?
"Shell-Metzner" or
> "shaker"?
On Sun, 12 Apr 2009, Roy J. Tellason wrote:
Shell-Metzner I've at least heard of, but not
that other one. I would
suppose that this stuff is covered in some texts (that I don't happen to
have), but do you know offhand if it's online anyplace? I'm still working
on getting a better understanding of all of this stuff...
Good. I'd hate to meet anybody who thinks that they already understand
ALL of it!
Try page 110 of Knuth's "Sorting and Searching" (Volume 3 of "Art of
Programming" ISBN 0-201-03803-X)
Rather oversimplified:
Take a plain bubble sort; add a conditional test to end early if NO
changes were made during a pass. Modify the end points of the passes, so
that it doesn't bother going back through territory that has already
achieved final order. So far, that is just optimizing the bubble sort
algorithm. Now modify it so that it alternates the passes between going
left to right and going right to left. THAT is a "Shaker Sort". It is
just one variety of "Bubble Sort", and for worst case scenario (and almost
as bad with truly random data), it could end up needing N-1 passes, with
an average of N/2 (approx) comparisons in each pass, which is TERRIBLE
performance. BUT, . . . if the data happens to already be in order, it
will require ONE pass with N-1 comparisons, V Shell-Metzner still going
through the same amount of work as for random data. Shaker will require
a maximum of one more pass than the number of elements that are out of
order. That ability to take advantage of existing order can be rather
important in the real world.
In addition to data that is partially in order, many managers who ought to
know better, instead of merging data sets will append a small amount of
new data to the end of a sorted data set and then "sort" to put it back
into order. For a basic Bubble sort, that provides WORST CASE, with N-1
passes of ~N/2 comparisons per pass. Swapping endpoints so that the
passes go right to left instead of left to right, immediately reduces it
from N-1 passes to the maximum nuber of passes being
the count of new data
elements added.
In an analogous situation, a compression algorithm that takes into account
the type of data being compressed might outperform an algorithm that is
theoretically significantly "better" for "random" data.
--
Grumpy Ol' Fred cisin at
xenosoft.com