John Wilson <wilson(a)dbit.dbit.com> wrote:
On Wed, Mar 01, 2000 at 11:21:35AM -0800, Dwight Elvey
wrote:
Actually, the algorithm I used is a kn+c unlike
QuickSort's k(n*logn) ( c is a constant time overhead ).
OK, I'll bite -- Radix Sort?
John Wilson
D Bit
Hi John
Radixed sorts are just normal sorts that you break the
problem into a number of smaller task. Normally if you
break the original problem into something like the
square root or some other root of the problem, it
is more efficient. In the example of 16 bit integers,
breaking the problem into two sorts of the 8 bit fields
is quite fast. In this case it would be called a
256 Radixed sort. Not all sorts gain speed up by
using radixing. You need a sort that doesn't destroy
the partial order created by the first pass sort.
The trick with both the basket and the distribution
sort is that you don't make any comparisons, you just
put the items into the correct location. You don't need
to compare to any other items ( all 5's go in the 5's
slot, regardless of how many 4's or other 5's or 6's ).
The basket sort on a computer usually requires creating
a number of linked list to build the results in. The
distribution sort requires one to build a tally of the
number of each item type in each basket location. This
has the advantage that you know ahead of time where
to put the item being sorted. It does have the overhead
that you need to make a pass through the data to find
the tallies.
Both of these methods require building data structures
that are used for book keeping by the sort algorithm.
This is where the over head comes in. Creating these
separate data structures takes time and uses a reasonable
amount of space. This was a problem when computers
were considered large with 16K words of memory. Now
days, I have a machine infront of me with 1GByte and
I only use a small fraction of it. Still, this part
in time is usually constant and the time cost for each
item is small. Since both of these types of sorts have
nothing that does anything different ( no conditional flow ),
they take a fixed amount of time to handle each data item.
This is even better on the newer piped machines because
they hate conditional branches.
This means that for large enough data sets, k1*n+c
will be smaller than k2*n*log(n), when n is big enough
that k1+c/n < k2*log(n). As you can see, for large
n, c/n gets quite small. So we can say k1 < k2*log(n)
is a rough guide for how large a data set will be best
sorted.
If enough request, I can post code showing both types
of sorts.
Dwight