Partial sorting
  Home FAQ Contact Sign in
comp.lang.forth only
 
Advanced search
POPULAR GROUPS

more...

comp.lang.forth Profile…
 Up
Partial sorting         


Author: Brad Eckert
Date: Sep 3, 2008 07:05

Hi all,

What would be a good algorithm for finding the top 256 of a million
items?

I suppose the "top 256" list would remain sorted and any candidates
greater than the minimum would be inserted into the list. Would a
linear array be the best representation for this list?

-Brad
7 Comments
Re: Partial sorting         


Author: Thomas Pornin
Date: Sep 3, 2008 09:08

According to Brad Eckert tinyboot.com>:
> I suppose the "top 256" list would remain sorted and any candidates
> greater than the minimum would be inserted into the list. Would a
> linear array be the best representation for this list?

It sure is simple enough. If you have n elements and want to keep the
top t, you should have something like O(t*log n) insertions. I am mostly
making this up, but the bottom line is that insertions shall be
sufficiently rare so that the cost of an insertion, even if it involves
moving O(t) elements, will be negligible next to the O(n) comparisons.
Show full article (1.38Kb)
no comments
Re: Partial sorting         


Author: Icarus Sparry
Date: Sep 3, 2008 10:20

On Wed, 03 Sep 2008 07:05:21 -0700, Brad Eckert wrote:
> Hi all,
>
> What would be a good algorithm for finding the top 256 of a million
> items?
>
> I suppose the "top 256" list would remain sorted and any candidates
> greater than the minimum would be inserted into the list. Would a linear
> array be the best representation for this list?
>
> -Brad

Tony Hoare has a good algorithm, which is problem 9 in chapter 11 of Jon
Bently's Programming Pearls 2nd edition. It does require that you can
permute the elements, but it has expected O(n) running time.

It is a varient of quicksort.
Show full article (1.15Kb)
no comments
Re: Partial sorting         


Author: Marcel Hendrix
Date: Sep 3, 2008 10:23

Brad Eckert tinyboot.com> writes Re: Partial sorting
> What would be a good algorithm for finding the top 256 of a million items?
> I suppose the "top 256" list would remain sorted and any candidates
> greater than the minimum would be inserted into the list. Would a
> linear array be the best representation for this list?

I don't think the top 256 need to be sorted, as a just-in-time linear
search for the minimum might be more efficient. A CS type should be
able to spew Forth the relevant O(n) numbers right away.

-marcel
no comments
Re: Partial sorting         


Author: Jonah Thomas
Date: Sep 3, 2008 12:06

mhx@iae.nl (Marcel Hendrix) wrote:
> Brad Eckert tinyboot.com> writes
>
>> What would be a good algorithm for finding the top 256 of a million
>> items?
>
>> I suppose the "top 256" list would remain sorted and any candidates
>> greater than the minimum would be inserted into the list. Would a
>> linear array be the best representation for this list?
>
> I don't think the top 256 need to be sorted, as a just-in-time linear
> search for the minimum might be more efficient. A CS type should be
> able to spew Forth the relevant O(n) numbers right away.

Yes. By the time you've checked the first 25000 records then the ones
you have will be around the top 1%%. From then on you'll be replacing
items less than 1%% of the time. So if you can do something to speed up
the case when you compare-and-discard it will probably have more effect
than speeding up the compare-and-replace.
2 Comments
Re: Partial sorting         


Author: Marcel Hendrix
Date: Sep 3, 2008 13:04

Jonah Thomas gmail.com> writes Re: Partial sorting
> mhx@iae.nl (Marcel Hendrix) wrote:
>> Brad Eckert tinyboot.com> writes
>>> What would be a good algorithm for finding the top 256 of a million
>>> items?
>>> I suppose the "top 256" list would remain sorted and any candidates
>>> greater than the minimum would be inserted into the list. Would a
>>> linear array be the best representation for this list?
>> I don't think the top 256 need to be sorted, as a just-in-time linear
>> search for the minimum might be more efficient. A CS type should be
>> able to spew Forth the relevant O(n) numbers right away.
> Yes. By the time you've checked the first 25000 records then the ones
> you have will be around the top 1%%. From then on you'll be replacing
> items less than 1%% of the time. So if you can do something to speed up
> the case when you compare-and-discard it will probably have more effect
> than speeding up the compare-and-replace.

Let's see. Both schemes (a = keep sorted, b = search) can cache the
current minimum, so searching candidates is equally fast.
Show full article (1.75Kb)
no comments
Re: Partial sorting         


Author: Albert van der Horst
Date: Sep 3, 2008 12:34

In article <0e738bad-3556-447e-821f-56e9b4bf7ad2@59g2000hsb.googlegroups.com>,
Brad Eckert tinyboot.com> wrote:
>Hi all,
>
>What would be a good algorithm for finding the top 256 of a million
>items?
>
>I suppose the "top 256" list would remain sorted and any candidates
>greater than the minimum would be inserted into the list. Would a
>linear array be the best representation for this list?

A heap would be better, allowing for 8 instead of 128 moves.
But your algorithm would be order of a million anyway, and
the "top 256" list would affect only a small factor.
Most of the time you would be skipping numbers that would
not beat the worst of the "top 256" list anyway.

If you have a good sorting algorithm available to just sort
the million items, you may discover that it is hard to beat
it, even if it does more than necessary.
Show full article (1.06Kb)
no comments
Re: Partial sorting         


Author: Jonah Thomas
Date: Sep 3, 2008 16:09

mhx@iae.nl (Marcel Hendrix) wrote:
> Let's see. Both schemes (a = keep sorted, b = search) can cache the
> current minimum, so searching candidates is equally fast.

Yes, and if you can find any way to speed up that part it can
potentially make a big difference. It happens about a million times and
the rest happens much fewer.
> If scheme a keeps a sorted array, it will need 4 compares and about
> 128 memory moves to insert a new element.

Are you figuring a maximum of 8 compares and so 4 compares on average?
But half the time it takes 8 compares, 1/4 the time it takes 7, 1/8 the
time it takes 6, etc.
> If it uses a linked list,
> there would be no moves, but 128 compares and maybe 3 writes (are
> there algorithms to do a binary search in a linked list?).

You can build a binary tree with linked lists but then it isn't just a
linked list -- you have to rebuild the tree.

3 writes? Because you have the list ordered. That looks like a good
trade, 2 extra writes to avoid on average 128 compares.
> I guess the linked list will be faster than an array.
Show full article (3.13Kb)
no comments

RELATED THREADS
SubjectArticles qty Group
Re: Sorting order with sort()macromedia.labs.spry_framework_for_ajax_prerelease ·