|
|
Up |
|
|
  |
Author: JohnJohn Date: Jul 23, 2007 07:47
On Jul 21, 9:41 am, Erik Wikstr
|
| |
|
| | 8 Comments |
|
  |
Author: Roman.PerepelitsaRoman.Perepelitsa Date: Jul 24, 2007 02:28
> There is a set::insert where you can pass in a hint iterator, I wish
> there was a set::find version that took hint(s). Basically I have
> have two sets - one big set and one fairly small set. I am trying to
> do a set intersection. Both are sets of strings. The small set
> contains strings all with the same prefix. Eg. all have the prefix
> "foo" for instance. It would be great if I could pass a hint to find
> or even better if I could pass an upper and lower bounds - so find
> could possibly take advantage of the information that I have passed
> in.
You can try 2 optimizations.
1. Find intersection for your small set and subset of large set.
Subset is found with the help of lower_bound and upper_bound.
Unfortunately, you have to create temporary set in this case.
2. Use alternative intersection algorithm with complexity
O(log(size_of_set1) * size_of_set2).
Note that if you use sorted vector instead of set, then you can
use first approach (with subset) without the need for temporary
set.
Here is some code.
|
| Show full article (2.65Kb) |
|
| | no comments |
|
  |
Author: JohnJohn Date: Jul 24, 2007 11:37
On Jul 24, 6:14 am, "Roman.Perepeli...@ gmail.com"
gmail.com> wrote:
>> There is a set::insert where you can pass in a hint iterator, I wish
>> there was a set::find version that took hint(s). Basically I have
>> have two sets - one big set and one fairly small set. I am trying to
>> do a set intersection. Both are sets of strings. The small set
>> contains strings all with the same prefix. Eg. all have the prefix
>> "foo" for instance. It would be great if I could pass a hint to find
>> or even better if I could pass an upper and lower bounds - so find
>> could possibly take advantage of the information that I have passed
>> in.
>
> You can try 2 optimizations.
>
> 1. Find intersection for your small set and subset of large set.
> Subset is found with the help of lower_bound and upper_bound.
> Unfortunately, you have to create temporary set in this case.
> 2. Use alternative intersection algorithm with complexity
> O(log(size_of_set1) * size_of_set2).
> ...
|
| Show full article (3.33Kb) |
| no comments |
|
  |
Author: Roman.PerepelitsaRoman.Perepelitsa Date: Jul 25, 2007 10:47
> The above looks great - except I have one question here..
> The creation of:
>
> Set sub(
> big.lower_bound(*small.begin()),
> big.upper_bound(*small.rbegin()));
>
> This could be somewhat expensive.
>
> If C++ had idea of subsets of a set that did not have to copy
> to create the new set this would be perfect.
You are right, this copy can be expensive. But if 'small' contains
only a few objects, and sub contains only a few objects, then it
can be faster than other solutions.
|
| Show full article (2.80Kb) |
| no comments |
|
  |
Author: Carl BarronCarl Barron Date: Jul 25, 2007 16:57
> // complexity is O(log(e1 - b1) * (e2 - b2))
> template
> void log_intersection(RA1 b1, RA1 e1,
> RA2 b2, RA2 e2, Out out)
> {
> for (; b2 != e2; ++b2)
> if (binary_search(b1, e1, *b2))
> *out++ = *b2;
> }
what makes you think that this is faster std::set_intersection?
its slower!! set_intersection be faster since log2(N1) > 1.
|
| |
| no comments |
|
  |
Author: Roman.PerepelitsaRoman.Perepelitsa Date: Jul 26, 2007 07:17
>> // complexity is O(log(e1 - b1) * (e2 - b2))
>> template
>> void log_intersection(RA1 b1, RA1 e1,
>> RA2 b2, RA2 e2, Out out)
>> {
>> for (; b2 != e2; ++b2)
>> if (binary_search(b1, e1, *b2))
>> *out++ = *b2;
>> }
>
> what makes you think that this is faster std::set_intersection?
> its slower!! set_intersection be faster since log2(N1) > 1.
std::set_intersection complexity is O(N1 + N2). Complexity of
log_intersection is log(N1) * N2. If N2 < N1 / (log(N1) - 1) then
log_intersection is faster. It is the case when N2 is very small,
and N1 is very large.
|
| Show full article (0.94Kb) |
| no comments |
|
  |
Author: qjohnny2000qjohnny2000 Date: Jul 26, 2007 13:37
On Jul 25, 2:39 pm, "Roman.Perepeli...@ gmail.com"
gmail.com> wrote:
>> The above looks great - except I have one question here..
>> The creation of:
>
>> Set sub(
>> big.lower_bound(*small.begin()),
>> big.upper_bound(*small.rbegin()));
>
>> This could be somewhat expensive.
>
>> If C++ had idea of subsets of a set that did not have to copy
>> to create the new set this would be perfect.
>
> You are right, this copy can be expensive. But if 'small' contains
> only a few objects, and sub contains only a few objects, then it
> can be faster than other solutions.
>
> std::set is odd thing. Range [s.begin(), s.end()) is always sorted,
> but you can't use effective std::lower_bound or std::binary_search ...
|
| Show full article (3.56Kb) |
| no comments |
|
  |
|
|
  |
|
|
  |
Author: Francis GlassborowFrancis Glassborow Date: Jul 28, 2007 10:07
>>> // complexity is O(log(e1 - b1) * (e2 - b2))
>>> template
>>> void log_intersection(RA1 b1, RA1 e1,
>>> RA2 b2, RA2 e2, Out out)
>>> {
>>> for (; b2 != e2; ++b2)
>>> if (binary_search(b1, e1, *b2))
>>> *out++ = *b2;
>>> }
>> what makes you think that this is faster std::set_intersection?
>> its slower!! set_intersection be faster since log2(N1) > 1.
>
> std::set_intersection complexity is O(N1 + N2). Complexity of
> log_intersection is log(N1) * N2. If N2 < N1 / (log(N1) - 1) then
> log_intersection is faster. It is the case when N2 is very small,
> and N1 is very large. ...
|
| Show full article (1.48Kb) |
| no comments |
|
|
|
|