-
Notifications
You must be signed in to change notification settings - Fork 173
Some improvement ColumnMap #257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add some description and rationale behind the changes?
clickhouse/columns/map.h
Outdated
const auto make_index = [](const auto& data) { | ||
std::vector<size_t> result{data.Size()}; | ||
std::generate(result.begin(), result.end(), [i = 0] () mutable { return i++; }); | ||
std::sort(result.begin(), result.end(), [&data](size_t l, size_t r) {return data[l] < data[r];}); | ||
return result; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely clear peace of code, so could you please add some comments to it?
Another thing: this allocates (2 * N)*sizeof(size_t)
extra memory on the heap. IMO it could be more beneficial to create a sorted array (of pointers/references) from left items and then do a binary-search lookups with items from the right.
That way the algorithmic complexity stays the same O(N*logN)
, but memory requirement drops significantly: only N*sizeof(size_t)
instead of 2N*sizeof(t)
.
Anyway, it would be really nice to see some numbers on that, could you please add an extra entry to clickhouse-cpp/ut/performance_tests.cpp
?
clickhouse/columns/map.h
Outdated
using Iterator = ProjectedIterator<Functor, BaseIter>; | ||
|
||
Functor functor = [](const BaseIter& i) { | ||
return std::make_tuple(i->first, i->second); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a random idea: maybe use std::ref
/std::cref
here? Because otherwise you are still copying values, not in bulk, but one-by-one.
- return std::make_tuple(i->first, i->second);
+ return std::make_tuple(std::cref(i->first), std::cref(i->second));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not bad, but some changes are still required. Please see notes.
Closing to re-open and re-trigger CI/CD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
That changes have two main purposes.
First of all to refuse of allocating memory for an intermediate container in method
ColumnMap<K, V>:: Append(const T& value)
. AProjectedIterator
has been implemented for it. Implementation of this iterator is inspired byboost::transform_iterator
and has to be replaced bystd::indirectly_readable
iterator.The second purpose is to reduce memory allocation in
MapView::operator==(...)
. Vector integer indexes are used instead of copies of values.