Skip to content

Some improvement ColumnMap #257

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 15, 2022
Merged

Some improvement ColumnMap #257

merged 3 commits into from
Dec 15, 2022

Conversation

den818
Copy link
Contributor

@den818 den818 commented Nov 30, 2022

That changes have two main purposes.

First of all to refuse of allocating memory for an intermediate container in method ColumnMap<K, V>:: Append(const T& value). A ProjectedIterator has been implemented for it. Implementation of this iterator is inspired by boost::transform_iterator and has to be replaced by std::indirectly_readable iterator.

The second purpose is to reduce memory allocation in MapView::operator==(...). Vector integer indexes are used instead of copies of values.

Copy link
Contributor

@Enmk Enmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add some description and rationale behind the changes?

Comment on lines 194 to 199
const auto make_index = [](const auto& data) {
std::vector<size_t> result{data.Size()};
std::generate(result.begin(), result.end(), [i = 0] () mutable { return i++; });
std::sort(result.begin(), result.end(), [&data](size_t l, size_t r) {return data[l] < data[r];});
return result;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely clear peace of code, so could you please add some comments to it?

Another thing: this allocates (2 * N)*sizeof(size_t) extra memory on the heap. IMO it could be more beneficial to create a sorted array (of pointers/references) from left items and then do a binary-search lookups with items from the right.

That way the algorithmic complexity stays the same O(N*logN), but memory requirement drops significantly: only N*sizeof(size_t) instead of 2N*sizeof(t).

Anyway, it would be really nice to see some numbers on that, could you please add an extra entry to clickhouse-cpp/ut/performance_tests.cpp ?

using Iterator = ProjectedIterator<Functor, BaseIter>;

Functor functor = [](const BaseIter& i) {
return std::make_tuple(i->first, i->second);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a random idea: maybe use std::ref/std::cref here? Because otherwise you are still copying values, not in bulk, but one-by-one.

-  return std::make_tuple(i->first, i->second);
+  return std::make_tuple(std::cref(i->first), std::cref(i->second));

Copy link
Contributor

@Enmk Enmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not bad, but some changes are still required. Please see notes.

@Enmk
Copy link
Contributor

Enmk commented Dec 10, 2022

Closing to re-open and re-trigger CI/CD

@Enmk Enmk closed this Dec 10, 2022
@Enmk Enmk reopened this Dec 10, 2022
Copy link
Contributor

@Enmk Enmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Enmk Enmk merged commit 0472da2 into ClickHouse:master Dec 15, 2022
@den818 den818 deleted the MapView branch December 19, 2022 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants