Skip to content

Decrease TieredMergePolicy's default number of segments per tier to 8. #14823

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jpountz
Copy link
Contributor

@jpountz jpountz commented Jun 20, 2025

TieredMergePolicy currently allows 10 segments per tier. With Lucene being increasingly deployed with separate indexing and search tiers that get updated via segment-based replication, I believe that it would make sense for Lucene to have more aggressive merging defaults, a price that is only paid once on the indexing tier, but that benefits all search nodes that serve queries for this index.

Note that this is still a somewhat conservative default, applications with low latency requirements and low update rates will likely want to go even further, with 4 segments per tier, or even 2.

BaseMergePolicyTestCase#testSimulateAppendOnly reports a write amplification increase from 3.4 to 3.8, while BaseMergePolicyTestCase#testSimulateUpdates reports a write amplification increase from 4.5 to 4.9. In exchange, the number of segments between the floor and max segment sizes decreases by about 20%.

This should especially help queries that have a high per-segment overhead: PK lookups, point queries, multi-term queries and vector searches.

`TieredMergePolicy` currently allows 10 segments per tier. With Lucene being
increasingly deployed with separate indexing and search tiers that get updated
via segment-based replication, I believe that it would make sense for Lucene to
have more aggressive merging defaults, a price that is only paid once on the
indexing tier, but that benefits all search nodes that serve queries for this
index.

Note that this is still a somewhat conservative default, applications with low
latency requirements and low update rates will likely want to go even further,
with 4 segments per tier, or even 2.

`BaseMergePolicyTestCase#testSimulateAppendOnly` reports a write amplification
increase from 3.4 to 3.8, while `BaseMergePolicyTestCase#testSimulateUpdates`
reports a write amplification increase from 4.5 to 4.9. In exchange, the number
of segments between the floor and max segment sizes decreases by about 20%.

This should especially help queries that have a high per-segment overhead:
PK lookups, point queries, multi-term queries and vector searches.
Copy link

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@mikemccand
Copy link
Member

+1, this is a great idea -- more aggessive merging by default makes sense.

Copy link
Member

@mikemccand mikemccand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jpountz

@jpountz
Copy link
Contributor Author

jpountz commented Jun 20, 2025

Thanks @mikemccand ! I'll wait a few days before merging to give others a chance to take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants