solr-shards-rows-optimizer

Summary

This script is designed to calculate appropriate values for the shards.rows parameter passed in to SolrCloud for an arbitrary SolrCloud cluster size. It is used to optimize SolrCloud sharded requests to improve performance, especially for deeply paginated queries.

Specifically, this script calculates "shard factor" ratio, defined as shards.rows / (start + rows). A Monte Carlo approach is used to simulate the random assignment of documents to shards which takes place during solr indexing. In this simulation, we do the following for each page:

Create shard_count buckets
Insert P=(current page * rows_per_page) elements into random buckets.
Determine the max number of elements M over all buckets
Resulting Shard Factor = M / P, which indicates how many elements we would need to request from each shard to produce a correct set of P elements.
Clear buckets, and repeat steps 2-4 num_trials times. Determine the minimum shard factor necessary to ensure full accuracy in accurary percent of trials.
Repeat the process 2-5 for each page in range [1,page_count].

The shard factor will fall as depth increases due to "the law of large numbers".

Example

python estimate_shard_rows.py --trial_count 10000 --page_count 10 --rows_per_page 100 --shard_count 12 --accuracy 99.9
Depth   Shard factor to use at this depth
100     20%
200     16%
300     15%
400     14%
500     13%
600     12%
700     12%
800     12%
900     11%
1000    11%

The above output tells us that if the shard count is 12, and the page size is 100, we need to fetch at least 20% of rows to be correct 99.9% of the time on page 1, 16% on page 2, 15% on page 3, etc...

Application

Run the above code during cluster setup to generate a table of shard factors.
Store the output to a table in your service that calls SolrCloud.
At runtime, for each query, calculate start + rows, an round down to nearest page size in the stored table (eg: start=150, rows=10: 150+10=160 => depth=100
Lookup query depth in the stored table. (eg: 100 => 20%)
Append the parameter shards.rows, calculated as follows: shards.rows = shard_factor * (start + rows)
Sit back and watch performance improve! In our perf tests, average latency dropped by 31% with 12 shards.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
estimate_shard_rows.py		estimate_shard_rows.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

solr-shards-rows-optimizer

Summary

Example

Application

Related

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

jsarma/solr-shards-rows-optimizer

Folders and files

Latest commit

History

Repository files navigation

solr-shards-rows-optimizer

Summary

Example

Application

Related

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages