Open
Description
Elasticsearch Version
8.9.1
Installed Plugins
No response
Java Version
20.0.2
OS Version
5.15.0-83-generic #92-Ubuntu SMP Mon Aug 14 09:30:42 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Problem Description
I'm trying to pull results from Elasticsearch with a sort. There can be millions of documents and It's taking a very long time to fetch all of the results. I'm looking for ways to improve the speed.
I implemented sliced scrolls with PIT, and it improves the time, but the results are no longer really sorted. They are only sorted within their own slice, but I need the results to return in sort order.
For example, this search for slice 1
GET _search
{
"slice": {
"id": 1,
"max": 5
},
"pit": {
"id": "tOaGBAEXY29tYmluZWRfbW9sX2RlZmluaXRpb24WaVBkVG1IZ0NUeFNvT3gybmJXUVAxdwAWTUpXdDdzb1BUYXlUd1NSS0l4THFMUQAAAAAAAAOpjBZwLUZROWxiWVJycWtUQnRFWk1iek9nAAEWaVBkVG1IZ0NUeFNvT3gybmJXUVAxdwAA"
},
"_source": ["_none_"],
"docvalue_fields": ["rcsb_id"],
"query": {
"match_all": {}
},
"sort": [
{
"rcsb_id": {
"order": "asc"
}
}
]
}
returns first document with ID "006"
{
"pit_id": "tOaGBAEXY29tYmluZWRfbW9sX2RlZmluaXRpb24WaVBkVG1IZ0NUeFNvT3gybmJXUVAxdwAWTUpXdDdzb1BUYXlUd1NSS0l4THFMUQAAAAAAAAOpjBZwLUZROWxiWVJycWtUQnRFWk1iek9nAAEWaVBkVG1IZ0NUeFNvT3gybmJXUVAxdwAA",
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 8227,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "combined_mol_definition",
"_id": "006-CHEM_COMP",
"_score": null,
"_source": {},
"fields": {
"rcsb_id": [
"006"
]
},
"sort": [
"006",
38024
]
},
{
"_index": "combined_mol_definition",
"_id": "00B-CHEM_COMP",
"_score": null,
"_source": {},
"fields": {
"rcsb_id": [
"00B"
]
},
"sort": [
"00B",
47562
]
}
and for slice 2 - "001"
{
"pit_id": "tOaGBAEXY29tYmluZWRfbW9sX2RlZmluaXRpb24WaVBkVG1IZ0NUeFNvT3gybmJXUVAxdwAWTUpXdDdzb1BUYXlUd1NSS0l4THFMUQAAAAAAAAOpjBZwLUZROWxiWVJycWtUQnRFWk1iek9nAAEWaVBkVG1IZ0NUeFNvT3gybmJXUVAxdwAA",
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 8257,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "combined_mol_definition",
"_id": "001-CHEM_COMP",
"_score": null,
"_source": {},
"fields": {
"rcsb_id": [
"001"
]
},
"sort": [
"001",
56945
]
},
{
"_index": "combined_mol_definition",
"_id": "003-CHEM_COMP",
"_score": null,
"_source": {},
"fields": {
"rcsb_id": [
"003"
]
},
"sort": [
"003",
63266
]
}
Steps to Reproduce
Step 1: mappings
{
"mappings": {
"rcsb_id": {
"type": "keyword",
"eager_global_ordinals": true,
"fields": {
"normalized": {
"type": "keyword",
"normalizer": "lowercase_normalizer"
}
}
}
}
}
Step 2: index creation
Step 3: opening point-in-time
Step 4: requesting slices
Logs (if relevant)
No response