2021 Jun 25 - 1 min read
Elastic Search always returns top 10 results by default. To get large volume of results, we need to use Search API.
The Search API provides from
and size
parameters that can be used to retrieve predefined amount of data. But using from and size should be avoided to request very large volume at once. Reason: Search request works with multiple shards storing its requested hits into memory which leads into high memory + CPU usage.
Moreover Elastic Search has set the maximum limit of 10,000 hits to paginate using size
and from
parameters. It's actually a safeguard mechanism of ES. More info can be found on ES-docs.
A scenario can exist where we need to paginate through ES and retrieve very large set of data. In such case search_after
parameter can be used.
The code below shows retrieving data infinitely via elastic search JavaScript API
let data = [];
try {
let query = {
index: 'index-name',
body: {
query: {
bool: {
// ...
}
},
sort: [{ "unique-property-on-index": "asc" }], // can be both asc & desc
size: 100 // data to retrieve at a time
}
};
var lastHits;
do {
lastHits = (await es.client.search(query)).body.hits.hits;
if (lastHits.length > 0) {
data = data.concat(lastHits);
query.body["search_after"] = [lastHits.pop()._source["unique-property-on-index"]];
}
} while (lastHits.length > 0);
// you get your results on "data" array
This approach requests for data until there is none left. This seems to be the most easier way to get all the data chunks by chunks. The most important thing to consider here is: Make sure you know how much data you are paginating through.