Elasticsearch-Java RestHighLevelClient-如何使用滚动api获取所有文档



在Elasticsearch的索引中,我保存了大约30000个实体。我想使用RestHighLevelClient获取它们的所有id。我已经读到,最好的方法是使用滚动api。然而,当我这样做的时候,我只收到了大约10个实体,而不是3万个。如何解决这个

final class ElasticRepo {
private final RestHighLevelClient restHighLevelClient;
List<ListingsData> getAllListingsDataIds() {
val request = new SearchRequest(ELASTICSEARCH_LISTINGS_INDEX);
request.types(ELASTICSEARCH_TYPE);
val searchSourceBuilder = new SearchSourceBuilder()
.query(matchAllQuery())
.fetchSource(new String[]{"listing_id"}, new String[]{"backoffice_data", "search_and_match_data"});
request.source(searchSourceBuilder);
request.scroll(TimeValue.timeValueMinutes(3));
return executeQuery(request);
}
private List<ListingsData> executeQuery(final SearchRequest searchQuery) {
try {
val hits = restHighLevelClient.search(searchQuery, RequestOptions.DEFAULT).getHits().getHits();
return Arrays.stream(hits).map(SearchHit::getSourceAsString).map(ElasticRepo::toListingsData).collect(Collectors.toList());
} catch (IOException e) {
e.printStackTrace();
throw new RuntimeException("");
}
}
}

当我这样做时,executeQuery只返回大约11个实体。如何解决这个问题,如何获取索引中的所有文档?

试着遵循这个例子,我使用的是这个代码,它很有效:

String query = "your query here";
QueryBuilder matchQueryBuilder = QueryBuilders.boolQuery().must(new QueryStringQueryBuilder(query));
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(matchQueryBuilder);
searchSourceBuilder.size(5000); //max is 10000
searchRequest.indices("your index here");
searchRequest.source(searchSourceBuilder);
final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(10L));
searchRequest.scroll(scroll);
SearchResponse searchResponse = client.search(searchRequest);
String scrollId = searchResponse.getScrollId();
SearchHit[] allHits = new SearchHit[0];
SearchHit[] searchHits = searchResponse.getHits().getHits();
while (searchHits != null && searchHits.length > 0)
{
allHits = Helper.concatenate(allHits, searchResponse.getHits().getHits()); //create a function which concatenate two arrays
SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
scrollRequest.scroll(scroll);
searchResponse = client.searchScroll(scrollRequest);
scrollId = searchResponse.getScrollId();
searchHits = searchResponse.getHits().getHits();
}
ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
clearScrollRequest.addScrollId(scrollId);
ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest);

作为搜索API的一部分,除非指定了大小字段,否则默认情况下检索的最大文档数为10。

搜索滚动API文档作为Java REST高级文档的一部分,有一个很好的示例代码->https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-search-scroll.html

相关内容

  • 没有找到相关文章

最新更新