如何从ES 1.7的大数据读取以将ES 6.7索引



需要读取从ES 1.7到索引到6.7的数据。由于没有升级。需要索引近5亿TB的2亿记录。我们使用搜索和滚动方法使用es_rest_high_level_client(6.7.2(。但无法使用滚动ID滚动。尝试的另一种方法是使用和批处理大小。最初,读取速度更快,因为从偏移增加读取真的很差。最好的方法是什么。

使用搜索和滚动的第一方法。

            SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
            searchSourceBuilder.size(10);
            searchRequest.source(searchSourceBuilder);
            searchRequest.scroll(TimeValue.timeValueMinutes(2));
            SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
            String scrollId = searchResponse.getScrollId();
    while (run) {
                SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
                scrollRequest.scroll(TimeValue.timeValueSeconds(60));
                SearchResponse searchScrollResponse = client.scroll(scrollRequest, RequestOptions.DEFAULT);
                scrollId = searchScrollResponse.getScrollId();
                hits = searchScrollResponse.getHits();
                if (hits.getHits().length == 0) {
                    run = false;
                }
            }

例外线程" MAIN" ELASTICSEARCHSTATUSEXCEPTION中的异常[elasticsearch exception [type = exception,casone = elasticsearchillegalargumentException [无法解码scrollid];嵌套:ioException [不良基本64输入字符十进制123在数组位置0];]]]] 请访问org.elasticsearch.rest.bytesrestresponse.errorfromxcontent(bytesrestrestresponse.java:177( 请访问org.elasticsearch.client.resthighlevelclient.parseentity(ResthighLevelClient.java:2050( 请访问org.elasticsearch.client.resthighlevelclient.parseresponseexception(ResthighLelevelClient.java:2026(:

第二种方法:

int offset = 0;
        boolean run = true;
        while (run) {
            SearchRequest searchRequest = new SearchRequest("indexname");
            SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
            searchSourceBuilder.from(offset);
            searchSourceBuilder.size(500);
            searchRequest.source(searchSourceBuilder);
            long start = System.currentTimeMillis();
            SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
            long end = System.currentTimeMillis();
            SearchHits hits = searchResponse.getHits();
            System.out.println(" Total hits : " + hits.totalHits + " time : " + (end - start));
            offset += 500;
            if(hits.getHits().length == 0) {
                run = false;
            }
        }

读取数据的任何其他方法。

通常最好的解决方案是一个远程扣留:https://www.elastic.co/guide/guide/en/elasticsearch/reference/6.7/docs-reindex.html#reindex-reindex-from - ememote

我不确定其余的客户端仍然与1.x兼容,而远程勒索应该支持它。

深层分页非常昂贵,这就是为什么应该避免它的原因 - 您在榜样中看到了原因。

最新更新