Solr fetchIndex命令在分片节点上莫名其妙地失败了

我通过REST调用fetchIndex命令时遇到了一个奇怪的问题。我正试图使用fetchIndex将数据从一个solrcloud实例传播到另一个。我对文件的阅读似乎表明这应该是可能的：

fetchindex

强制指定的从属服务器从其主服务器获取索引的副本。http://slave_host:port/solr/core_name/replication？command=fetchindex

如果您愿意，您可以传递一个额外的属性，如masterUrl或compression(或标记中指定的任何其他参数(，以便从master进行一次复制。这消除了对从设备中的主设备进行硬编码的需要。

我遇到的问题是复制开始时出现了许多意外异常。例如，从"从"节点：

2020-12-15 00:17:17.442 INFO  (explicit-fetchindex-cmd) [   ] o.a.s.h.IndexFetcher Starting replication process
2020-12-15 00:17:17.445 INFO  (explicit-fetchindex-cmd) [   ] o.a.s.h.IndexFetcher Number of files in latest index in master: 17
2020-12-15 00:17:17.449 INFO  (explicit-fetchindex-cmd) [   ] o.a.s.u.DefaultSolrCoreState New IndexWriter is ready to be used.
2020-12-15 00:17:17.449 INFO  (explicit-fetchindex-cmd) [   ] o.a.s.h.IndexFetcher Starting download (fullCopy=false) to NRTCachingDirectory(MMapDirectory@C:scratchsolr-7.7.3examplecloudnode1solrtechproducts_shard1_replica_n1dataindex.20201215001717446 lockFactory=org.apache.lucene.store.NativeFSLockFactory@5577fa1; maxCacheMB=48.0 maxMergeSizeMB=4.0)
2020-12-15 00:17:17.455 ERROR (explicit-fetchindex-cmd) [   ] o.a.s.h.IndexFetcher Error fetching file, doing one retry...:org.apache.solr.common.SolrException: Unable to download _0.si completely. Downloaded 551!=533
at org.apache.solr.handler.IndexFetcher$FileFetcher.cleanup(IndexFetcher.java:1700)
at org.apache.solr.handler.IndexFetcher$FileFetcher.fetch(IndexFetcher.java:1580)
at org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1550)
at org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:1030)
at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:569)
at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:346)
at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:425)
at org.apache.solr.handler.ReplicationHandler.lambda$fetchIndex$0(ReplicationHandler.java:346)
at java.lang.Thread.run(Thread.java:748)

这些异常会导致复制中止。SO上有一些问题引用了这样的错误(solr ReplicationHandler-SnapPull无法下载文件(，但似乎没有与这种情况相关的问题。

这个问题非常容易再现，只使用基本的solr安装，而不使用特殊数据。我使用的是Solr 7.7.3。

复制步骤：

在"主"机器上打开solr
执行./bin/solr -e cloud以部署示例solr云。接受所有默认值，但以下情况除外：
- 将集合命名为"techproducts"，而不是"gettingstarted">
- 选择"sample_techproductsConfigs"配置集
将样品技术产品数据加载到solr:bin/post -c techproducts ./example/exampledocs/*中
重复步骤1&2在另一台机器或VM上。不要加载技术产品数据-我们希望使用fetchIndex来复制它
加载poster或您选择的REST客户端，并在第二台机器上调用fetchIndex命令：GET http://<second machine>:8983/solr/techproducts/replication?command=fetchindex&masterUrl=http://<first machine>:8983/solr/techproducts

这应该会在"从"机器的日志中产生如上所示的错误输出。我的任务要求我使用Solr 7.7.3，但我尝试过不同的JVM以及Windows和Linux主机。所有组合都产生相同的结果。

我觉得我一定错过了什么，但我不确定是什么。任何建议或建议都将非常有帮助。

我也很好奇如何通过SolrJ以编程方式正确地调用这种行为，但一旦这个问题得到解决，这可能最好留给另一个问题。

编辑：通过将示例云中的碎片/副本数量减少到一个，我已经能够使用此过程成功地进行复制。我现在正在研究在每个碎片的基础上执行这些索引复制需要做什么，但我还没有答案

事实证明，在这个过程的早期，我将集合和核心混为一谈，但没有注意到。在提供的REST URL中，

获取http://8983/solr/技术产品/replication？command=fetchindex&masterUrl=http://8983/solr/技术产品

我发布的是集合名称，而不是核心名称。一个恰当的例子：

是否获取http://8983/solr/techproducts_shard1_replica_n1/replication？command=fetchindex&masterUrl=http://8983/solr/techproducts_shard1replica_n1

当然，为了正确复制整个云实例，需要对每个核心重复此REST请求。奇怪的是，当使用集合而不是核心调用复制端点时，Solr不会生成明确的错误消息，但仍然尝试复制。自然地，当涉及多个碎片时，这会导致目标节点尝试命中"碎片"；移动目标"-指向集合的查询可能会触及任何核心，并且这些文件将不符合预期，从而导致上面的错误消息。

相关内容

最新更新

热门标签：