无法将 Apache Ignite 配置文件分发到 Spark on Yarn 执行器

我们尝试在 Spark on Yarn 中提交一个作业，它将数据从 HDFS 导入到 Apache Ignite。因此，我们需要为 Spark 容器指定 Ignite 配置文件路径。

Ignite 网站上的示例只定义了像"conf/cache.xml"这样的路径，然后 Spark 驱动程序和执行器"神奇地"找到了该文件，但我不明白 Spark 执行器是如何找到它的。

我们尝试了几种方法，但没有一种奏效：

在代码中指定完整路径，如"file:///disk1/conf/cache.xml">
将配置文件上传到 HDFS 并指定它，如"hdfs:///hdfs_root/conf/cache.xml">
在 spark-defaults.conf 中的参数 spark.{ 中指定完整路径。驱动程序，执行器}.extraClassPath

我们是否必须将 Ignite 配置文件放在每个 Yarn 节点中，以便 Ignite 在 Yarn 上使用 Spark ？有没有更好的方法？

我不确定为什么 Ignite 无法使用"file:///disk1/conf/cache.xml"或"hdfs:///hdfs_root/conf/cache.xml"读取配置。可能是问题所在，应该进行调查。

但是，您仍然可以尝试使用动态配置，例如：

public static IgniteConfiguration getClientConfiguration(String igniteInstanceName) {
IgniteConfiguration cfg = new IgniteConfiguration();
if (igniteInstanceName != null) {
cfg.setIgniteInstanceName(igniteInstanceName);
cfg.setConsistentId(igniteInstanceName);
}

cfg.setWorkDirectory(FileSystems.getDefault().getPath(".").toAbsolutePath().toString());
cfg.setClientMode(true);
TcpDiscoveryVmIpFinder ipFinder = new TcpDiscoveryVmIpFinder();
List<String> addrs = Arrays.asList("10.0.75.1:47500..47509");
ipFinder.setAddresses(addrs);
TcpDiscoverySpi discoSpi = new TcpDiscoverySpi();
discoSpi.setIpFinder(ipFinder);
cfg.setDiscoverySpi(discoSpi);
return cfg;
}

此外，您可以将 XML 配置放入 jar 存档中。

如果您打算使用Ignite RDD，那么下一个示例应该适用于上述情况：

1(动态配置：

JavaIgniteContext<Long, Record> igniteContext = new JavaIgniteContext<>(
sparkCtx, (IgniteOutClosure<IgniteConfiguration>)() -> {
try {
return IngniteConfigurationProvider.getClientConfiguration("ClientNode");
}
catch (Exception e) {
return null;
}
});

2(使用 jar 中的配置：

JavaIgniteContext<Long, Record> igniteContext = new JavaIgniteContext<>(
sparkCtx, "client_config.xml");

相关内容

最新更新

热门标签：