IGNITE的SharedRDD代码在单个服务器的设置上工作,但是当添加其他服务器时,会发生故障而异常



我有2个与Spark Worker共处的服务器节点。我正在使用共享的IGNITE RDD来保存我的数据框架。当我仅使用一个服务器节点凝视时,我的代码正常工作,如果我启动两个服务器节点代码都会失败,

网格处于无效状态,可以执行此操作。它要么尚未启动,要么已经停止了[gridName = null,state = spot]

DiscoverySpi的配置如下

<property name="discoverySpi">
        <bean class="org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi">
            <property name="ipFinder">
                <!--
                    Ignite provides several options for automatic discovery that can be used
                    instead os static IP based discovery. For information on all options refer
                    to our documentation: http://apacheignite.readme.io/docs/cluster-config
                -->
                <!-- Uncomment static IP finder to enable static-based discovery of initial nodes. -->
                <bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.vm.TcpDiscoveryVmIpFinder">
                <!--<bean class="org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder">-->
           <property name="shared" value="true"/>
                    <property name="addresses">
                        <list>
                            <!-- In distributed environment, replace with actual host IP address. -->
                            <value>v-in-spark-01:47500..47509</value>
            <value>v-in-spark-02:47500..47509</value>
                        </list>
                    </property>
                </bean>
            </property>
        </bean>
</property>

我知道这个例外通常意味着点燃instanace既没有启动或停止并尝试进行操作,但是我认为没有单个服务器节点的原因,它可以正常工作,而且我也没有明确关闭Ignite我的程序中的实例。

在我的代码流中,我确实在交易中执行操作,所以就像

  1. 创建cache1:工作正常
  2. 创建cache2:工作正常
  3. 将值放在cache1中;正常工作
  4. cache2上的igniterdd.savevalues:此步骤失败,上述例外。

使用此链接以完成错误跟踪由零件引起的下面也粘贴了

Caused by: java.lang.IllegalStateException: Grid is in invalid state to perform this operation. It either not started yet or has already being or have stopped [gridName=null, state=STOPPING]
      at org.apache.ignite.internal.GridKernalGatewayImpl.illegalState(GridKernalGatewayImpl.java:190)
      at org.apache.ignite.internal.GridKernalGatewayImpl.readLock(GridKernalGatewayImpl.java:90)
      at org.apache.ignite.internal.IgniteKernal.guard(IgniteKernal.java:3151)
      at org.apache.ignite.internal.IgniteKernal.getOrCreateCache(IgniteKernal.java:2739)
      at org.apache.ignite.spark.impl.IgniteAbstractRDD.ensureCache(IgniteAbstractRDD.scala:39)
      at org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1.apply(IgniteRDD.scala:164)
      at org.apache.ignite.spark.IgniteRDD$$anonfun$saveValues$1.apply(IgniteRDD.scala:161)
      at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:883)
      at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:883)
      at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897)
      at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897)
      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
      at org.apache.spark.scheduler.Task.run(Task.scala:85)
      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
      ... 3 more</pre>

看起来,在您仍在尝试运行作业时,由于某些原因,嵌入在执行程序过程中的节点被停止。据我所知,发生这种情况的唯一方法是停止执行者流程。会这样吗?日志中是否有任何内容?

最新更新