独立 Spark 群集中的内存错误为“关闭 JVM,因为 ActorSystem[远程]启用了'akka.jvm-ex



在我的代码迭代 140 次后,我的独立 Spark 集群中出现了以下内存错误。如何在没有内存故障的情况下运行我的代码?

我有 7 个节点,具有 8GB RAM,其中 6GB 分配给所有工作线程。主站还具有8GB RAM。

[error] application - Remote calculator (Actor[akka.tcp://Remote@127.0.0.1:44545/remote/akka.tcp/NotebookServer@127.0.0.1:50778/user/$c/$a#872469007]) has been terminated !!!!!
[info] application - View notebook 'kamaruddin/PSOAANN_BreastCancer_optimized.snb', presentation: 'None'
[info] application - Closing websockets for kernel 6c8e8090-cbeb-430e-9d45-5710ce60b984
Uncaught error from thread [Remote-akka.actor.default-dispatcher-6] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[Remote]
Exception in thread "Thread-36" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.jar.Attributes.read(Attributes.java:394)
    at java.util.jar.Manifest.read(Manifest.java:199)
    at java.util.jar.Manifest.<init>(Manifest.java:69)
    at java.util.jar.JarFile.getManifestFromReference(JarFile.java:186)
    at java.util.jar.JarFile.getManifest(JarFile.java:167)
    at sun.misc.URLClassPath$JarLoader$2.getManifest(URLClassPath.java:779)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:416)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.bindError(SparkIMain.scala:1041)
    at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1347)
    at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
    at notebook.kernel.Repl$$anonfun$3.apply(Repl.scala:173)
    at notebook.kernel.Repl$$anonfun$3.apply(Repl.scala:173)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
    at scala.Console$.withOut(Console.scala:126)
    at notebook.kernel.Repl.evaluate(Repl.scala:172)
    at notebook.client.ReplCalculator$$anonfun$10$$anon$1$$anonfun$24.apply(ReplCalculator.scala:364)
    at notebook.client.ReplCalculator$$anonfun$10$$anon$1$$anonfun$24.apply(ReplCalculator.scala:361)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
    at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
    at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40)
    at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
    at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
Uncaught error from thread [Remote-akka.remote.default-remote-dispatcher-445] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[Remote]
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOf(Arrays.java:2367)
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
    at java.lang.StringBuffer.append(StringBuffer.java:322)
    at java.io.StringWriter.write(StringWriter.java:94)
    at com.fasterxml.jackson.core.json.WriterBasedJsonGenerator._flushBuffer(WriterBasedJsonGenerator.java:1879)
    at com.fasterxml.jackson.core.json.WriterBasedJsonGenerator._writeString(WriterBasedJsonGenerator.java:916)
    at com.fasterxml.jackson.core.json.WriterBasedJsonGenerator._writeFieldName(WriterBasedJsonGenerator.java:213)
    at com.fasterxml.jackson.core.json.WriterBasedJsonGenerator.writeFieldName(WriterBasedJsonGenerator.java:104)
    at play.api.libs.json.JsValueSerializer$$anonfun$serialize$2.apply(JsValue.scala:319)
    at play.api.libs.json.JsValueSerializer$$anonfun$serialize$2.apply(JsValue.scala:318)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at play.api.libs.json.JsValueSerializer.serialize(JsValue.scala:318)
    at play.api.libs.json.JsValueSerializer$$anonfun$serialize$1.apply(JsValue.scala:312)
    at play.api.libs.json.JsValueSerializer$$anonfun$serialize$1.apply(JsValue.scala:311)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at play.api.libs.json.JsValueSerializer.serialize(JsValue.scala:311)
    at play.api.libs.json.JsValueSerializer$$anonfun$serialize$2.apply(JsValue.scala:320)
    at play.api.libs.json.JsValueSerializer$$anonfun$serialize$2.apply(JsValue.scala:318)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at play.api.libs.json.JsValueSerializer.serialize(JsValue.scala:318)
    at play.api.libs.json.JsValueSerializer.serialize(JsValue.scala:302)
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:128)
    at com.fasterxml.jackson.databind.ObjectMapper.writeValue(ObjectMapper.java:1902)
    at play.api.libs.json.JacksonJson$.generateFromJsValue(JsValue.scala:494)
    at play.api.libs.json.Json$.stringify(Json.scala:51)
    at play.api.libs.json.JsValue$class.toString(JsValue.scala:80)
    at play.api.libs.json.JsObject.toString(JsValue.scala:166)
    at java.util.Formatter$FormatSpecifier.printString(Formatter.java:2838)
    at java.util.Formatter$FormatSpecifier.print(Formatter.java:2718)
Uncaught error from thread [Remote-akka.remote.default-remote-dispatcher-446] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[Remote]
java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "appclient-receive-and-reply-threadpool-0" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "appclient-receive-and-reply-threadpool-2" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "appclient-receive-and-reply-threadpool-4" java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "appclient-receive-and-reply-threadpool-6" java.lang.OutOfMemoryError: GC overhead limit exceeded
[error] application - Process exited with an error: 255 (Exit value: 255)
org.apache.commons.exec.ExecuteException: Process exited with an error: 255 (Exit value: 255)
    at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
    at org.apache.commons.exec.DefaultExecutor.access$200(DefaultExecutor.java:48)
    at org.apache.commons.exec.DefaultExecutor$1.run(DefaultExecutor.java:200)
    at java.lang.Thread.run(Thread.java:745)

也许你可以尝试使用检查点。

数据检查点 - 将生成的RDD保存到可靠的存储中。 这在一些合并数据的有状态转换中是必需的 跨多个批次。在此类转换中,生成的 RDD 取决于前几批的 RDD,这会导致 依赖链随着时间的推移而不断增加。为了避免这种无界的 恢复时间增加(与依赖链成正比(, 有状态转换的中间 RDD 是周期性的 检查点到可靠的存储(例如HDFS(以切断依赖性 链

最新更新