我的spark集群中总共有6个节点。5个节点各有4核和32GB ram,其中一个节点(节点4)有8核和32GB ram。
所以我总共有6个节点- 28个核心,192GB RAM。(我想使用一半的内存,但所有核心)
计划在集群上运行5个spark应用。
我的spark_defaults.conf如下:
spark.master spark://***:7077
spark.eventLog.enabled false
spark.driver.memory 2g
worker_max_heapsize 2g
spark.kryoserializer.buffer.max.mb 128
spark.shuffle.file.buffer.kb 1024
spark.cores.max 4
spark.dynamicAllocation.enabled true
我想在每个节点上使用16GB的最大值,并通过设置以下配置在每台机器上运行4个工作实例。因此,我期望在我的集群上(4个实例* 6个节点=24)工作人员。它们总共将使用多达28个内核(全部)和96GB内存。
我的spark- envy .sh如下。
export SPARK_WORKER_MEMORY=16g
export SPARK_WORKER_INSTANCES=4
SPARK_LOCAL_DIRS=/app/spark/spark-1.6.1-bin-hadoop2.6/local
SPARK_WORKER_DIR=/app/spark/spark-1.6.1-bin-hadoop2.6/work
但是我的星团已经开始了
Spark UI正在显示正在运行的worker .
Worker Id ? Address State Cores Memory
worker-node4-address ALIVE 8 (1 Used) 16.0 GB (0.0 GB Used)
worker-node4-address ALIVE 8 (1 Used) 16.0 GB (0.0 GB Used)
worker-node4-address ALIVE 8 (1 Used) 16.0 GB (0.0 GB Used)
worker-node4-address ALIVE 8 (0 Used) 16.0 GB (0.0 B Used)
worker-node4-address ALIVE 8 (1 Used) 16.0 GB (0.0 GB Used)
worker-node1-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
worker-node1-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
worker-node1-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
worker-node1-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
worker-node2-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
worker-node2-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
worker-node2-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
worker-node2-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
worker-node3-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
worker-node3-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
worker-node3-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
worker-node3-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
worker-node5-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
worker-node5-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
worker-node5-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
worker-node5-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
worker-node6-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
worker-node6-address ALIVE 4 (3 Used) 16.0 GB (0.0 GB Used)
worker-node6-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
worker-node6-address ALIVE 4 (0 Used) 16.0 GB (0.0 B Used)
但是主UI正在显示(当没有应用程序运行时)活工人:25人正在使用的核数:Total 120, Used 0实际使用内存:400.0 GB Total, 0gb Used状态:活着
当我期望24个工人(每个节点4个)时,为什么有25个?- 1在node4上是额外的,它有8个内核
当我在每个节点上分配最大16GB使用时,为什么它显示在使用的内存:400.0 GB总数?
UI数据显示我有120个内核,因为我在我的集群上有28个内核?
你能告诉我我的系统应该有什么样的火花配置吗?
当我提交spark作业时,我应该指定多少内核执行内存?
spark.cores.max参数是什么?是每个节点还是整个集群?
我运行了3个应用程序,配置为——executor-memory 2G——total-executor-cores 4至少我的一个应用程序给出以下错误和失败。
Exception in thread "main" java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at scala.concurrent.forkjoin.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1672)
at scala.concurrent.forkjoin.ForkJoinPool.signalWork(ForkJoinPool.java:1966)
at scala.concurrent.forkjoin.ForkJoinPool.fullExternalPush(ForkJoinPool.java:1905)
at scala.concurrent.forkjoin.ForkJoinPool.externalPush(ForkJoinPool.java:1834)
at scala.concurrent.forkjoin.ForkJoinPool.execute(ForkJoinPool.java:2955)
at scala.concurrent.impl.ExecutionContextImpl.execute(ExecutionContextImpl.scala:120)
at scala.concurrent.impl.Future$.apply(Future.scala:31)
at scala.concurrent.Future$.apply(Future.scala:485)
at org.apache.spark.deploy.rest.RestSubmissionClient.readResponse(RestSubmissionClient.scala:232)
at org.apache.spark.deploy.rest.RestSubmissionClient.org$apache$spark$deploy$rest$RestSubmissionClient$$postJson(RestSubmissionClient.scala:222)
at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$createSubmission$3.apply(RestSubmissionClient.scala:87)
at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$createSubmission$3.apply(RestSubmissionClient.scala:83)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at org.apache.spark.deploy.rest.RestSubmissionClient.createSubmission(RestSubmissionClient.scala:83)
at org.apache.spark.deploy.rest.RestSubmissionClient$.run(RestSubmissionClient.scala:411)
at org.apache.spark.deploy.rest.RestSubmissionClient$.main(RestSubmissionClient.scala:424)
at org.apache.spark.deploy.rest.RestSubmissionClient.main(RestSubmissionClient.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
据我所知,每个节点应该只启动一个Worker:
http://spark.apache.org/docs/latest/hardware-provisioning.html仅当每个节点的内存超过200 gb时。但是每个Node没有200 gb的ram。你能在spark- envy .sh中设置只有4个核心的节点吗?
export SPARK_EXECUTOR_CORES=4
export SPARK_EXECUTOR_MEMORY=16GB
export SPARK_MASTER_HOST=<Your Master-Ip here>
在这个节点上有8个内核:
export SPARK_EXECUTOR_CORES=8
export SPARK_EXECUTOR_MEMORY=16GB
export SPARK_MASTER_HOST=<Your Master-Ip here>
在spark-defaults.conf中的主节点:
spark.driver.memory 2g
我认为你应该试试这个,并注释掉其他配置进行测试。这就是你想要的吗?您的集群现在应该总共使用96 GB和28个内核。您可以在没有--executor-memory 2G --total-executor-cores 4
的情况下启动应用程序。但是java.lang.OutOfMemoryError
可以在没有错误配置的情况下发生。当你向司机收取太多时也会发生这种情况。
是的,每个Worker在当前配置中有16 GB Ram。那么25worker * 16gb = 400gb。