spark.shuffle.service.enabled=真正的集群.YarnScheduler:初始作业未接受任何资



我正试图使用带有spark.shuffle.service.enabled=true选项的yarn运行pyspark作业,但该作业从未完成:

如果没有这个选项,这项工作效果很好:

user@e7524bf7f996:~$ pyspark --master yarn                                                               
Using Python version 3.9.7 (default, Sep 16 2021 13:09:58)
Spark context Web UI available at http://e7524bf7f996:4040
Spark context available as 'sc' (master = yarn, app id = application_1644937120225_0004).
SparkSession available as 'spark'.
>>> sc.parallelize(range(10)).sum()
45       

使用选项--conf-spark.shuffle.service.enabled=true

user@e7524bf7f996:~$ pyspark --master yarn --conf spark.shuffle.service.enabled=true
Using Python version 3.9.7 (default, Sep 16 2021 13:09:58)
Spark context Web UI available at http://e7524bf7f996:4040
Spark context available as 'sc' (master = yarn, app id = application_1644937120225_0005).
SparkSession available as 'spark'.
>>> sc.parallelize(range(10)).sum()
2022-02-15 15:10:14,591 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2022-02-15 15:10:29,590 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2022-02-15 15:10:44,591 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Spark或Yarn中是否有其他选项可以使Spark.shuffle.service.enabled工作?

我正在运行Spark 3.1.2、Python 3.9.7、hadoop-3.2

谢谢你,

Bertrand

您需要通过以下在Yarn集群上配置外部shuffle服务

  1. 使用YARN配置文件构建Spark。如果您正在使用预包装分销
  2. 找到CCD_ 1。这应该在$SPARK_HOME/common/network yarn/target/scala-如果您是自行构建Spark,如果使用分销
  3. 将此jar添加到中所有NodeManager的类路径中您的集群
  4. 在每个节点的yarn-site.xml中,添加spark_shuffle设置为yarn.nodemanager.aux-servicesyarn.nodemanager.aux-services.spark_shuffle.classorg.apache.spark.network.yarn.YarnShuffleService
  5. 增加NodeManager通过设置YARN_HEAPSIZE的堆大小(默认为1000(在etc/hadoop/yarn-env.sh中,以避免在洗牌
  6. 重新启动集群中的所有NodeManager

有关详细信息,请参阅https://spark.apache.org/docs/latest/running-on-yarn.html#configuring-外部混洗服务

如果仍然不工作,请检查以下内容:

  1. 检查Yarn UI以确保有足够的资源可用
  2. 尝试--deploy-mode cluster以确保驱动程序可以与纱线集群进行通信以进行调度

感谢Warren的帮助。

以下是适用于我的设置:

https://github.com/BertrandBrelier/SparkYarn/blob/main/yarn-site.xml

echo "export YARN_HEAPSIZE=2000" >> /home/user/hadoop-3.2.1/etc/hadoop/yarn-env.sh
ln -s /home/user/spark-3.1.2-bin-hadoop3.2/yarn/spark-3.1.2-yarn-shuffle.jar /home/user/hadoop-3.2.1/share/hadoop/yarn/lib/.
echo "spark.shuffle.service.enabled    true" >> /home/user/spark-3.1.2-bin-hadoop3.2/conf/spark-defaults.conf

重新启动Hadoop和Spark

我能够启动pyspark会话:

pyspark --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.enabled=true

相关内容

最新更新