我正试图使用带有spark.shuffle.service.enabled=true选项的yarn运行pyspark作业,但该作业从未完成:
如果没有这个选项,这项工作效果很好:
user@e7524bf7f996:~$ pyspark --master yarn
Using Python version 3.9.7 (default, Sep 16 2021 13:09:58)
Spark context Web UI available at http://e7524bf7f996:4040
Spark context available as 'sc' (master = yarn, app id = application_1644937120225_0004).
SparkSession available as 'spark'.
>>> sc.parallelize(range(10)).sum()
45
使用选项--conf-spark.shuffle.service.enabled=true
user@e7524bf7f996:~$ pyspark --master yarn --conf spark.shuffle.service.enabled=true
Using Python version 3.9.7 (default, Sep 16 2021 13:09:58)
Spark context Web UI available at http://e7524bf7f996:4040
Spark context available as 'sc' (master = yarn, app id = application_1644937120225_0005).
SparkSession available as 'spark'.
>>> sc.parallelize(range(10)).sum()
2022-02-15 15:10:14,591 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2022-02-15 15:10:29,590 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2022-02-15 15:10:44,591 WARN cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Spark或Yarn中是否有其他选项可以使Spark.shuffle.service.enabled工作?
我正在运行Spark 3.1.2、Python 3.9.7、hadoop-3.2
谢谢你,
Bertrand
您需要通过以下在Yarn集群上配置外部shuffle服务
- 使用YARN配置文件构建Spark。如果您正在使用预包装分销
- 找到CCD_ 1。这应该在$SPARK_HOME/common/network yarn/target/scala-如果您是自行构建Spark,如果使用分销
- 将此jar添加到中所有NodeManager的类路径中您的集群
- 在每个节点的
yarn-site.xml
中,添加spark_shuffle设置为yarn.nodemanager.aux-services
yarn.nodemanager.aux-services.spark_shuffle.class
至org.apache.spark.network.yarn.YarnShuffleService
- 增加NodeManager通过设置
YARN_HEAPSIZE
的堆大小(默认为1000(在etc/hadoop/yarn-env.sh
中,以避免在洗牌 - 重新启动集群中的所有NodeManager
有关详细信息,请参阅https://spark.apache.org/docs/latest/running-on-yarn.html#configuring-外部混洗服务
如果仍然不工作,请检查以下内容:
- 检查Yarn UI以确保有足够的资源可用
- 尝试
--deploy-mode cluster
以确保驱动程序可以与纱线集群进行通信以进行调度
感谢Warren的帮助。
以下是适用于我的设置:
https://github.com/BertrandBrelier/SparkYarn/blob/main/yarn-site.xml
echo "export YARN_HEAPSIZE=2000" >> /home/user/hadoop-3.2.1/etc/hadoop/yarn-env.sh
ln -s /home/user/spark-3.1.2-bin-hadoop3.2/yarn/spark-3.1.2-yarn-shuffle.jar /home/user/hadoop-3.2.1/share/hadoop/yarn/lib/.
echo "spark.shuffle.service.enabled true" >> /home/user/spark-3.1.2-bin-hadoop3.2/conf/spark-defaults.conf
重新启动Hadoop和Spark
我能够启动pyspark会话:
pyspark --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.enabled=true