火花任务没有开始执行

我正在spark shell作业中运行作业

--num-executors 15 
--driver-memory 15G 
--executor-memory 7G 
--executor-cores 8 
--conf spark.yarn.executor.memoryOverhead=2G 
--conf spark.sql.shuffle.partitions=500 
--conf spark.sql.autoBroadcastJoinThreshold=-1 
--conf spark.executor.memoryOverhead=800

作业被卡住，无法启动该代码在270m的大型数据集上使用过滤条件进行交叉连接。我已经将大表270m和小表(100000(的分区增加到16000，我已将其转换为广播变量

我已经为这份工作添加了spark ui，

所以我必须减少分区，增加执行器，任何想法

谢谢你的帮助。

[spark ui 1][1]！[spark ui 2][2]！[spark ui 3][3]10小时后

状态：任务：7341/16936(16624失败(

检查容器错误日志

RM Home
NodeManager
Tools
Failed while trying to construct the redirect url to the log server. Log Server url may not be configured
java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all.

[50/完成的ui 1][4][50/完成的ui 2][5][1] ：https://i.stack.imgur.com/nqcys.png[2] ：https://i.stack.imgur.com/S2vwL.png[3] ：https://i.stack.imgur.com/81FUn.png[4] ：https://i.stack.imgur.com/h5MTa.png[5] ：https://i.stack.imgur.com/yDfKF.png

如果您能提到您的集群配置，那将非常有用。

但由于您添加了1000的小表广播，但100000可能不需要调整内存配置。

根据您的配置，我假设您总共有：15 * 7 = 105GB的内存。

你可以试试--num-executors 7 --executor-memory 15

这将为每个执行器提供更多的内存来保存广播变量。请相应地调整--executor-cores以正确利用

相关内容

最新更新

热门标签：