对于这个简单的例子,我们目前有三个正在运行的作业。随着时间的推移,我在集群1上看到了许多"幽灵端口",其中一个端口,比如4040,可以长时间使用,现在永远被幽灵进程占用。
我的集群的盒子在它们自己的VLAN内,它们之间的所有端口都打开了。
呼叫
spark-shell
输出
20/03/06 12:54:21 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
20/03/06 12:54:21 WARN util.Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
20/03/06 12:54:21 WARN util.Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
20/03/06 12:54:21 WARN util.Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
20/03/06 12:54:21 WARN util.Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045.
20/03/06 12:54:21 WARN util.Utils: Service 'SparkUI' could not bind on port 4045. Attempting port 4046.
20/03/06 12:54:21 WARN util.Utils: Service 'SparkUI' could not bind on port 4046. Attempting port 4047.
20/03/06 12:54:21 WARN util.Utils: Service 'SparkUI' could not bind on port 4047. Attempting port 4048.
20/03/06 12:54:21 WARN util.Utils: Service 'SparkUI' could not bind on port 4048. Attempting port 4049.
20/03/06 12:54:21 WARN util.Utils: Service 'SparkUI' could not bind on port 4049. Attempting port 4050.
20/03/06 12:54:21 WARN util.Utils: Service 'SparkUI' could not bind on port 4050. Attempting port 4051.
20/03/06 12:54:21 WARN util.Utils: Service 'SparkUI' could not bind on port 4051. Attempting port 4052.
20/03/06 12:54:21 WARN util.Utils: Service 'SparkUI' could not bind on port 4052. Attempting port 4053.
20/03/06 12:54:21 WARN util.Utils: Service 'SparkUI' could not bind on port 4053. Attempting port 4054.
20/03/06 12:54:21 WARN util.Utils: Service 'SparkUI' could not bind on port 4054. Attempting port 4055.
20/03/06 12:54:21 WARN util.Utils: Service 'SparkUI' could not bind on port 4055. Attempting port 4056.
我试过
- 重置盒子
- 通过grepping Spark搜索PID来查找作业
- 使用"yarn application"CLI命令
- 正在杀死spark master/edge节点上的cloudera服务器和代理
我能做些什么来取回这些端口吗?
找到了解决问题的方法。似乎对于我们的环境来说,我们的Windows VDI会轮换,不会在周末"完全"关闭。正因为如此,Java端保持操作打开,但YARN确定它已关闭,但没有关闭Java操作。
我糟糕的工作是运行:
ps -aux | grep spark > jobs.log
如果您改为使用SPARK,它将输出糟糕的结果。当我查看日志时,我发现Spark pid在1月和2月开放。运行以下命令时,端口打开。明智的做法是,如果使用CloudEra,不要按CTRL+C退出你的火花壳,你需要按CTRL+D。
kill -9 ${PID}
让任何有更好解决方法的人都能得到答案。