Jupyter 笔记本 Pyspark 在 Yarn-Client 模式下 名称错误 'sc' 未定义



我是个初学者。当我用一个基本代码运行Jupyter Notebook时,它显示了这样一个错误:

NameErr      Traceback (most recencall last)
<ipython-input-1-67f48183a30b> in <module>()
----> 1 sc.master
NameError: name 'sc' is not defined

我用这些命令行进入Jutyper Notebook的Yarn Client模式:

tigerfish@master:~/桌面$ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_IR=/usr/local/hadoop/etc/hadoop pyspark --master yarn --deploy-mode client
and belows are errors and warnings
[I 10:45:25.079 NotebookApp] Writing notebook server cookie secret to /run/user/1000/jupyter/notebook_cookie_secret
[I 10:45:25.139 NotebookApp] Serving notebooks from local directory: /home/tigerfish/桌面
[I 10:45:25.139 NotebookApp] 0 active kernels 
[I 10:45:25.139 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/
[I 10:45:25.139 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
ATTENTION: default value of option mesa_glthread overridden by environment.
[I 10:45:43.265 NotebookApp] Kernel started: c6392ab5-ea7b-402e-ae26-6bc89e07791a
Exception in thread "main" java.lang.Exception: When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:239)
at org.apache.spark.deploy.SparkSubmitArguments.validateArguments(SparkSubmitArguments.scala:216)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:103)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:106)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

我使用的是Ubuntu 20.04和降级的python2.7、Hadoop2.6.0、pyspark 1.4.0、jdk8

我该如何解决这个问题?这令人困惑。

在此处输入图像描述

我终于意识到我的启动代码有问题。。。一个愚蠢的问题。。

终端显示错误:

[I 10:45:43.265 NotebookApp] Kernel started: c6392ab5-ea7b-402e-ae26-6bc89e07791a
Exception in thread "main" java.lang.Exception: When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.

这意味着我没有设置HADOOP_CONF_DIR或YARN_CONF_DIR,在我的启动代码中:

PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" HADOOP_CONF_IR=/usr/local/hadoop/etc/hadoop pyspark --master yarn --deploy-mode client

我键入了HADOOP_CONF_IR而不是HADOOP_CONF_DIR,这导致了Yarn客户端无法成功启动的问题。

最新更新