无法在Jupyter Notebook-Linux中运行PySpark



我正试图在未连接到互联网的服务器上,在Jupyter Notebook上本地运行PySpark。我使用以下软件安装了PySpark和Java:

conda install pyspark-3.3.0-pyhd8ed1ab_0.tar.bz2
conda install openjdk-8.0.332-h166bdaf_0.tar.bz2

当我在笔记本上做!java -version时,我会得到

openjdk version "1.8.0_332"
OpenJDK Runtime Environment (Zulu 8.62.0.19-CA-linux64) (build 1.8.0_332-b09)
OpenJDK 64-Bit Server VM (Zulu 8.62.0.19-CA-linux64) (build 25.332-b09, mixed mode)

当我运行!which java时,我得到

/root/anaconda3/bin/java

我的代码如下。

import os
os.environ['SPARK_HOME'] = "/root/anaconda3/pkgs/pyspark-3.3.0-pyhd8ed1ab_0/site_packages/pyspark"
os.environ['JAVA_HOME'] = "/root/anaconda3"
os.environ['PYSPARK_SUBMIT_ARGS'] = "--master local[2] pyspark-shell"
from pyspark import SparkConf, SparkContext
conf = SparkConf().set('spark.driver.host','127.0.0.1')
sc = SparkContext(master='local', appName='Test', conf=conf)

我得到的错误是(一个片段,因为我在这里手动输入(:

Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.deploy.SparkSubmitArguments.$anonfun$loadEnvironmentArguments$3(SparkSubmitArguments.scala:157)
...
Caused by: java.net.UnknownHostException: abc: abc: Name or service not known
...
Caused by: java.net.UnknownHostException: abc: Name or service not known
...
Runtime Error: Java gateway process exited before sending its port number

"abc";是我的服务器的主机名。我在这里错过了什么?

我发现了问题所在。

根据错误消息java.net.UnknownHostException: abc: abc: Name or service not known,我怀疑Java无法识别我的服务器主机名abc。所以我把它添加到环回IP127.0.0.1下的/etc/hosts中,现在我可以运行pyspark了。

最新更新