我正试图在未连接到互联网的服务器上,在Jupyter Notebook上本地运行PySpark。我使用以下软件安装了PySpark和Java:
conda install pyspark-3.3.0-pyhd8ed1ab_0.tar.bz2
conda install openjdk-8.0.332-h166bdaf_0.tar.bz2
当我在笔记本上做!java -version
时,我会得到
openjdk version "1.8.0_332"
OpenJDK Runtime Environment (Zulu 8.62.0.19-CA-linux64) (build 1.8.0_332-b09)
OpenJDK 64-Bit Server VM (Zulu 8.62.0.19-CA-linux64) (build 25.332-b09, mixed mode)
当我运行!which java
时,我得到
/root/anaconda3/bin/java
我的代码如下。
import os
os.environ['SPARK_HOME'] = "/root/anaconda3/pkgs/pyspark-3.3.0-pyhd8ed1ab_0/site_packages/pyspark"
os.environ['JAVA_HOME'] = "/root/anaconda3"
os.environ['PYSPARK_SUBMIT_ARGS'] = "--master local[2] pyspark-shell"
from pyspark import SparkConf, SparkContext
conf = SparkConf().set('spark.driver.host','127.0.0.1')
sc = SparkContext(master='local', appName='Test', conf=conf)
我得到的错误是(一个片段,因为我在这里手动输入(:
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.apache.spark.deploy.SparkSubmitArguments.$anonfun$loadEnvironmentArguments$3(SparkSubmitArguments.scala:157)
...
Caused by: java.net.UnknownHostException: abc: abc: Name or service not known
...
Caused by: java.net.UnknownHostException: abc: Name or service not known
...
Runtime Error: Java gateway process exited before sending its port number
"abc";是我的服务器的主机名。我在这里错过了什么?
我发现了问题所在。
根据错误消息java.net.UnknownHostException: abc: abc: Name or service not known
,我怀疑Java无法识别我的服务器主机名abc
。所以我把它添加到环回IP127.0.0.1
下的/etc/hosts
中,现在我可以运行pyspark了。