jupyter 上的 Pyspark 命令:在远程服务器上连接 spark



我已经在我的远程Linux服务器(IBM RHEL Z系统)上配置了Spark 2.1。我正在尝试创建一个SparkContext并收到以下错误

from pyspark.context import SparkContext, SparkConf
master_url="spark://<IP>:7077"
conf = SparkConf()
conf.setMaster(master_url)
conf.setAppName("App1")
sc = SparkContext.getOrCreate(conf)

我收到以下错误。 当我在 PySpark 外壳中的远程服务器上运行相同的代码时,它可以正常工作。

The currently active SparkContext was created at:
(No active SparkContext.)
    at org.apache.spark.SparkContext.assertNotStopped(SparkContext.scala:100)
    at org.apache.spark.SparkContext.getSchedulingMode(SparkContext.scala:1768)
    at org.apache.spark.SparkContext.postEnvironmentUpdate(SparkContext.scala:2411)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:563)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:236)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)

听起来你还没有将jupyter设置为pyspark驱动程序。在从 jupyter 控制 pyspark 之前,您必须首先设置 PYSPARK_DRIVER_PYTHON=jupyterPYSPARK_DRIVER_PYTHON_OPTS='notebook' 。如果我没记错的话,如果您查看libexec/bin/pyspark中的代码(在OSX上),您将找到设置jupyter笔记本的说明。

最新更新