Pyspark windows操作系统-运行时间错误:Java网关进程在发送其端口号之前退出



从昨天开始,我就试图在windows上安装Pyspark,但我经常遇到这个错误。已经48个多小时了,我想尽一切办法来解决这个问题。从零开始重新安装Pyspark多次,但仍然无法使其正常工作。

无论何时我在运行-

spark = SparkSession.builder.getOrCreate()

我收到这个错误-

RuntimeError                              Traceback (most recent call last)
~AppDataLocalTemp/ipykernel_20592/2335384691.py in <module>
1 # create a spark session
----> 2 spark = SparkSession.builder.getOrCreate()
c:usersbholaappdatalocalprogramspythonpython38libsite-packagespysparksqlsession.py in getOrCreate(self)
226                             sparkConf.set(key, value)
227                         # This SparkContext may be an existing one.
--> 228                         sc = SparkContext.getOrCreate(sparkConf)
229                     # Do not update `SparkConf` for existing `SparkContext`, as it's shared
230                     # by all sessions.
c:usersbholaappdatalocalprogramspythonpython38libsite-packagespysparkcontext.py in getOrCreate(cls, conf)
390         with SparkContext._lock:
391             if SparkContext._active_spark_context is None:
--> 392                 SparkContext(conf=conf or SparkConf())
393             return SparkContext._active_spark_context
394 
c:usersbholaappdatalocalprogramspythonpython38libsite-packagespysparkcontext.py in __init__(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
142                 " is not allowed as it is a security risk.")
143 
--> 144         SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
145         try:
146             self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,
c:usersbholaappdatalocalprogramspythonpython38libsite-packagespysparkcontext.py in _ensure_initialized(cls, instance, gateway, conf)
337         with SparkContext._lock:
338             if not SparkContext._gateway:
--> 339                 SparkContext._gateway = gateway or launch_gateway(conf)
340                 SparkContext._jvm = SparkContext._gateway.jvm
341 
c:usersbholaappdatalocalprogramspythonpython38libsite-packagespysparkjava_gateway.py in launch_gateway(conf, popen_kwargs)
106 
107             if not os.path.isfile(conn_info_file):
--> 108                 raise RuntimeError("Java gateway process exited before sending its port number")
109 
110             with open(conn_info_file, "rb") as info:
RuntimeError: Java gateway process exited before sending its port number

我尝试了这个stackoverflow帖子和这个stackoverflow2帖子中给出的解决方案。

export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"

在我的windows系统中,我使用了variable name = PYSPARK_SUBMIT_ARGSvariable value = "--master local[2] pyspark-shell"

但它不起作用。

在我的机器上设置的其他系统变量在安装期间是-

SPARK_HOME = D:sparkspark-3.2.0-bin-hadoop3.2

HADOOP_HOME = D:sparkspark-3.2.0-bin-hadoop3.2

Path = D:sparkspark-3.2.0-bin-hadoop3.2bin

PYSPARK_DRIVER_PYTHON = jupyter

PYSPARK_DRIVER_PYTHON_OPTS = jupyter

JAVA_HOME = C:Program FilesJavajdk1.8.0_301

有人能帮我吗?

您是否从https://github.com/kontext-tech/winutils?您需要将其放入\Hadoop\bin并添加路径等。

最新更新