###RuntimeError: Java网关进程在发送端口号之前退出



我试着用python分析这些数据:

from pyspark.sql import SparkSession
from pyspark.sql.types import *
from pyspark.sql.functions import*
spark = SparkSession.builder.getOrCreate()
ds1 = spark.read.csv("C:\Users\User\Desktop\Trip_data\202101-divvy-tripdata.csv", 
header=True)
ds2 = spark.read.csv("C:\Users\User\Desktop\Trip_data\202102-divvy-tripdata.csv", 
header=True)
ds3 = spark.read.csv("C:\Users\User\Desktop\Trip_data\202103-divvy-tripdata.csv", 
header=True)
ds4 = spark.read.csv("C:\Users\User\Desktop\Trip_data\202104-divvy-tripdata.csv", 
header=True)
ds5 = spark.read.csv("C:\Users\User\Desktop\Trip_data\202105-divvy-tripdata.csv", 
header=True)
ds6 = spark.read.csv("C:\Users\User\Desktop\Trip_data\202106-divvy-tripdata.csv", 
header=True)
ds7 = spark.read.csv("C:\Users\User\Desktop\Trip_data\202107-divvy-tripdata.csv", 
header=True)
ds8 = spark.read.csv("C:\Users\User\Desktop\Trip_data\202108-divvy-tripdata.csv", 
header=True)
ds9 = spark.read.csv("C:\Users\User\Desktop\Trip_data\202109-divvy-tripdata.csv", 
header=True)
ds10 = spark.read.csv("C:\Users\User\Desktop\Trip_data\202110-divvy-tripdata.csv", 
header=True)
ds11 = spark.read.csv("C:\Users\User\Desktop\Trip_data\202111-divvy-tripdata.csv", 
header=True)
ds12 = spark.read.csv("C:\Users\User\Desktop\Trip_data\202112-divvy-tripdata.csv", 
header=True)
ds_all=ds1.union(ds2).union(ds3).union(ds4).union(ds5).union(ds6).union(ds7).union(ds8).union(ds9).union(ds10).union(ds11).union(ds12)
print((ds_all.count(), len(ds_all.columns)))

这是我的错误:

Java not found and JAVA_HOME environment variable is not set.
Install Java and set JAVA_HOME to point to the Java installation 
directory.
Traceback (most recent call last):
File "C:UsersUserPycharmProjectspythonProjectCase Study 1.py", l 
ine 4, in <module>
spark = SparkSession.builder.getOrCreate()
File "C:UsersUserPycharmProjectspythonProjectvenvlibsite- 
packagespysparksqlsession.py", line 228, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "C:UsersUserPycharmProjectspythonProjectvenvlibsite- 
packagespysparkcontext.py", line 392, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "C:UsersUserPycharmProjectspythonProjectvenvlibsite- 
packagespysparkcontext.py", line 144, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "C:UsersUserPycharmProjectspythonProjectvenvlibsite- 
packagespysparkcontext.py", line 339, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "C:UsersUserPycharmProjectspythonProjectvenvlibsite- 
packagespysparkjava_gateway.py", line 108, in launch_gateway
raise RuntimeError("Java gateway process exited before sending its 
port number")
RuntimeError: Java gateway process exited before sending its port 
number

我已经谷歌了,但是很多解决方案对我来说很困惑,我不能理解和遵循它。有人对这个问题有什么想法吗?或者在pycharm社区有更方便的包来编码?请给我一些建议,我将不胜感激!

这个问题是由于缺少$JAVA_HOME变量引起的。只需将其设置在~/.bashrc(或Mac上的~/.zshrc)文件中,添加一行:

export JAVA_HOME="/path/to/java_home/"

在Windows环境下,需要在"系统设置"中添加环境变量JAVA_HOME

注意,Spark/pyspark需要Java版本>=8。以下是检查Java版本的方法:

% $JAVA_HOME/bin/java -version
java version "1.8.0_301"
Java(TM) SE Runtime Environment (build 1.8.0_301-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.301-b09, mixed mode)

我设法通过输入安装Java的环境变量来解决这个错误。

注意:我使用的是Windows。

import os
os.environ['SPARK_HOME'] = 'C:sparkspark-3.3.0-bin-hadoop3'
os.environ["JAVA_HOME"] = 'C:Javajdk1.8.0_321'

最新更新