Python Kedro PySpark:py4j.protocol.Py4JJava错误:调用None.org.apa



这是我第一个将kedro与Pyspark一起使用的项目,我遇到了一个问题。我使用新款Mac(M1(。当我在终端中执行spark-shell时,spark已成功安装,并且我有正确的输出(欢迎使用带有图片的spark 3.2.1版本(。然而,我试图运行火花使用Kedro项目,我有一个麻烦。由于讨论了堆栈溢出,我试图找到解决方案,但没有找到与此相关的解决方案。

版本:

  • Python:3.8
  • Java:openjdk版本";18〃;2022-03-22
  • PySpark:3.2.1

Spark conf:

spark.driver.maxResultSize: 3g
spark.hadoop.fs.s3a.impl: org.apache.hadoop.fs.s3a.S3AFileSystem
spark.sql.execution.arrow.pyspark.enabled: true

在我的Kedro:项目背景下

class ProjectContext(KedroContext):
"""A subclass of KedroContext to add Spark initialisation for the pipeline."""
def __init__(
self,
package_name: str,
project_path: Union[Path, str],
env: str = None,
extra_params: Dict[str, Any] = None,
):
super().__init__(package_name, project_path, env, extra_params)
if not os.getenv('DISABLE_SPARK'):
self.init_spark_session()
def init_spark_session(self) -> None:
"""Initialises a SparkSession using the config
defined in project's conf folder.
"""
parameters = self.config_loader.get("spark*", "spark*/**")
spark_conf = SparkConf().setAll(parameters.items())
# Initialise the spark session
spark_session_conf = (
SparkSession.builder.appName(self.package_name)
.enableHiveSupport()
.config(conf=spark_conf)
.master("local[*]")
)
_spark_session = spark_session_conf.getOrCreate()

当我运行它时,我会出现以下错误:

py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x3c60b7e7) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x3c60b7e7
at org.apache.spark.storage.StorageUtils$.<init>(StorageUtils.scala:213)
at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala)
at org.apache.spark.storage.BlockManagerMasterEndpoint.<init>(BlockManagerMasterEndpoint.scala:110)
at org.apache.spark.SparkEnv$.$anonfun$create$9(SparkEnv.scala:348)
at org.apache.spark.SparkEnv$.registerOrLookupEndpoint$1(SparkEnv.scala:287)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:336)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:191)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:460)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:499)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:480)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:833)

在我的终端中,我调整了命令以匹配我的Python路径:

export HOMEBREW_OPT="/opt/homebrew/opt"
export JAVA_HOME="$HOMEBREW_OPT/openjdk/"
export SPARK_HOME="$HOMEBREW_OPT/apache-spark/libexec"
export PATH="$JAVA_HOME:$SPARK_HOME:$PATH"
export SPARK_LOCAL_IP=localhost

感谢您的帮助

Hi@Mathilde Roblot感谢您的详细报告-

具体错误"cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module"让我印象深刻。

谷歌搜索表明,您可能检索到了错误的Java(不是spark要求的8.0(

  • https://stackoverflow.com/a/49453770/2010808
  • https://stackoverflow.com/a/69851663/2010808

您可以使用一些SparkConf来设置所需的--add-opens,请参阅:https://stackoverflow.com/a/71855571/13547620.

当您的spark env库没有被Kedro提取或Kedro无法在您的env中找到spark时,也会发生这种情况。

QQ:正在使用像PyCharm这样的IDE,如果是这样的话,你可能需要转到首选项并嵌入你的env变量。我也遇到过同样的问题,从项目偏好设置env变量帮助我

希望这能帮助

相关内容

  • 没有找到相关文章

最新更新