Py4JJavaError:调用o41.load时发生错误.: java.lang.ClassNotFoundExcep



这是我得到错误的代码*

df=spark.read.format("xml").option("rowTag","Root").load("/content/xml")

我想用pyspark解析xml没有任何其他平台(即databricks或azure)我也试过下载spark-xml格式的jar文件,它的代码是

spark=SparkSession.builder.appName("Apache spark using pyspark")
.config("spark jars","C:/Users/baps/Downloads/spark-xml_2.12-0.9.0.jar")
.config("spark.executor.extraClassPath","C:/Users/baps/Downloads/spark-xml_2.12-0.9.0.jar")
.config("spark.executor.extraLibrary","C:/Users/baps/Downloads/spark-xml_2.12-0.9.0.jar")
.config("spark.driver.extraClassPath","C:/Users/baps/Downloads/spark-xml_2.12-0.9.0.jar")
.getOrCreate()

在这也是我得到相同的错误在行spark.read.format

这是我每次

得到的错误
Py4JJavaError                             Traceback (most recent call last)
<ipython-input-22-59ae75a30984> in <module>
----> 1 df=spark.read.format("xml").option("rowTag","Root").load("/content/spark_/sample_corrupted.xml")
3 frames
/usr/local/lib/python3.8/dist-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
324             value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325             if answer[1] == REFERENCE_TYPE:
--> 326                 raise Py4JJavaError(
327                     "An error occurred while calling {0}{1}{2}.n".
328                     format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o125.load.
: java.lang.ClassNotFoundException: 
Failed to find data source: xml. Please find packages at
https://spark.apache.org/third-party-projects.html

at org.apache.spark.sql.errors.QueryExecutionErrors$.failedToFindDataSourceError(QueryExecutionErrors.scala:587)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:675)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:725)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:185)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassNotFoundException: xml.DefaultSource
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:476)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:589)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:661)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:661)
at scala.util.Failure.orElse(Try.scala:224)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:661)
... 15 more

是否有其他方法可以做到这一点?

看起来你在spark.jars中缺少了.。试着改变

.config("spark jars","C:/Users/baps/Downloads/spark-xml_2.12-0.9.0.jar")

.config("spark.jars","C:/Users/baps/Downloads/spark-xml_2.12-0.9.0.jar")

希望这对你有帮助!

相关内容

最新更新