Spark上下文问题


spark = SparkSession.builder.appName('QUEUEVQL').getOrCreate()
jsonStrings = {"Name": "SRIDHAR", "Author": "jangcy", "BlogEntries": 100, "Caller": "jangcy"}
dt = [jsonStrings]
dfs = spark.createDataFrame(dt).collect()
dfs2 = spark.sparkContext.parallelize(dfs).toDF()
dfs2.createOrReplaceTempView("QVQL")
resDf = spark.sql("select Name from QVQL")
resDfPandas = resDf.toPandas()
print(resDfPandas)

代码错误:

py4j.protocol.Py4JJava错误:调用o490 collectToPython时出错。:org.apache.spark.SparkException:由于阶段失败而中止作业:阶段27.0中的任务0失败4次,最近一次失败:阶段27.10中丢失的任务0.3(TID 98((172.17.7.28 executor 1(:java.io.IOException:无法运行程序"蟒蛇3.6〃:CreateProcess错误=2,系统找不到在java.lang.ProcessBuilder.start(ProcessBuilder.java:1048(中指定的文件

这应该是一个2行,如果您仍然收到错误,那么正如错误消息所示,您将丢失一些jar文件。

jsonStrings = {"Name": "SRIDHAR", "Author": "jangcy", "BlogEntries": 100, "Caller": "jangcy"}
dfs = spark.createDataFrame([jsonStrings]).toPandas()
print(dfs)
# Author  BlogEntries  Caller     Name
# 0  jangcy          100  jangcy  SRIDHAR

最新更新