如何在另一个Pyspark应用程序中访问全局临时视图



我有一个火花壳,可以调用pyscript并创建了一个全局温度视图

这是我在第一个Spark Shell脚本

中所做的
from pyspark.sql import SparkSession
spark = SparkSession 
.builder 
.appName("Spark SQL Parllel load example") 
.config("spark.jars","/u/user/graghav6/sqljdbc4.jar") 
.config("spark.dynamicAllocation.enabled","true") 
.config("spark.shuffle.service.enabled","true") 
.config("hive.exec.dynamic.partition", "true") 
.config("hive.exec.dynamic.partition.mode", "nonstrict") 
.config("spark.sql.shuffle.partitions","50") 
.config("hive.metastore.uris", "thrift://xxxxx:9083") 
.config("spark.sql.join.preferSortMergeJoin","true") 
.config("spark.sql.autoBroadcastJoinThreshold", "-1") 
.enableHiveSupport() 
.getOrCreate()
#after doing some transformation I am trying to create a global temp view of dataframe as:
df1.createGlobalTempView("df1_global_view")
spark.stop()
exit()

这是我的第二个火花壳脚本:

from pyspark.sql import SparkSession
spark = SparkSession 
.builder 
.appName("Spark SQL Parllel load example") 
.config("spark.jars","/u/user/graghav6/sqljdbc4.jar") 
.config("spark.dynamicAllocation.enabled","true") 
.config("spark.shuffle.service.enabled","true") 
.config("hive.exec.dynamic.partition", "true") 
.config("hive.exec.dynamic.partition.mode", "nonstrict") 
.config("spark.sql.shuffle.partitions","50") 
.config("hive.metastore.uris", "thrift://xxxx:9083") 
.config("spark.sql.join.preferSortMergeJoin","true") 
.config("spark.sql.autoBroadcastJoinThreshold", "-1") 
.enableHiveSupport() 
.getOrCreate()
newSparkSession = spark.newSession()
#reading dta from the global temp view
data_df_save = newSparkSession.sql(""" select * from global_temp.df1_global_view""")
data_df_save.show()
newSparkSession.close()
exit()

我要低于错误:

Stdoutput pyspark.sql.utils.AnalysisException: u"Table or view not found: `global_temp`.`df1_global_view`; line 1 pos 15;n'Project [*]n+- 'UnresolvedRelation `global_temp`.`df1_global_view`n"

看起来我缺少一些东西。我如何在多个会话中共享相同的全局温度视图?我是否在第一个Spark Shell中错误地关闭了火花会话?我已经在堆栈跨流程上找到了几个答案,但无法找出原因。

您正在使用createGlobalTempView,因此它是一个临时视图,关闭应用程序后将无法使用。

换句话说,它将在另一个SparkSession中可用,但在另一个pyspark应用程序中不可用。

相关内容

  • 没有找到相关文章

最新更新