为什么我不能在双表 SparkSQl 上执行联合?



我正试图通过使用"双";桌子我的代码如下:

from pyspark.sql.types import *
from pyspark.sql import Row    

df_dual.registerTempTable("dual")
result = sqlContext.sql("select 1,[['Red',['ML', 100.0]],['Green'['Litre', 4.0]]] from dual  
union 
select 2,[['Red1',['M1L', 100.0]],['Green'['Litre', 4.0]]] from dual ")
result.show()

预期输出:

1   [[['Red',['ML', 100.0]],['Green'['Litre', 4.0]]]]
2   [[['Red1',['M1L', 100.0]],['Green'['Litre', 4.0]]]

错误得到:调用o207.sql时出错。:org.apache.spark.sql.catalyst.parser.ParseException:

我想知道如何创建伪行,只是为了试验一些使用嵌套数据的函数。

您的方法看起来是正确的,但我注意到Row数据中存在一些语法错误。

请检查下面的代码,返回您的预期输出-

from pyspark.sql.types import *
from pyspark.sql import Row
df_dual = sc.parallelize([Row(r=Row("dummy"))]).toDF()
df_dual.printSchema()
df_dual.show()
df_dual.registerTempTable("dual")

result = sqlContext.sql("select 1 as first_col,(('Red',('ML', 100.0)),('Green',('Litre', 4.0))) as second_col from dual union select 2,(('Red1',('M1L', 100.0)),('Green',('Litre', 4.0))) from dual")

result.show(truncate=False)
+---------+----------------------------------------+
|first_col|second_col                              |
+---------+----------------------------------------+
|2        |[[Red1,[M1L,100.0]],[Green,[Litre,4.0]]]|
|1        |[[Red,[ML,100.0]],[Green,[Litre,4.0]]]  |
+---------+----------------------------------------+

不过,您确实不需要创建dual表。您可以在没有dual表的情况下构建所需的数据帧-

result = sqlContext.sql("select 1 as first_col,(('Red',('ML', 100.0)),('Green',('Litre', 4.0))) as second_col union select 2,(('Red1',(
'M1L', 100.0)),('Green',('Litre', 4.0)))")
result.show(truncate=False)
+---------+----------------------------------------+
|first_col|second_col                              |
+---------+----------------------------------------+
|2        |[[Red1,[M1L,100.0]],[Green,[Litre,4.0]]]|
|1        |[[Red,[ML,100.0]],[Green,[Litre,4.0]]]  |
+---------+----------------------------------------+

最新更新