我正试图通过使用"双";桌子我的代码如下:
from pyspark.sql.types import *
from pyspark.sql import Row
df_dual.registerTempTable("dual")
result = sqlContext.sql("select 1,[['Red',['ML', 100.0]],['Green'['Litre', 4.0]]] from dual
union
select 2,[['Red1',['M1L', 100.0]],['Green'['Litre', 4.0]]] from dual ")
result.show()
预期输出:
1 [[['Red',['ML', 100.0]],['Green'['Litre', 4.0]]]]
2 [[['Red1',['M1L', 100.0]],['Green'['Litre', 4.0]]]
错误得到:调用o207.sql时出错。:org.apache.spark.sql.catalyst.parser.ParseException:
我想知道如何创建伪行,只是为了试验一些使用嵌套数据的函数。
您的方法看起来是正确的,但我注意到Row
数据中存在一些语法错误。
请检查下面的代码,返回您的预期输出-
from pyspark.sql.types import *
from pyspark.sql import Row
df_dual = sc.parallelize([Row(r=Row("dummy"))]).toDF()
df_dual.printSchema()
df_dual.show()
df_dual.registerTempTable("dual")
result = sqlContext.sql("select 1 as first_col,(('Red',('ML', 100.0)),('Green',('Litre', 4.0))) as second_col from dual union select 2,(('Red1',('M1L', 100.0)),('Green',('Litre', 4.0))) from dual")
result.show(truncate=False)
+---------+----------------------------------------+
|first_col|second_col |
+---------+----------------------------------------+
|2 |[[Red1,[M1L,100.0]],[Green,[Litre,4.0]]]|
|1 |[[Red,[ML,100.0]],[Green,[Litre,4.0]]] |
+---------+----------------------------------------+
不过,您确实不需要创建dual
表。您可以在没有dual
表的情况下构建所需的数据帧-
result = sqlContext.sql("select 1 as first_col,(('Red',('ML', 100.0)),('Green',('Litre', 4.0))) as second_col union select 2,(('Red1',(
'M1L', 100.0)),('Green',('Litre', 4.0)))")
result.show(truncate=False)
+---------+----------------------------------------+
|first_col|second_col |
+---------+----------------------------------------+
|2 |[[Red1,[M1L,100.0]],[Green,[Litre,4.0]]]|
|1 |[[Red,[ML,100.0]],[Green,[Litre,4.0]]] |
+---------+----------------------------------------+