火花 SQL 分组集

我需要将列集的各种组合作为参数传递给我的 sql 查询

例如：

Val result=sqlContext.sql(""" select col1,col2,col3,col4,col5,count(col6) from table T1 GROUP BY col1,col2,col3,col4,col5 GROUPING SETS ((col1,col2),(col3,col4),(col4, col5)) """)

我需要找到聚合值的几种组合。有没有办法将这些列集作为参数传递给SQL查询，而不是手动对其进行硬编码。

目前我已经在 sql 查询中提供了所有组合，但如果出现任何新组合，我需要再次更改查询。我打算将所有组合放在一个文件中，然后全部读取并作为参数传递给sql查询。可能吗？

示例：表

id category age gender cust_id
1   101 54  M   1111
1   101 54  M   2222
1   101 55  M   3333
1   102 55     F    4444
""" select id, category, age, gender, count(cust_id) from table T1 group By id, category, age, gender
GROUPING SETS ((id,category),(id,age),(id,gender)) """

它应该产生以下结果：

group by (id, category) - count of cust_id 
1 101 3
1 102 1
group by (id and age) - count of cust_id
1 54 2
1 55 2
group by (id and gender) - count cust_id
1 M 3
1 F 1

这只是一个例子 - 我需要将各种不同的组合传递给 GROPING SETS（不是所有组合），就像一次性或单独作为参数一样

任何帮助将不胜感激。

多谢。

您可以动态构建 SQL

// original slices
var slices = List("(col1, col2)", "(col3, col4)", "(col4, col5)")
// adding new slice
slices = "(col1, col5)" :: slices 
// building SQL dynamically
val q =
s"""
with t1 as
(select 1 col1, 2 col2, 3 col3,
        4 col4, 5 col5, 6 col6)
select col1,col2,col3,col4,col5,count(col6)
  from t1
group by col1,col2,col3,col4,col5
grouping sets ${slices.mkString("(", ",", ")")}
"""
// output
spark.sql(q).show

结果

scala> spark.sql(q).show
+----+----+----+----+----+-----------+
|col1|col2|col3|col4|col5|count(col6)|
+----+----+----+----+----+-----------+
|   1|null|null|null|   5|          1|
|   1|   2|null|null|null|          1|
|null|null|   3|   4|null|          1|
|null|null|null|   4|   5|          1|
+----+----+----+----+----+-----------+

将列集组合到我的 SQL 查询作为参数

sql由 Spark 而不是源数据库执行。它根本无法到达MySQL。

我已经提供了所有的组合

如果您想要所有可能的组合，则不需要GROUPING SETS。只需使用CUBE：

SELECT ... FROM table CUBE (col1,col2,col3,col4,col5)

相关内容

最新更新

热门标签：