将客户数据汇总到一行中

我在Apache Spark中创建了一个查询，希望获取多行客户数据，并将其汇总为一行，显示他们打开的产品类型。所以数据看起来像这样：

Customer   Product
1           Savings
1           Checking
1           Auto

最后看起来是这样的：

Customer     Product
1            Savings/Checking/Auto

查询当前仍有多行。我试过分组，但这并不能显示客户拥有的多个产品，相反，它只显示一个产品。

有没有一种方法可以做到这一点，那就是Apache Spark或SQL(它与Apache非常相似(？不幸的是，我没有MYSQL，也不认为它会为我安装它。

SELECT
"ACCOUNT"."account_customerkey" AS "account_customerkey",
max(
concat(case when Savings=1 then ' Savings'end,
case when Checking=1 then ' Checking  'end,
case when CD=1 then ' CD /'end,
case when IRA=1 then ' IRA /'end,
case when StandardLoan=1 then ' SL /'end,
case when Auto=1 then ' Auto /'end,
case when Mortgage=1 then ' Mortgage /'end,
case when CreditCard=1 then ' CreditCard 'end)) AS Description
FROM "ACCOUNT" "ACCOUNT" 
inner join (
SELECT 
"ACCOUNT"."account_customerkey" AS "customerkey",

CASE WHEN "ACCOUNT"."account_producttype"  = 'Savings' THEN 1 ELSE NULL END AS Savings,
CASE WHEN "ACCOUNT"."account_producttype"  = 'Checking' THEN 1 ELSE NULL END AS Checking,
CASE WHEN "ACCOUNT"."account_producttype"  = 'CD' THEN 1 ELSE NULL END AS CD,
CASE WHEN "ACCOUNT"."account_producttype"  = 'IRA' THEN 1 ELSE NULL END AS IRA,
CASE WHEN "ACCOUNT"."account_producttype"  = 'Standard Loan' THEN 1 ELSE NULL END AS StandardLoan,
CASE WHEN "ACCOUNT"."account_producttype"  = 'Auto' THEN 1 ELSE NULL END AS Auto,
CASE WHEN "ACCOUNT"."account_producttype"  = 'Mortgage' THEN 1 ELSE NULL END AS Mortgage,
CASE WHEN "ACCOUNT"."account_producttype"  = 'Credit Card' THEN 1 ELSE NULL END AS CreditCard
FROM "ACCOUNT" "ACCOUNT" 
)a on "account_customerkey" =a."customerkey"
GROUP BY
"ACCOUNT"."account_customerkey"

请尝试一下。

scala> df.show()
+--------+--------+
|Customer| Product|
+--------+--------+
|       1| Savings|
|       1|Checking|
|       1|    Auto|
|       2| Savings|
|       2|    Auto|
|       3|Checking|
+--------+--------+

scala> df.groupBy($"Customer").agg(collect_list($"Product").as("Product")).select($"Customer",concat_ws(",",$"Product").as("Product")).show(false)
+--------+---------------------+
|Customer|Product              |
+--------+---------------------+
|1       |Savings,Checking,Auto|
|3       |Checking             |
|2       |Savings,Auto         |
+--------+---------------------+

scala>

请参阅https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/collect_list和相关功能

您需要使用collect_list，它可用于SQL或%SQL。

%sql 
select id, collect_list(num) 
from t1
group by id

我使用了我自己的数据，你需要定制。只是以更原生的SQL形式进行演示。

相关内容

最新更新

热门标签：