我想知道如何根据字段Purch Org对我的数据集进行汇总/分组,以及在同一列中值不同的情况下添加一个"#多个";单词
我的最终结果应该是每个单据号一个唯一的行
我正在研究Databricks,所以解决方案可以是SQL或Python
单据编号 | 采购组织 | 物料组采购_文档 | >1采购_类别 | |
---|---|---|---|---|
465503991 | DL08XYW | <29130000(家用洗涤用品(>价值合同WK|||
465503991 | DL08XYW32031600(过滤器(实验室(( | 价值合同WKDE(耗材(L072(实验室耗材( | ||
465503991 | DL08XYW32169000(实验室用品( | 价值合同WKD8(设备(L072(实验室用品 |
select [Doc Number]
,case when min([Purch Org]) <> max([Purch Org]) then '#multiple#' else min([Purch Org]) end as [Purch Org]
,case when min([Material_Group]) <> max([Material_Group]) then '#multiple#' else min([Material_Group]) end as [Material_Group]
,case when min([purchasing_document]) <> max([purchasing_document]) then '#multiple#' else min([purchasing_document]) end as [purchasing_document]
,case when min([Procurement_Class_2]) <> max([Procurement_Class_2]) then '#multiple#' else min([Procurement_Class_2]) end as [Procurement_Class_2]
,case when min([Procurement_Category]) <> max([Procurement_Category]) then '#multiple#' else min([Procurement_Category]) end as [Procurement_Category]
from t
group by [Doc Number]
另一种选择(更简洁的语法,但min<>max解决方案会运行得更好(:
select [Doc Number]
,case when count(distinct [Purch Org]) > 1 then '#multiple#' else min([Purch Org]) end as [Purch Org]
,case when count(distinct [Material_Group]) > 1 then '#multiple#' else min([Material_Group]) end as [Material_Group]
,case when count(distinct [purchasing_document]) > 1 then '#multiple#' else min([purchasing_document]) end as [purchasing_document]
,case when count(distinct [Procurement_Class_2]) > 1 then '#multiple#' else min([Procurement_Class_2]) end as [Procurement_Class_2]
,case when count(distinct [Procurement_Category]) > 1 then '#multiple#' else min([Procurement_Category]) end as [Procurement_Category]
from t
group by [Doc Number]
单据编号 | 采购组织 | 物料组采购_文档 | >1采购_类别 | |
---|---|---|---|---|
465503991 | DL08XYW | #multiple#价值合同WK