用满足其他列条件的值填充列



我有如下表格:

con_no     pr_no     flag_1  flag_2
con_002   1000234      1       0
con_002   1000345     -1       0
con_002   1100200      1       1
con_002   1005890      0       0
con_003   1100367      0       0
con_003   1000445      1       1
con_003   1200334     -1       0
con_003   1140567      1       0

现在我想要另一列填充pr_no,其flag_2 = 1,并且在特定的con_no中flag_1 = 1。因此生成的表看起来应该像

con_no     pr_no     flag_1  flag_2  Parent_pr
con_002   1000234      1       0      1100200 <---This belongs to con_002
con_002   1000345     -1       0      
con_002   1100200      1       1      1100200 <---This belongs to con_002
con_002   1005890      0       0
con_003   1100367      0       0
con_003   1000445      1       1      1000445 <---This belongs to con_003
con_003   1200334     -1       0
con_003   1140567      1       0      1000445 <---This belongs to con_003

如何使用python (pandas)或SQL实现上述目标。实际上这个表在一个数据库(postgreSQL)。所以SQL查询字符串也可以。

这里有一个陷阱。每次程序运行时,con_no都会被重新洗牌。一些pr_no可能附着在新的或现有的(但不同的)con_no上。

应该将表本身连接起来。如果您的数据位于名为my_table的表中,那么像这样的查询应该可以工作:

with cte as (
select con_no, pr_no
from my_table
where flag_2 = 1)
select t.con_no, t.pr_no, t.flog_1, t.flog_2, cte.pr_no as parent_pf
from my_table as t
left outer join cte on (cte.con_no = t.con_no and t.flag_1 = 1)

您可以使用窗口函数和条件逻辑:

select t.*,
(case when flag_1 = 1
then max(case when flag_1 = 1 and flag_2 = 1 then pr_no end) over (partition by con_no)
end) as parent_pr
from t;

实际上,Postgres支持filter语法,所以我将其写为:

select t.*,
(case when flag_1 = 1
then max(pr_no) filter (where flag_1 = 1 and flag_2 = 1) over (partition by con_no)
end) as parent_pr
from t;

最新更新