我想不出该怎么做:正如标题所解释的那样,只有当另一列包含Closed Won
时,我才想通过列acquired_month
分组数据框(在示例中,我制作了一个助手列,如果满足该条件,则仅标记True
,尽管我不确定该步骤是否必要)。如果满足这些条件,我想对第三列的值求和但不知道怎么做。下面是我到目前为止的代码:
us_lead_scoring.loc[us_lead_scoring['Stage'].str.contains('Closed Won'), 'closed_won_binary'] = True acquired_date = us_lead_scoring.groupby('acquired_month')['closed_won_binary'].sum()
,但这只是对真假列求和,而不是sum
列,如果真假列在acquired_month
组比之后为真。如有任何指示,欢迎。
如果需要聚合列col
,将Series.where
中不匹配的值替换为0
的值,然后聚合sum
:
us_lead_scoring = pd.DataFrame({'Stage':['Closed Won1','Closed Won2','Closed', 'Won'],
'col':[1,3,5,6],
'acquired_month':[1,1,1,2]})
out = (us_lead_scoring['col'].where(us_lead_scoring['Stage']
.str.contains('Closed Won'), 0)
.groupby(us_lead_scoring['acquired_month'])
.sum()
.reset_index(name='SUM'))
print (out)
acquired_month SUM
0 1 4
1 2 0