如何对数据帧中列值与另一个数据帧中的列值匹配的行进行计数

我有以下样本数据

df1 = [[52, '1', '10'], [54, '1', '4'],
[55, '2', '3'], [52, '1', '10'],
[55, '2', '10'], [52, '1', '4']]

df = pd.DataFrame(df1, columns =['Cow', 'Lact', 'Procedure'])
df2 = [['52', '1'], ['53', '9'],
['54', '2'], ['55', '2']]

df2 = pd.DataFrame(df2, columns =['Cow', 'Lact'])

表格看起来像：

df:

Cow Lact    Procedure
0   52  1        10
1   54  1        4
2   55  2        3
3   52  1        10
4   55  2        10
5   52  1        4

df2:

我想为df2中的每个Cow-Lact组合计算procedure = 10的数量，然后向df2添加一列，称为Tproc，其中包括该计数。

我要找的输出是

Cow Lact Tproc
0   52  1     2
1   53  9     0
2   54  2     0
3   55  2     1

以下过滤器不起作用：

filt = [(df['Cow']==df2['Cow'])&(df['Lact']==df2['Lact'])&(df['Procedure']==10)]

当时我的计划是使用.len来获得计数

df2['Tproc'] = df2.loc[filt].len

如何根据另一个DataFrame中的值筛选DataFrame以计算满足条件的行数？

在将值附加到Tproc列之前，可以使用value_counts和reindex作为第一个数据帧：

df2['Tproc'] = (
df1[df1['Procedure'] == '10'].value_counts(['Cow', 'Lact'])
.reindex(pd.MultiIndex.from_frame(df2[['Cow', 'Lact']]), fill_value=0).values
)
print(df2)
# Output
Cow Lact  Tproc
0  52    1      2
1  53    9      0
2  54    2      0
3  55    2      1

设置：

df1 = pd.DataFrame({'Cow': [52, 54, 55, 52, 55, 52],
'Lact': [1, 1, 2, 1, 2, 1],
'Procedure': ['10', '4', '3', '10', '10', '4']})
df2 = pd.DataFrame({'Cow': [52, 53, 54, 55], 'Lact': [1, 9, 2, 2]})

您可以merge+groupby+sum:

tmp = df2.merge(df.astype(str), on=['Cow','Lact'], how='left')
out = tmp['Procedure'].eq('10').groupby([tmp['Cow'], tmp['Lact']]).sum().reset_index(name='Tproc')

输出：

Cow Lact  Tproc
0  52    1      2
1  53    9      0
2  54    2      0
3  55    2      1

使用groupby()+size()，然后使用merge()

out = df2.merge(
df[df['Procedure'] == '10'].groupby(['Cow', 'Lact']).size().reset_index(name='Tproc').astype(str), 
how='left', 
on=['Cow','Lact']
).fillna(0)

输出：

Cow Lact Tproc
0 52  1   2
1 53  9   0
2 54  2   0
3 55  2   1

相关内容

最新更新

热门标签：