我有两个示例数据帧:
df1 = pd.DataFrame()
df1['a1'] = ['ABC','ACC','BCC','ABC']
df1['b1'] = ['ACC','AAC','BAC','ACC']
df2 = pd.DataFrame()
df2['a2'] = ['ACC','BCC','ABC']
df2['b2'] = ['AAC','BAC','ACC']
df2['types'] = [t1,t2,t3]
>>> df2
a2 b2 types
0 ACC AAC t1
1 BCC BAC t2
2 ABC ACC t3
>>> df1
a1 a2
0 ABC ACC
1 ACC AAC
2 BCC BAC
3 CCC CAC
我想从df1中提取一行,并在df2中迭代寻找匹配项。如果a1匹配a2,b1匹配b2,那么我想计算每个类型的概率的类型。例如,对于df1的第一行,它与df2的第三行匹配,因此我计数t3+1。当有更多的数据时,我想找到一种有效的方法
我试过了:
for ind in df1:
compare_item1= df1['a1'][ind]
compare_item2 = df1['b1'][ind]
for i in df2:
count = 0
if compare_item1 == df2['a2'][i] and compare_item2 == df2['b2'][i]:
df1['t_{}'.format(i)]= count+1
我的想法是,对于每次迭代,创建一个伪变量t_I,然后我可以进行计数和进一步的计算。然而,我没有得到预期的带有伪变量的df1。有什么关于如何修复的建议吗?或者有什么更有效的方法来寻找概率?
谢谢!
IIUC使用:
df = df1.merge(df2, left_on=['a1','b1'], right_on=['a2','b2'])
print (df)
a1 b1 a2 b2 types
0 ABC ACC ABC ACC t3
1 ABC ACC ABC ACC t3
2 ACC AAC ACC AAC t1
3 BCC BAC BCC BAC t2
df = df.groupby(['a1','b1','types']).size().reset_index(name='count')
print (df)
a1 b1 types count
0 ABC ACC t3 2
1 ACC AAC t1 1
2 BCC BAC t2 1