将pandas列的元素迭代到长度不等的不同数据帧中另一列的元素上

我有两个长度不等的Pandas数据帧，第一个包含关于预测的蛋白质修饰的数据，第二个包含实验验证的蛋白质修饰的数据

第一个数据帧包含以下列：

蛋白质_id
位置_预测
修改_预测

…并且看起来像这样：

protein_id	position_predicted	modification_predicted
prot1	135	+
prot1	267	+
prot1	360	-
prot2	59	++
prot2	135	+++
prot3	308	-
…

您可以使用合并。参考文献：Pandas Merging 101

我假设(两个数据帧的(索引号是唯一的。如果没有，请使用：df.reset_index()

# Inner merge:
intersection = df_pred.merge(
df_real, 
left_on=['protein_id', 'position_predicted'], 
right_on=['protein_id', 'position']
)
TP = intersection['modification_predicted'].str.contains('+', regex=False).sum()
FN = intersection['modification_predicted'].eq('-').sum()
# FN = len(intersection) - TP  # alternative

这里选择两个数据帧中而不是存在于另一个数据帧的元素：

unique_pred = df_pred.loc[df_pred.index.difference(intersection.index)]
unique_real = df_real.loc[df_real.index.difference(intersection.index)]

TN = unique_pred['modification_predicted'].eq('-').sum()
FP = unique_pred['modification_predicted'].str.contains('+', regex=False).sum()
# FP = len(unique_pred) - TN  # alternative
Misc = len(unique_real)

结果：

>>> TP, FN, TN, FP, Misc
(1, 0, 2, 3, 2)

相关内容

最新更新

热门标签：