Pandas - 类型错误:不可哈希类型:尝试在筛选的数据帧中查找重复项时'Series'



我试图在过滤的DataFrame中找到重复项。我的DataFrame:

Document type   Application number
0       Return    1658
1       Sale      1658
2       Return    1659
3       Sale      1659
4       Return    1659
5       Return    1660
6       Return    1660

我需要得到重复的应用程序编号只有行与"返回"键入文档并写注释"发现副本"在这些行的评论中。这里是我需要的:

Document type   Application number    Comment
0       Return    1658                  
1       Sale      1658
2       Return    1659                  //Duplicate is found
3       Sale      1659
4       Return    1659                  //Duplicate is found
5       Return    1660                  //Duplicate is found
6       Return    1660                  //Duplicate is found

但是当我试图过滤DataFrame时,我得到一个错误TypeError: unhashable type: 'Series'。下面是我的代码:

def check_duplicated_app_nums(df,
col_app_num,
col_doc_type,
col_comments,
comment = 'Duplicate is found'):
mask_doc_type = df[col_doc_type] == 'Return'
mask_duplicate = df[mask_doc_type].duplicated(subset=col_app_num, keep=False)
df.loc[mask_duplicate, col_comments] = df.apply(lambda x: '%s//%s' % (x[col_comments], comment), axis=1)

与mask_duplicate:

一起使用
mask_duplicate = df.duplicated(subset=col_app_num, keep=False)

但是在这个例子中它返回:

Document type   Application number    Comment
0       Return    1658                  //Duplicate is found
1       Sale      1658                  //Duplicate is found
2       Return    1659                  //Duplicate is found
3       Sale      1659                  //Duplicate is found
4       Return    1659                  //Duplicate is found
5       Return    1660                  //Duplicate is found
6       Return    1660                  //Duplicate is found

如何在我需要的行中获得副本?

使用说明:

m = df.duplicated(subset=['Document type', 'Application number'], keep=False)
df.loc[m, col_comment] = comment

作为函数:

def check_duplicated_app_nums(df,
col_app_num,
col_doc_type,
col_comments,
comment = 'Duplicate is found'):
m = df.duplicated(subset=[col_doc_type, col_app_num], keep=False)
df.loc[m, col_comment] = f'//{comment}'

check_duplicated_app_nums(df, 'Application number', 'Document type', 'Comment')

输出:

Document type  Application number               Comment
0        Return                1658                   NaN
1          Sale                1658                   NaN
2        Return                1659  //Duplicate is found
3          Sale                1659                   NaN
4        Return                1659  //Duplicate is found
5        Return                1660  //Duplicate is found
6        Return                1660  //Duplicate is found