查找Pandas Dataframe中的非唯一行



假设我有一个像这样的pandas数据框架:

天th>7185

您可以向groupby查询

d = {x : y  for x, y in df.groupby(['Doctor','Patient']) if len(y) > 1}
d
Out[36]: 
{('Aaron', 'Josh'):   Doctor Patient  Days
1  Aaron    Josh    46
2  Aaron    Josh    71, ('Jess', 'Manny'):   Doctor Patient  Days
3   Jess   Manny    55
4   Jess   Manny    85
5   Jess   Manny    46}

您可以像df.loc[df.duplicated()]那样使用pd. dataframe .duplicate

选择所有值重复的行,要选择特定的列,可以设置subset参数:

rows = df.loc[df.duplicated(subset=['doctor', 'patient'])]

这是一种方法

df2 = (df.groupby(['Doctor','Patient'])['Days'].count() > 1).reset_index()
df2 = df2.drop(df2[df2['Days']==False].index)
df.merge(df2, on=['Doctor','Patient'], suffixes=('','_y')).drop(columns='Days_y')
Doctor  Patient     Days
0   Aaron   Josh        46
1   Aaron   Josh        71
2   Jess    Manny       55
3   Jess    Manny       85
4   Jess    Manny       46

最新更新