如何在涉及熊猫中两列的特定条件下返回行



我有此dataframe:

user_id     status_id       date_created
1           1               2018-02-14 11:49:07.429000-02:00
1           4               2018-02-19 12:51:43.622000-03:00
1           3               2018-02-15 09:21:23.116000-02:00
2           3               2018-02-19 12:52:08.646000-03:00
3           3               2016-08-29 11:02:39.449000-03:00
4           4               2016-08-29 11:18:31.742000-03:00
4           2               2018-02-21 10:43:45.747000-03:00
5           3               2018-02-15 09:34:57.478000-02:00
5           2               2018-02-19 11:52:16.629000-03:00

我只想返回具有特定status_id且仅此特定状态的用户,因此,对于status_id=3,它应该返回以下内容:

user_id     status_id       date_created
2           3               2018-02-19 12:52:08.646000-03:00
3           3               2016-08-29 11:02:39.449000-03:00

我尝试过滤所有具有我需要的status_id的用户,但它还返回具有多个status_id的用户:

> df.loc[df.user_id.isin(df.user_id.loc[df.status_id == 3])]
user_id     status_id       date_created
1           1               2018-02-14 11:49:07.429000-02:00
1           4               2018-02-19 12:51:43.622000-03:00
1           3               2018-02-15 09:21:23.116000-02:00
2           3               2018-02-19 12:52:08.646000-03:00
3           3               2016-08-29 11:02:39.449000-03:00
5           3               2018-02-15 09:34:57.478000-02:00
5           2               2018-02-19 11:52:16.629000-03:00

通过使用transform nunique

df[df.groupby('user_id').status_id.transform('nunique').eq(1)].loc[lambda x :x['status_id']==3,:]

更多信息

df.groupby('user_id').status_id.transform('nunique') # get the number of unique value within each group, after this we just need to select the group only contain one value , which is index 3,4
Out[426]: 
0    3
1    3
2    3
3    1
4    1
5    2
6    2
7    2
8    2
Name: status_id, dtype: int64

您可以使用df.loc[df['status_id'] == 3]如下所述

带有相关输入的Python文件

示例

最新更新