使用条件count()Python选择值



我想选择所有值的数据,在数据值中都有两种类型"E"。在这个数据中,我们可以有许多type 'S',但它只是type 'E'的一个值。

例如:ID:1114的值中有两个'Type':'E',因此显示1114的所有值。

数据帧1:

id /date /origine /destination /horaire A /horaire B/ Type /Other data
1112 2021-03-11 Paris / Marseille/10:00/14:00/A / ..
1112 2021-03-11 Paris / Marseille/10:00/14:00/E /..
1112 2021-03-11 Paris / Marseille/10:00/14:00/S /..
1112 2021-03-11 Paris / Lyon/10:00/12:00/S/..
1112 2021-03-11 Paris / Marseille/10:00/14:00/S/..
1112 2021-03-11 Paris / Marseille/10:00/14:00/C/..
1114 2021-05-11 Paris / Bordeaux/09:00/13:00/A/..
1114 2021-05-11 Paris / Bordeaux/09:00/13:00/E/..
1114 2021-05-11 Paris / Bordeaux/10:00/14:00/S/..
1114 2021-05-11 Paris / Bordeaux/10:20/14:00/E/..
1114 2021-05-11 Paris / Bordeaux/10:00/14:00/S/..
1114 2021-05-11 Paris / Bordeaux/10:00/14:00/S/..
1114 2021-05-11 Paris / Bordeaux/10:00/14:00/S/..
1114 2021-05-11 Paris / Bordeaux/10:00/14:00/C/..

数据输出:

id /date /origine /destination /horaire A /horaire B/ Type /Other data
1114 2021-05-11 Paris / Bordeaux/09:00/13:00/A/..
1114 2021-05-11 Paris / Bordeaux/09:00/13:00/E/..
1114 2021-05-11 Paris / Bordeaux/10:00/14:00/S/..
1114 2021-05-11 Paris / Bordeaux/10:20/14:00/E/..
1114 2021-05-11 Paris / Bordeaux/10:00/14:00/S/..
1114 2021-05-11 Paris / Bordeaux/10:00/14:00/S/..
1114 2021-05-11 Paris / Bordeaux/10:00/14:00/S/..
1114 2021-05-11 Paris / Bordeaux/10:00/14:00/C/..

我写了这个代码:

mask = df.groupby(['date','Id']).apply(lambda x: x['Type'].value_counts())
data_set = df[((df['Type']=='E).isin(mask.index[mask > 1]))]
data_set 

但我的输出是空的

对于E值的计数,创建辅助列tmp并通过sum:调出值

df = (df[df.assign(tmp = df['Type']=='E')
.groupby(['date','Id'])['tmp'].transform('sum').gt(1)])

最新更新