我想选择所有值的数据,在数据值中都有两种类型"E"。在这个数据中,我们可以有许多type 'S'
,但它只是type 'E'
的一个值。
例如:ID:1114的值中有两个'Type'
:'E'
,因此显示1114的所有值。
数据帧1:
id /date /origine /destination /horaire A /horaire B/ Type /Other data
1112 2021-03-11 Paris / Marseille/10:00/14:00/A / ..
1112 2021-03-11 Paris / Marseille/10:00/14:00/E /..
1112 2021-03-11 Paris / Marseille/10:00/14:00/S /..
1112 2021-03-11 Paris / Lyon/10:00/12:00/S/..
1112 2021-03-11 Paris / Marseille/10:00/14:00/S/..
1112 2021-03-11 Paris / Marseille/10:00/14:00/C/..
1114 2021-05-11 Paris / Bordeaux/09:00/13:00/A/..
1114 2021-05-11 Paris / Bordeaux/09:00/13:00/E/..
1114 2021-05-11 Paris / Bordeaux/10:00/14:00/S/..
1114 2021-05-11 Paris / Bordeaux/10:20/14:00/E/..
1114 2021-05-11 Paris / Bordeaux/10:00/14:00/S/..
1114 2021-05-11 Paris / Bordeaux/10:00/14:00/S/..
1114 2021-05-11 Paris / Bordeaux/10:00/14:00/S/..
1114 2021-05-11 Paris / Bordeaux/10:00/14:00/C/..
数据输出:
id /date /origine /destination /horaire A /horaire B/ Type /Other data
1114 2021-05-11 Paris / Bordeaux/09:00/13:00/A/..
1114 2021-05-11 Paris / Bordeaux/09:00/13:00/E/..
1114 2021-05-11 Paris / Bordeaux/10:00/14:00/S/..
1114 2021-05-11 Paris / Bordeaux/10:20/14:00/E/..
1114 2021-05-11 Paris / Bordeaux/10:00/14:00/S/..
1114 2021-05-11 Paris / Bordeaux/10:00/14:00/S/..
1114 2021-05-11 Paris / Bordeaux/10:00/14:00/S/..
1114 2021-05-11 Paris / Bordeaux/10:00/14:00/C/..
我写了这个代码:
mask = df.groupby(['date','Id']).apply(lambda x: x['Type'].value_counts())
data_set = df[((df['Type']=='E).isin(mask.index[mask > 1]))]
data_set
但我的输出是空的
对于E
值的计数,创建辅助列tmp
并通过sum
:调出值
df = (df[df.assign(tmp = df['Type']=='E')
.groupby(['date','Id'])['tmp'].transform('sum').gt(1)])