当前数据帧:
key status score
A initial 2
A in-between 3
A intermediate 6
A pre-final 2
A final 3
B initial 1
B intermediate 7
B final 8
我希望得到两个独立的数据帧,数据帧1应该只包括状态为中间之前的行,数据帧2应该只包括在状态为中间之后的行
预期数据帧:
数据帧1:
key status score
A initial 2
A in-between 3
B initial 1
数据帧2:
key status score
A pre-final 2
A final 3
B final 8
尝试:
cond = df.status.eq("intermediate")
mask = cond.groupby(df.key).cummax().shift(fill_value=False)
dataframe_1 = df[~mask]
dataframe_2 = df[mask]
有了这个,我没有得到所需的数据帧
删除第一个掩码的shift
和第二个筛选出的intermediate
值:
cond = df.status.eq("intermediate")
mask = cond.groupby(df.key).cummax()
dataframe_1 = df[~mask]
print (dataframe_1)
key status score
0 A initial 2
1 A in-between 3
5 B initial 1
dataframe_2 = df[mask & ~cond]
print (dataframe_2)
key status score
3 A pre-final 2
4 A final 3
7 B final 8
您可以使用groupby.cummin
设置"中间体";设置为False,则使用布尔索引:
m1 = df['status'].ne('intermediate')
m2 = m1.groupby(df['key']).cummin()
dataframe_1 = df[m2]
# key status score
# 0 A initial 2
# 1 A in-between 3
# 5 B initial 1
dataframe_2 = df[m1 & ~m2]
# key status score
# 3 A pre-final 2
# 4 A final 3
# 7 B final 8
def function1(dd:pd.DataFrame):
return dd.assign(col2=dd.status.isin(['intermediate']).cumsum())
.query("status!='intermediate'")
df1.groupby('key').apply(function1)
.reset_index(drop=True).groupby('col2').apply(lambda dd:print(dd.iloc[:,:3]))
key status score
0 A initial 2
1 A in-between 3
4 B initial 1
key status score
2 A pre-final 2
3 A final 3
5 B final 8