Python:需要对每个组的数据帧行w.r.t.列值重新建立索引



得到如下输入数据帧df

Flow        Side        Row
Banana      Front       Row 1
PP Banana   Front       Row 1
Peach       Front       Row 1
Red Grape   Right End   Row 1
Kp Mango    Back        Row 1
PP Kp Mango Back        Row 1
Tomato      Left End    Row 1
Cucum       Front       Row 2
PP Cucum    Front       Row 2
Bunch       Right End   Row 2
Capsicum    Back        Row 2
Romo        Left End    Row 2

场景:如果是df['Flow'].str.contains('PP') && df['Side']=='Front',则需要将该特定行作为最后一行移动到具有相同row值的Side"Back"。如流"PP Banana"one_answers"PP Cucum"的示例。注意-其他流程顺序不应更改。此外,对于"Row"列中的每个Row值,列"Side"的顺序应始终Front、Right End、Back、Left End

预期输出:

Flow        Side        Row
Banana      Front       Row 1
Peach       Front       Row 1
Red Grape   Right End   Row 1
Kp Mango    Back        Row 1
PP Kp Mango Back        Row 1
PP Banana   Back        Row 1
Tomato      Left End    Row 1
Cucum       Front       Row 2
Bunch       Right End   Row 2
Capsicum    Back        Row 2
PP Cucum    Back        Row 2
Romo        Left End    Row 2

任何帮助都将不胜感激!

您可以使用掩码提取需要移动的行和尾部("左端"(行,然后使用concat再次连接并按组重新排序。

m1 = df['Flow'].str.contains('PP') & df['Side'].eq('Front')
m2 = df['Side'].eq('Left End')
mask = m1|m2
out = (pd.concat([df[~mask], df[mask].replace({'Side': {'Front': 'Back'}})])
.sort_values(by='Row')
)

或者,如果你不想在最后对组进行排序并独立处理,你可以使用一个自定义函数:

def move(d):
m1 = d['Flow'].str.contains('PP') & d['Side'].eq('Front')
m2 = d['Side'].eq('Left End')
mask = m1|m2
return pd.concat([d[~mask], d[mask].replace({'Side': {'Front': 'Back'}})])
out = df.groupby('Row', group_keys=False).apply(move)

输出:

Flow       Side    Row
0        Banana      Front  Row 1
2         Peach      Front  Row 1
3     Red Grape  Right End  Row 1
4      Kp Mango       Back  Row 1
5   PP Kp Mango       Back  Row 1
1     PP Banana       Back  Row 1
6        Tomato   Left End  Row 1
7         Cucum      Front  Row 2
9         Bunch  Right End  Row 2
10     Capsicum       Back  Row 2
8      PP Cucum       Back  Row 2
11         Romo   Left End  Row 2

最新更新