我有一个相当复杂的问题,关于如何为每个组添加一个带有条件的新列。下面是示例数据帧,
df = pd.DataFrame({
'id': ['AA', 'AA', 'AA', 'AA', 'BB', 'BB', 'BB', 'BB', 'BB',
'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC'],
'From_num': [80, 68, 751, 'Issued', 32, 68, 126, 'Issued', 'Missed', 105, 68, 114, 76, 68, 99, 'Missed'],
'To_num':[99, 80, 68, 751, 105, 32, 68, 126, 49, 324, 105, 68, 114, 76, 68, 99],
})
id From_num To_num
0 AA 80 99
1 AA 68 80
2 AA 751 68
3 AA Issued 751
4 BB 32 105
5 BB 68 32
6 BB 126 68
7 BB Issued 126
8 BB Missed 49
9 CC 105 324
10 CC 68 105
11 CC 114 68
12 CC 76 114
13 CC 68 76
14 CC 99 68
15 CC Missed 99
我有一个"旗帜"编号68。在每个组中,对于等于或高于"From_num"列中此标志编号的任何行将在新列中标记为"前进",任何等于或低于"To_num"列中标志编号的行将在同一列中标记为"后退"。但是,最困难的情况是:如果此标志编号在每列中出现多次,则"From_num"和"To_num"之间的行将在新列中标记为"前进和后退",请参阅下面的 df 和预期结果。
Expected result
id From_num To_num Direction
0 AA 80 99 Forward
1 AA 68 80 Forward
2 AA 751 68 Back
3 AA Issued 751 Back
4 BB 32 105 Forward
5 BB 68 32 Forward
6 BB 126 68 Back
7 BB Issued 126 Back
8 BB Missed 49 Back
9 CC 105 324 Forward
10 CC 68 105 Forward
11 CC 114 68 Forward&Back # From line 11 to 13, flag # 68 appears more than once
12 CC 76 114 Forward&Back # so the line 11, 12 and 13 labelled "Forward&Back"
13 CC 68 76 Forward&Back
14 CC 99 68 Back
15 CC Missed 99 Back
我尝试编写许多循环,但它们都失败了,无法产生预期的结果。因此,如果有人有想法,请提供帮助。希望这个问题是清楚的。非常感谢!
我没有"真正的循环"。
- 保留行号 (
reset_index()
( - 构造包含标志的记录的新数据框 (68( ">
- 前进"和"后退"的简单逻辑基于第一次看到 68 之前或之后的行
- "前进和后退"发生在多次目击事件以及第 2 次和第 (n-1( 次目击事件之间
def direction(r):
flagrow = df2[(df2["id"]==r["id"]) ]["index"].values
if r["index"] <= flagrow[0]: val = "Forward"
elif r["index"] > flagrow[0]: val = "Back"
if len(flagrow)>2 and r["index"] >= flagrow[1] and r["index"]<flagrow[-1]: val = "Forward&Back"
return val
df = pd.DataFrame({
'id': ['AA', 'AA', 'AA', 'AA', 'BB', 'BB', 'BB', 'BB', 'BB',
'CC', 'CC', 'CC', 'CC', 'CC', 'CC', 'CC'],
'From_num': [80, 68, 751, 'Issued', 32, 68, 126, 'Issued', 'Missed', 105, 68, 114, 76, 68, 99, 'Missed'],
'To_num':[99, 80, 68, 751, 105, 32, 68, 126, 49, 324, 105, 68, 114, 76, 68, 99],
})
df = df.reset_index()
df2 = df[(df.From_num==68) | (df.To_num==68)].copy()
df["Direction"] = df.apply(lambda r: direction(r), axis=1)
df