我需要根据ID和DATE_TWO列向数据框添加一些值。在DATE_TWO>= DATE_ONE的情况下,用第一个DATE_TWO值填充该ID的任何后续DATE_TWO值。下面是原始数据帧:
<表类>
ID
事件
DATE_ONE
DATE_TWO
tbody><<tr>1 13 3/1/2021 120 3/5/2021 3/5/2021 132 3/6/2021 143 3/7/2021 21 3/3/2021 22 4/5/2021 3 1 3/1/2021 3 12 3/7/2021 3/7/2021 3 13 3/9/2021 3 15 3/14/2021 表类>
我稍微改变了你的数据,所以我们可以看看它是如何工作的。
<标题>数据import pandas as pd
import numpy as np
data = {'ID': [1,1,1,1,2,2,3,3,3,3],
'EVENT': [12, 20, 32, 43,1,2,1,12,13,15],
'DATE_ONE': ['3/1/2021','3/5/2021','3/6/2021','3/7/2021','3/3/2021','4/5/2021',
'3/1/2021','3 /7/2021','3/9/2021','3/14/2021'],
'DATE_TWO': ['','3/5/2021','','','','','3/7/2021','','3/7/2021','']}
df = pd.DataFrame(data)
df["DATE_ONE"] = pd.to_datetime(df["DATE_ONE"])
df["DATE_TWO"] = pd.to_datetime(df["DATE_TWO"])
# We better sort DATE_ONE
df = df.sort_values(["ID", "DATE_ONE"]).reset_index(drop=True)
FILL with condition
df["COND"] = np.where(df["DATE_ONE"].le(df["DATE_TWO"]).eq(True),
1,
np.where(df["DATE_TWO"].notnull() &
df["DATE_ONE"].gt(df["DATE_TWO"]),
0,
np.nan))
grp = df.groupby("ID")
df["COND"] = grp["COND"].fillna(method='ffill').fillna(0)
df["FILL"] = grp["DATE_TWO"].fillna(method='ffill')
df["DATE_TWO"] = np.where(df["COND"].eq(1), df["FILL"], df["DATE_TWO"])
df = df.drop(columns=["COND", "FILL"])
ID EVENT DATE_ONE DATE_TWO
0 1 12 2021-03-01 NaT
1 1 20 2021-03-05 2021-03-05
2 1 32 2021-03-06 2021-03-05
3 1 43 2021-03-07 2021-03-05
4 2 1 2021-03-03 NaT
5 2 2 2021-04-05 NaT
6 3 1 2021-03-01 2021-03-07
7 3 12 2021-03-07 2021-03-07
8 3 13 2021-03-09 2021-03-07
9 3 15 2021-03-14 NaT
标题>