比较日期并添加与上一个交易日期的列



我有两个数据帧:df_a包含双方之间感兴趣的交易日期id_a和id_b,交易ID为tx_id,df_b包含双方之间的所有交易及其tx_id。我想在df_a中添加另外两列,其中记录了双方(id_a和id_b(在利息日期之前的交易日期。

df_a可以包含同一方之间的许多交易,并且在每种情况下都需要添加以前的交易日期。

> df_a
id_a    id_b    tx_id       date_of_interest
2222    3189    1138312.0   2020-04-01 18:55:36.629318
2222    3325    1138371.0   2020-04-01 19:15:33.341302
> df_b
id_a    id_b    tx_id   date_all
2222    3189    1045728 2020-02-13 00:18:18.840492
2222    3189    1138312 2020-04-01 18:55:36.629318
2222    3325    1052235 2020-02-17 19:56:07.809550
2222    3325    1138371 2020-04-01 19:15:33.341302

预期输出

> df_a
id_a    id_b    tx_id       date                        prev_date.                  prev_tx_id
2222    3189    1138312.0   2020-04-01 18:55:36.629318  2020-02-13 00:18:18.840492  1045728
2222    3325    1138371.0   2020-04-01 19:15:33.341302  2020-02-17 19:56:07.809550 1052235

我试图使用分组,然后迭代每个日期,但一无所获。

g = df_a.groupby(['id_a', 'id_b'])
prev_date = []
for name, group in g:
# iterate through each date in g, 
# prev_date_val = find the highest - 1 date in df_b dates for same id_a, id_b combinations
# prev_date.append(prev_date_val)

您可以使用.shift()将行下移一行,然后将数据帧合并在一起以获取日期。

这里有一个小例子:

>>> df_a = pd.DataFrame(dict(id_a=[1, 1], id_b=[2, 3], tx_id=[101, 111], date_of_interest=["18:55", "19:15"]))
>>> df_a
id_a  id_b  tx_id date_of_interest
0     1     2    101            18:55
1     1     3    111            19:15
>>> df_b = pd.DataFrame(dict(id_a=[1, 1, 1, 1], id_b=[2, 2, 3, 3], tx_id=[100, 101, 110, 111], date_all=["00:18", "18:55", "19:00", "19:15"]))
>>> df_b
id_a  id_b  tx_id date_all
0     1     2    100    00:18
1     1     2    101    18:55
2     1     3    110    19:00
3     1     3    111    19:15

然后

>>> df_b_shifted = df_b.sort_values(["id_a", "id_b", "tx_id"]).shift()
>>> df_b_shifted.columns = [c+"_shift" for c in df_b.columns]  # Rename columns
>>> df_b_shifted
id_a_shift  id_b_shift  tx_id_shift date_all_shift
0         NaN         NaN          NaN            NaN
1         1.0         2.0        100.0          00:18
2         1.0         2.0        101.0          18:55
3         1.0         3.0        110.0          19:00
>>> df_b_concat = pd.concat((df_b, df_b_shifted), axis=1)
>>> df_b_concat
id_a  id_b  tx_id date_all  id_a_shift  id_b_shift  tx_id_shift date_all_shift
0     1     2    100    00:18         NaN         NaN          NaN            NaN
1     1     2    101    18:55         1.0         2.0        100.0          00:18
2     1     3    110    19:00         1.0         2.0        101.0          18:55
3     1     3    111    19:15         1.0         3.0        110.0          19:00
>>> df = df_b.merge(df_a, on=("id_a", "id_b", "tx_a"))
>>> # Keep only those that correspond to the same id_a, id_b
>>> df = df[(df.id_a==df.id_a_shift) & (df.id_b==df.id_b_shift)]
>>> res = df.drop(['date_all', 'id_a_shift', 'id_b_shift'], 1).rename(columns=dict(tx_id_shift="tx_id_prev", date_all_shift="prev_date"))
>>> res
id_a  id_b  tx_id date_of_interest  tx_id_prev prev_date
0     1     2    101            18:55       100.0     00:18
1     1     3    111            19:15       110.0     19:00

希望对;)有所帮助

最新更新