在一列中锻炼两个值之间的差异,同时留在另一列的边界?



我有一个数据帧,我试图计算两个不同主题之间的时差,同时在呼叫中保持不变,而不是溢出到一个新的呼叫(即同时确保它没有计算出不同呼叫中主题之间的时差)。其中interaction_id是一个单独的调用

这是一个示例Dataframe

df = pd.DataFrame([[1, 2, 'Cost'], [1, 5.72, NaN], [1, 8.83, 'Billing'], [1, 12.86, NaN], [2, 2, 'Cost'], [2, 6.75, NaN], [2, 8.54, NaN], [3, 1.5, 'Payments'],[3, 3.65, 'Products']], columns=['interaction_id', 'start_time', 'topic'])
interaction_id    start_time     topic 
1               2           Cost
1              5.72          NaN
1              8.83         Billing
1              12.86         NaN
2               2            Cost
2              6.75          NaN
2              8.54          NaN
3              1.5          Payments
3              3.65         Products

这是期望输出

df2 = pd.DataFrame([[1, 2, 'Cost',6.83], [1, 5.72, NaN, NaN], [1, 8.83, 'Billing',4.03], [1, 12.86, NaN,NaN], [2, 2, 'Cost',6.54], [2, 6.75, NaN, NaN], [2, 8.54, NaN, NaN], [3, 1.5, 'Payments', 2.15],[3, 3.65, 'Products','...']], columns=['interaction_id', 'start_time', 'topic','topic_length'])
interaction_id    start_time     topic     topic_length
1               2           Cost           6.83
1              5.72          NaN           NaN
1              8.83         Billing        4.03
1              12.86         NaN           NaN
2               2            Cost          6.54
2              6.75          NaN           NaN
2              8.54          NaN           NaN
3              1.5          Payments       2.15
3              3.65         Products       ....

我不知道是否有更简单的解决方法,但是这个方法可以解决你的问题:

def custom_agg(group):
group = group.reset_index(drop=True)
max_ind = group.shape[0]-1
current_ind = -1
current_val = None
for ind, val in group.iterrows():
if pd.isna(val.topic) and ind != max_ind:
continue
if current_ind == -1:
current_ind = ind
current_val = val["start_time"]
else:
group.loc[current_ind,"topic_length"] = val["start_time"] - current_val
current_ind = ind
current_val = val["start_time"]
return group
df = df.sort_values(by=['interaction_id', 'start_time']).groupby("interaction_id").apply(custom_agg).reset_index(drop=True)

输出:

interaction_id  start_time  topic   topic_length
0   1   2.00    Cost    6.83
1   1   5.72    NaN NaN
2   1   8.83    Billing 4.03
3   1   12.86   NaN NaN
4   2   2.00    Cost    6.54
5   2   6.75    NaN NaN
6   2   8.54    NaN NaN
7   3   1.50    Payments    2.15
8   3   3.65    Products    NaN

最新更新