如何找到从每一行到满足条件的最近一行的距离


import datetime
import pandas as pd
pd.DataFrame({'date': {0: datetime.date(2020, 8, 15),
1: datetime.date(2020, 8, 16),
2: datetime.date(2020, 8, 16),
3: datetime.date(2020, 8, 17),
4: datetime.date(2020, 8, 17),
5: datetime.date(2020, 8, 18),
6: datetime.date(2020, 8, 19),
7: datetime.date(2020, 8, 19)},
'sign_change': {0: 0, 1: 0, 2: 0, 3: 1, 4: 1, 5: 0, 6: 1, 7: 1},
'distance (desired_output)': {0: 2, 1: 1, 2: 1, 3: 0, 4: 0, 5: 1, 6: 0, 7: 0}})

date      sign_change         distance (desired_output)
0  2020-08-15            0                          2
1  2020-08-16            0                          1
2  2020-08-16            0                          1
3  2020-08-17            1                          0
4  2020-08-17            1                          0
5  2020-08-18            0                          1
6  2020-08-19            1                          0
7  2020-08-19            1                          0

对于每一行,我想找到离sign_change==1最近的一行的距离(以天为单位(。我已经在上面的数据帧中手动输入了所需的输出。

让我们尝试广播:

s = df.sign_change!=1
offset = (np.abs(df.loc[s,'date'].values[None,:] - df.loc[~s,['date']].values).min(0)
/pd.to_timedelta('1D')
)
df['distance'] = 0
df.loc[s,'distance'] = offset

输出:

date  sign_change  distance (desired_output)  distance
0  2020-08-15            0                          2       2.0
1  2020-08-16            0                          1       1.0
2  2020-08-16            0                          1       1.0
3  2020-08-17            1                          0       0.0
4  2020-08-17            1                          0       0.0
5  2020-08-18            0                          1       1.0
6  2020-08-19            1                          0       0.0
7  2020-08-19            1                          0       0.0

您可以使用wherebfill()ffill()。本质上,.where的符号是1,您返回日期,否则返回NaN。从那里,您可以bfill或向后填充该日期back到下一个1;并且您可以ffill或将该日期forward向前填充到下一个1。然后取日期和这个fill'd日期的差值。最后,.fillna(0)表示数据帧中的最后一个值。


解决方案#1-只期待最近的日期(请参阅解决方案#2了解最近的日期(:

df['distance (desired_output)'] = ((df['date'].where(df['sign_change'] == 1).bfill() 
- df['date']).dt.days).fillna(0)
df
Out[1]: 
date  sign_change  distance (desired_output)
0 2020-08-15            0                        2.0
1 2020-08-16            0                        1.0
2 2020-08-16            0                        1.0
3 2020-08-17            1                        0.0
4 2020-08-17            1                        0.0
5 2020-08-18            0                        1.0
6 2020-08-19            1                        0.0
7 2020-08-19            0                        0.0

解决方案#2(该解决方案比较ffill()bfill()系列,并返回最接近日期的最小天数或天数,无论是在之前还是之后。

import datetime
import pandas as pd
df = pd.DataFrame({'date': {0: datetime.date(2020, 8, 15),
1: datetime.date(2020, 8, 16),
2: datetime.date(2020, 8, 16),
3: datetime.date(2020, 8, 17),
4: datetime.date(2020, 8, 17),
5: datetime.date(2020, 8, 18),
6: datetime.date(2020, 8, 19),
7: datetime.date(2020, 8, 19),
8: datetime.date(2020, 8, 20),
9: datetime.date(2020, 8, 21)},
'sign_change': {0: 0, 1: 0, 2: 0, 3: 1, 4: 1, 5: 0, 6: 1, 7: 1, 8: 0, 9: 0},
'distance (desired_output)': {0: 2, 1: 1, 2: 1, 3: 0, 4: 0, 5: 1, 6: 0, 7: 0}})
df['date'] = pd.to_datetime(df['date'])
s = (df['date'].where(df['sign_change'] == 1))
b = (s.bfill() - df['date']).dt.days
f = (s.ffill() - df['date']).dt.days.abs()
df['distance (desired_output)'] = np.where((b <= f) | (b.notnull()), b, f)
df
Out[2]: 
date  sign_change  distance (desired_output)
0 2020-08-15            0                        2.0
1 2020-08-16            0                        1.0
2 2020-08-16            0                        1.0
3 2020-08-17            1                        0.0
4 2020-08-17            1                        0.0
5 2020-08-18            0                        1.0
6 2020-08-19            1                        0.0
7 2020-08-19            1                        0.0
8 2020-08-20            0                        1.0
9 2020-08-21            0                        2.0

最新更新