import datetime
import pandas as pd
pd.DataFrame({'date': {0: datetime.date(2020, 8, 15),
1: datetime.date(2020, 8, 16),
2: datetime.date(2020, 8, 16),
3: datetime.date(2020, 8, 17),
4: datetime.date(2020, 8, 17),
5: datetime.date(2020, 8, 18),
6: datetime.date(2020, 8, 19),
7: datetime.date(2020, 8, 19)},
'sign_change': {0: 0, 1: 0, 2: 0, 3: 1, 4: 1, 5: 0, 6: 1, 7: 1},
'distance (desired_output)': {0: 2, 1: 1, 2: 1, 3: 0, 4: 0, 5: 1, 6: 0, 7: 0}})
date sign_change distance (desired_output)
0 2020-08-15 0 2
1 2020-08-16 0 1
2 2020-08-16 0 1
3 2020-08-17 1 0
4 2020-08-17 1 0
5 2020-08-18 0 1
6 2020-08-19 1 0
7 2020-08-19 1 0
对于每一行,我想找到离sign_change==1最近的一行的距离(以天为单位(。我已经在上面的数据帧中手动输入了所需的输出。
让我们尝试广播:
s = df.sign_change!=1
offset = (np.abs(df.loc[s,'date'].values[None,:] - df.loc[~s,['date']].values).min(0)
/pd.to_timedelta('1D')
)
df['distance'] = 0
df.loc[s,'distance'] = offset
输出:
date sign_change distance (desired_output) distance
0 2020-08-15 0 2 2.0
1 2020-08-16 0 1 1.0
2 2020-08-16 0 1 1.0
3 2020-08-17 1 0 0.0
4 2020-08-17 1 0 0.0
5 2020-08-18 0 1 1.0
6 2020-08-19 1 0 0.0
7 2020-08-19 1 0 0.0
您可以使用where
、bfill()
和ffill()
。本质上,.where
的符号是1
,您返回日期,否则返回NaN
。从那里,您可以bfill
或向后填充该日期back
到下一个1
;并且您可以ffill
或将该日期forward
向前填充到下一个1
。然后取日期和这个fill
'd日期的差值。最后,.fillna(0)
表示数据帧中的最后一个值。
解决方案#1-只期待最近的日期(请参阅解决方案#2了解最近的日期(:
df['distance (desired_output)'] = ((df['date'].where(df['sign_change'] == 1).bfill()
- df['date']).dt.days).fillna(0)
df
Out[1]:
date sign_change distance (desired_output)
0 2020-08-15 0 2.0
1 2020-08-16 0 1.0
2 2020-08-16 0 1.0
3 2020-08-17 1 0.0
4 2020-08-17 1 0.0
5 2020-08-18 0 1.0
6 2020-08-19 1 0.0
7 2020-08-19 0 0.0
解决方案#2(该解决方案比较ffill()
和bfill()
系列,并返回最接近日期的最小天数或天数,无论是在之前还是之后。
import datetime
import pandas as pd
df = pd.DataFrame({'date': {0: datetime.date(2020, 8, 15),
1: datetime.date(2020, 8, 16),
2: datetime.date(2020, 8, 16),
3: datetime.date(2020, 8, 17),
4: datetime.date(2020, 8, 17),
5: datetime.date(2020, 8, 18),
6: datetime.date(2020, 8, 19),
7: datetime.date(2020, 8, 19),
8: datetime.date(2020, 8, 20),
9: datetime.date(2020, 8, 21)},
'sign_change': {0: 0, 1: 0, 2: 0, 3: 1, 4: 1, 5: 0, 6: 1, 7: 1, 8: 0, 9: 0},
'distance (desired_output)': {0: 2, 1: 1, 2: 1, 3: 0, 4: 0, 5: 1, 6: 0, 7: 0}})
df['date'] = pd.to_datetime(df['date'])
s = (df['date'].where(df['sign_change'] == 1))
b = (s.bfill() - df['date']).dt.days
f = (s.ffill() - df['date']).dt.days.abs()
df['distance (desired_output)'] = np.where((b <= f) | (b.notnull()), b, f)
df
Out[2]:
date sign_change distance (desired_output)
0 2020-08-15 0 2.0
1 2020-08-16 0 1.0
2 2020-08-16 0 1.0
3 2020-08-17 1 0.0
4 2020-08-17 1 0.0
5 2020-08-18 0 1.0
6 2020-08-19 1 0.0
7 2020-08-19 1 0.0
8 2020-08-20 0 1.0
9 2020-08-21 0 2.0