熊猫:查找指定星期几的日期和其他日期相近

  • 本文关键字:日期 其他 查找 熊猫 pandas
  • 更新时间 :
  • 英文 :


对于以下问题,我希望用更快的方法取代apply方法:

具有day_of_weekclosest_date列,我需要找到指定day_of_weekfound_dates,它们在closest_date之前(向后(最接近,从而允许等于closest_date的结果。

初始df:

closest_date  day_of_week
0   2009-06-01    6
1   2014-09-02    0
2   2014-10-11    4
3   2015-01-02    3
4   2015-07-11    4

我需要加快以下工作代码:

from pandas.tseries.offsets import Week
def find_nearset_day_to_dayofweek(row):
return row['closest_date'] - Week(weekday=row['day_of_week'])
df['date'] = df.apply(find_nearset_day_to_dayofweek, axis=1)

以下只是修复found_date应该等于closest_date的位置,但在一周前返回。将numpy导入为np

df['closest_date_dayofweek'] = df['closest_date'].dt.dayofweek
df['found_date'] = np.where(df['closest_date_dayofweek']==df['day_of_week'], 
df['closest_date'],
df['found_date'])
df = df.drop(['closest_date_dayofweek'], axis=1)

返回以下df

closest_date    day_of_week found_date
0   2009-06-01      6           2009-05-31
1   2014-09-02      0           2014-09-01
2   2014-10-11      4           2014-10-10
3   2015-01-02      3           2015-01-01
4   2015-07-11      4           2015-07-10
5   2015-08-08      4           2015-08-07

上面代码的问题是apply方法,它很慢。有什么加速的想法吗?

谢谢!

因为可能只有7值,所以您可以使用循环,只过滤另一列匹配的行:

for i in range(7):
m = df['day_of_week'].eq(i)
df.loc[m, 'date'] = df.loc[m, 'closest_date'] - Week(weekday=i)

然后不需要新的列,使用:

df['date'] = np.where(df['closest_date'].dt.dayofweek==df['day_of_week'],
df['closest_date'], df['date'])

5000行的性能:

from pandas.tseries.offsets import Week
def find_nearset_day_to_dayofweek(row):
return row.closest_date - Week(weekday=row['day_of_week'])
df = pd.concat([df] * 1000, ignore_index=True)
In [137]: %timeit df['date'] = df.apply(find_nearset_day_to_dayofweek, axis=1)
550 ms ± 77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [138]: %%timeit
...: for i in range(7):
...:     m = df['day_of_week'].eq(i)
...:     df.loc[m, 'date1'] = df.loc[m, 'closest_date'] - Week(weekday=i)
...:     
38.1 ms ± 883 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

最新更新