我有以下熊猫DF:
print(df.to_dict())
{'Date_Installed': {11885: Timestamp('2018-11-15 00:00:00'), 111885: Timestamp('2018-11-15 00:00:00')}, 'days_from_instalation': {11885: 2, 111885: 3}}
我想创建一个新列,将'Date_Installed'
列从列'days_from_instalation'
的天数递增
我知道使用apply()
方法可以这样做,如下所示:
from datetime import timedelta
df['desired_date']=df.apply(lambda row:row['Date_Installed']+timedelta(row['days_from_instalation']), axis=1)
产生我想要的输出:
print(df.to_dict())
{'Date_Installed': {11885: Timestamp('2018-11-15 00:00:00'), 111885: Timestamp('2018-11-15 00:00:00')}, 'days_from_instalation': {11885: 2, 111885: 3}, 'desired_date': {11885: Timestamp('2018-11-17 00:00:00'), 111885: Timestamp('2018-11-18 00:00:00')}}
但是,这种方法非常慢,并且不适用于我的完整DF是不现实的。
关于像这样的熊猫增加日期,我没有提出几个问题:
熊猫增量日期时间
但它们似乎都处理了不断递增,没有任何矢量化方法可以做到这一点。
这种类型的增量是否有矢量化版本?
提前感谢!
添加由 to_timedelta
创建的时间增量:
df['desired_date'] = df['Date_Installed'] +
pd.to_timedelta(df['days_from_instalation'], unit='d')
print (df)
Date_Installed days_from_instalation desired_date
11885 2018-11-15 2 2018-11-17
111885 2018-11-15 3 2018-11-18
另一个numpy解决方案更快,但丢失了时区(如果指定):
a = pd.to_timedelta(df['days_from_instalation'], unit='d').values.astype(np.int64)
df['desired_date1'] = pd.to_datetime(df['Date_Installed'].values.astype(np.int64)+a, unit='ns')
性能:
#20krows
df = pd.concat([df] * 10000, ignore_index=True)
In [217]: %timeit df['desired_date1'] = pd.to_datetime(df['Date_Installed'].values.astype(np.int64) + pd.to_timedelta(df['days_from_instalation'], unit='d').values.astype(np.int64), unit='ns')
886 µs ± 9.92 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [218]: %timeit df['desired_date'] = df['Date_Installed'] + pd.to_timedelta(df['days_from_instalation'], unit='d')
1.53 ms ± 82.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)