我试图在我的数据集中计算一些缺失的df['Roll_time']值。我有avg_time_diff变量,这是一个timedelta64[ns] dtype和df['Notif_date'],这是一个datetime.time。我想计算avg_time_diff和'Notif_date'的总和为每一行缺少'Roll_time'。
到目前为止,我有这个:
avg_time_diff = df['Time_diff'].mean()
df['Time_diff'].fillna(avg_time_diff, inplace=True)
df['Roll_time'].fillna(avg_time_diff + df['Notif_time'])
当我运行代码时,我得到了这个错误:
TypeError: unsupported operand type(s) for +: 'Timedelta' and 'datetime.time'
您需要将datetime.time
对象也转换为timedelta
,以便算术工作。
,
import datetime
import pandas as pd
# some dummy data:
df = pd.DataFrame({'Time_diff': [pd.Timedelta(hours=1), pd.Timedelta(hours=2), pd.NaT, pd.Timedelta(hours=4)],
'Notif_time': [datetime.time(1,2,3), datetime.time(2,3,4), datetime.time(4,5,6), datetime.time(7,8,9)]})
# Time_diff column and avg_time_diff are of dtype Timedelta...
avg_time_diff = df['Time_diff'].mean()
df['Time_diff'] = df['Time_diff'].fillna(avg_time_diff)
# need to cast Notif_time to Timedelta as well so that the arithmetic works out:
df['Roll_time'] = avg_time_diff + pd.to_timedelta(df['Notif_time'].astype(str))
# df['Roll_time']
# 0 0 days 03:22:03
# 1 0 days 04:23:04
# 2 0 days 06:25:06
# 3 0 days 09:28:09
# Name: Roll_time, dtype: timedelta64[ns]
如果您希望输出的dtype为datetime(带有所有格式化选项等),您可以通过添加日期来实现:
# to get from timedelta to datetime, you can add the timedelta column to today's date:
df['roll_datetime'] = pd.Timestamp('now').floor('d') + df['Roll_time']
# df['roll_datetime']
# 0 2021-02-04 03:22:03
# 1 2021-02-04 04:23:04
# 2 2021-02-04 06:25:06
# 3 2021-02-04 09:28:09
# Name: roll_datetime, dtype: datetime64[ns]
进一步阅读:Format timedelta to string