我有一个带有日期的Dataframe。我只是想用最大结束日期和最小开始日期的差异填充一个新列,并找到天数的长度。我的计算是工作的,但如果任何一列包含0或Nan值,它会给我这个错误。有没有人可以看看代码并给出建议。提前谢谢。
# here is the Dataframe
end_d start_d
0 2021-09-11 00:00:00 2021-08-01 00:00:00
1 2021-08-29 00:00:00 2021-05-23 00:00:00
2 2021-09-04 00:00:00 2021-06-13 00:00:00
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 NaN NaN
9 NaN NaN
10 2021-09-04 00:00:00 2021-06-13 00:00:00
11
12
13
#When I use the below code if there aren't any zeros or Nan values, the code is working fine.
dsx['length'] = (dsx['end_d'] - dsx['start_d'] + pd.Timedelta(1, unit=freq)).max()
# I want something like, below Dataframe. Any suggestion?
end_d start_d length
0 2021-09-11 00:00:00 2021-08-01 00:00:00 99 days
1 2021-08-29 00:00:00 2021-05-23 00:00:00 99 days
2 2021-09-04 00:00:00 2021-06-13 00:00:00 99 days
3 0 0 99 days
4 0 0 99 days
5 0 0 99 days
6 0 0 99 days
7 0 0 99 days
8 NaN NaN 99 days
9 NaN NaN 99 days
10 2021-09-04 00:00:00 2021-06-13 00:00:00 99 days
Thanks in advance.
您可以使用skipna过滤非n/A值的数据框。
pandas.Dataframe.dropna
filtered_df = df.dropna()
df['length'] = (filtered_df['end_d'] - filtered_df['start_d'] + pd.Timedelta(1, unit=freq)).max()
这就解决了你的N/A问题,但是你仍然有一个问题,你的列填充了不同的数据类型(int和datetime)。不知道是什么,但你需要解决这个问题。