如何在Pandas DataFrame中插入缺少的工作日时获取NaN值



我正试图将缺失的工作日插入Pandas时间序列数据帧中。插入的工作日在每个数据列中必须具有NaN值。当我尝试在pandas数据帧中插入缺少的工作日并用NaN填充答案时,新行用0而不是NaN填充。举例说明:

import pandas as pd
df = pd.DataFrame({
'date': ['2022-10-06', '2022-10-11'],  # Thursday and Tuesday.
'num':  [123, 456]
})
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
df = df.resample('B').sum()  # Insert Friday and Monday.

然而,df现在是:

num
date           
2022-10-06  123
2022-10-07    0
2022-10-10    0
2022-10-11  456

我得到的不是NaN,而是0。如何获取NaN?这就是我想要的:

num
date           
2022-10-06  123
2022-10-07  NaN
2022-10-10  NaN
2022-10-11  456

(Pandas版本1.3.2,Python版本3.8.10(

使用.asfreq()而不是.sum():

df.resample('B').asfreq()

输出:

num
date             
2022-10-06  123.0
2022-10-07    NaN
2022-10-10    NaN
2022-10-11  456.0
df = pd.DataFrame({
'date': ['2022-10-06', '2022-10-11'],  # Thursday and Tuesday.
'num':  [123, 456]
})
df['date'] = pd.to_datetime(df['date'])

df = df.set_index('date')

如果唯一日期时间:,则使用DataFrame.asfreq

df1 = df.asfreq('B')
print (df1)
num
date             
2022-10-06  123.0
2022-10-07    NaN
2022-10-10    NaN
2022-10-11  456.0

如果可能重复并且需要聚合sum,则添加参数min_count=1:

df2 = df.resample('B').sum(min_count=1)
print (df2)
num
date             
2022-10-06  123.0
2022-10-07    NaN
2022-10-10    NaN
2022-10-11  456.0

df = pd.DataFrame({
'date': ['2022-10-06', '2022-10-11'] * 2,  # Thursday and Tuesday.
'num':  [123, 456, 10, 20]
})
df['date'] = pd.to_datetime(df['date'])
print (df)
date  num
0 2022-10-06  123
1 2022-10-11  456
2 2022-10-06   10
3 2022-10-11   20
df = df.set_index('date')

df2 = df.resample('B').sum(min_count=1)
print (df2)
num
date             
2022-10-06  133.0
2022-10-07    NaN
2022-10-10    NaN
2022-10-11  476.0

df1 = df.asfreq('B')
print (df1)

ValueError:无法从重复的轴重新索引

最新更新