我正试图将缺失的工作日插入Pandas时间序列数据帧中。插入的工作日在每个数据列中必须具有NaN
值。当我尝试在pandas数据帧中插入缺少的工作日并用NaN填充答案时,新行用0
而不是NaN
填充。举例说明:
import pandas as pd
df = pd.DataFrame({
'date': ['2022-10-06', '2022-10-11'], # Thursday and Tuesday.
'num': [123, 456]
})
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
df = df.resample('B').sum() # Insert Friday and Monday.
然而,df
现在是:
num
date
2022-10-06 123
2022-10-07 0
2022-10-10 0
2022-10-11 456
我得到的不是NaN
,而是0
。如何获取NaN
?这就是我想要的:
num
date
2022-10-06 123
2022-10-07 NaN
2022-10-10 NaN
2022-10-11 456
(Pandas版本1.3.2,Python版本3.8.10(
使用.asfreq()
而不是.sum()
:
df.resample('B').asfreq()
输出:
num
date
2022-10-06 123.0
2022-10-07 NaN
2022-10-10 NaN
2022-10-11 456.0
df = pd.DataFrame({
'date': ['2022-10-06', '2022-10-11'], # Thursday and Tuesday.
'num': [123, 456]
})
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
如果唯一日期时间:,则使用DataFrame.asfreq
df1 = df.asfreq('B')
print (df1)
num
date
2022-10-06 123.0
2022-10-07 NaN
2022-10-10 NaN
2022-10-11 456.0
如果可能重复并且需要聚合sum
,则添加参数min_count=1
:
df2 = df.resample('B').sum(min_count=1)
print (df2)
num
date
2022-10-06 123.0
2022-10-07 NaN
2022-10-10 NaN
2022-10-11 456.0
df = pd.DataFrame({
'date': ['2022-10-06', '2022-10-11'] * 2, # Thursday and Tuesday.
'num': [123, 456, 10, 20]
})
df['date'] = pd.to_datetime(df['date'])
print (df)
date num
0 2022-10-06 123
1 2022-10-11 456
2 2022-10-06 10
3 2022-10-11 20
df = df.set_index('date')
df2 = df.resample('B').sum(min_count=1)
print (df2)
num
date
2022-10-06 133.0
2022-10-07 NaN
2022-10-10 NaN
2022-10-11 476.0
df1 = df.asfreq('B')
print (df1)
ValueError:无法从重复的轴重新索引