2016-01-01 a 1.0 2016-01-02 a 33.0 2016-01-03 2016-01-04 2016-01-05 a 0.0 2016-01-06 a 2.0 2016-01-05 b <2.0>2016-01-06 2016-01-07 2016-01-08 b 0.0 2016-01-09 b 1.0
嘿,我看到了很多问题的答案,其中最大和最小日期在输出中是恒定的。但是,在每个ID的最大日期和最小日期之间只填写什么,为每个ID填写日期呢。例如,假设这是数据帧
x=熊猫。DataFrame({'user':〔'a','a','b','a'〕,'dt':〔2016-01-01,2016-01-02,2016-01-05,2016-01-09,2016-01-06〕,'val':〔1,33,2,1,2〕}(
所需输出为
日期日期用户基于您的解决方案,我刚刚用min(d.index)
:替换了min(x.dt)
import pandas as pd
x = pd.DataFrame({'user': ['a','a','b','b','a'], 'dt': ['2016-01-01','2016-01-02', '2016-01-05','2016-01-09','2016-01-06'], 'val': [1,33,2,1,2]})
x['dt'] = pd.to_datetime(x['dt'])
filled_df = (x.set_index('dt')
.groupby('user')
.apply(lambda d: d.reindex(pd.date_range(min(d.index),
max(x.dt),
freq='D')))
.drop('user', axis=1)
.reset_index('user')
.fillna(0))
输出
>>> filled_df
user val
2016-01-01 a 1.0
2016-01-02 a 33.0
2016-01-03 a 0.0
2016-01-04 a 0.0
2016-01-05 a 0.0
2016-01-06 a 2.0
2016-01-07 a 0.0
2016-01-08 a 0.0
2016-01-09 a 0.0
2016-01-05 b 2.0
2016-01-06 b 0.0
2016-01-07 b 0.0
2016-01-08 b 0.0
2016-01-09 b 1.0