在熊猫中传播缺少日期的值



我有以下工作日数据帧

In [23]: d = pd.DataFrame({'date' : ['20070105', '20070105', '20070106', '20070106', '20070106', '20070109'], 's' : [1, 2, 1,2,3,1], 'i': ['a', 'b', 'a', 'b', 'c', 'a']})
In [26]: d['date'] = pd.to_datetime(d['date'], format='%Y%m%d')
In [27]: d
Out[27]: 
        date  i  s
0 2007-01-05  a  1
1 2007-01-05  b  2
2 2007-01-06  a  1
3 2007-01-06  b  2
4 2007-01-06  c  3
5 2007-01-09  a  1

我想填写缺少日期的数据(根据"所有日子"日历(,输出应如下所示。基本上20070107和20070108丢失了,其数据是从20070106复制的。

Out[31]: 
         date  i  s
0  2007-01-05  a  1
1  2007-01-05  b  2
2  2007-01-06  a  1
3  2007-01-06  b  2
4  2007-01-06  c  3
5  2007-01-07  a  1
6  2007-01-07  b  2
7  2007-01-07  c  3
8  2007-01-08  a  1
9  2007-01-08  b  2
10 2007-01-08  c  3
11 2007-01-09  a  1

在大熊猫中做到这一点的最佳方法是什么?

使用:

  • set_index by cumcount 以获得独特的MultiIndex
  • 通过unstack重塑,实现独特的DatetimeIndex
  • 按索引中的最小值和最大值reindex,按method='ffill'添加缺失值
  • stack重塑
  • 最后删除第二级MultiIndex并将Datetimeindex转换为列

df = (
      d.set_index(['date', d.groupby('date').cumcount()])
        .unstack()
        .reindex(pd.date_range(d['date'].min(), d['date'].max()), method='ffill')
        .stack()
        .reset_index(level=1, drop=True)
        .rename_axis('date')
        .reset_index()
       )
print (df)
         date  i    s
0  2007-01-05  a  1.0
1  2007-01-05  b  2.0
2  2007-01-06  a  1.0
3  2007-01-06  b  2.0
4  2007-01-06  c  3.0
5  2007-01-07  a  1.0
6  2007-01-07  b  2.0
7  2007-01-07  c  3.0
8  2007-01-08  a  1.0
9  2007-01-08  b  2.0
10 2007-01-08  c  3.0
11 2007-01-09  a  1.0

最新更新