当日期时间索引不唯一且相应值相同时，重新采样

我有以下df数据帧（pandas）：

           attribute
2017-01-01         a
2017-01-01         a
2017-01-05         b
2017-02-01         a
2017-02-10         a

其中第一列是非唯一datetime索引，我想每周计算 a 和 b 的数量。如果我尝试df.attribute.resample('W').count()，由于重复的条目，会出现错误。

我能用什么方法做到这一点？

df=df.reset_index()    
df.groupby([df['index'].dt.week,'attribute']).count()
Out[292]: 
                 index
index attribute       
1     b              1
5     a              1
6     a              1
52    a              2

或

df.groupby([df.index.get_level_values(0).week,'attribute'])['attribute'].count()
Out[303]: 
    attribute
1   b            1
5   a            1
6   a            1
52  a            2
Name: attribute, dtype: int64

您可能对涉及groupby后跟resample的两步过程感兴趣。

df.groupby(level=0).count().resample('W').sum()
            attribute
2017-01-01        2.0
2017-01-08        1.0
2017-01-15        NaN
2017-01-22        NaN
2017-01-29        NaN
2017-02-05        1.0
2017-02-12        1.0

您可以使用pd.Grouper按每周频率对索引进行分组：

In [83]: df.groupby(pd.Grouper(freq='W')).count()
Out[83]: 
            attribute
2017-01-01          2
2017-01-08          1
2017-01-15          0
2017-01-22          0
2017-01-29          0
2017-02-05          1
2017-02-12          1

要按每周频率和attribute列进行分组，您可以使用：

In [87]: df.groupby([pd.Grouper(freq='W'), 'attribute']).size()
Out[87]: 
            attribute
2017-01-01  a            2
2017-01-08  b            1
2017-02-05  a            1
2017-02-12  a            1
dtype: int64

pd.Grouper还有一个key参数，允许您按位于列而不是索引中的日期时间进行分组。

相关内容

最新更新

热门标签：