熊猫日期多索引，缺少日期 - 滚动总和 - Pandas Date MultiIndex with missing dates

我有一个看起来像

的熊猫系列

Attribute      DateEvent     Value
Type A         2015-04-01    4
               2015-04-02    5
               2015-04-05    3
Type B         2015-04-01    1
               2015-04-03    4
               2015-04-05    1

如何将值转换为滚动总和（例如，过去两天），同时确保在我的dateEvent索引中考虑丢失日期（假设其开始日期和结束日期是完整范围？（例如，A型缺少2015-04-03和2015-04-04，b）缺少2015-04-02和2015-04-04。

我已经对您想要的东西做了几个假设，请澄清：

您想要缺少日期的行，被视为具有Value = NaN的行。
结果，过去2天滚动总和应返回NaN滚动窗口中丢失的日期。
您想在每个组中计算滚动总和 Type A和 Type B

如果我正确假设，

创建示例数据集

import pandas as pd
import numpy as np
import io
datastring = io.StringIO(
"""
Attribute,DateEvent,Value
Type A,2017-04-02,1
Type A,2017-04-03,2
Type A,2017-04-04,3
Type A,2017-04-05,4
Type B,2017-04-02,1
Type B,2017-04-03,2
Type B,2017-04-04,3
Type B,2017-04-05,4
""")
s = pd.read_csv(
            datastring, 
            index_col=['Attribute', 'DateEvent'],
            parse_dates=True)
print(s)

这就是它的样子。Type A和Type B中的每一个都缺少2017-04-01。

                      Value
Attribute DateEvent        
Type A    2017-04-02      1
          2017-04-03      2
          2017-04-04      3
          2017-04-05      4
Type B    2017-04-02      1
          2017-04-03      2
          2017-04-04      3
          2017-04-05      4

解决方案

根据此答案，您必须重建索引，然后重新索引您的Series才能获取包含所有日期的一个。

# reconstruct index with all the dates
dates = pd.date_range("2017-04-01","2017-04-05", freq="1D")
attributes = ["Type A", "Type B"]
# create a new MultiIndex
index = pd.MultiIndex.from_product([attributes,dates], 
        names=["Attribute","DateEvent"])
# reindex the series
sNew = s.reindex(index)

添加了丢失的日期，带有Value = NaN。

                      Value
Attribute DateEvent        
Type A    2017-04-01    NaN
          2017-04-02    1.0
          2017-04-03    2.0
          2017-04-04    3.0
          2017-04-05    4.0
Type B    2017-04-01    NaN
          2017-04-02    1.0
          2017-04-03    2.0
          2017-04-04    3.0
          2017-04-05    4.0

现在由Attribute索引列分组Series，并用sum()

应用一个尺寸2的滚动窗口

# group the series by the `Attribute` column
grouped = sNew.groupby(level="Attribute")
# Apply a 2 day rolling window
summed = grouped.rolling(2).sum()

最终输出

                                Value
Attribute Attribute DateEvent        
Type A    Type A    2017-04-01    NaN
                    2017-04-02    NaN
                    2017-04-03    3.0
                    2017-04-04    5.0
                    2017-04-05    7.0
Type B    Type B    2017-04-01    NaN
                    2017-04-02    NaN
                    2017-04-03    3.0
                    2017-04-04    5.0
                    2017-04-05    7.0

最终注意：不知道为什么现在有两个Attribute索引列，让我知道是否有人弄清楚。

编辑：在这里提出了类似的问题。检查一下。

来源：如何用多索引

填充缺失值

熊猫日期多索引，缺少日期 - 滚动总和

创建示例数据集

解决方案

最终输出

相关内容

最新更新

热门标签：