使用多索引和多列重采样



我有一个具有以下结构的熊猫数据帧:

ID    date           m_1   m_2 
 1    2016-01-03     10    3.4
      2016-02-07     11    3.3
      2016-02-07     10.4  2.8
 2    2016-01-01     10.9  2.5
      2016-02-04     12    2.3
      2016-02-04     11    2.7
      2016-02-04     12.1  2.1

IDdate都是MultiIndex。数据表示某些传感器(在示例中为两个传感器(进行的一些测量。这些传感器有时每天创建多个测量值(如示例中所示(。

我的问题是:

  • 如何对此进行重新采样,以便每个传感器每天有一行,但一列带有mean,另一列带有max另一列带有min,等等?
  • 我如何"对齐"(也许这不是正确的词(两个时间序列,以便同时开始和结束(从 2016-01-012016-02-07 (添加带有 NA 的缺失天数?

您可以将groupbyDataFrameGroupBy.resample一起使用,并首先按dict中的函数聚合,然后按MultiIndex.from_product reindex

df = df.reset_index(level=0).groupby('ID').resample('D').agg({'m_1':'mean', 'm_2':'max'})
df = df.reindex(pd.MultiIndex.from_product(df.index.levels, names = df.index.names))
#alternative for adding missing start and end datetimes
#df = df.unstack().stack(dropna=False)
print (df.head())
               m_2   m_1
ID date                 
1  2016-01-01  NaN   NaN
   2016-01-02  NaN   NaN
   2016-01-03  3.4  10.0
   2016-01-04  NaN   NaN
   2016-01-05  NaN   NaN

对于二级PeriodIndex,请使用set_levels to_period

df.index = df.index.set_levels(df.index.get_level_values('date').to_period('d'), level=1)
print (df.index.get_level_values('date'))
PeriodIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
             '2016-01-05', '2016-01-06', '2016-01-07', '2016-01-08',
             '2016-01-09', '2016-01-10', '2016-01-11', '2016-01-12',
             '2016-01-13', '2016-01-14', '2016-01-15', '2016-01-16',
             '2016-01-17', '2016-01-18', '2016-01-19', '2016-01-20',
             '2016-01-21', '2016-01-22', '2016-01-23', '2016-01-24',
             '2016-01-25', '2016-01-26', '2016-01-27', '2016-01-28',
             '2016-01-29', '2016-01-30', '2016-01-31', '2016-02-01',
             '2016-02-02', '2016-02-03', '2016-02-04', '2016-02-05',
             '2016-02-06', '2016-02-07', '2016-01-01', '2016-01-02',
             '2016-01-03', '2016-01-04', '2016-01-05', '2016-01-06',
             '2016-01-07', '2016-01-08', '2016-01-09', '2016-01-10',
             '2016-01-11', '2016-01-12', '2016-01-13', '2016-01-14',
             '2016-01-15', '2016-01-16', '2016-01-17', '2016-01-18',
             '2016-01-19', '2016-01-20', '2016-01-21', '2016-01-22',
             '2016-01-23', '2016-01-24', '2016-01-25', '2016-01-26',
             '2016-01-27', '2016-01-28', '2016-01-29', '2016-01-30',
             '2016-01-31', '2016-02-01', '2016-02-02', '2016-02-03',
             '2016-02-04', '2016-02-05', '2016-02-06', '2016-02-07'],
            dtype='period[D]', name='date', freq='D')