由 agg(['sum','count']) 创建的拖放标签



我有一个可以重建为pd的df。DataFrame(dict(:

           city   d1      d2       d3       d4
date                    
2014-05-01  sfo 4.26    6.58     2.32    -4.87
2014-05-01  yyz 2.90    6.64    24.78   -50.55
2014-05-01  yvr 2.90    6.64    24.78   -50.55
2014-05-01  dfw 4.06    6.54     2.40    -4.06
2014-05-01  pdx 9.96    6.66    30.35    64.24

dict:

{'date': {0: pd.Timestamp('2014-05-01 00:00:00'),
  1: pd.Timestamp('2014-05-01 00:00:00'),
  2: pd.Timestamp('2014-05-01 00:00:00'),
  3: pd.Timestamp('2014-05-01 00:00:00'),
  4: pd.Timestamp('2014-05-01 00:00:00')},
 'city': {0: 'sfo', 1: 'yyz', 2: 'yvr', 3: 'dfw', 4: 'pdx'},
 'd1': {0: 4.2599999999999998,
  1: 2.8999999999999999,
  2: 2.8999999999999999,
  3: 4.0599999999999996,
  4: 9.9600000000000009},
 'd2': {0: 6.5800000000000001,
  1: 6.6399999999999997,
  2: 6.6399999999999997,
  3: 6.54,
  4: 6.6600000000000001},
 'd3': {0: 2.3199999999999998,
  1: 24.780000000000001,
  2: 24.780000000000001,
  3: 2.3999999999999999,
  4: 30.350000000000001},
 'd4': {0: -4.8700000000000001,
  1: -50.549999999999997,
  2: -50.549999999999997,
  3: -4.0599999999999996,
  4: 64.239999999999995}}
df.set_index(['date'], inplace=True)

我通过TimeGrouper执行以下聚合:

grouped = df.groupby(['city', pd.TimeGrouper('M')])
monthly_agg = grouped.agg(['sum', 'count'])

monthly_agg看起来像:

        d1  d2  d3  d4
                 sum    count    sum    count     sum   count      sum  count
city    date                                
dfw 2014-05-31  4.06        1   6.54        1    2.40       1    -4.06      1
pdx 2014-05-31  9.96        1   6.66        1   30.35       1    64.24      1
sfo 2014-05-31  4.26        1   6.58        1    2.32       1    -4.87      1
yvr 2014-05-31  2.90        1   6.64        1   24.78       1   -50.55      1
yyz 2014-05-31  2.90        1   6.64        1   24.78       1   -50.55      1

count标签列用于健全性检查,但一旦完成,我希望能够删除它

同样,d1、d2、d3、d4等下的sum标签不再是必要的

我想要的输出:

                  d1      d2       d3       d4
city    date                                
dfw 2014-05-31  4.06    6.54     2.40    -4.06
pdx 2014-05-31  9.96    6.66    30.35    64.24
sfo 2014-05-31  4.26    6.58     2.32    -4.87
yvr 2014-05-31  2.90    6.64    24.78   -50.55
yyz 2014-05-31  2.90    6.64    24.78   -50.55

我该怎么得到这个?

monthly_agg = monthly_agg.loc[:, pd.IndexSlice[:,'sum']]
monthly_agg.columns = monthly_agg.columns.droplevel(1)
monthly_agg
Out: 
                   d1    d2     d3     d4
city date                                
dfw  2014-05-31  4.06  6.54   2.40  -4.06
pdx  2014-05-31  9.96  6.66  30.35  64.24
sfo  2014-05-31  4.26  6.58   2.32  -4.87
yvr  2014-05-31  2.90  6.64  24.78 -50.55
yyz  2014-05-31  2.90  6.64  24.78 -50.55

或者使用xs:

monthly_agg.xs('sum', axis=1, level=1)
Out: 
                   d1    d2     d3     d4
city date                                
dfw  2014-05-31  4.06  6.54   2.40  -4.06
pdx  2014-05-31  9.96  6.66  30.35  64.24
sfo  2014-05-31  4.26  6.58   2.32  -4.87
yvr  2014-05-31  2.90  6.64  24.78 -50.55
yyz  2014-05-31  2.90  6.64  24.78 -50.55

最新更新