Pandas切割下三角形,独立地重新索引每个列,重新排列和连接列



我有一个数据框rowcolumn按日期索引。条件是如果row-index-date >= column-index-date。下面是获取初始数据帧的代码:

import pandas as pd
import numpy as np
np.random.seed(0)

rng = pd.date_range('1/1/2011', periods=5, freq='M')
df = pd.DataFrame(np.random.random((len(rng), len(rng))), index=rng, columns=rng)
idx = df.apply(lambda x: x.index >= x.name, axis=0)
df = df[idx]
df.ix[4, 0:2] = np.nan
df.ix[2, 1] = np.nan
print(df) 

            2011-01-31  2011-02-28  2011-03-31  2011-04-30  2011-05-31 
2011-01-31  0.548814    NaN         NaN         NaN         NaN
2011-02-28  0.645894    0.437587    NaN         NaN         NaN
2011-03-31  0.791725    NaN         0.568045    NaN         NaN
2011-04-30  0.087129    0.020218    0.832620    0.778157    NaN
2011-05-31  NaN         NaN         0.461479    0.780529    0.118274

我想把这个改成下面的格式:

    2011-01-31  2011-02-28 2011-03-31   2011-04-30 2011-05-31 
0   0.548814    0.437587    0.568045    0.778157    0.118274
1   0.645894    NaN         0.832620    0.780529    NaN
2   0.791725    0.020218    0.461479    NaN         NaN
3   0.087129    NaN         NaN         NaN         NaN
4   NaN         NaN         NaN         NaN         NaN

新索引表示从原始数据帧延迟row-index - column-index。注意,每个列的索引是不同的。我正在努力为每列分配新的索引,然后重新排列列

我是这么做的:

def align_columns_by_lag(x):
    """Keep Lower triangular, re-indexed columns
    """
    xlen = len(x)
    idx = x.index >= x.name
    newx = x[idx]
    newx.reset_index(drop=True, inplace=True)
    newx.reindex(range(xlen), fill_value=np.nan)
    return newx
df2 = df.apply(align_columns_by_lag, axis=0)
df2

最新更新