我有一个数据框row
和column
按日期索引。条件是如果row-index-date >= column-index-date
。下面是获取初始数据帧的代码:
import pandas as pd
import numpy as np
np.random.seed(0)
rng = pd.date_range('1/1/2011', periods=5, freq='M')
df = pd.DataFrame(np.random.random((len(rng), len(rng))), index=rng, columns=rng)
idx = df.apply(lambda x: x.index >= x.name, axis=0)
df = df[idx]
df.ix[4, 0:2] = np.nan
df.ix[2, 1] = np.nan
print(df)
为
2011-01-31 2011-02-28 2011-03-31 2011-04-30 2011-05-31
2011-01-31 0.548814 NaN NaN NaN NaN
2011-02-28 0.645894 0.437587 NaN NaN NaN
2011-03-31 0.791725 NaN 0.568045 NaN NaN
2011-04-30 0.087129 0.020218 0.832620 0.778157 NaN
2011-05-31 NaN NaN 0.461479 0.780529 0.118274
我想把这个改成下面的格式:
2011-01-31 2011-02-28 2011-03-31 2011-04-30 2011-05-31
0 0.548814 0.437587 0.568045 0.778157 0.118274
1 0.645894 NaN 0.832620 0.780529 NaN
2 0.791725 0.020218 0.461479 NaN NaN
3 0.087129 NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
新索引表示从原始数据帧延迟row-index
- column-index
。注意,每个列的索引是不同的。我正在努力为每列分配新的索引,然后重新排列列
我是这么做的:
def align_columns_by_lag(x):
"""Keep Lower triangular, re-indexed columns
"""
xlen = len(x)
idx = x.index >= x.name
newx = x[idx]
newx.reset_index(drop=True, inplace=True)
newx.reindex(range(xlen), fill_value=np.nan)
return newx
df2 = df.apply(align_columns_by_lag, axis=0)
df2