重命名multiIndex数据帧中的特定列



我有以下数据帧。它有2个索引来选择行(样本和时期(,还有2个索引用来选择列(kpi和模型(。

kpi            Accuracy             Precision            Recall             Training time (sec)                 Model memory (MB)               HE Memory (GB)         
model                M0    M1    M2        M0   M1   M2      M0    M1    M2                  M0      M1      M2                M0     M1     M2             M0       M1
samples epochs                                                                                                                                                         
675     3          0.96  0.52  1.00       1.0  0.0  1.0  0.9166  0.00  1.00              0.2124  0.2083  0.2080             0.417  0.417  0.417       0.553547   6.2009
        4          0.96  0.52  1.00       1.0  0.0  1.0  0.9166  0.00  1.00              0.2066  0.2123  0.2137             0.417  0.417  0.417       0.553547   6.2009
1950    3          0.98  0.96  0.98       1.0  1.0  1.0  0.9600  0.92  0.96              0.2132  0.2139  0.2136             0.417  0.417  0.417       1.664447  12.3319
        4          0.98  0.90  0.98       1.0  1.0  1.0  0.9600  0.80  0.96              0.2064  0.2166  0.2152             0.417  0.417  0.417       1.664447  12.3319

实现这一点的代码是这样的:

tuples = list(zip_longest(shape_ind, epoch_ind))
flat_list = flatten_list(kpi_values)
df = pd.DataFrame(np.reshape(flat_list, (len(kpi_values), -1)))
df.index = pd.MultiIndex.from_tuples(tuples, names=['samples', 'epochs'])
df.columns= pd.MultiIndex.from_arrays(np.divmod(df.columns, len(kpi_values[0][0])), names=['kpi','model'])
df.rename((lambda x: f'M{x}' ), 
        axis=1,
        level=1,
        inplace=True)
kpi = ['Accuracy', 'Precision', 'Recall', 'Training time (sec)', 'Model memory (MB)', 'HE Memory (GB)', 'HE gen. time (sec)']
df.rename(mapper=lambda x: kpi[x], 
        axis=1,
        level=0,
        inplace=True)
print(df)

我只想重命名最后2列并创建新的分组,所以从这个开始更改:

HE Memory (GB)         
M0         M1                                                                                                                                                         
0.553547   6.2009
0.553547   6.2009
1.664447  12.3319
1.664447  12.3319

到这个

HE Memory (GB)  HE gen. time (sec)      
                                   <--- note how M0 and M1 are gone                                                                                                                                    
0.553547        6.2009
0.553547        6.2009
1.664447        12.3319
1.664447        12.3319

如何在保留原始数据帧结构的同时实现这一点?

我最终得到了这样的解决方案:

model_kpi = ['ACC', 'PRC', 'REC', 'TR_T', 'MM']#, 'HE_M', 'HE_GEN_TIME']
he_kpi = ['HE_M', 'HE_GEN_T']
kpi = [ item for item in model_kpi for repetitions in range(len(kpi_values[0][0])) ] + he_kpi
model = ['M'+str(i) for i in range(len(kpi_values[0][0]))]*len(model_kpi) + ['',''] 
col_ind = list(zip(kpi, model))
row_ind = list(zip_longest(shape_ind, epoch_ind))
flat_list = flatten_list(kpi_values)
df = pd.DataFrame(np.reshape(flat_list, (len(kpi_values), -1)))
df.index = pd.MultiIndex.from_tuples(row_ind, names=['samples', 'epochs'])
df.columns = pd.MultiIndex.from_tuples(col_ind, names=['kpi', 'model'])

您可以尝试droplevel方法:https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.droplevel.html

df.droplevel(1) 

应该做到这一点。

最新更新