我有以下数据帧。它有2个索引来选择行(样本和时期(,还有2个索引用来选择列(kpi和模型(。
kpi Accuracy Precision Recall Training time (sec) Model memory (MB) HE Memory (GB)
model M0 M1 M2 M0 M1 M2 M0 M1 M2 M0 M1 M2 M0 M1 M2 M0 M1
samples epochs
675 3 0.96 0.52 1.00 1.0 0.0 1.0 0.9166 0.00 1.00 0.2124 0.2083 0.2080 0.417 0.417 0.417 0.553547 6.2009
4 0.96 0.52 1.00 1.0 0.0 1.0 0.9166 0.00 1.00 0.2066 0.2123 0.2137 0.417 0.417 0.417 0.553547 6.2009
1950 3 0.98 0.96 0.98 1.0 1.0 1.0 0.9600 0.92 0.96 0.2132 0.2139 0.2136 0.417 0.417 0.417 1.664447 12.3319
4 0.98 0.90 0.98 1.0 1.0 1.0 0.9600 0.80 0.96 0.2064 0.2166 0.2152 0.417 0.417 0.417 1.664447 12.3319
实现这一点的代码是这样的:
tuples = list(zip_longest(shape_ind, epoch_ind))
flat_list = flatten_list(kpi_values)
df = pd.DataFrame(np.reshape(flat_list, (len(kpi_values), -1)))
df.index = pd.MultiIndex.from_tuples(tuples, names=['samples', 'epochs'])
df.columns= pd.MultiIndex.from_arrays(np.divmod(df.columns, len(kpi_values[0][0])), names=['kpi','model'])
df.rename((lambda x: f'M{x}' ),
axis=1,
level=1,
inplace=True)
kpi = ['Accuracy', 'Precision', 'Recall', 'Training time (sec)', 'Model memory (MB)', 'HE Memory (GB)', 'HE gen. time (sec)']
df.rename(mapper=lambda x: kpi[x],
axis=1,
level=0,
inplace=True)
print(df)
我只想重命名最后2列并创建新的分组,所以从这个开始更改:
HE Memory (GB)
M0 M1
0.553547 6.2009
0.553547 6.2009
1.664447 12.3319
1.664447 12.3319
到这个
HE Memory (GB) HE gen. time (sec)
<--- note how M0 and M1 are gone
0.553547 6.2009
0.553547 6.2009
1.664447 12.3319
1.664447 12.3319
如何在保留原始数据帧结构的同时实现这一点?
我最终得到了这样的解决方案:
model_kpi = ['ACC', 'PRC', 'REC', 'TR_T', 'MM']#, 'HE_M', 'HE_GEN_TIME']
he_kpi = ['HE_M', 'HE_GEN_T']
kpi = [ item for item in model_kpi for repetitions in range(len(kpi_values[0][0])) ] + he_kpi
model = ['M'+str(i) for i in range(len(kpi_values[0][0]))]*len(model_kpi) + ['','']
col_ind = list(zip(kpi, model))
row_ind = list(zip_longest(shape_ind, epoch_ind))
flat_list = flatten_list(kpi_values)
df = pd.DataFrame(np.reshape(flat_list, (len(kpi_values), -1)))
df.index = pd.MultiIndex.from_tuples(row_ind, names=['samples', 'epochs'])
df.columns = pd.MultiIndex.from_tuples(col_ind, names=['kpi', 'model'])
您可以尝试droplevel方法:https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.droplevel.html
df.droplevel(1)
应该做到这一点。