我想根据两个自定义字典对df进行排序:
custom_dict = {'HC': 0, 'AMG HC': 1, 'S': 2, 'AMG S': 3,
'HCA':4, 'AMG HCA':5, 'MUP':6, 'AMG MUP':7}
custom_dict2 = {'Offline': 0, 'Online': 1, 'Independent':2}
df4 = df4.sort_values(by=['category','segment'], key=lambda x: x.map(custom_dict, custom_dict2))
我执行这段代码:
df4 = (df4.set_index(['category', 'segment'], append=True)
.unstack()
.swaplevel(axis=1)
.sort_index(level=0, axis=1, ascending=False)
.reindex([ytd1, ytd2, 'Evolution'], level=1, axis=1)
.reset_index('category')
# .sort_index(level=-1, axis=0)
)
但是它不输出预期的输出。我想有以下顺序:离线的HC,离线的AMG HC,在线的HC,在线的AMG HC,独立的HC,独立的AMG HC,在线的S,在线的AMF等等。
你能告诉我怎么做吗?
数据集样本:
df4 = {'category': {0: 'HC', 1: 'AMG HC', 2: 'S', 3: 'AMG MUP', 4: 'MUP', 5: 'S', 6: 'AMG S', 7: 'HCA'}, 'segment': {0: 'Offline', 1: 'Offline', 2: 'Offline', 3: 'Offline', 4: 'Online', 5: 'Online', 6: 'Offline', 7: 'Independent'}, 'ytd2020': {0: '101142', 1: '38541', 2: '55653', 3: '19561', 4: '84921', 5: '99301', 6: '80212', 7: '95731'}, 'ytd2021': {0: '105726', 1: '39463', 2: '57537', 3: '21402', 4: '90310', 5: '97283', 6: '87011', 7: '119289'}, 'Evolution': {0: '4.5%', 1: '2.4%', 2: '2.1%', 3: '4.3%', 4: '8.2%', 5: '-2.0%', 6: '12.4%', 7: '24.6%'}}
您可以将数据设置为有序的类别。有几个优点,更小的内存消耗和更快的排序,一旦分类到位:
df4['category'] = pd.Categorical(df4['category'],
categories=list(custom_dict),
ordered=True)
df4['segment'] = pd.Categorical(df4['segment'],
categories=list(custom_dict2),
ordered=True)
df4 = df4.sort_values(by=['category','segment'])
NB。此解决方案不需要字典,按所需顺序排列的类别列表就足够了
示例输出(随机输入):
category segment
11 HC Offline
14 HC Offline
1 HC Online
5 HC Independent
16 HC Independent
19 AMG HC Offline
15 AMG HC Online
3 S Offline
4 S Offline
0 S Independent
12 S Independent
9 AMG S Online
10 AMG S Independent
2 HCA Offline
6 HCA Offline
17 HCA Independent
7 AMG HCA Offline
13 AMG HCA Independent
8 MUP Online
18 AMG MUP Independent
sort_values
中的key
函数应该分别应用于category
和segment
列,但是您正在尝试一次映射这两个列,从而产生不正确的输出。为了修复代码,我们可以创建一个额外的order
字典,它可以帮助我们将列名映射到相应的映射字典
order = {'category': custom_dict, 'segment': custom_dict2}
df4.sort_values(['category', 'segment'], key=lambda s: s.map(order[s.name]))
category segment ytd2020 ytd2021 Evolution
0 HC Offline 101142 105726 4.5%
1 AMG HC Offline 38541 39463 2.4%
2 S Offline 55653 57537 2.1%
5 S Online 99301 97283 -2.0%
6 AMG S Offline 80212 87011 12.4%
7 HCA Independent 95731 119289 24.6%
4 MUP Online 84921 90310 8.2%
3 AMG MUP Offline 19561 21402 4.3%
根据我对您上一个问题的理解,您的主要目标是将所有AMG XXX
立即放置在同一段的相应XXX
类别之后。例如:AMG HC
到紧接offline
段中的HC
和AMG S
对的紧接
offline
段中的S
。因此,您可以使用:
idx = (df4[['category','segment']].apply(tuple, axis=1)
.sort_values(key=lambda x: x.str[0].map(custom_dict) * 10 + x.str[1].map(custom_dict2))
.index
)
df5 = df4.loc[idx]
请注意,我已经将排序的数据帧放入了一个新名称df5
中,而不是覆盖df4
。如果您觉得方便,您可以将df5
更改为df4
。
结果:
print(df5)
category segment ytd2020 ytd2021 Evolution
0 HC Offline 101142 105726 4.5%
1 AMG HC Offline 38541 39463 2.4%
2 S Offline 55653 57537 2.1%
5 S Online 99301 97283 -2.0%
6 AMG S Offline 80212 87011 12.4%
7 HCA Independent 95731 119289 24.6%
4 MUP Online 84921 90310 8.2%
3 AMG MUP Offline 19561 21402 4.3%