熊猫中列名的异常排序



当我将数据帧从熊猫导出到 excel 电子表格时,我的列顺序如下所示,其中"10 个最大事件"被认为是"1 个最大事件"之后的下一个,而不是"2 个最大事件"。我希望它按数字顺序显示。即"1 个最大的事件"、"2 个最大的事件"、"10 个最大的事件">

ID_1    Permit No.        ID_2       1 Largest Event    10 Largest Event    2 Largest Event
10220   To Be Permitted 0010001-24.1       4.0548                  0.822    3.9611

为什么会这样?这是一个很小的格式错误,但它可能非常碍眼。

来自natsortreindex

from natsort import natsorted
l=['1 Largest Event','10 Largest Event','2 Largest Event']
natsorted(l)
Out[789]: ['1 Largest Event', '2 Largest Event', '10 Largest Event']
df=df.reindex(columns=natsorted(list(df)))

问题是您的列按字典顺序像字符串一样排序。

所以需要使用自定义函数进行排序,将第一个拆分的值转换为ints:

df = df[sorted(df.columns, key=lambda x: int(x.split()[0]))]

示例

cols = ['1 Largest Event', 
'10 Largest Event', 
'2 Largest Event',
'3 Largest Event',
'4 Largest Event',
'5 Largest Event', 
'6 Largest Event', 
'7 Largest Event', 
'8 Largest Event', 
'9 Largest Event']
df = pd.DataFrame(0, columns=cols, index=[0])
print (df)
1 Largest Event  10 Largest Event  2 Largest Event  3 Largest Event  
0                0                 0                0                0   
4 Largest Event  5 Largest Event  6 Largest Event  7 Largest Event  
0                0                0                0                0   
8 Largest Event  9 Largest Event  
df = df[sorted(df.columns, key=lambda x: int(x.split()[0]))]
print (df)
1 Largest Event  2 Largest Event  3 Largest Event  4 Largest Event  
0                0                0                0                0   
5 Largest Event  6 Largest Event  7 Largest Event  8 Largest Event  
0                0                0                0                0   
9 Largest Event  10 Largest Event  
0                0                 0  

编辑:

您还可以筛选最后 3 列以进行排序:

df = df[df.columns[:3].tolist() + sorted(df.columns[3:], key=lambda x: int(x.split()[0]))]
print (df)
ID_1       Permit No.          ID_2  1 Largest Event  2 Largest Event  
0  10220  To Be Permitted  0010001-24.1           4.0548           3.9611   
10 Largest Event  
0             0.822  

最新更新