如何以特定方式按特定列对pandas数据帧的值进行排序(使用类似stdlib中排序的lambda函数)



给定以下数据:

import pandas as pd
import io
df = pd.read_csv(
io.StringIO(
"bit,valnbit_0,40.9nbit_1,49.6nbit_2,50.5nbit_3,37.7nbit_4,52.0nbit_5,55.1nbit_6,40.6nbit_7,37.8nbit_8,39.2nbit_9,51.1nbit_10,48.4nbit_11,49.8nbit_12,51.7nbit_13,46.7nbit_14,40.8nbit_15,41.1nbit_16,36.7nbit_17,50.8nbit_18,41.6nbit_19,41.3n"
)
)
df = df.sample(len(df), random_state=1).reset_index(drop=True)

看起来是:

bit   val
0    bit_3  37.7
1   bit_16  36.7
2    bit_6  40.6
3   bit_10  48.4
4    bit_2  50.5
5   bit_14  40.8
6    bit_4  52.0
7   bit_17  50.8
8    bit_7  37.8
9    bit_1  49.6
10  bit_13  46.7
11   bit_0  40.9
12  bit_19  41.3
13  bit_18  41.6
14   bit_9  51.1
15  bit_15  41.1
16   bit_8  39.2
17  bit_12  51.7
18  bit_11  49.8
19   bit_5  55.1

我想根据后面的数字,按bit列对数据进行排序。

如果这是一个标准的python列表,那么以下内容将起作用:

sorted(df["bit"].to_list(), key=lambda x: int(x.split("_")[-1]))

不过,我不知道如何将其应用于数据帧。

试用natsort

from natsort import index_natsorted
df = df.iloc[index_natsorted(df.bit)]
df
Out[195]: 
bit   val
11   bit_0  40.9
9    bit_1  49.6
4    bit_2  50.5
0    bit_3  37.7
6    bit_4  52.0
19   bit_5  55.1
2    bit_6  40.6
8    bit_7  37.8
16   bit_8  39.2
14   bit_9  51.1
3   bit_10  48.4
18  bit_11  49.8
17  bit_12  51.7
10  bit_13  46.7
5   bit_14  40.8
15  bit_15  41.1
1   bit_16  36.7
7   bit_17  50.8
13  bit_18  41.6
12  bit_19  41.3

使用df.sort_values.str.split("_",expand=True)并使用.astype(int)强制转换为int,如下所示:

df.sort_values('bit',key=lambda x: x.str.split("_",expand=True)[1].astype(int))

输出:

bit   val
11   bit_0  40.9
9    bit_1  49.6
4    bit_2  50.5
0    bit_3  37.7
6    bit_4  52.0
19   bit_5  55.1
2    bit_6  40.6
8    bit_7  37.8
16   bit_8  39.2
14   bit_9  51.1
3   bit_10  48.4
18  bit_11  49.8
17  bit_12  51.7
10  bit_13  46.7
5   bit_14  40.8
15  bit_15  41.1
1   bit_16  36.7
7   bit_17  50.8
13  bit_18  41.6
12  bit_19  41.3

如果您需要重置索引,只需添加.reset_index(drop=True):

df.sort_values('bit',key=lambda x: x.str.split("_",expand=True)[1].astype(int)).reset_index(drop=True)

输出:

bit   val
0    bit_0  40.9
1    bit_1  49.6
2    bit_2  50.5
3    bit_3  37.7
4    bit_4  52.0
5    bit_5  55.1
6    bit_6  40.6
7    bit_7  37.8
8    bit_8  39.2
9    bit_9  51.1
10  bit_10  48.4
11  bit_11  49.8
12  bit_12  51.7
13  bit_13  46.7
14  bit_14  40.8
15  bit_15  41.1
16  bit_16  36.7
17  bit_17  50.8
18  bit_18  41.6
19  bit_19  41.3

熊猫>=1.1.0,您可以像在sorted中一样使用key
在我的解决方案中,我对位列进行排序,但对于排序,我抛出bit_:

df.sort_values(
by='bit', 
key=lambda x: x.str.replace('bit_', '').astype(int),
)
bit     val
11  bit_0   40.9
9   bit_1   49.6
4   bit_2   50.5
0   bit_3   37.7
6   bit_4   52.0

.sort_values()上的文档:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html

一种高效的方法是创建一个按您的意愿排序的序列,然后将该索引传递到数据帧:

# create series of bit integers, sort them
bit_vals = df.bit.str.split("_", expand=True).loc[:, 1].astype(int)
sort_series = bit_vals.sort_values()    
# pass back to dataframe
df = df.iloc[sort_series.index]

结果:

bit   val
11   bit_0  40.9
9    bit_1  49.6
4    bit_2  50.5
0    bit_3  37.7
6    bit_4  52.0
19   bit_5  55.1
2    bit_6  40.6
8    bit_7  37.8
16   bit_8  39.2
14   bit_9  51.1
3   bit_10  48.4
18  bit_11  49.8
17  bit_12  51.7
10  bit_13  46.7
5   bit_14  40.8
15  bit_15  41.1
1   bit_16  36.7
7   bit_17  50.8
13  bit_18  41.6
12  bit_19  41.3

您可以根据需要重置数据帧索引

您可以将str.extractSeries.argsortdf.loc:一起使用

In [1038]: ix = df.bit.str.extract('(d+)', expand=False).astype(int).argsort().tolist()
In [1039]: df.loc[ix]
Out[1039]: 
bit   val
11   bit_0  40.9
9    bit_1  49.6
4    bit_2  50.5
0    bit_3  37.7
6    bit_4  52.0
19   bit_5  55.1
2    bit_6  40.6
8    bit_7  37.8
16   bit_8  39.2
14   bit_9  51.1
3   bit_10  48.4
18  bit_11  49.8
17  bit_12  51.7
10  bit_13  46.7
5   bit_14  40.8
15  bit_15  41.1
1   bit_16  36.7
7   bit_17  50.8
13  bit_18  41.6
12  bit_19  41.3

最新更新