给定以下数据:
import pandas as pd
import io
df = pd.read_csv(
io.StringIO(
"bit,valnbit_0,40.9nbit_1,49.6nbit_2,50.5nbit_3,37.7nbit_4,52.0nbit_5,55.1nbit_6,40.6nbit_7,37.8nbit_8,39.2nbit_9,51.1nbit_10,48.4nbit_11,49.8nbit_12,51.7nbit_13,46.7nbit_14,40.8nbit_15,41.1nbit_16,36.7nbit_17,50.8nbit_18,41.6nbit_19,41.3n"
)
)
df = df.sample(len(df), random_state=1).reset_index(drop=True)
看起来是:
bit val
0 bit_3 37.7
1 bit_16 36.7
2 bit_6 40.6
3 bit_10 48.4
4 bit_2 50.5
5 bit_14 40.8
6 bit_4 52.0
7 bit_17 50.8
8 bit_7 37.8
9 bit_1 49.6
10 bit_13 46.7
11 bit_0 40.9
12 bit_19 41.3
13 bit_18 41.6
14 bit_9 51.1
15 bit_15 41.1
16 bit_8 39.2
17 bit_12 51.7
18 bit_11 49.8
19 bit_5 55.1
我想根据后面的数字,按bit
列对数据进行排序。
如果这是一个标准的python列表,那么以下内容将起作用:
sorted(df["bit"].to_list(), key=lambda x: int(x.split("_")[-1]))
不过,我不知道如何将其应用于数据帧。
试用natsort
from natsort import index_natsorted
df = df.iloc[index_natsorted(df.bit)]
df
Out[195]:
bit val
11 bit_0 40.9
9 bit_1 49.6
4 bit_2 50.5
0 bit_3 37.7
6 bit_4 52.0
19 bit_5 55.1
2 bit_6 40.6
8 bit_7 37.8
16 bit_8 39.2
14 bit_9 51.1
3 bit_10 48.4
18 bit_11 49.8
17 bit_12 51.7
10 bit_13 46.7
5 bit_14 40.8
15 bit_15 41.1
1 bit_16 36.7
7 bit_17 50.8
13 bit_18 41.6
12 bit_19 41.3
使用df.sort_values
和.str.split("_",expand=True)
并使用.astype(int)
强制转换为int,如下所示:
df.sort_values('bit',key=lambda x: x.str.split("_",expand=True)[1].astype(int))
输出:
bit val
11 bit_0 40.9
9 bit_1 49.6
4 bit_2 50.5
0 bit_3 37.7
6 bit_4 52.0
19 bit_5 55.1
2 bit_6 40.6
8 bit_7 37.8
16 bit_8 39.2
14 bit_9 51.1
3 bit_10 48.4
18 bit_11 49.8
17 bit_12 51.7
10 bit_13 46.7
5 bit_14 40.8
15 bit_15 41.1
1 bit_16 36.7
7 bit_17 50.8
13 bit_18 41.6
12 bit_19 41.3
如果您需要重置索引,只需添加.reset_index(drop=True)
:
df.sort_values('bit',key=lambda x: x.str.split("_",expand=True)[1].astype(int)).reset_index(drop=True)
输出:
bit val
0 bit_0 40.9
1 bit_1 49.6
2 bit_2 50.5
3 bit_3 37.7
4 bit_4 52.0
5 bit_5 55.1
6 bit_6 40.6
7 bit_7 37.8
8 bit_8 39.2
9 bit_9 51.1
10 bit_10 48.4
11 bit_11 49.8
12 bit_12 51.7
13 bit_13 46.7
14 bit_14 40.8
15 bit_15 41.1
16 bit_16 36.7
17 bit_17 50.8
18 bit_18 41.6
19 bit_19 41.3
与熊猫>=1.1.0,您可以像在sorted中一样使用key
在我的解决方案中,我对位列进行排序,但对于排序,我抛出bit_
:
df.sort_values(
by='bit',
key=lambda x: x.str.replace('bit_', '').astype(int),
)
bit val
11 bit_0 40.9
9 bit_1 49.6
4 bit_2 50.5
0 bit_3 37.7
6 bit_4 52.0
.sort_values()
上的文档:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html
一种高效的方法是创建一个按您的意愿排序的序列,然后将该索引传递到数据帧:
# create series of bit integers, sort them
bit_vals = df.bit.str.split("_", expand=True).loc[:, 1].astype(int)
sort_series = bit_vals.sort_values()
# pass back to dataframe
df = df.iloc[sort_series.index]
结果:
bit val
11 bit_0 40.9
9 bit_1 49.6
4 bit_2 50.5
0 bit_3 37.7
6 bit_4 52.0
19 bit_5 55.1
2 bit_6 40.6
8 bit_7 37.8
16 bit_8 39.2
14 bit_9 51.1
3 bit_10 48.4
18 bit_11 49.8
17 bit_12 51.7
10 bit_13 46.7
5 bit_14 40.8
15 bit_15 41.1
1 bit_16 36.7
7 bit_17 50.8
13 bit_18 41.6
12 bit_19 41.3
您可以根据需要重置数据帧索引
您可以将str.extract
与Series.argsort
和df.loc
:一起使用
In [1038]: ix = df.bit.str.extract('(d+)', expand=False).astype(int).argsort().tolist()
In [1039]: df.loc[ix]
Out[1039]:
bit val
11 bit_0 40.9
9 bit_1 49.6
4 bit_2 50.5
0 bit_3 37.7
6 bit_4 52.0
19 bit_5 55.1
2 bit_6 40.6
8 bit_7 37.8
16 bit_8 39.2
14 bit_9 51.1
3 bit_10 48.4
18 bit_11 49.8
17 bit_12 51.7
10 bit_13 46.7
5 bit_14 40.8
15 bit_15 41.1
1 bit_16 36.7
7 bit_17 50.8
13 bit_18 41.6
12 bit_19 41.3