我有以下数据集:
Name Year Date Value
x year1 date1 v1
x year1 date2 v2
x year1 date3 v3
x year2 date1 v4
x year2 date2 v5
x year2 date3 v6
z year1 date1 v7
z year1 date2 v8
z year1 date3 v9
z year2 date1 v10
z year2 date2 v11
z year2 date3 v12
y year1 date1 v13
y year1 date2 v14
y year1 date3 v15
y year2 date1 v16
y year2 date2 v17
y year2 date3 v18
我想要以下数据集输出:
Name Year Date Value
x year1 date1 v1
x year2 date1 v4
x year1 date2 v2
x year2 date2 v5
x year1 date3 v3
x year2 date3 v6
z year1 date1 v7
z year2 date1 v10
z year1 date2 v8
z year2 date2 v11
z year1 date3 v9
z year2 date3 v12
y year1 date1 v13
y year2 date1 v16
y year1 date2 v14
y year2 date2 v17
y year1 date3 v15
y year2 date3 v18
我尝试了以下代码,但我的"名称"列也被排序为"x,y,z"。我希望'Name'列顺序保持为'x,z,y':df.sort_values(['Name', 'Date'])
现在,让我们创建一个具有集合顺序的新分类dtype:
namedtype = pd.CategoricalDtype([*'xzy'], ordered=True)
df['Name'] = df['Name'].astype(namedtype)
df.sort_values(['Name', 'Date', 'Year'])
输出:
Name Year Date Value
0 x year1 date1 v1
3 x year2 date1 v4
1 x year1 date2 v2
4 x year2 date2 v5
2 x year1 date3 v3
5 x year2 date3 v6
6 z year1 date1 v7
9 z year2 date1 v10
7 z year1 date2 v8
10 z year2 date2 v11
8 z year1 date3 v9
11 z year2 date3 v12
12 y year1 date1 v13
15 y year2 date1 v16
13 y year1 date2 v14
16 y year2 date2 v17
14 y year1 date3 v15
17 y year2 date3 v18
datar
是对pandas api的重新想象。
使用data:
很容易实现>>> from datar.all import f, tribble, arrange, match
>>> df = tribble(
... f.Name, f.Year, f.Date, f.Value,
... "x", "year1", "date1", "v1",
... "x", "year1", "date2", "v2",
... "x", "year1", "date3", "v3",
... "x", "year2", "date1", "v4",
... "x", "year2", "date2", "v5",
... "x", "year2", "date3", "v6",
... "z", "year1", "date1", "v7",
... "z", "year1", "date2", "v8",
... "z", "year1", "date3", "v9",
... "z", "year2", "date1", "v10",
... "z", "year2", "date2", "v11",
... "z", "year2", "date3", "v12",
... "y", "year1", "date1", "v13",
... "y", "year1", "date2", "v14",
... "y", "year1", "date3", "v15",
... "y", "year2", "date1", "v16",
... "y", "year2", "date2", "v17",
... "y", "year2", "date3", "v18",
... )
>>> df >> arrange(match(f.Name, f.Name), f.Date, f.Year)
Name Year Date Value
<object> <object> <object> <object>
0 x year1 date1 v1
3 x year2 date1 v4
1 x year1 date2 v2
4 x year2 date2 v5
2 x year1 date3 v3
5 x year2 date3 v6
6 z year1 date1 v7
9 z year2 date1 v10
7 z year1 date2 v8
10 z year2 date2 v11
8 z year1 date3 v9
11 z year2 date3 v12
12 y year1 date1 v13
15 y year2 date1 v16
13 y year1 date2 v14
16 y year2 date2 v17
14 y year1 date3 v15
17 y year2 date3 v18