小贝子编程

dask创建组合列来模拟按2列排序

本文关键字：模拟 2列排序创建创建组组合 dask python dask dask-dataframe
更新时间 : 2024-09-06
英文 : dask create combined column to simulate sort by 2 columns

目前df。dask中的sort_values只接受按1列排序。

我有一个大文件，它的结构是

输入数据

我不知道如何先按整数列排序，然后按日期排序，如

2000-01-01;43000
2000-01-02;43000
2000-01-01;25000
2000-01-02;25000

我认为创建一个组合列并对其排序将是最好的选择。问题是，我不知道如何创建一个列来完成这一点。也许有另一种方法可以做到这一点，而无需在Dask中创建组合列…

谢谢!

假设d['col1']为datetime型，d['col2']为int型:

import struct
import numpy as np
# create a timedelta with days resolution as int
d['col1_int'] = ((d['col1_dt'] -
d['col1_dt'].min())/np.timedelta64(1,'D')
).astype(int)
d['sort_col'] = d.apply(lambda r: struct.pack("ll",r.col1_int,r.col2))
d = d.set_index('sort_col')
d = d.map_partitions(lambda x: x.sort_index())

根据这个答案重做

dask创建组合列来模拟按2列排序

相关内容

最新更新

热门标签：