小贝子编程

如何在自定义函数中使用Dictionary从大型数据帧计算新列

本文关键字：大型数据帧计算新列 Dictionary 自定义函数 python-3.x pandas dataframe vectorization
更新时间 : 2023-09-21
英文 : How to calculate a new column from a large Dataframe using Dictionary in a custom function?

我有一个数据帧df，包含7亿行和三列，格式如下

key_x key_y  num
0 1       1    111.111
1 1       2    222.222
2 1       3    333.333
:

我有一个字典dict，其中key_x和key_y中的所有值都存储为密钥

我需要创建一个新列，这样，对于df中的每一行

df['result'] =  df['num'] /( dict[key_x] * dict[key_y])

我目前的方法是矢量化如下：

def find_res(key_x,key_y,num):
return num/(dict[key_x]*row_dict[key_y])
df["result"] = np.vectorize(find_res)(df["key"],df["key_y"],df["num"])

然而，这种方法太慢了。我有大约500GB的RAM，所以内存不是问题。有没有更有效的方法来执行相同的操作？

您可以使用map:

df['result'] = df['num'] / (df['key_x'].map(your_dict) * df['key_y'].map(your_dict) )

相关内容