我有三个数据帧,我想使用multiprocessing
对每个数据帧并行应用一个函数。(并保留他们的名字。(
def loop(df):
df = df.groupby('X').sum().reset_index()
print('end of groupby')
return
pool = multiprocessing.Pool(processes=16)
df1, df2, df3 = pool.map(loop, [df1, df2, df3])
我会做如下操作:
from multiprocessing import Pool
def loop(df):
df = df.groupby('X').sum().reset_index()
print('end of groupby')
return df
if __name__ == '__main__':
df1 = pd.DataFrame({'Col': 1, 'X': [1, 1, 1, 2, 2]})
df2 = pd.DataFrame({'Col': 1, 'X': [1, 1, 2, 2, 3]})
df3 = pd.DataFrame({'Col': 1, 'X': [1, 2, 2, 2, 3]})
with Pool(processes=3) as p:
df1, df2, df3 = p.map(loop, [df1, df2, df3])
print(df1)
print(df2)
print(df3)
结果:
end of groupby
end of groupby
end of groupby
X Col
0 1 3
1 2 2
X Col
0 1 2
1 2 2
2 3 1
X Col
0 1 1
1 2 3
2 3 1