Python 多处理计数器更新

我正在使用python多处理来调用一个名为"sql_fetch"的函数，该函数应该更新变量计数，因为它遍历我的列表(即"test_propid_entid"(，以确定我从查询中获得好数据的时间。这里query_randomizer如果我的函数调用生成查询，我只想在多处理调用结束时打印 count 变量的结果以确定多少次熊猫数据帧返回的结果(即查询返回记录的次数(：如何实现？count 总是为我打印 1，因为它会在每次调用时重置值我在下面用于多处理：

from tqdm import tqdm
start_dt = time()
multi =[]
with tqdm(total=len(test_propid_entid)) as pbar:
for sub_prop_entid in test_propid_entid:
t_sub = multiprocessing.Process(target=sql_fetch, args=(sub_prop_entid,))
pbar.update()
multi.append(t_sub)
t_sub.start()
for a in multi:
a.join()
print('TOTAL TIME: ' ,time() - start_dt)

我想调用sql_fetch函数从 Oracle 查询引擎获取数据：

import pandas as pd
def sql_fetch(sub_prop_entid):
count = 0 
data= pd.read_sql(query_randomizer(
sub_prop_entid[0], sub_prop_entid[1], arg1, arg2,), engine)
num_records = len(pd.DataFrame(data).index)
df = pd.DataFrame(data)
if num_records > 0:
count += 1
print( "# Of Records............: " ,num_records , 'n')
df.insert(0,'# Of Records',num_records)
df.insert(1,'Exec Time',tot)
display(df)
print ("Records with good data", count)

您应该了解多处理和多线程之间的区别。我相信后者是完成任务的正确方法。

python中的多处理允许您在多个"CPU"中并行运行任务，这意味着任何变量，数据或连接都不能被腌制(即共享(。将每个进程视为一个全新的 python 程序 - 而您要做的是在不同的 python 程序之间共享一个变量。

因此，尝试使用多线程并将计数器设置为全局变量。

import multithreading
global counter
for sub_prop_entid in test_propid_entid:
t_sub = multithreading.Thread(target=sql_fetch, args=(sub_prop_entid,))
multi.append(t_sub)
t_sub.start()
for a in multi:
a.join()

相关内容

最新更新

热门标签：