为什么python线程和进程不工作

我有一个很大的jsn列表，其中包含许多可能具有重复值的字符串元素。我需要检查每个元素的相似性，并在dubs列表中添加重复的列表项键，以从jsn列表中删除这些项。

由于jsn列表的大小，我决定在代码中使用线程来加快循环执行的秒数和等待时间

但是线程/进程并没有像我预期的那样工作。

下面包含线程的代码在性能上没有任何变化，而且在线程加入完成后，配音列表为空

我尝试了但没有成功。join((，但我仍然得到了空的配音列表，并且性能没有变化。

主要问题->在开始删除重复项之前，配音列表为空

from threading import Thread
from multiprocessing import Process
from difflib import SequenceMatcher
# Searching for dublicates in array
def finddubs(jsn,dubs,a):
for b in range(len(jsn)):
if ((jsn[a] == jsn[b]) or (SequenceMatcher(None, jsn[a], jsn[b]).ratio() > 40)):
dubs.append(b) # add dublicate list element keys to dublicates array

# Start threading
threads = []
for a in range(len(jsn)):
t = Thread(target=finddubs, args=(jsn,dubs,a))
threads.append(t)
t.start()
for thr in threads:
thr.join()
# Delete duplicate list items 
for d in dubs:
k = int(d)
del jsn[k]

没有线程的代码正在工作

如果要加快计算速度，则需要使用multiprocessing而不是threading。请阅读有关GIL的详细信息。

multiprocessing如何用于此任务的示例：

import multiprocessing
from difflib import SequenceMatcher
from uuid import uuid4
# Let's generate a large list with random data
# where we have few duplicates: "abc" indices: 0, 1_001 ; "b" - indices 1_002, 1_003
jsn = ['abc'] + [str(uuid4()) for _ in range(1_000)] + ['abc', 'b', 'b']

def compare_strings(a: int, b: int):
if ((jsn[a] == jsn[b]) or (SequenceMatcher(None, jsn[a], jsn[b]).ratio() > 40)):
return a, b

# now we are comparing all possible pairs using multiprocessing
with multiprocessing.Pool(processes=10) as pool:
results = pool.starmap(compare_strings, [(i, j) for i in range(len(jsn)) for j in range(i + 1, len(jsn))])
for result in results:
if result is not None:
a, b = result
print(f"Duplicated pair: {a} {b} {jsn[b]}")
# delete duplicates

修改你的代码应该工作：

from difflib import SequenceMatcher
from threading import Thread
from uuid import uuid4
# Let's generate a large list with random data
# where we have few duplicates: "abc" indices: 1, 10_001 ; "b" - indices 10_002, 10_003
jsn = ["abc"] + [str(uuid4()) for _ in range(1_00)] + ["abc", "b", "b"]
dubs = []
# Searching for dublicates in array
def finddubs(jsn, dubs, a):
for b in range(a + 1, len(jsn)):
if (jsn[a] == jsn[b]) or (SequenceMatcher(None, jsn[a], jsn[b]).ratio() > 40):
print(a, b)
dubs.append(b)  # add dublicate list element keys to dublicates array

# Start threading
threads = []
for a in range(len(jsn)):
t = Thread(target=finddubs, args=(jsn, dubs, a))
threads.append(t)
t.start()
for thr in threads:
thr.join()
# Delete duplicate list items
print(dubs)
for d in dubs:
k = int(d)
del jsn[k]

相关内容

最新更新

热门标签：