Python通过多处理生成哈希

我有一个简单的代码，可以将密码列表加载到字典中，并从中生成哈希值到一个新列表：

def hash_one():
hash_to_string = {}
with open("wordlists/pass.txt", "r", encoding="ISO-8859-1") as file:
for x in file:
x = x.strip()
result = hashlib.sha1(hashlib.sha1(x.encode()).digest()).hexdigest()
f_result = (result)
hash_to_string[hash] = (f_result)
with open("hashed.txt", "a") as final:
final.write(f_result + "n")
hash_one()

我想知道如何使用PoolProcessExecutor()来加快进程？现在它正在逐行阅读。我试着做了一些测试，但无法正常工作。我想使用我正在使用的16核CPU。

您可以并行化哈希，但不能真正并行化文件的读取。为此，您可以尝试将原始文件拆分为16个部分，并启动原始的非并行代码(启动16个Python解释器(。

import hashlib
import multiprocessing
from concurrent.futures import ProcessPoolExecutor
def do_hash(x):
return hashlib.sha1(hashlib.sha1(x.encode()).digest()).hexdigest()
def hash_each_in_list(l):
return [do_hash(x) for x in l]
def hash_each_in_list_parallel(l):
n = multiprocessing.cpu_count()
parts = [l[(i*len(l))//n : ((i+1)*len(l))//n] for i in range(n)]
with ProcessPoolExecutor() as executor:
return sum(list(executor.map(hash_each_in_list, parts)), [])
l = hash_each_in_list_parallel(open('so_2020-12-08_hashes.txt').read().splitlines())

password
123456
monkey

['2470c0c06dee42fd1618bb99005adca2ec9d1e19',
'6bb4837eb74329105ee4568dda7dc67ed2ca2ad9',
'a5892368ae83685440a1e27d012306b073bdf5b7']

非并行调用：

l = hash_each_in_list(open('so_2020-12-08_hashes.txt').read().splitlines()))

在一个1000万行的伪文件上的测试结果，Ryzen 5 3600 6核处理器：

非并行：9.9s
并行化：6.3s
- 如果不在最后使用sum(parts, [])将列表缝合在一起，则为5.1s

我的猜测是，在内存周围复制是这里的瓶颈，而计算CPU上的has非常快，所以这里没有太大的加速。(我的CPU利用率没有超过40%(

相关内容

最新更新

热门标签：