我正在创建搜索算法,为此我需要大量的数据,所以我决定创建一个文件,其中包含从english-words
模块中取出的随机单词。为此,我创建了这个小代码,从CMD运行:
def create_very_large_file(filename='very_large_file.txt'):
from english_words import english_words_set as ews
import random
_ews = list(ews)
print('Creating a very large file...')
with open(filename, 'w') as large_file:
large_file.close()
with open(filename, 'a') as large_file:
for i in range(1000000000):
print(f'Work Done: {round((i/1000000000)*100, 8)}t|{"▐" * int((i/1000000000)*100)}{" " * int(100 - ((i/1000000000)*100))}|', end='r')
if i % 10 == 0:
large_file.write('n')
word = random.choice(_ews)
large_file.write(word + ' ')
large_file.close()
print('A very large file created !!')
现在这段代码的速度大约是每分钟3,600,000字,附加到文件中,创建这么大的文件需要一个多小时。
因为我用的是Ryzen 9,所以在其他电脑上这个过程会慢得多。
这种工作有没有办法做得快一点呢?
您可以通过在每次迭代中构建一个完整的行来减少每次迭代所做的工作量(并将迭代的数量减少10倍)。
with open(filename, 'w') as large_file:
for i in range(100_000_000):
words = random.choices(_ews, k=10)
print(' '.join(words), file=large_file)
对于您仍然剩余的迭代次数,没有理由尝试在每个迭代上更新您的进度条。每10,000次甚至100,000次迭代一次更新可能就足够了。
您还做了几倍于每次进度条更新所需的工作。计算(i/1000000000)*100
。
if i % 10_000 == 0:
x = i / 1_000_000
total = round(x, 8)
done = int(x)
remaining = int(100 - x)
print(f'Work Done: {total}t|{"▐" * done}{" " * remaining}|', end='r')