如何优化python3中的嵌套循环

迄今为止的代码：

import glob
import re
words = [x.strip () for x in open('words.txt').read().split('n') if x]
paths = glob.glob('./**/*.text', recursive=True)
for path in paths:
with open(path, "r+") as file:
s = file.read()
for word in words:
s = re.sub(word, 'random_text', s)
file.seek(0)
file.write(s)
file.truncate()

我需要循环浏览文件路径，扫描每个文件中的单词，并用一些文本替换找到的每个单词。需要明确的是，这段代码是有效的，它非常慢(需要一个多小时(，因为大约有23k个单词和14k个文件。你能给我一些加速的建议吗？

我看过map((和zip((函数，但我认为这不是我需要的(可能是错误的(。我还研究了线程&多处理，但不确定在这种情况下如何实现。我也尝试过用sed在bash中这样做，但这也需要很长时间，并且遇到了嵌套循环的相同问题。提前感谢您的帮助！(我对编码很陌生，所以对我来说很轻松！：(

我认为您可以删除第二个for循环，并通过预编译来避免每次编译regex。我在优化代码方面经验不足，但这是我的起点。

import glob
import re
words = [x.strip() for x in open('words.txt').read().split('n') if x]
paths = glob.glob('./**/*.text', recursive=True)
regex = re.compile('|'.join(words))
for path in paths:
with open(path, 'r+') as file:
contents = file.read()
contents = regex.sub('random_text', contents)
file.seek(0)
file.write(contents)
file.truncate()

这具有有限的适用性。如果你想根据你要替换的单词来更改'random_text'，这是行不通的。

除了@Jacinator的好答案之外，使用多个进程实际上会增强您的运行时。

import glob
import re
from concurrent.futures import ProcessPoolExecutor
words = [x.strip() for x in open('words.txt').read().split('n') if x]
regex = re.compile('|'.join(words))
def replace_in_one_file(path):
contents = file.read()
contents = regex.sub('random_text', contents)
file.seek(0)
file.write(contents)
file.truncate()
if __name__ == '__main__': 
paths = glob.glob('./**/*.text', recursive=True)
paths = glob.glob('./**/*.text', recursive=True)

with ProcessPoolExecutor(max_workers = 10) as executor:
executor.map(replace_in_one_file, paths)

相关内容

最新更新

热门标签：