做python的子进程。PIPE 将所有发送到 stdout 的数据存储到缓冲区中，直到从中读取，有没有办法流式传输？

import subprocess
from pathlib import Path
def check_file_wc_count(path: Path, regex: str):
try:
zgrep = subprocess.run(['zgrep', regex, path], check=True, stdout=subprocess.PIPE)
except subprocess.CalledProcessError as e:
return 0
output = subprocess.run(['wc', '-l'], input=zgrep.stdout, capture_output=True, check=True)
return int(output.stdout.decode('utf-8').strip())

在读取大文件(使用gzip，因此使用zgrep(时，我观察到内存使用量很大。当单独使用linux实用程序时，(我认为(通常不会发生这种情况。我猜这是因为我使用subprocess.PIPE的方式，我猜它将zgrep调用的stdout存储在缓冲区中，直到它被读取到wc调用的输入中。

这个假设正确吗？在python中有没有办法避免这种情况？

像我在发布的示例中那样使用subprocess.PIPE确实会将stdout保存到内部缓冲区。为了避免这种情况，评论中的@chepner和@Omer Dagri解决方案似乎有效。我认为使用zgrep -c {regex} {path}是最直接的解决方案：

def check_num_quotes(path: Path, regex: str):
try:
output = subprocess.run(['zgrep', '-c', regex, path], capture_output=True, check=True)
except subprocess.CalledProcessError as e:
return 0
return int(output.stdout.decode('utf-8').strip())

编辑：我的完整用例是搜索一个目录，找到所有匹配的文件，并为每个匹配文件中的每个匹配regex匹配获取一个wc，然后输出哪个文件。这可以在一个命令中完成：

find . -type f -name "{file_regex}" -exec zgrep -cH "{match_regex}" {} ;

相关内容

最新更新

热门标签：