如何在Python中使用zcat测试gzip文件目录并解压缩gzip文件

我在Python的第二周，我被困在压缩/解压缩日志文件的目录上，我需要解析和处理。

当前我正在做这个:

import os
import sys
import operator
import zipfile
import zlib
import gzip
import subprocess
if sys.version.startswith("3."):
    import io
    io_method = io.BytesIO
else:
    import cStringIO
    io_method = cStringIO.StringIO
for f in glob.glob('logs/*'):
    file = open(f,'rb')        
    new_file_name = f + "_unzipped"
    last_pos = file.tell()
    # test for gzip
    if (file.read(2) == b'x1fx8b'):
        file.seek(last_pos)
    #unzip to new file
    out = open( new_file_name, "wb" )
    process = subprocess.Popen(["zcat", f], stdout = subprocess.PIPE, stderr=subprocess.STDOUT)
    while True:
      if process.poll() != None:
        break;
    output = io_method(process.communicate()[0])
    exitCode = process.returncode

    if (exitCode == 0):
      print "done"
      out.write( output )
      out.close()
    else:
      raise ProcessException(command, exitCode, output)

我用这些答案(这里)和博客(这里)"缝合"在一起

然而，它似乎不起作用，因为我的测试文件是2.5GB，脚本一直在咀嚼它10+分钟加上我不确定我所做的是否正确。

问题:
如果我不想使用GZIP模块，并且需要逐块解压缩(实际文件>10GB)，我如何使用zcat和subprocess在Python中解压缩并保存到文件?

谢谢!

这将读取logs子目录中每个文件的第一行，并根据需要解压缩:

#!/usr/bin/env python
import glob
import gzip
import subprocess
for f in glob.glob('logs/*'):
  if f.endswith('.gz'):
    # Open a compressed file. Here is the easy way:
    #   file = gzip.open(f, 'rb')
    # Or, here is the hard way:
    proc = subprocess.Popen(['zcat', f], stdout=subprocess.PIPE)
    file = proc.stdout
  else:
    # Otherwise, it must be a regular file
    file = open(f, 'rb')
  # Process file, for example:
  print f, file.readline()

相关内容

最新更新

热门标签：