Python2.7:以并行模式解压缩文件(使用线程)



我正在学习Python线程,同时试图改进我的旧untaring脚本。

主要部分如下:

import tarfile, os, threading
def untar(fname, path):
    print "Untarring " + fname
    try:
        ut = tarfile.open(os.path.join(path,fname), "r:gz")
        ut.extractall(path)
        ut.close()
    except tarfile.ReadError as e:          #in case it's not gziped
        print e
        ut = tarfile.open(os.path.join(path,fname), "r:*")
        ut.extractall(path)
        ut.close()
def untarFolder(path):
    if path == ".":
        path = os.getcwd()
    print "path", path
    ListTarFiles = serveMenu(path)         # function what parse folder 
                                           # content for tars, and tar.gz 
                                           # files and return list of them
    print "ListTarFiles ", ListTarFiles 
    for filename in ListTarFiles:
        print "filename: ", filename
        t = threading.Thread(target=untar, args = (filename,path))
        t.daemon = True
        t.start()
        print "Thread:", t

所以目标不是一个一个地解压缩给定文件夹中的所有文件,而是同时以并行模式解压缩。这可能吗?

输出:

bogard@testlab:~/Toolz/untar$ python untar01.py -f .
path /home/bogard/Toolz/untar
ListTarFiles ['tar1.tgz', 'tar2.tgz', 'tar3.tgz']
filename:  tar1.tgz
Untarring tar1.tgz
 Thread: <Thread(Thread-1, started daemon 140042104731392)>
filename:  tar2.tgz
Untarring tar2.tgz
 Thread: <Thread(Thread-2, started daemon 140042096338688)>
filename:  tar3.tgz
Untarring tar3.tgz
 Thread: <Thread(Thread-3, started daemon 140042087945984)>

在输出中可以看到脚本创建线程,但它没有解压缩任何文件。有什么隐情?

可能发生的情况是,脚本在线程实际完成之前返回。您可以等待一个线程完成Thread.join()。也许你可以试试这样做:

threads = []
for filename in ListTarFiles:
    t = threading.Thread(target=untar, args = (filename,path))
    t.daemon = True
    threads.append(t)
    t.start()
# Wait for each thread to complete
for thread in threads:
    thread.join()

另外,根据要解压缩的文件的数量,您可能需要限制正在启动的作业的数量,这样您就不会试图一次解压缩1000个文件。你可以用multiprocessing.Pool

最新更新