如何处理zip文件中的文件



我有一个包含多个文件的zip文件。我解压缩了它,并使用其中一个文件来写我想要的行(清理数据、计算(。现在我想知道如何创建一个循环来将我的行应用于zip文件中的所有文件?

from zipfile import ZipFile
import string
from collections import Counter
punctuations = '''!()-[]{};:'",<>./?@#$%^&*_~'''
stopwords = ['a', 'an', 'the', 'that', 'these', 'those', 'of', 'to', 'i', 'you'
'your', 'yours', 'he', 'she', 'they', 'me','him', 'her', 'them',
'his', 'is', 'are', 'was', 'were', 'at', 'dont', 'its']
with ZipFile('articles.zip', 'r') as zip:
with zip.open('articles/document0001.txt') as file:
file_text = file.read().decode('utf-8')
words = file_text.split()
table = str.maketrans("", "", string.punctuation)
decode_file = [w.translate(table) for w in words]
more3_file = [f for f in decode_file if len(f) > 3]
lower_file = [each_string.lower() for each_string in more3_file]
new = []
for i in range(len(lower_file) - 1, -1, -1):
if lower_file[i] not in stopwords:
new.append(lower_file[i])
counts = Counter(lower_file)

您必须使用namelist方法提取zip文件中的所有文件名。

with ZipFile('articles.zip', 'r') as zip:
for filename in zip.namelist()[1:]:
with zip.open(filename) as file:
file_text = file.read().decode('utf-8')
# rest of the code

namelist输出中的第一个元素是目录本身,在本例中为articles。我们必须跳过这一点,这就是为什么从第二个元素开始迭代——zip.namelist()[1:]

最新更新