Python:连接文本文件



使用Python,我正在寻求迭代地组合两组txt文件以创建第三组txt文件。

我有一个目录的txt文件分为两类:

  1. text_[number].txt(例如:text_0.txt,text_1.txt,text_2.txt....text_20.txt)
  2. comments_[number].txt(例如:comments_0.txt,comments_1.txt,comments_2.txt…)comments_20.txt)。

我想迭代地将text_[number]文件与匹配的comments_[number]文件组合到一个新的文件类别feedback_[number].txt中。该脚本将把text_0.txtcomments_0.txt组合成feedback_0.txt,并继续遍历目录中的每一对。textcomments的文件数量总是匹配的,但textcomment的文件总数会根据前面的脚本而变化。

我可以使用下面的代码和文件对列表组合两对:

filenames = ['text_0.txt', 'comments_0.txt']
with open("feedback_0.txt", "w") as outfile:
for filename in filenames:
with open(filename) as infile:
contents = infile.read()
outfile.write(contents)

然而,我不确定如何为其余文件构建迭代。我也很好奇如何从文件目录的内容生成列表。如有任何建议或协助,我将不胜感激。

直接分叉cat进程要简单得多(可能也更快):

import subprocess

n = ... # number of files
for i in range(n):
with open(f'feedback_{i}.txt', 'w') as f:
subprocess.run(['cat', 'text_{i}.txt', 'comments_{i}.txt'], stdout=f)

或者,如果您已经有了文件名列表:

for text, comment, feedback in zip(text_files, comment_files, feedback_files):
with open(feedback, 'w') as f:
subprocess.run(['cat', text, comment], stdout=f)

除非这些都是非常小的文件,否则读写字节的成本将超过为每对创建一个新进程的成本。

也许不是最优雅的,但是…

length = 10
txt = [f"text_{n}.txt" for n in range(length)]
com = [f"comments_{n}.txt" for n in range(length)]
feed = [f"feedback_{n}.txt" for n in range(length)]
for f, t, c in zip(feed, txt, com):
with open(f, "w") as outfile:
with open(t) as infile1:
contents = infile1.read()
outfile.write(contents)
with open(c) as infile2:
contents = infile2.read()
outfile.write(contents)

有很多方法可以实现这一点,但我似乎没有看到任何解决方案既适合初学者,又考虑到您所描述的文件结构。

您可以遍历文件,对于每个text_[num].txt,获取相应的comments_[num].txt并写入feedback_[num].txt,如下所示。不需要添加任何计数器或对文件做任何其他可能不总是正确的假设:

import os
srcpath = 'path/to/files'
for f in os.listdir(srcpath):
if f.startswith('text'):
index = f[5:-4] # extract the [num] part
# Build the paths to text, comment, feedback files
txt_path = os.path.join(srcpath, f)
cmnt_path = os.path.join(srcpath, f'comments_{index}.txt')
fb_path = os.path.join(srcpath, f'feedback_{index}.txt')
# write to output – reading in in byte mode following chepner's advice
with open(fb_path, 'wb') as outfile:
outfile.write(open(txt_path, 'rb').read())
outfile.write(open(cmnt_path, 'rb').read())

最简单的方法可能是从1开始迭代,在第一个丢失的文件处停止。假设您的文件按递增顺序编号并且没有间隔(例如,您有1,2,3而不是1,3)。

import os
from itertools import count
for i in count(1):
t = f'text_{i}.txt'
c = f'comments_{i}.txt'
if not os.path.isfile(t) or not os.path.isfile(c):
break
with open(f'feedback_{i}.txt', 'wb') as outfile:
outfile.write(open(t, 'rb').read())
outfile.write(open(c, 'rb').read())

你可以试试这个

filenames = ['text_0.txt', 'comments_0.txt','text_1.txt', 'comments_1.txt','text_2.txt', 'comments_2.txt','text_3.txt', 'comments_3.txt']
for i,j in enumerate (zip(filenames[::2],filenames[1::2])):
with open(f'feedback_{i}','a+') as file:
for k in j:
with open(k,'r') as f:
files=f.read()
file.write(files)
我这里有一张单子。相反,您可以使用
import os
filenames=os.listdir('path/to/folder')

最新更新