根据文本文件的单个部分比较两个文本文件中的两行



我有两个文本文件,我想根据两个原始文本文件中每行是否有一个公共部分来写出两个新的文本文件。

文本文件的格式如下:

commontextinallcases   uniquetext2   potentiallycommontext    uniquetext4

有超过4列,但你会明白的。我想检查每个文本文件中的"potentiallycommontext"部分,如果它们相同,请将每个文本文件的整行写到一个新的文本文件中,每个文件都有自己的唯一文本。

在读取时只需使用.split((命令就可以很容易地对其进行拆分。我发现了以下代码:

with open('some_file_1.txt', 'r') as file1:
with open('some_file_2.txt', 'r') as file2:
same = set(file1).intersection(file2)
same.discard('n')
with open('some_output_file.txt', 'w') as file_out:
for line in same:
file_out.write(line)

但我不确定这是否适用于我需要划清界限的情况。有没有办法做到这一点我很想念?

感谢

我认为这种集合方法不适合您的情况
我想试试

with open('some_file_1.txt', 'r') as file1, open('some_file_2.txt', 'r') as file2, open('some_output_file.txt', 'w') as file_out:
for line1, line2 in zip(file1, file2):
if line1.split()[2] == line2.split()[2]:
file_out.write(line1)
file_out.write(line2)

可能有较短的解决方案,但这应该能在中工作

PCT_IDX = _  # find which index of line.split() corresponds to potentiallycommontext
def lines(filename):
with open(filename, 'r') as file:
for line in file:
line = line.rstrip('n')
yield line
lines_1 = lines('some_file_1.txt')
lines_2 = lines('some_file_2.txt')
with open('some_output_file.txt', 'w') as file_out:
for (line_1, line_2) in zip(lines_1, lines_2):
maybe_cmn1 = line_1.split()[PCT_IDX]
maybe_cmn2 = line_2.split()[PCT_IDX]
if maybe_cmn1 == maybe_cmn2:
file_out.write(line_1)
file_out.write(line_2)

最新更新