连接包含多个单词的文件



我目前是Python的新手,我想做以下事情:

text1.txt中,我将具有:

1
2
7
9

text2.txt中,我将具有:

1
2
2
2
3
4

我正在寻找一个解决方案,我的输出将在这个特定的顺序:

1
2
7
9
2
2
3
4

首先,我想比较这两个文件,然后得到一个连接的文件,但如果其中一个文件中有多个单词,这在两个文件中都是常见的,我希望在输出中,也应该是一个文件,它的次数与重复错误最多的文件中的次数一样多。在这种情况下,file2有3次2file1有1次2,所以我希望在输出中有3次的2,但2的额外次数最终会增加。此外,应该在输出的末尾添加两个文件之间不常见的内容。我一开始是这样的,直到现在比较这两个文件并将它们连接起来,但我不知道如何添加发现的多个常用词:

import glob
read_files=glob.glob("1.txt,output1.txt")
file1=open('1.txt','r')
file3=open('3.txt','r')
with open('3.txt', 'r') as file3:
with open('1.txt', 'r') as file1:
same1 = set(file3).difference(file1)
same1.discard('n')
with open('output1.txt', 'w') as file_out:
for line in same1:
file_out.write(line)
filename=['output1.txt','1.txt']
with open('output_final.txt', 'w') as outfile:
for fname in filename:
with open(fname) as infile:
for line in infile:
outfile.write(line)

有没有可能从文件名列表中选择一个文件?像filename(1)这样的东西只操作第一个文件还是第二个文件?这很难解释,但我认为这个例子比我的解释更能作为参考。

我认为您可以将这两个文件都加载为列表(我假设它们不是很大),然后迭代第一个文件,从第二个文件中删除这些元素,然后连接列表。类似于:

with open('text1.txt', 'r') as f:
li1 = f.readlines()
with open('text2.txt', 'r') as f:
li2 = f.readlines()
li1 = list(map(lambda t: int(t.strip()), li1))
li2 = list(map(lambda t: int(t.strip()), li2))
for i in li1:
try:
li2.remove(i)
except ValueErorr:
pass
li1.extend(li2)

li1现在应该有想要的输出:

In [27]: print(li1)
[1, 2, 7, 9, 2, 2, 3, 4]

如果您需要这种代码,这里是:

import fileinput
import collections
import re
import csv
import xlrd
file1=open('1.txt','r')
file3=open('3.txt','r')
with open('3.txt', 'r') as file3:
with open('1.txt', 'r') as file1:
diff = set(file3).difference(file1)
diff.discard('n')
with open('difference.txt', 'w') as file_out:
for line in diff:
file_out.write(line)
finput = fileinput.FileInput('1.txt')
finput.next()
ginput = fileinput.FileInput('3.txt')
ginput.next()   
lines1=open('1.txt','r').read().splitlines()
lines2=open('3.txt','r').read().splitlines()
with open('duplicate.txt','w') as gout:
count1 = collections.Counter(lines1)
count2 = collections.Counter(lines2)
final_lines = lines1
seen = set()
for line in lines2:
if line not in seen:
seen.add(line)
if count1[line] < count2[line]:
final_lines += ([line] * (count2[line] - count1[line]))
for line in final_lines:
gout.write("%sn" % line)

最新更新