我尝试了不同的解决方案来解决这个问题,但都不起作用,而且太乱了,无法在这里发布。所以我只提出我的问题。我有一个.txt
文件,看起来像这样:
Field1:
Something
Field2:
Something
Field3:
Field4:
Field1:
Something
Field2:
Field3:
Something
Field4:
Something
...
该文件包含4个字段,这些字段重复自身的次数不详,但始终以Field4
结尾。每个字段下要么写有一个字符串,要么没有。一个字段下是否写有内容也是随机的。如果没有,我必须在下面插入一个字符串,上面写着"Empty"
。所以最后应该是这样的:
Field1:
Something
Field2:
Something
Field3:
Empty
Field4:
Empty
Field1:
Something
Field2:
Empty
Field3:
Something
Field4:
Something
...
我的想法是打开可读的原始文本文件,打开可写的另一个文本文件,遍历原始文件的行,并在输出文件中写入每一行。如果一行中包含Field1
,而下一行包含Field2
,则在Field1
下面添加字符串Empty
,并继续对每行执行此操作。
由于文本文件不能在中间进行编辑,因此程序读取readable.txt中的每一行,并将其附加到writable.txt中,并带有更正行。
file = open("readable.txt","r")
file = file.readlines()
f = open("writable.txt", "a")
n = 0
while n < len(file):
if "Field" in file[n]:
f.write(str(file[n]))
if "Field" in file[n + 1]:
f.write("Emptyn")
n = n + 1
continue
else:
f.write(file[n + 1])
n = n + 1
continue
else:
n = n + 1
continue
file.close()
f.close()
如果你有一个大文件,你不想在处理它之前把它全部读到内存中,所以你可以逐行进行。
首先,我们可以定义一个正则表达式pattern
来匹配单词"Field"
,后跟任意数字,后跟冒号。尝试regex
每次读一行,如果前一行与此模式匹配,而当前行也与此模式相匹配,则在写此行之前先写一个"Empty"
。如果没有,你只需写下这行:
import re
pattern = re.compile(r"Fieldd+:") # Field, followed by one or more digits (d+), and a colon
with open("in.txt") as infile, open("out.txt", "w") as outfile:
prev_line = ""
for line in infile:
if pattern.match(line) and pattern.match(prev_line):
outfile.write("Emptyn") # Write an Empty line if both lines match the pattern:
outfile.write(line) # This is outside an if because we always write the current line
prev_line = line
对于您的输入文件,这将给出:
Field1:
Something
Field2:
Something
Field3:
Empty
Field4:
Empty
Field1:
Something
Field2:
Empty
Field3:
Something
Field4:
Something