我在这里遇到了麻烦。我需要读取一个文件。包含一系列记录的 Txt 文件,请检查我要将它们复制到新文件的记录。文件内容是这样的(这只是一个例子,原始文件有超过30 000行(:
AAAAA|12|120 #begin file
00000|46|150 #begin register
03000|TO|460
99999|35|436 #end register
00000|46|316 #begin register
03000|SP|467
99999|33|130 #end register
00000|46|778 #begin register
03000|TO|478
99999|33|457 #end register
ZZZZZ|15|111 #end file
以 03000 开头且具有字符"TO"的记录必须写入新文件。根据该示例,该文件应如下所示:
AAAAA|12|120 #begin file
00000|46|150 #begin register
03000|TO|460
99999|35|436 #end register
00000|46|778 #begin register
03000|TO|478
99999|33|457 #end register
ZZZZZ|15|111 #end file
法典:
file = open("file.txt",'r')
newFile = open("newFile.txt","w")
content = file.read()
file.close()
# here I need to check if the record exists 03000 characters 'TO', if it exists, copy the recordset 00000-99999 for the new file.
我做了多次搜索,但没有任何帮助。谢谢!
with open("file.txt",'r') as inFile, open("newFile.txt","w") as outFile:
outFile.writelines(line for line in inFile
if line.startswith("03000") and "TO" in line)
如果您需要上一行和下一行,则必须在三合会中迭代inFile
。首先定义:
def gen_triad(lines, prev=None):
after = current = next(lines)
for after in lines:
yield prev, current, after
prev, current = current, after
然后像以前一样做:
outFile.writelines(''.join(triad) for triad in gen_triad(inFile)
if triad[1].startswith("03000") and "TO" in triad[1])
import re
pat = ('^00000|d+|d+.*n'
'^03000|TO|d+.*n'
'^99999|d+|d+.*n'
'|'
'^AAAAA|d+|d+.*n'
'|'
'^ZZZZZ|d+|d+.*')
rag = re.compile(pat,re.MULTILINE)
with open('fifi.txt','r') as f,
open('newfifi.txt','w') as g:
g.write(''.join(rag.findall(f.read())))
对于以 00000、03000 和 99999 开头的行之间有额外行的文件,我没有找到比这更简单的代码:
import re
pat = ('(^00000|d+|d+.*n'
'(?:.*n)+?'
'^99999|d+|d+.*n)'
'|'
'(^AAAAA|d+|d+.*n'
'|'
'^ZZZZZ|d+|d+.*)')
rag = re.compile(pat,re.MULTILINE)
pit = ('^00000|.+?^03000|TO|d+.+?^99999|')
rig = re.compile(pit,re.DOTALL|re.MULTILINE)
def yi(text):
for g1,g2 in rag.findall(text):
if g2:
yield g2
elif rig.match(g1):
yield g1
with open('fifi.txt','r') as f,
open('newfifi.txt','w') as g:
g.write(''.join(yi(f.read())))
file = open("file.txt",'r')
newFile = open("newFile.txt","w")
content = file.readlines()
file.close()
newFile.writelines(filter(lambda x:x.startswith("03000") and "TO" in x,content))
似乎有效。其他答案似乎只是写出包含'03000|TO|',但你也必须在那之前和之后写出记录。
import sys
# ---------------------------------------------------------------
# ---------------------------------------------------------------
# import file
file_name = sys.argv[1]
file_path = 'C:\DATA_SAVE\pick_parts\' + file_name
file = open(file_path,"r")
# ---------------------------------------------------------------
# create output files
output_file_path = 'C:\DATA_SAVE\pick_parts\' + file_name + '.out'
output_file = open(output_file_path,"w")
# create output files
# ---------------------------------------------------------------
# process file
temp = ''
temp_out = ''
good_write = False
bad_write = False
for line in file:
if line[:5] == 'AAAAA':
temp_out += line
elif line[:5] == 'ZZZZZ':
temp_out += line
elif good_write:
temp += line
temp_out += temp
temp = ''
good_write = False
elif bad_write:
bad_write = False
temp = ''
elif line[:5] == '03000':
if line[6:8] != 'TO':
temp = ''
bad_write = True
else:
good_write = True
temp += line
temp_out += temp
temp = ''
else:
temp += line
output_file.write(temp_out)
output_file.close()
file.close()
输出:
AAAAA|12|120 #begin file
00000|46|150 #begin register
03000|TO|460
99999|35|436 #end register
00000|46|778 #begin register
03000|TO|478
99999|33|457 #end register
ZZZZZ|15|111 #end file
它必须是python吗?这些 shell 命令会在紧要关头做同样的事情。
head -1 inputfile.txt > outputfile.txt
grep -C 1 "03000|TO" inputfile.txt >> outputfile.txt
tail -1 inputfile.txt >> outputfile.txt
# Whenever I have to parse text files I prefer to use regular expressions
# You can also customize the matching criteria if you want to
import re
what_is_being_searched = re.compile("^03000.*TO")
# don't use "file" as a variable name since it is (was?) a builtin
# function
with open("file.txt", "r") as source_file, open("newFile.txt", "w") as destination_file:
for this_line in source_file:
if what_is_being_searched.match(this_line):
destination_file.write(this_line)
对于那些喜欢更紧凑表示的人:
import re
with open("file.txt", "r") as source_file, open("newFile.txt", "w") as destination_file:
destination_file.writelines(this_line for this_line in source_file
if re.match("^03000.*TO", this_line))
代码:
fileName = '1'
fil = open(fileName,'r')
import string
##step 1: parse the file.
parsedFile = []
for i in fil:
##tuple1 = (1,2,3)
firstPipe = i.find('|')
secondPipe = i.find('|',firstPipe+1)
tuple1 = (i[:firstPipe],
i[firstPipe+1:secondPipe],
i[secondPipe+1:i.find('n')])
parsedFile.append(tuple1)
fil.close()
##search criterias:
searchFirst = '03000'
searchString = 'TO' ##can be changed if and when required
##step 2: used the parsed contents to write the new file
filout = open('newFile','w')
stringToWrite = parsedFile[0][0] + '|' + parsedFile[0][1] + '|' + parsedFile[0][2] + 'n'
filout.write(stringToWrite) ##to write the first entry
for i in range(1,len(parsedFile)):
if parsedFile[i][1] == searchString and parsedFile[i][0] == searchFirst:
for j in range(-1,2,1):
stringToWrite = parsedFile[i+j][0] + '|' + parsedFile[i+j][1] + '|' + parsedFile[i+j][2] + 'n'
filout.write(stringToWrite)
stringToWrite = parsedFile[-1][0] + '|' + parsedFile[-1][1] + '|' + parsedFile[-1][2] + 'n'
filout.write(stringToWrite) ##to write the first entry
filout.close()
我知道这个解决方案可能有点长。但这很容易理解。这似乎是一种直观的方法。我已经用您提供的数据检查了这一点,它运行良好。
如果您需要有关代码的更多解释,请告诉我。我一定会添加相同的内容。
我提示(比斯利和乔兰·伊利亚斯(非常有趣,但它只允许获取 03000 行的内容。我想将 00000 行的内容获取到 99999 行。我什至设法在这里做到了,但我不满意,我想做一个更干净的。看看我是怎么做到的:
file = open(url,'r')
newFile = open("newFile.txt",'w')
lines = file.readlines()
file.close()
i = 0
lineTemp = []
for line in lines:
lineTemp.append(line)
if line[0:5] == '03000':
state = line[21:23]
if line[0:5] == '99999':
if state == 'TO':
newFile.writelines(lineTemp)
else:
linhaTemp = []
i = i+1
newFile.close()
建议。。。谢谢大家!