我试图解析2管道/逗号分隔的文件,如果特定字段在文件中匹配,则在第三个文件中创建新条目。
代码如下:
#! /usr/bin/python
fo = open("c-1.txt" , "r" )
for line in fo:
#print line
fields = line.split('|')
src = fields[0]
f1 = open("Airport.txt", 'r')
f2 = open("b.txt", "a")
#with open('c.csv', 'r') as f1:
# line1 = f1.read()
for line1 in f1:
reader = line1.split(',')
hi = False
target = reader[0]
if target == src and fields[1] == 'ZHT':
print target
hi = True
f2.write(fields[0])
f2.write("|")
f2.write(fields[1])
f2.write("|")
f2.write(fields[2])
f2.write("|")
f2.write(fields[3])
f2.write("|")
f2.write(fields[4])
f2.write("|")
f2.write(fields[5])
f2.write("|")
f2.write(reader[2])
if hi == False:
f2.write(line)
f2.close()
f1.close()
fo.close()
匹配字段在新文件中打印2次。原因是什么呢?
问题似乎是您在循环的每次迭代中将hi
重置为False
。假设第二行匹配,但第三行不匹配。在第二行将hi
设置为True
,但在第三行再次设置为False
,然后打印原始line
。
试试:
hi = False
for line1 in f1:
reader = line1.split(',')
target = reader[0]
if target == src and fields[1] == 'ZHT':
hi = True
f2.write(stuff)
if hi == False:
f2.write(line)
或者,假设只有一行可以匹配,您可以使用for/else
:
for line1 in f1:
reader = line1.split(',')
target = reader[0]
if target == src and fields[1] == 'ZHT':
f2.write(stuff)
break
else:
f2.write(line)
还需要注意的是,您可以用下面的语句替换f2.write
语句系列,将|
:
f2.write('|'.join(fields[0:6] + [reader[2]])
如前所述,您在循环中重置了标志,因此易于打印多行。
如果确定只有一行是匹配的,那么在找到该行后就应该中断循环。
,最后检查你的数据,以确保没有相同的匹配行。
除此之外,我还有一些其他的建议来清理你的代码,使其更容易调试:
1)使用csv
库。
2)如果文件可以保存在内存中,最好将它们保存在内存中,而不是不断地打开和关闭它们。
3)使用with
来处理文件(我不是你已经在你的评论中尝试过)。
下面的内容应该可以工作。
#! /usr/bin/python
import csv
data_0 = {}
data_1 = {}
with open("c-1.txt" , "r" ) as fo, open("Airport.txt", "r") as f1:
fo_reader = csv.reader(fo, delimiter="|")
f1_reader = csv.reader(f1) # default delimiter is ','
for line in fo_reader:
if line[1] == 'ZHT':
try: # Add to a list here in case keys are duplicated.
data_0[line[0]].append(line)
except KeyError:
data_0[line[0]] = [line]
for line in f1_reader:
data_1[line[0]] = line[2] # We only need the third column of this row to append to the data.
with open("b.txt", "a") as f2:
writer = csv.writer(f2, delimiter="|") # I would be tempted to not make this a pipe, but probably too late already if you've got a pre-made file.
for key in data_0:
if key in data_1.keys():
for row in data_0[key]:
writer.writerow(row[:6]+data_1[key]) # index to the 6th column, and append the data from the other file.
else:
for row in data_0[key]:
writer.writerow(row)
应该避免有额外的行,并且没有true/False标志可以依赖。