我试图使用Python来操纵格式a的文本文件:
Key1
Key1value1
Key1value2
Key1value3
Key2
Key2value1
Key2value2
Key2value3
Key3...
Into Format B:
Key1 Key1value1
Key1 Key1value2
Key1 Key1value3
Key2 Key2value1
Key2 Key2value2
Key2 Key2value3
Key3 Key3value1...
具体来说,这里简要介绍一下文件本身(只显示一个键,整个文件中有数千个键):
chr22:16287243: PASS
patientID1 G/G
patientID2 G/G
patient ID3 G/G
和这里想要的输出:
chr22:16287243: PASS patientID1 G/G
chr22:16287243: PASS patientID2 G/G
chr22:16287243: PASS patientID3 G/G
我已经编写了以下代码,可以检测/显示键,但我有麻烦编写代码来存储与每个键相关联的值,并随后打印这些键值对。有人能帮我做这项工作吗?
import sys
import re
records=[]
with open('filepath', 'r') as infile:
for line in infile:
variant = re.search("Achrd",line, re.I) # all variants start with "chr"
if variant:
records.append(line.replace("n",""))
#parse lines until a new variant is encountered
for r in records:
print (r)
一次性完成,不存储以下行:
with open("input") as infile, open("ouptut", "w") as outfile:
for line in infile:
if line.startswith("chr"):
key = line.strip()
else:
print >> outfile, key, line.rstrip("n")
此代码假设第一行包含一个键,否则将失败。
首先,如果字符串以字符序列开头,不要使用正则表达式。更简单,更容易阅读:
if line.startswith("chr")
下一步是使用一个非常简单的状态机。像这样:
current_key = ""
for line in file:
if line.startswith("chr"):
current_key = line.strip()
else:
print " ".join([current_key, line.strip()])
如果每个键的值数量总是相同的,那么islice是有用的:
from itertools import islice
with open('input.txt') as fin, open('output.txt','w') as fout:
for k in fin:
for v in islice(fin,3):
fout.write(' '.join((k.strip(),v)))