我有一种不同类型的文件格式,在txt文件中包含数百万行。
我的文件格式是这样的:
12122.AA.K IRIR-93I3KD-OEPE-IE,6373,893939,09/12/2093,,N,EC,3838-38939-393
12123.AA.K KKKS-93I3KD-OEPE-IE,9393,039033,09/12/2093,,N,EC,3838-38939-393
12122.AA.K PEOEP-93I3KD-OEPE-IE,9033,930392,09/12/2093,,N,EC,3838-38939-393
12124.AA.K MDJDK-93I3KD-OEPE-IE,3930,272882,09/12/2093,,N,EC,3838-38939-393
12125.AA.K EOEPE-93I3KD-OEPE-IE,8393,039393,09/12/2093,,N,EC,3838-38939-393
在 Python 中,我想将每一行拆分为一个键和一个值:
Key: 12122.AA.K
Value: IRIR-93I3KD-OEPE-IE,3833,343343,09/12/2093,,N,EC,3838-38939-393
如您所见,键和值仅由一个空格区分。
进入python的有效方法是什么?
with open(filename) as f:
mapping = dict(line.split(' ', 1) for line in f)
with open('file.txt','r') as file:
thedict={e.split(' ')[0]:e.split(' ')[1] for e in file}
你可以试试这个字典理解
这将是矫枉过正,但您也可以使用内置的csv模块。
虽然默认情况下它设计为适用于逗号分隔值,但它确实提供了一种注册自定义方言以匹配自定义文件格式的方法,例如具有空格分隔值的文件。方言和格式参数包括一个delimiter
属性,您可以在该属性中设置为空格" "
。
import csv
from pprint import pprint
csv.register_dialect("my_custom_dialect", delimiter=" ")
mapping1 = {}
with open("test.txt") as f:
reader = csv.reader(f, dialect="my_custom_dialect")
for row in reader:
# Each row is a list of strings separated by the delimiter
key, value = row
mapping1[key] = value
pprint(mapping1)
{'12122.AA.K': 'IRIR-93I3KD-OEPE-IE,6373,893939,09/12/2093,,N,EC,3838-38939-393',
'12123.AA.K': 'KKKS-93I3KD-OEPE-IE,9393,039033,09/12/2093,,N,EC,3838-38939-393',
'12124.AA.K': 'PEOEP-93I3KD-OEPE-IE,9033,930392,09/12/2093,,N,EC,3838-38939-393',
'12125.AA.K': 'MDJDK-93I3KD-OEPE-IE,3930,272882,09/12/2093,,N,EC,3838-38939-393',
'12126.AA.K': 'EOEPE-93I3KD-OEPE-IE,8393,039393,09/12/2093,,N,EC,3838-38939-393'}
如果您的文件具有标头,则可以利用 csv
的DictReader
将每一行的值作为字典进行访问。
KEY VALUE
12122.AA.K IRIR-93I3KD-OEPE-IE,6373,893939,09/12/2093,,N,EC,3838-38939-393
12123.AA.K KKKS-93I3KD-OEPE-IE,9393,039033,09/12/2093,,N,EC,3838-38939-393
12124.AA.K PEOEP-93I3KD-OEPE-IE,9033,930392,09/12/2093,,N,EC,3838-38939-393
import csv
from pprint import pprint
csv.register_dialect("my_custom_dialect", delimiter=" ")
mapping2 = {}
with open("test_with_headers.txt") as f:
reader = csv.DictReader(f, dialect="my_custom_dialect")
for row in reader:
# 'row' is a dictionary with the headers as the key
mapping2[row["KEY"]] = row["VALUE"]
pprint(mapping2)
{'12122.AA.K': 'IRIR-93I3KD-OEPE-IE,6373,893939,09/12/2093,,N,EC,3838-38939-393',
'12123.AA.K': 'KKKS-93I3KD-OEPE-IE,9393,039033,09/12/2093,,N,EC,3838-38939-393',
'12124.AA.K': 'PEOEP-93I3KD-OEPE-IE,9033,930392,09/12/2093,,N,EC,3838-38939-393'}