我有一个文本文件，其中的行顺序如下：

1 id:0  e1:"a"  e2:"b"
0 id:0  e1:"4"  e2:"c"
0 id:1  e1:"6"  e2:"d"
2 id:2  e1:"8"  e2:"f"
2 id:2  e1:"9"  e2:"f"
2 id:2  e1:"d"  e2:"k"

我必须提取一个包含元素(e1，e2(的列表列表，id按照行的顺序确定外部列表和内部列表的索引。所以在上面的情况下，我的输出将是

[[("a","b"),("4","c")],[("6","d")],[("8","f"),("9","f"),("d","k")]]

对我来说，问题是要知道新内部列表的开头，我需要检查 id 值是否已更改。每个 id 没有固定数量的元素。例如，id：0 有 2，id：1 有 1，id：2 有 3。在制作列表时，是否有一种有效的方法可以在下一行中检查此条件？

您可以使用itertools.groupby()来完成作业：

import itertools

def split_by(
items,
key=None,
processing=None,
container=list):
for key_value, grouping in itertools.groupby(items, key):
if processing:
grouping = (processing(group) for group in grouping)
if container:
grouping = container(grouping)
yield grouping

称为：

from operator import itemgetter

list(split_by(items, itemgetter(0), itemgetter(slice(1, None))))

items可以很容易地从上面的文本生成(假设它包含在文件data.txt中(：

def get_items():
# with io.StringIO(text) as file_obj:  # to read from `text`
with open(filename, 'r') as file_obj:  # to read from `filename`
for line in file_obj:
if line.strip():
vals = line.replace('"', '').split()
yield tuple(val.split(':')[1]  for val in vals[1:])

最后，要测试所有部分(其中get_items()中的open(filename, 'r')替换为io.StringIO(text)(：

import io
import itertools
from operator import itemgetter

text = """
1 id:0  e1:"a"  e2:"b"
0 id:0  e1:"4"  e2:"c"
0 id:1  e1:"6"  e2:"d"
2 id:2  e1:"8"  e2:"f"
2 id:2  e1:"9"  e2:"f"
2 id:2  e1:"d"  e2:"k"
""".strip()

print(list(split_by(get_items(), itemgetter(0), itemgetter(slice(1, None)))))
# [[('a', 'b'), ('4', 'c')], [('6', 'd')], [('8', 'f'), ('9', 'f'), ('d', 'k')]]

这样可以有效地循环访问输入，而无需不必要的内存分配。

不需要其他软件包

加载并解析文件：

从文本文件开始，格式如问题所示

# parse text file into dict
with open('test.txt', 'r') as f:
text = [line[2:].replace('"', '').strip().split() for line in f.readlines()]  # clean each line and split it into a list
text = [[v.split(':') for v in t] for t in text]  # split each value in the list into a list
d =[{v[0]: v[1] for v in t} for t in text]  # convert liest to dicts

# text will appear as:
[[['id', '0'], ['e1', 'a'], ['e2', 'b']],
[['id', '0'], ['e1', '4'], ['e2', 'c']],
[['id', '1'], ['e1', '6'], ['e2', 'd']],
[['id', '2'], ['e1', '8'], ['e2', 'f']],
[['id', '2'], ['e1', '9'], ['e2', 'f']],
[['id', '2'], ['e1', 'd'], ['e2', 'k']]]
# d appears as:
[{'id': '0', 'e1': 'a', 'e2': 'b'},
{'id': '0', 'e1': '4', 'e2': 'c'},
{'id': '1', 'e1': '6', 'e2': 'd'},
{'id': '2', 'e1': '8', 'e2': 'f'},
{'id': '2', 'e1': '9', 'e2': 'f'},
{'id': '2', 'e1': 'd', 'e2': 'k'}]

将

字典列表解析为预期输出

使用.get确定键是否存在，并返回某个指定的值，None在本例中，如果键不存在。
dict.get默认为None，因此此方法永远不会引发 KeyError。
- 如果None是字典中的值，则更改.get返回的默认值。
  - test.get(v[0], 'something here')

test = dict()
for r in d:
v = list(r.values())
if test.get(v[0]) == None:
test[v[0]] = [tuple(v[1:])]
else:
test[v[0]].append(tuple(v[1:]))

# test dict appears as:
{'0': [('a', 'b'), ('4', 'c')],
'1': [('6', 'd')],
'2': [('8', 'f'), ('9', 'f'), ('d', 'k')]}
# final output
final = list(test.values())
[[('a', 'b'), ('4', 'c')], [('6', 'd')], [('8', 'f'), ('9', 'f'), ('d', 'k')]]

<小时 />

代码更新和减少：

在这种情况下，text是一个列表列表，无需将其转换为字典d，如上所述。
对于text中的每个列表t，索引[0]始终是键，索引[1:]是值。

with open('test.txt', 'r') as f:
text = [line[2:].replace('"', '').strip().split() for line in f.readlines()]  # clean each line and split it into a list
text = [[v.split(':')[1] for v in t] for t in text]  # list of list of only value at index 1
# text appears as:
[['0', 'a', 'b'],
['0', '4', 'c'],
['1', '6', 'd'],
['2', '8', 'f'],
['2', '9', 'f'],
['2', 'd', 'k']]

test = dict()
for t in text:
if test.get(t[0]) == None:
test[t[0]] = [tuple(t[1:])]
else:
test[t[0]].append(tuple(t[1:]))

final = list(test.values())

<小时 />

使用`defaultdict`

将节省几行代码
使用text作为上面的列表列表

from collections import defaultdict as dd
test = dd(list)
for t in text:
test[t[0]].append(tuple(t[1:]))
final = list(test.values())

从文本文件创建列表列表，其中新列表基于条件

不需要其他软件包

加载并解析文件：

字典列表解析为预期输出

代码更新和减少：

使用`defaultdict`

相关内容

最新更新

热门标签：

从文本文件创建列表列表，其中新列表基于条件

不需要其他软件包

加载并解析文件：

字典列表解析为预期输出

代码 更新和减少：

使用defaultdict

相关内容

最新更新

热门标签：

代码更新和减少：

使用`defaultdict`