我有一个定义如下的数据类:
from typing import List
from dataclasses import dataclass, field
@dataclass
class Speaker:
id: int
name: str
statements: List[str] = field(default_factory=list)
def __eq__(self, other):
return self.id == other.id and self.name == other.name
def __hash__(self):
return hash((self.id, self.name))
,我有一个名字和语句列表,我想把它们组合起来。列表中的每一项都有一个id,这个id可以被共享任意次数。我想把列表中每个项目的语句部分附加到发言人集合中。
这是我目前为止写的:
test = [(1, 'john', 'foo'),(1, 'john', 'bar'),(2, 'jane', 'near'),(2, 'george', 'far')]
speakers = set()
for i in test:
id, name, statement = i
Speaker(id, name)
# This line needs to change
speakers.add(Speaker(id, name, [statement]))
print(speakers)
电流输出{Speaker(id=1, name='john', statements=['foo']), Speaker(id=2, name= 'jane', statements=['near']), Speaker(id=2, name= 'george', statements=['far'])}
我想要的
{Speaker(id=1, name='john', statements=['foo', 'bar']), Speaker(id=2, name= 'jane', statements=['near']), Speaker(id=2, name= 'george', statements=['far'])}
如果你有什么建议请告诉我。数字字段可能会更改(我可能会添加标题等),因此转换为字典可能无法工作。
编辑:增加了一个名为name的额外字段来澄清情况。
而不是使用set
,我认为在这种情况下使用dict
类型更有意义。这应该是O(1)
进行查找,这样我们就可以避免next
通过Speaker
的哈希值来查找元素。在下面的例子。
from pprint import pprint
from typing import List
from dataclasses import dataclass, field
@dataclass
class Speaker:
id: int
name: str
statements: List[str] = field(default_factory=list)
def __eq__(self, other):
return self.id == other.id and self.name == other.name
def __hash__(self):
return hash((self.id, self.name))
test = [(1, 'john', 'foo'), (1, 'john', 'bar'), (2, 'jane', 'near'), (2, 'george', 'far')]
speakers = {}
for id, name, statement in test:
key = Speaker(id, name)
speaker = speakers.setdefault(key, key)
speaker.statements.append(statement)
print(speakers)
print()
pprint(list(speakers.values()))
:
{Speaker(id=1, name='john', statements=['foo', 'bar']): Speaker(id=1, name='john', statements=['foo', 'bar']), Speaker(id=2, name='jane', statements=['near']): Speaker(id=2, name='jane', statements=['near']), Speaker(id=2, name='george', statements=['far']): Speaker(id=2, name='george', statements=['far'])}
[Speaker(id=1, name='john', statements=['foo', 'bar']),
Speaker(id=2, name='jane', statements=['near']),
Speaker(id=2, name='george', statements=['far'])]
Edit:由于数据类本身是可哈希的,在这里我认为将Speaker对象本身(由id
,name
,__hash__
中定义的任何其他字段唯一标识)存储为键和值是有意义的。这有点迂回,所以如果你想,你也可以存储你想要散列的值的元组-即(id, name)
-这也应该工作。在任何一种情况下,这应该仍然是更有效的,因为它使用dict.setdefault
,这仍然是O(1)
时间。
可以这样做:
for i in test:
id, statement = i
new_speaker = Speaker(id, [statement])
match = next((x for x in speakers if x == new_speaker), None)
if match:
match.statements.append(statement)
else:
speakers.add(new_speaker)
不确定是否有方法可以在更短的时间内完成,如果你想到任何,请评论。
如果一个集合涵盖了您拥有的所有用例,但是为了合并一个或多个字段,您可以自定义一个集合,以便所有更改集合的调用都合并属性。
最简单的方法是编写一个从collections.abc.MutableSet
派生的类,这将最小化要实现的方法数量,并确保所有更改调用都通过您的代码(不像,例如,子类化Python的原生set
)。
再考虑一下:在内部拥有一个集合并不会削减它——需要一个映射,以便于检索特定元素以更新它。这种方法允许我们保持一个固定的接口,但是,我们可以添加"update"one_answers";get"方法:
from collections.abc import MutableMapping
class MergingSet(MutableSet):
def __init__(self):
# do not allow an initial content to avoid corner cases.
# call .update later to feed several instances at once
self.data = dict()
def __contains__(self, item):
return item in self.data
def __iter__(self):
return iter(self.data)
def __len__(self):
return len(self.data)
def discard(self, item):
del self.data[item]
self.data.discard(item)
def add(self, item):
if item in self:
our_item = self.data[item]
# hardcoded attribute merging:
our_item.statements.extend(item.statements)
else:
self.data[item] = item
def get(self, item):
return self.data[item] # returns the matching instance in the inner data, with the mutable fields
# modified.
def update(self, iterable):
for item in iterable:
self.add(item)
def get(self, item, default=None):
return self.data.get(item, default)
def __repr__(self):
return repr(set(self.data.keys()))
在你的例子中运行这个:
In [28]:
...: test = [(1, 'john', 'foo'),(1, 'john', 'bar'),(2, 'jane', 'near'),(2, 'george', 'far')]
...:
In [29]: speakers = MergingSet()
In [30]: for i in test:
...: id, name, statement = i
...: Speaker(id, name)
...:
...: # This line needs to change
...: speakers.add(Speaker(id, name, [statement]))
...:
In [31]: speakers
Out[31]: {Speaker(id=2, name='jane', statements=['near']), Speaker(id=1, name='john', statements=['foo', 'bar']), Speaker(id=2, name='george', statements=['far'])}