Python添加数据类到Set



我有一个定义如下的数据类:

from typing import List
from dataclasses import dataclass, field
@dataclass
class Speaker:
id: int
name: str
statements: List[str] = field(default_factory=list)
def __eq__(self, other):
return self.id == other.id and self.name == other.name
def __hash__(self):
return hash((self.id, self.name))

,我有一个名字和语句列表,我想把它们组合起来。列表中的每一项都有一个id,这个id可以被共享任意次数。我想把列表中每个项目的语句部分附加到发言人集合中。

这是我目前为止写的:

test = [(1, 'john', 'foo'),(1, 'john', 'bar'),(2, 'jane', 'near'),(2, 'george', 'far')]
speakers = set()
for i in test:
id, name, statement = i
Speaker(id, name)

# This line needs to change
speakers.add(Speaker(id, name, [statement]))
print(speakers)

电流输出{Speaker(id=1, name='john', statements=['foo']), Speaker(id=2, name= 'jane', statements=['near']), Speaker(id=2, name= 'george', statements=['far'])}

我想要的

{Speaker(id=1, name='john', statements=['foo', 'bar']), Speaker(id=2, name= 'jane', statements=['near']), Speaker(id=2, name= 'george', statements=['far'])}

如果你有什么建议请告诉我。数字字段可能会更改(我可能会添加标题等),因此转换为字典可能无法工作。

编辑:增加了一个名为name的额外字段来澄清情况。

而不是使用set,我认为在这种情况下使用dict类型更有意义。这应该是O(1)进行查找,这样我们就可以避免next通过Speaker的哈希值来查找元素。在下面的例子。

from pprint import pprint
from typing import List
from dataclasses import dataclass, field

@dataclass
class Speaker:
id: int
name: str
statements: List[str] = field(default_factory=list)
def __eq__(self, other):
return self.id == other.id and self.name == other.name
def __hash__(self):
return hash((self.id, self.name))

test = [(1, 'john', 'foo'), (1, 'john', 'bar'), (2, 'jane', 'near'), (2, 'george', 'far')]
speakers = {}
for id, name, statement in test:
key = Speaker(id, name)
speaker = speakers.setdefault(key, key)
speaker.statements.append(statement)
print(speakers)
print()
pprint(list(speakers.values()))

:

{Speaker(id=1, name='john', statements=['foo', 'bar']): Speaker(id=1, name='john', statements=['foo', 'bar']), Speaker(id=2, name='jane', statements=['near']): Speaker(id=2, name='jane', statements=['near']), Speaker(id=2, name='george', statements=['far']): Speaker(id=2, name='george', statements=['far'])}
[Speaker(id=1, name='john', statements=['foo', 'bar']),
Speaker(id=2, name='jane', statements=['near']),
Speaker(id=2, name='george', statements=['far'])]

Edit:由于数据类本身是可哈希的,在这里我认为将Speaker对象本身(由id,name,__hash__中定义的任何其他字段唯一标识)存储为键和值是有意义的。这有点迂回,所以如果你想,你也可以存储你想要散列的值的元组-即(id, name)-这也应该工作。在任何一种情况下,这应该仍然是更有效的,因为它使用dict.setdefault,这仍然是O(1)时间。

可以这样做:

for i in test:
id, statement = i
new_speaker = Speaker(id, [statement])

match = next((x for x in speakers if x == new_speaker), None)
if match:
match.statements.append(statement)
else:
speakers.add(new_speaker)

不确定是否有方法可以在更短的时间内完成,如果你想到任何,请评论。

如果一个集合涵盖了您拥有的所有用例,但是为了合并一个或多个字段,您可以自定义一个集合,以便所有更改集合的调用都合并属性。

最简单的方法是编写一个从collections.abc.MutableSet派生的类,这将最小化要实现的方法数量,并确保所有更改调用都通过您的代码(不像,例如,子类化Python的原生set)。

再考虑一下:在内部拥有一个集合并不会削减它——需要一个映射,以便于检索特定元素以更新它。这种方法允许我们保持一个固定的接口,但是,我们可以添加"update"one_answers";get"方法:


from collections.abc import MutableMapping
class MergingSet(MutableSet):
def __init__(self):
# do not allow an initial content to avoid corner cases.
# call .update  later to feed several instances at once
self.data = dict()

def __contains__(self, item):
return item in self.data

def __iter__(self):
return iter(self.data)

def __len__(self):
return len(self.data)

def discard(self, item):
del self.data[item]
self.data.discard(item)

def add(self, item):
if item in self:
our_item = self.data[item]
# hardcoded attribute merging:
our_item.statements.extend(item.statements)
else:
self.data[item] = item

def get(self, item):
return self.data[item]  # returns the matching instance in the inner data, with the mutable fields
# modified.

def update(self, iterable):
for item in iterable:
self.add(item)
def get(self, item, default=None):
return self.data.get(item, default)    

def __repr__(self):
return repr(set(self.data.keys()))

在你的例子中运行这个:


In [28]:     
...: test = [(1, 'john', 'foo'),(1, 'john', 'bar'),(2, 'jane', 'near'),(2, 'george', 'far')]
...: 
In [29]: speakers = MergingSet()
In [30]: for i in test:
...:     id, name, statement = i
...:     Speaker(id, name)
...:     
...:     # This line needs to change
...:     speakers.add(Speaker(id, name, [statement]))
...:     
In [31]: speakers
Out[31]: {Speaker(id=2, name='jane', statements=['near']), Speaker(id=1, name='john', statements=['foo', 'bar']), Speaker(id=2, name='george', statements=['far'])}

相关内容

  • 没有找到相关文章

最新更新