根据字典列表中特定键的值查找重复项



我有以下dict记录列表,我需要从中提取所有重复(基于label),并在原始记录中每个label留下一个。此外,当label移除项目时,始终移除headings值为True的项目,而不是headings值为False的项目。

输入:

records = [
{"label": "x", "headings": False, "key": 300},
{"label": "x", "headings": True, "key": 301},
{"label": "x", "headings": False, "key": 302},
{"label": "x", "headings": False, "key": 303},
{"label": "y", "headings": False, "key": 304},
{"label": "y", "headings": True, "key": 305},
{"label": "z", "headings": True, "key": 306},
{"label": "z", "headings": True, "key": 307},
]

输出:(重复项)

[
{"label": "x", "headings": False, "key": 300},
{"label": "x", "headings": True, "key": 301},
{"label": "x", "headings": False, "key": 302},
{"label": "y", "headings": True, "key": 305},
{"label": "z", "headings": True, "key": 306},
]

您有几个问题没有回答(请参阅评论)。你也没有提供你自己的代码和任何意想不到的输出/错误,你得到它,所以我们没有什么工作/修复。这是不好的形式。

但我发现这是一个有趣的练习,所以这是我想到的:

from typing import TypedDict

class Record(TypedDict):
label: str
headings: bool
key: int

def remove_duplicates(records: list[Record]) -> list[Record]:
# First, decide which records (by index) _not_ to remove.
# Map labels to 2-tuples of (index, headings boolean):
keep: dict[str, tuple[int, bool]] = {}
for idx, record in enumerate(records):
label, headings = record["label"], record["headings"]
# We keep it, if this is the first time we see that label OR
# we did encounter it, but this record's `headings` value is `False`,
# whereas the previous one was `True`:
if label not in keep or (not headings and keep[label][1]):
keep[label] = (idx, headings)
# Combine all indices we want to keep into one set for easy lookup:
keep_indices = {idx for idx, _ in keep.values()}
# Iterate over all record indices in reverse order
# and pop the corresponding records if necessary:
removed = []
for idx in reversed(range(len(records))):
if idx not in keep_indices:
removed.append(records.pop(idx))
return removed

原列表被就地更改,但创建一个新列表并从删除的字典/重复项返回。该算法创建了一些辅助数据结构,牺牲了一些内存,但在时间方面应该相当有效,即大约O(n),其中n是记录的数量。

测试:

...
def main() -> None:
from pprint import pprint
records = [
{"label": "x", "headings": False, "key": 300},
{"label": "x", "headings": True, "key": 301},
{"label": "x", "headings": False, "key": 302},
{"label": "x", "headings": False, "key": 303},
{"label": "y", "headings": False, "key": 304},
{"label": "y", "headings": True, "key": 305},
{"label": "z", "headings": True, "key": 306},
{"label": "z", "headings": True, "key": 307},
]
removed = remove_duplicates(records)  # type: ignore[arg-type]
print("remaining:")
pprint(records)
removed.reverse()
print("removed:")
pprint(removed)

if __name__ == "__main__":
main()

输出:

剩余<>之前:[{'heading ': False, 'key': 300, 'label': 'x'},{'heading ': False, 'key': 304, 'label': 'y'},{'heading ': True, 'key': 306, 'label': 'z'}]删除:[{'heading ': True, 'key': 301, 'label': 'x'},{'heading ': False, 'key': 302, 'label': 'x'},{'heading ': False, 'key': 303, 'label': 'x'},{'heading ': True, 'key': 305, 'label': 'y'},{"标题":真的,"关键":307年,"标签":"z"}]

相关内容

  • 没有找到相关文章

最新更新