如何根据最大值删除元组列表中的重复值



我有一个元组列表,如下所示:

[('John', 53, "Yes", "No", "No", "No", 62.4, "Yes", "Yes"), ('Amy', 46, "No", "No", "No", "No", 52.2, "No", "No"), ('John', 53, "Yes", "No", "No", "No", 65, "No", "No")]

我想删除重复项,只返回索引6的最大值,所以它看起来像这样:

[('Amy', 46, "No", "No", "No", "No", 52.2, "No", "No"), ('John', 53, "Yes", "No", "No", "No", 65, "Yes", "Yes")]

使用itertools.groupbymax函数。operator.itemgetter有助于获取值,您还需要对数据进行排序:

from itertools import groupby
from operator import itemgetter
get_name, get_val = itemgetter(0), itemgetter(6)
data = [('John', 53, "Yes", "No", "No", "No", 62.4, "Yes", "Yes"), ('Amy', 46, "No", "No", "No", "No", 52.2, "No", "No"), ('John', 53, "Yes", "No", "No", "No", 65, "No", "No")]
res = [max(g, key=get_val) for _, g in groupby(sorted(data, key=get_name), get_name)]

[('Amy', 46, 'No', 'No', 'No', 'No', 52.2, 'No', 'No'), ('John', 53, 'Yes', 'No', 'No', 'No', 65, 'No', 'No')]

无进口:

data = [('John', 53, "Yes", "No", "No", "No", 62.4, "Yes", "Yes"), ('Amy', 46, "No", "No", "No", "No", 52.2, "No", "No"), ('John', 53, "Yes", "No", "No", "No", 65, "No", "No")]
data.sort(key=lambda x:x[6], reverse=True)
seen = set(); seen_add = seen.add
res = [t for t in data if not ((n:=t[0]) in seen or seen_add(n))]

使用字典:

d = {}
for item in l:
name = item[0]
if not name in d or d[name][6] < item[6]:
d[name] = item

new_l = list(d.values())

如果您需要保留订单:

new_l = []
names = []
for item in l:
name = item[0]
if not name in names: 
new_l.append(item)
names.append(name)
else:
i = names.index(name)
if new_l[i][6] < item[6]:
new_l[i] = item

您可以使用itertools中的groupby对名称上的元组列表进行排序,以解决您所解释的问题,但您的预期输出无论如何都不正确

>>> [max(v, key=lambda x:x[6]) for _,v in groupby(sorted(lst,
key=lambda x:x[0]), 
key=lambda x:x[0])]
Out[12]: 
[('Amy', 46, 'No', 'No', 'No', 'No', 52.2, 'No', 'No'),
('John', 53, 'Yes', 'No', 'No', 'No', 65, 'No', 'No')]

如果最终列表的顺序无关紧要,请首先按元素6 排序

responses = [
('John', 53, "Yes", "No", "No", "No", 62.4, "Yes", "Yes"),
('Amy', 46, "No", "No", "No", "No", 52.2, "No", "No"),
('John', 53, "Yes", "No", "No", "No", 65, "No", "No")
]
responses.sort(key=lambda e: e[6], reverse=True)

然后只过滤不重复的

filtered_responses = []
used_names = []
for r in responses:
if r[0] not in used_names:
filtered_responses.append(r)
used_names.append(r[0])