Python 中按组划分的最大值



我有一个python列表。列表列表中的每个值都由 [类别、类型、项目、分数] 表示。对于每个类别和类型,我想返回一个最高分项目的列表。

[["Edibles", "Fruit", "Apple", 3],
"Edibles", "Fruit", "Grapes", 8],
"Edible", "Candy", "Hershey", 4],
"Edible", "Candy", "Snickers", 6],
"NonEdible", "Bikes", "Yamaha", 5],
"NonEdible", "Bikes", "Suzuki", 7],
"NonEdible", "Cars", "Kia", 8],
"NonEdible", "Cars", "Toyota", 9]]

期望的输出

[["Edibles", "Fruit", "Grapes", 8],
"Edible", "Candy", "Snickers", 6],
"NonEdible", "Bikes", "Suzuki", 7],
"NonEdible", "Cars", "Toyota", 9]]

我能够通过创建临时列表的多个循环来做到这一点,但随着输入大小的增加,计算变得非常慢。我正在寻找一个有效的解决方案。

您可以使用itertools.groupby,但您需要在分组之前对列表进行排序:

from itertools import groupby
lst = [["Edibles", "Fruit", "Apple", 3],
["Edibles", "Fruit", "Grapes", 8],
["Edible", "Candy", "Hershey", 4],
["Edible", "Candy", "Snickers", 6],
["NonEdible", "Bikes", "Yamaha", 5],
["NonEdible", "Bikes", "Suzuki", 7],
["NonEdible", "Cars", "Kia", 8],
["NonEdible", "Cars", "Toyota", 9]]
#if lst is already sorted, skip this step:
lst = sorted(lst, key=lambda k: (k[0], k[1]))
out = [max(g, key=lambda k: k[-1]) for _, g in groupby(lst, lambda k: (k[0], k[1]))]
from pprint import pprint
pprint(out)

指纹:

[['Edible', 'Candy', 'Snickers', 6],
['Edibles', 'Fruit', 'Grapes', 8],
['NonEdible', 'Bikes', 'Suzuki', 7],
['NonEdible', 'Cars', 'Toyota', 9]]

一个简单的字典既快速又高效!

(列表列表格式不正确 - 每个子列表都没有左括号(
您可以使用字典在 1 次传递中执行此操作:

input = [["Edibles", "Fruit", "Apple", 3],
["Edibles", "Fruit", "Grapes", 8],
["Edible", "Candy", "Hershey", 4],
["Edible", "Candy", "Snickers", 6],
["NonEdible", "Bikes", "Yamaha", 5],
["NonEdible", "Bikes", "Suzuki", 7],
["NonEdible", "Cars", "Kia", 8],
["NonEdible", "Cars", "Toyota", 9]
]
highest_val_dict = {}
for curr_list in input:
curr_key = (curr_list[0], curr_list[1])  # (category,type) is the key
curr_item = curr_list[2]
curr_val = curr_list[3]
highest_pair = highest_val_dict.get(curr_key, (None, -1))
if curr_val > highest_pair[1]:
highest_val_dict[curr_key] = (curr_item, curr_val)
>>> for key, val in highest_val_dict.items():
>>>     print(f'{key[0]}, {key[1]}, {val[0]}, {val[1]}')
Edibles, Fruit, Grapes, 8
Edible, Candy, Snickers, 6
NonEdible, Bikes, Suzuki, 7
NonEdible, Cars, Toyota, 9

您可以使用pandas库来实现此目的:

安装熊猫,例如:

pip install pandas

您的代码将是:

In [2271]: import pandas as pd
In [2272]: l = [["Edibles", "Fruit", "Apple", 3], 
...: ["Edibles", "Fruit", "Grapes", 8], 
...: ["Edible", "Candy", "Hershey", 4], 
...: ["Edible", "Candy", "Snickers", 6], 
...: ["NonEdible", "Bikes", "Yamaha", 5], 
...: ["NonEdible", "Bikes", "Suzuki", 7], 
...: ["NonEdible", "Cars", "Kia", 8], 
...: ["NonEdible", "Cars", "Toyota", 9]] 
In [2275]: df = pd.DataFrame(l, columns=['category','type','item','score'])
In [2284]: df.groupby(['category','type'], as_index=False).agg(max).values.tolist()
Out[2284]: 
[['Edible', 'Candy', 'Snickers', 6],
['Edibles', 'Fruit', 'Grapes', 8],
['NonEdible', 'Bikes', 'Yamaha', 7],
['NonEdible', 'Cars', 'Toyota', 9]]

您可以使用常规字典,将每个唯一键的所有值存储在列表中,然后获取最大值:

data = [
["Edibles", "Fruit", "Apple", 3],
["Edibles", "Fruit", "Grapes", 8],
["Edible", "Candy", "Hershey", 4],
["Edible", "Candy", "Snickers", 6],
["NonEdible", "Bikes", "Yamaha", 5],
["NonEdible", "Bikes", "Suzuki", 7],
["NonEdible", "Cars", "Kia", 8],
["NonEdible", "Cars", "Toyota", 9]]
dct = {}
for item in data:
dct.setdefault((item[0], item[1]), []).append((item[-2], item[-1]))
for k, v in dct.items():
print(list(k) + list(max(v, key=lambda x: x[1])))

输出:

['Edibles', 'Fruit', 'Grapes', 8]
['Edible', 'Candy', 'Snickers', 6]
['NonEdible', 'Bikes', 'Suzuki', 7]
['NonEdible', 'Cars', 'Toyota', 9]

使用熊猫

  • 使用数据帧可以轻松操作、分析和可视化数据。
import pandas as pd
# setup dataframe
data = [["Edibles", "Fruit", "Apple", 3],
["Edibles", "Fruit", "Grapes", 8],
["Edible", "Candy", "Hershey", 4],
["Edible", "Candy", "Snickers", 6],
["NonEdible", "Bikes", "Yamaha", 5],
["NonEdible", "Bikes", "Suzuki", 7],
["NonEdible", "Cars", "Kia", 8],
["NonEdible", "Cars", "Toyota", 9]]
df = pd.DataFrame(data)
# groupby max
output = df.groupby([0, 1]).agg(max).reset_index()
0      1         2  3
0     Edible  Candy  Snickers  6
1    Edibles  Fruit    Grapes  8
2  NonEdible  Bikes    Yamaha  7
3  NonEdible   Cars    Toyota  9
# output to a list if you want
output.to_numpy()
array([['Edible', 'Candy', 'Snickers', 6],
['Edibles', 'Fruit', 'Grapes', 8],
['NonEdible', 'Bikes', 'Yamaha', 7],
['NonEdible', 'Cars', 'Toyota', 9]], dtype=object)

最新更新