如何聚合同一键的值并取该值的平均值



如何聚合相同的键值并取该值的平均值?

[['Absard', 140000.0], ['Absard', 150000.0], ['Absard', 150000.0], ['Absard', 173333.3], ['Abuzar', 28333.3], ['Abuzar', 34000.0], ['Abuzar', 90500.0], ['Afsarieh', 37333.3], ['Afsarieh', 44333.3], ['Afsarieh', 51666.6], ['Afsarieh', 55000.0], ['Afsarieh', 80000.0], ['Afsarieh', 105000.0], ['Ahang', 26666.6], ['Ahang', 46666.6], ['Air force', 55000.0], ['Air force', 56333.3]]

我想这样打印这个列表,

NAME        AVG.
----------------------
Abazar      3033333.33
Ahang       2666666.67
Air force   2333333.33
Afsarieh    1916666.67

这5个值将存储在字典中(平均值随机写入)

写了一些像这样的东西,但是聚合不正确

def takeAvg(addressList):
resultDict = {}
tot = 0
startIndex = 0
inner_i = 1
for inner_i in range(len(addressList)):
if(addressList[startIndex][0] == addressList[inner_i][0]):
tot += float(addressList[inner_i][1])
else:
tot += float(addressList[startIndex][1])
resultDict.update({addressList[startIndex][0]: format(float(tot / (inner_i-startIndex)), ".2f")})
tot = 0
startIndex = inner_i
return resultDict

您可以迭代您的数据,将其排序到列表的defaultdict中,然后为每个键计算结果的平均值:

from collections import defaultdict
data = [['Absard', 140000.0], ['Absard', 150000.0], ['Absard', 150000.0], ['Absard', 173333.3], ['Abuzar', 28333.3], ['Abuzar', 34000.0], ['Abuzar', 90500.0], ['Afsarieh', 37333.3], ['Afsarieh', 44333.3], ['Afsarieh', 51666.6], ['Afsarieh', 55000.0], ['Afsarieh', 80000.0], ['Afsarieh', 105000.0], ['Ahang', 26666.6], ['Ahang', 46666.6], ['Air force', 55000.0], ['Air force', 56333.3]]
acc = defaultdict(list)
for name, value in data:
acc[name].append(value)
result = { k : sum(v)/len(v) for k, v in acc.items() }

输出:

{
'Absard': 153333.325,
'Abuzar': 50944.43333333333,
'Afsarieh': 62222.200000000004,
'Ahang': 36666.6,
'Air force': 55666.65
}

出于显示目的,您可以使用f字符串将值格式化为小数点后2位。例如

print(*[f'{k:16}t{v:.2f}n' for k, v in result.items()], end='')

输出:

Absard                 153333.33
Abuzar                 50944.43
Afsarieh               62222.20
Ahang                  36666.60
Air force              55666.65

您可以使用itertools.groupby

l = [['Absard', 140000.0], ['Absard', 150000.0], ['Absard', 150000.0], ['Absard', 173333.3], ['Abuzar', 28333.3], ['Abuzar', 34000.0], ['Abuzar', 90500.0], ['Afsarieh', 37333.3], ['Afsarieh', 44333.3], ['Afsarieh', 51666.6], ['Afsarieh', 55000.0], ['Afsarieh', 80000.0], ['Afsarieh', 105000.0], ['Ahang', 26666.6], ['Ahang', 46666.6], ['Air force', 55000.0], ['Air force', 56333.3]]
for k, g in groupby(l, key=lambda x: x[0]):
values = [_[1] for _ in g]
print(k, sum(values) / len(values), sep='t')

Absard  153333.325
Abuzar  50944.43333333333
Afsarieh        62222.200000000004
Ahang   36666.6
Air force       55666.65

这里假设所有的"键"同时出现-例如,如果"Absard"再次出现在列表的末尾,你会得到2个"absard"的均值。您可以确保在将列表传递给itertools.groupby之前对其进行排序-

l = sorted(l, key=lambda x: x[0])

如果将其转换为以列表为值的字典,然后迭代字典以获得平均值,则可以这样做:

data = [['Absard', 140000.0], ['Absard', 150000.0], ['Absard', 150000.0], ['Absard', 173333.3], ['Abuzar', 28333.3], ['Abuzar', 34000.0], ['Abuzar', 90500.0], ['Afsarieh', 37333.3], ['Afsarieh', 44333.3], ['Afsarieh', 51666.6], ['Afsarieh', 55000.0], ['Afsarieh', 80000.0], ['Afsarieh', 105000.0], ['Ahang', 26666.6], ['Ahang', 46666.6], ['Air force', 55000.0], ['Air force', 56333.3]]
res = {_lst[0]: [] for _lst in data}
for _lst in data:
name, num = _lst
res[name].append(num)
res = {k: round(sum(v) / len(v), 2) for k, v in res.items()}
print(res)
{'Absard': 153333.33, 'Abuzar': 50944.43, 'Afsarieh': 62222.2, 'Ahang': 36666.6, 'Air force': 55666.65}

不是所有迭代中最有效的解决方案,但我希望它易于遵循。

最新更新