如何根据键对字典值求和?



我有一个字典列表如下:

data = [{'student_id': '1','mark': 7.8,'course_id': '1',},
{'student_id': '1','mark': 34.8,'course_id': '1'},
{'student_id': '1','mark': 12.8,'course_id': '2'},
{'student_id': '1','mark': 39.0,'course_id': '2'},
{'student_id': '1','mark': 70.2,'course_id': '3'},
{'student_id': '2','mark': 7.8,'course_id': '1'},
{'student_id': '2','mark': 34.8,'course_id': '1'}]

我想把每个给定课程的student_id的分数加起来。如学生号。我的课程一的总分数将是42.6分,等等。理想情况下,我会创建一个新的干净的列表,只有每个学生每门课程的总分数。

我想到的一件事是编写一个迭代,如果前一项中的学生和课程id与下一项匹配,则分别将其相加:

for i in range(len(data)-1):
if data[i]['course_id'] == data[i+1]['course_id'] and data[i]['student_id'] == data[i+1]['student_id']:
data[i+1]['sum_mark'] = round(float(data[i]['mark'])+float(data[i+1]['mark']),3) 

我认为这不是解决问题的好方法。

如果使用defaultdict,则可以使用(student_id, course_id)元组作为键。然后你可以边走边加。如果你想要一个列表在最后,它是一个简单的列表推导式:

from collections import defaultdict
totals = defaultdict(float)
for d in data:
totals[(d['student_id'], d['course_id'])] += d['mark']

[{'student_id':s_id, 'course_id': c_id, 'total': round(total, 3)} 
for (s_id, c_id), total in totals.items()]

等于:

[{'student_id': '1', 'course_id': '1', 'total': 42.6},
{'student_id': '1', 'course_id': '2', 'total': 51.8},
{'student_id': '1', 'course_id': '3', 'total': 70.2},
{'student_id': '2', 'course_id': '1', 'total': 42.6}]

可以使用pandas数据操作库,而不是被低级python所困。

支持分组操作,如sum, means等。

Pandas可以接受各种输入,包括Pythondict.csv文件和许多其他格式。

data = [{'student_id': '1','mark': 7.8,'course_id': '1',},
{'student_id': '1','mark': 34.8,'course_id': '1'},
{'student_id': '1','mark': 12.8,'course_id': '2'},
{'student_id': '1','mark': 39.0,'course_id': '2'},
{'student_id': '1','mark': 70.2,'course_id': '3'},
{'student_id': '2','mark': 7.8,'course_id': '1'},
{'student_id': '2','mark': 34.8,'course_id': '1'}]
import pandas as pd
df = pd.DataFrame(data)
df.groupby(['student_id','course_id']).sum()  
# output in iPython or Jupyter
mark
student_id course_id      
1          1          42.6
2          51.8
3          70.2
2          1          42.6
# often teachers/students need an average, not a sum...
df.groupby(['student_id','course_id']).mean()
mark
student_id course_id      
1          1          21.3
2          25.9
3          70.2
2          1          21.3

如果您不介意对数据进行排序,您可以使用itertools.groupby:

data = [
{'student_id': '1', 'mark': 7.8, 'course_id': '1'},
{'student_id': '1', 'mark': 34.8, 'course_id': '1'},
{'student_id': '1', 'mark': 12.8, 'course_id': '2'},
{'student_id': '1', 'mark': 39.0, 'course_id': '2'},
{'student_id': '1', 'mark': 70.2, 'course_id': '3'},
{'student_id': '2', 'mark': 7.8, 'course_id': '1'},
{'student_id': '2', 'mark': 34.8, 'course_id': '1'}
]
def to_summed(data):
from itertools import groupby
from operator import itemgetter
keys = ("student_id", "course_id")
key = itemgetter(*keys)
for current_key, group in groupby(sorted(data, key=key), key=key):
sum_mark = sum(map(itemgetter("mark"), group))
yield dict(zip(keys, current_key)) | {"sum_mark": sum_mark}
for entry in to_summed(data):
print(entry)

输出:

{'student_id': '1', 'course_id': '1', 'sum_mark': 42.599999999999994}
{'student_id': '1', 'course_id': '2', 'sum_mark': 51.8}
{'student_id': '1', 'course_id': '3', 'sum_mark': 70.2}
{'student_id': '2', 'course_id': '1', 'sum_mark': 42.599999999999994}
>>> 

你可以在stock "low-level"Python很容易通过在自定义字典子类上实现特殊的__missing__()方法来设置并返回您想要的容器类型的新实例。自Python 2.5

以来,这种方法已经可用(并且有文档)。请注意,一个可行且经常使用的替代方法是在标准库中使用更通用的collections.defaultdict子类,但由于前者相当容易,因此我将以这种方式演示:
from pprint import pprint

class CourseMarks(dict):
def __missing__(self, course_id):
value = self[course_id] = []
return value

class StudentCourseMarks(dict):
def __missing__(self, student_id):
value = self[student_id] = CourseMarks()
return value

data = [{'student_id': 'id 1','mark': 7.8,'course_id': 'crs 1',},
{'student_id': 'id 1','mark': 34.8,'course_id': 'crs 1'},
{'student_id': 'id 1','mark': 12.8,'course_id': 'crs 2'},
{'student_id': 'id 1','mark': 39.0,'course_id': 'crs 2'},
{'student_id': 'id 1','mark': 70.2,'course_id': 'crs 3'},
{'student_id': 'id 2','mark': 7.8,'course_id': 'crs 1'},
{'student_id': 'id 2','mark': 34.8,'course_id': 'crs 1'}]
scm = StudentCourseMarks()
for grade in data:
scm[grade['student_id']][grade['course_id']].append(grade['mark'])
print('Student course marks:')
pprint(scm)
for courses in scm.values():
for course in courses:
courses[course] = round(sum(courses[course]), 1)
print()
print('Total marks per student per course:')
pprint(scm, compact=0)

输出:

Student course marks:
{'id 1': {'crs 1': [7.8, 34.8], 'crs 2': [12.8, 39.0], 'crs 3': [70.2]},
'id 2': {'crs 1': [7.8, 34.8]}}
Total marks per student per course:
{'id 1': {'crs 1': 42.6, 'crs 2': 51.8, 'crs 3': 70.2}, 
'id 2': {'crs 1': 42.6}}

您可以创建一个临时字典,您可以在其中添加标记,然后将此字典转换为您想要的格式:

tmp = {}
for d in data:
tmp.setdefault(d["student_id"], {}).setdefault(d["course_id"], 0)
tmp[d["student_id"]][d["course_id"]] += d["mark"]
tmp = [
{"student_id": k, "course_id": kk, "sum_mark": vv}
for k, v in tmp.items()
for kk, vv in v.items()
]
print(tmp)

打印:

[
{"student_id": "1", "course_id": "1", "sum_mark": 42.599999999999994},
{"student_id": "1", "course_id": "2", "sum_mark": 51.8},
{"student_id": "1", "course_id": "3", "sum_mark": 70.2},
{"student_id": "2", "course_id": "1", "sum_mark": 42.599999999999994},
]

您也可以使用pandas库轻松地完成此操作。

import pandas as pd
df = pd.DataFrame(data)
grouped = df.groupby(["student_id","course_id"]).sum()
new_df = grouped.reset_index()
new_df.to_dict(orient='records')
Output:
[{'course_id': '1', 'mark': 42.599999999999994, 'student_id': '1'},
{'course_id': '2', 'mark': 51.8, 'student_id': '1'},
{'course_id': '3', 'mark': 70.2, 'student_id': '1'},
{'course_id': '1', 'mark': 42.599999999999994, 'student_id': '2'}]

您可以使用pandas

import pandas as pd
data = [{'student_id': '1','mark': 7.8,'course_id': '1',},
{'student_id': '1','mark': 34.8,'course_id': '1'},
{'student_id': '1','mark': 12.8,'course_id': '2'},
{'student_id': '1','mark': 39.0,'course_id': '2'},
{'student_id': '1','mark': 70.2,'course_id': '3'},
{'student_id': '2','mark': 7.8,'course_id': '1'},
{'student_id': '2','mark': 34.8,'course_id': '1'}]
df = pd.DataFrame(data)
result = df.groupby(by=["student_id", "course_id"], as_index=False).sum()
print(result)

输出:

student_id course_id  mark
0          1         1  42.6
1          1         2  51.8
2          1         3  70.2
3          2         1  42.6

参见:Pandas group-by and sum


为完整性补充:使用result.to_dict(orient="records")转换回字典。(普丽娅的回答值得称赞!)

相关内容

  • 没有找到相关文章

最新更新