我正在使用Twitch API,最终得到了以下字典:
用户数据
{"data": [
{"display_name": "John", "id": "123"},
{"display_name": "Frank", "id": "456"},
{"display_name": "Billy", "id": "789"}]}
流数据
{"data": [
{"id": "333444", "user_id": "456", "title": "Franks Stream"}
{"id": "555666", "user_id": "789", "title": "Billys Stream"}
{"id": "111222", "user_id": "123", "title": "Johns Stream"}]}
我不能保证任何一个的顺序都是一样的,所以我想要的是使用 id/user_id 将两者结合起来:
合并数据
{"data": [
{"id": "333444", "user_id": "456", "title": "Franks Stream", "display_name": "Frank"}
{"id": "555666", "user_id": "789", "title": "Billys Stream", "display_name": "Billy"}
{"id": "111222", "user_id": "123", "title": "Johns Stream", "display_name": "John"}]}
如果可能的话,我想合并更多字段,但是可以像这样加入这些词典吗?
设置
user_data = {
"data": [
{"display_name": "John", "id": "123"},
{"display_name": "Frank", "id": "456"},
{"display_name": "Billy", "id": "789"}]}
stream_data = {
"data": [
{"id": "333444", "user_id": "456", "title": "Franks Stream"},
{"id": "555666", "user_id": "789", "title": "Billys Stream"},
{"id": "111222", "user_id": "123", "title": "Johns Stream"}]}
溶液
使用字典推导式,您可以创建一个以id
字段为键的用户数据的新字典,然后使用此数据更新流数据。
new_user_data = {row.get("id"): row.get('display_name')
for row in user_data['data']}
>>> new_user_data
{'123': 'John', '456': 'Frank', '789': 'Billy'}
现在使用此字典更新数据中的display_name
(如果您不想更改原始数据,您可能希望复制(。
for row in stream_data['data']:
id_ = row.get('user_id')
row['display_name'] = new_user_data[id_]
>>> stream_data['data']
[{'display_name': 'Frank',
'id': '333444',
'title': 'Franks Stream',
'user_id': '456'},
{'display_name': 'Billy',
'id': '555666',
'title': 'Billys Stream',
'user_id': '789'},
{'display_name': 'John',
'id': '111222',
'title': 'Johns Stream',
'user_id': '123'}]
如果您有多条用户数据存储为值(即不仅仅是id
作为键,display_name
作为值(,您可以使用namedtuple
来保存它们。
user_data = {
"data": [
{"display_name": "John", "id": "123", "description": "boring"},
{"display_name": "Frank", "id": "456", "description": "smart"},
{"display_name": "Billy", "id": "789", "description": "funny"}]}
from collections import namedtuple
UserData = namedtuple('UserData', ('display_name', 'description'))
for row in user_data['data']:
id_ = row.pop('id')
new_user_data[id_] = UserData(**row)
for row in stream_data['data']:
id_ = row.get('user_id')
row.update(**new_user_data[id_].__dict__)
>>> new_user_data
{'123': UserData(display_name='John', description='boring'),
'456': UserData(display_name='Frank', description='smart'),
'789': UserData(display_name='Billy', description='funny')}
>>> stream_data
{'data': [
{'description': 'smart',
'display_name': 'Frank',
'id': '333444',
'title': 'Franks Stream',
'user_id': '456'},
{'description': 'funny',
'display_name': 'Billy',
'id': '555666',
'title': 'Billys Stream',
'user_id': '789'},
{'description': 'boring',
'display_name': 'John',
'id': '111222',
'title': 'Johns Stream',
'user_id': '123'}]}
您可以使用itertools.groupby
:
import itertools
from functools import reduce
d = {'data': [{'display_name': 'John', 'id': '123'}, {'display_name': 'Frank', 'id': '456'}, {'display_name': 'Billy', 'id': '789'}]}
d1 = {'data': [{'id': '333444', 'user_id': '456', 'title': 'Franks Stream'}, {'id': '555666', 'user_id': '789', 'title': 'Billys Stream'}, {'id': '111222', 'user_id': '123', 'title': 'Johns Stream'}]}
combined = sorted(d['data']+d1['data'], key=lambda x:x.get('user_id', x['id']))
new_data = [list(b) for _, b in itertools.groupby(combined, key=lambda x:x.get('user_id', x['id']))]
final_results = {'data':[reduce(lambda x, y:{**x, **y}, i) for i in new_data]}
输出:
{'data': [
{'display_name': 'John', 'id': '111222', 'user_id': '123', 'title': 'Johns Stream'},
{'display_name': 'Frank', 'id': '333444', 'user_id': '456', 'title': 'Franks Stream'},
{'display_name': 'Billy', 'id': '555666', 'user_id': '789', 'title': 'Billys Stream'}
]}
让我们在这里使用pandas
import pandas as pd
df2=pd.DataFrame(d2['data'])
df1=pd.DataFrame(d1['data']).rename(columns={'id':'user_id'})
{'data':df1.merge(df2).to_dict('r')}
Out[150]:
{'data': [{'display_name': 'John',
'id': '111222',
'title': 'Johns Stream',
'user_id': '123'},
{'display_name': 'Frank',
'id': '333444',
'title': 'Franks Stream',
'user_id': '456'},
{'display_name': 'Billy',
'id': '555666',
'title': 'Billys Stream',
'user_id': '789'}]}
您可以先将用户数据字典转换为id
到display_name
映射:
u = {"data": [
{"display_name": "John", "id": "123"},
{"display_name": "Frank", "id": "456"},
{"display_name": "Billy", "id": "789"}]}
s = {"data": [
{"id": "333444", "user_id": "456", "title": "Franks Stream"},
{"id": "555666", "user_id": "789", "title": "Billys Stream"},
{"id": "111222", "user_id": "123", "title": "Johns Stream"}]}
m = dict(reversed(tuple(i.values())) for i in u['data'])
combined = {'data': [{**i, 'display_name': m[i['user_id']]} for i in s['data']]}
combined
变成:
{'data': [{'id': '333444', 'user_id': '456', 'title': 'Franks Stream', 'display_name': 'Frank'}, {'id': '555666', 'user_id': '789', 'title': 'Billys Stream', 'display_name': 'Billy'}, {'id': '111222', 'user_id': '123', 'title': 'Johns Stream', 'display_name': 'John'}]}