我使用python3和我有数据集。它包含以下数据。我试图从这个数据列表中得到期望值。我已经尝试了很多方法,但就是想不出该怎么做。
slots_data = [
{
"id":551,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":552,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":525,
"user_id":3,
"time":"199322002",
"expire":"199322002"
},
{
"id":524,
"user_id":3,
"time":"199322002",
"expire":"199322002"
},
{
"id":553,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":550,
"user_id":2,
"time":"199322002",
"expire":"199322002"
}
]
# Desired output
# [
# {"user_id":1,"slots_ids":[551,552,553]}
# {"user_id":2,"slots_ids":[550]}
# {"user_id":3,"slots_ids":[524,525]}
# ]
我已经尝试了以下方式,显然这是不正确的。我想不出这个问题的解决方法:
final_list = []
for item in slots_data:
obj = obj.dict()
obj = {
"user_id":item["user_id"],
"slot_ids":item["id"]
}
final_list.append(obj)
print(set(final_list))
这里添加的另一个答案有一个很好的解决方案,但这里没有使用pandas
:
users = {}
for item in slots_data:
# Check if we've seen this user before,
if item['user_id'] not in users:
# if not, create a new entry for them
users[item['user_id']] = {'user_id': item['user_id'], 'slot_ids': []}
# Add their slot ID to their dictionary
users[item['user_id']]['slot_ids'].append(item['id'])
# We only need the values (dicts)
output_list = list(users.values())
这里有很多好的答案。
如果我这样做,我会基于我的答案setdefault
和/或collections.defaultdict
,可以以类似的方式使用。我认为defaultdict
版本是非常可读的,但如果你还没有导入集合,你可以没有它。
给定你的数据:
slots_data = [
{
"id":551,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":552,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
#....
]
你可以通过:
将它重塑成你想要的输出## -------------------
## get the value for the key user_id if it exists
## if it does not, set the value for that key to a default
## use the value to append the current id to the sub-list
## -------------------
reshaped = {}
for slot in slots_data:
user_id = slot["user_id"]
id = slot["id"]
reshaped.setdefault(user_id, []).append(id)
## -------------------
## -------------------
## take a second pass to finish the shaping in a sorted manner
## -------------------
reshaped = [
{
"user_id": user_id,
"slots_ids": sorted(reshaped[user_id])
}
for user_id
in sorted(reshaped)
]
## -------------------
print(reshaped)
这将给你:
[
{'user_id': 1, 'slots_ids': [551, 552, 553]},
{'user_id': 2, 'slots_ids': [550]},
{'user_id': 3, 'slots_ids': [524, 525]}
]
我会说尝试使用pandas
将用户id分组在一起并将其转换回字典
pd.DataFrame(slots_data).groupby('user_id')['id'].agg(list).reset_index().to_dict('records')
[{'user_id': 1, 'id': [551, 552, 553]},
{'user_id': 2, 'id': [550]},
{'user_id': 3, 'id': [525, 524]}]
through just simple loop way
>>> result = {}
>>> for i in slots_data:
... if i['user_id'] not in result:
... result[i['user_id']] = []
... result[i['user_id']].append(i['id'])
...
>>> output = []
>>> for i in result:
... dict_obj = dict(user_id=i, slots_id=result[i])
... output.append(dict_obj)
...
>>> output
[{'user_id': 1, 'slots_id': [551, 552, 553]}, {'user_id': 3, 'slots_id': [525, 524]}, {'user_id': 2, 'slots_id': [550]}]
您可以使用以下方法来完成它。纯Python。没有任何依赖。
slots_data = [
{
"id":551,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":552,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":525,
"user_id":3,
"time":"199322002",
"expire":"199322002"
},
{
"id":524,
"user_id":3,
"time":"199322002",
"expire":"199322002"
},
{
"id":553,
"user_id":1,
"time":"199322002",
"expire":"199322002"
},
{
"id":550,
"user_id":2,
"time":"199322002",
"expire":"199322002"
}
]
user_wise_slots = {}
for slot_detail in slots_data:
if not slot_detail["user_id"] in user_wise_slots:
user_wise_slots[slot_detail["user_id"]] = {
"user_id": slot_detail["user_id"],
"slot_ids": []
}
user_wise_slots[slot_detail["user_id"]]["slot_ids"].append(slot_detail["id"])
print(user_wise_slots.values())
这可以在using listcomprehension语句中实现:
final_list = [{"user_id": user_id, "id":sorted([slot["id"] for slot in slots_data if slot["user_id"] == user_id])} for user_id in sorted(set([slot["user_id"] for slot in slots_data]))]
相同代码的更详细、格式更好的版本:
all_user_ids = [slot["user_id"] for slot in slots_data]
unique_user_ids = sorted(set(all_user_ids))
final_list = [
{
"user_id": user_id,
"id": sorted([slot["id"] for slot in slots_data if slot["user_id"] == user_id])
}
for user_id in unique_user_ids]
解释:
- 使用列表推导式获取所有用户id
- 通过创建集 获得唯一的用户id
- 使用列表推导式创建字典的最终列表。
- 每个字段
id
本身是一个具有列表推导式的列表。我们获取槽位的id,只有当用户id匹配 时才将其添加到列表中。
使用pandas可以很容易地实现这个结果。如果没有下列文件,请先安装pandas
pip install pandas
import pandas as pd
df = pd.DataFrame(slots_data) #create dataframe
df1 = df.groupby("user_id")['id'].apply(list).reset_index(name="slots_ids") #groupby on user_id and combine elements of id in list and give the column name is slots_ids
final_slots_data = df1.to_dict('records') # convert dataframe into a list of dictionary
final_slots_data
输出:
[{'user_id': 1, 'slots_ids': [551, 552, 553]},
{'user_id': 2, 'slots_ids': [550]},
{'user_id': 3, 'slots_ids': [525, 524]}]