Python:解析Json以找到匹配少数键的匹配行,并将每个匹配集作为单个Json记录返回



我有一个JSON,如下所示。

{"processId":"p1","userId":"user1","reportName":"report1","threadId": "12234", "some_other_keys":"respective values"}
{"userId":"user1","processId":"p1","reportName":"report1","threadId":"12335", "some_other_keys":"respective values"}
{"reportName":"report2","processId":"p1","userId":"user1","threadId":"12434", "some_other_keys":"respective values"}
{"threadId":"12734", "some_other_keys":"respective values", "processId":"p1","userId":"user2","reportName":"report1"}
{"processId":"p1","reportName":"report1","threadId":"12534", "some_other_keys":"respective values","userId":"user2"}
{"processId":"p1","userId":"user1","reportName":"report2","threadId":"12934", "some_other_keys":"respective values"}
{"processId":"p1","userId":"user1","reportName":"report1","threadId":"12834", "some_other_keys":"respective values"}
{"processId":"p1","userId":"user1","reportName":"report2","threadId":"12634", "some_other_keys":"respective values"}

目标:编写一个函数,返回所有不同的行集,这些行集具有相同的"值;processId"userId"reportName";作为一个单独的JSON记录,每个匹配记录都有修改过的键名,如下所示。

在上面的例子中,有三个匹配的集合。

Set1(对于"processId":"p1","userId":"user1","reportName":"report1"(:

{"processId":"p1","userId":"user1","reportName":"report1","threadId":"12234", "some_other_keys":"respective values"}
{"userId":"user1","processId":"p1","reportName":"report1","threadId":"12335", "some_other_keys":"respective values"}
{"processId":"p1","userId":"user1","reportName":"report1","threadId":"12834", "some_other_keys":"respective values"}

Set2("processId":"p1","userId":"user1","reportName":"report2"(:

{"reportName":"report2","processId":"p1","userId":"user1","threadId":"12434", "some_other_keys":"respective values"}
{"processId":"p1","userId":"user1","reportName":"report2","threadId":"12934", "some_other_keys":"respective values"}
{"processId":"p1","userId":"user1","reportName":"report2","threadId":"12634", "some_other_keys":"respective values"}

Set3("processId":"p1","userId":"user2","reportName":"report2"(:

{"threadId":"12734", "some_other_keys":"respective values", "processId":"p1","userId":"user2","reportName":"report1"}
{"processId":"p1","reportName":"report1","threadId":"12534", "some_other_keys":"respective values","userId":"user2"}

因此,在这个特定的例子中,函数应该返回三个不同的集合,如下所示。

Set1(对于"processId":"p1","userId":"user1","reportName":"report1"(:{"processId":"p1","userId":"user1","reportName":"report1","threadId_1":"12234", "some_other_keys_1":"respective values", "threadId_2":"12335", "some_other_keys_2":"respective values", "threadId_3":"12834", "some_other_keys_3":"respective values"}

Set2("processId":"p1","userId":"user1","reportName":"report2"(:{"processId":"p1","userId":"user1","reportName":"report2","threadId_1":"12934", "some_other_keys_1":"respective values","threadId_2":"12434", "some_other_keys_2":"respective values","threadId_3":"12634", "some_other_keys_3":"respective values"}

Set3("processId":"p1","userId":"user2","reportName":"report2"(:{"threadId_1":"12734", "some_other_keys_1":"respective values", "processId":"p1","userId":"user2","reportName":"report1""threadId_2":"12534", "some_other_keys_2":"respective values"}

因此,一个函数返回三个集合(这可能或多或少也取决于匹配集合的数量(

我需要一个解决上述问题的方案,作为(a(性能高效的代码(b(行数较少的代码,因为我将处理大量的行。所以希望我的代码运行得更快,而且代码的行数应该更少。

import json
f = open('data.json')
data = json.load(f)
f.close()
sets_of_procces = dict()
for item in data:
set_id = processId, userId, reportName = item['processId'], item['userId'], item['reportName']
if set_id not in sets_of_procces.keys():
sets_of_procces[set_id] = []
thread_number = len(sets_of_procces[set_id]) + 1
thread_data = { f'threadId_{thread_number}' : item['threadId'], f'some_other_keys_{thread_number}' : item['some_other_keys'] }
sets_of_procces[set_id].append(json.dumps(thread_data))
for i, procces_set in enumerate(sets_of_procces):
print(f'Set {i+1} : n')
processId, userId, reportName = procces_set
json_dict = { 'processId' : processId, 'userId' : userId, 'reportName' : reportName }
for item in sets_of_procces[procces_set]:
json_dict = {**json_dict, **json.loads(item)}
print(json.dumps(json_dict))

相关内容

  • 没有找到相关文章

最新更新