从多个JSON文件创建一个字典,不重复



我有一组3个JSON文件,它们具有相同的布局。当我将代码投入生产时,这个数字预计会更多,我只使用3来保持工作流程的快速。

JSON文件的结构如下

{
"results": [
{
"engagement": {
"id": 2342,
"portalId": 23423,
"active": true,
"createdAt": 1661855667536,
"lastUpdated": 1661935264761,
"modifiedBy": 3453
},
"associations": {
"contactIds": [
00000
],
"companyIds": [],
"dealIds": []
},
"attachments": [],
"scheduledTasks": [],
"metadata": {
"status": "COMPLETED",
"forObjectType": "CONTACT",
"subject": "DEMO"
}
},
],
"hasMore": true,
"offset": 520,
"total": 10523
}

"Results"标头中最多可以有250条记录,从"engagement"开始。

我正试图找到一种方法来用Python合并所有3个JSON文件,其中我只包含"中的数据;结果";然后放下剩下的。

到目前为止,我可以将所有3个JSON添加在一起,但它们仍然由不同的"分隔;结果";头或最后一个JSON会覆盖之前制作的文件,我不再赘述。

预期结果如下:

[
{
"engagement": {
"id": 2342,
"portalId": 23423,
"active": true,
"createdAt": 1661855667536,
"lastUpdated": 1661935264761,
"modifiedBy": 3453
},
"associations": {
"contactIds": [
00000
],
"companyIds": [],
"dealIds": []
},
"attachments": [],
"scheduledTasks": [],
"metadata": {
"status": "COMPLETED",
"forObjectType": "CONTACT",
"subject": "DEMO"
}
},
],
[
{
"engagement": {
"id": 2342,
"portalId": 23423,
"active": true,
"createdAt": 1661855667536,
"lastUpdated": 1661935264761,
"modifiedBy": 3453
},
"associations": {
"contactIds": [
00000
],
"companyIds": [],
"dealIds": []
},
"attachments": [],
"scheduledTasks": [],
"metadata": {
"status": "COMPLETED",
"forObjectType": "CONTACT",
"subject": "DEMO"
}
},
],
[
{
"engagement": {
"id": 2342,
"portalId": 23423,
"active": true,
"createdAt": 1661855667536,
"lastUpdated": 1661935264761,
"modifiedBy": 3453
},
"associations": {
"contactIds": [
00000
],
"companyIds": [],
"dealIds": []
},
"attachments": [],
"scheduledTasks": [],
"metadata": {
"status": "COMPLETED",
"forObjectType": "CONTACT",
"subject": "DEMO"
}
},
],

任何帮助都是显而易见的。

这相对简单,但我会对生成的JSON进行一些重组,因为当前的结构没有多大意义。

下面的代码只需加载文件,并将resultsdict中的所有列表元素添加到final_result列表中。基本上,现在您有了一个列表,其中列表的每个元素都包含原始JSON文件中所需的部分。

然后将其保存到一个新文件中。

import json
filelist = ["file1.json", "file2.json", "file3.json"]
final_result = []

for filename in filelist:
with open(filename) as infile:
newdata = json.load(infile)
final_result.extend(newdata["results"])
with open("result.json", "w") as outfile:
json.dump(final_result, outfile, indent=4)

result.json

[
{
"engagement": {
"id": 1,
"portalId": 23423,
"active": true,
"createdAt": 1661855667536,
"lastUpdated": 1661935264761,
"modifiedBy": 3453
},
"associations": {
"contactIds": [
21345
],
"companyIds": [],
"dealIds": []
},
"attachments": [],
"scheduledTasks": [],
"metadata": {
"status": "COMPLETED",
"forObjectType": "CONTACT",
"subject": "DEMO"
}
},
{
"engagement": {
"id": 2,
"portalId": 23423,
"active": true,
"createdAt": 1661855667536,
"lastUpdated": 1661935264761,
"modifiedBy": 3453
},
"associations": {
"contactIds": [
21345
],
"companyIds": [],
"dealIds": []
},
"attachments": [],
"scheduledTasks": [],
"metadata": {
"status": "COMPLETED",
"forObjectType": "CONTACT",
"subject": "DEMO"
}
},
{
"engagement": {
"id": 3,
"portalId": 23423,
"active": true,
"createdAt": 1661855667536,
"lastUpdated": 1661935264761,
"modifiedBy": 3453
},
"associations": {
"contactIds": [
21345
],
"companyIds": [],
"dealIds": []
},
"attachments": [],
"scheduledTasks": [],
"metadata": {
"status": "COMPLETED",
"forObjectType": "CONTACT",
"subject": "DEMO"
}
}
]

对于从目录中获取文件,我有这个功能。它需要一个文件路径和可选的文件扩展名。它返回一个文件名列表,您可以使用上面的代码。如果你需要从多个目录中获取文件,你可以扩展文件名列表,如下所示。。。

import os
def get_files_from_path(path: str = ".", ext: str or list(str) = None) -> list:
"""Find files in path and return them as a list.
Gets all files in folders and subfolders
See the answer on the link below for a ridiculously
complete answer for this.
https://stackoverflow.com/a/41447012/9267296
Args:
path (str, optional): Which path to start on.
Defaults to '.'.
ext (str/list, optional): Optional file extention.
Defaults to None.
Returns:
list: list of full file paths
"""
result = []
for subdir, dirs, files in os.walk(path):
for fname in files:
filepath = f"{subdir}{os.sep}{fname}"
if ext == None:
result.append(filepath)
elif type(ext) == str and fname.lower().endswith(ext.lower()):
result.append(filepath)
elif type(ext) == list:
for item in ext:
if fname.lower().endswith(item.lower()):
result.append(filepath)
return result

filelist = get_files_from_path("your/path/here/", ext=".json")
filelist.extend(get_files_from_path("another/path/here/", ext=".json"))

在@Edo Aske的帮助下,我找到了这个问题的解决方案。最终代码如下:

path = '/content/extracted_data/'
json_files = [jfile for jfile in os.listdir(path) if jfile.endswith('.json')]
final_result = []

for filename in json_files:
with open(path+filename) as infile:
newdata = json.load(infile)
# grab the first list element from the results dict
newdata = newdata ["results"]
final_result.extend(newdata)
with open("result.json", "w") as outfile:
json.dump(final_result, outfile, indent=4)

结果是,所有的JSON文件都在单独的Dicts中,从那里我们可以使用pd.JSON_normalize.轻松地将它们放在数据帧中

谢谢你们的帮助!

最新更新