合并/更新两个不同json文件中的json对象



我有两个JSON文件,都有相同数量的对象,每个对象都有一个ID键"DOCN",问题是一些对象有不同的键,例如在file1中对象"DOCN": "000093019"有4个键,而在file2中相同的对象有5 .

我试图创建一个新文件,在两个文件中包含相同的对象(在file1和file2中找到缺失的对象并将它们添加到对象)

的例子:

file1:

[
{
"DOCN": "000093019",
"A": "blabla",
"B": "blabla",
"C": "blabla"
},
{
"DOCN": "000093085",
"B": "blabla",
"C": "blabla",
"D": "blabla"
}
]

file2:

[
{
"DOCN": "000093019",
"A": "blabla",
"C": "blabla",
"D": "blabla",
"E": "blabla"
},
{
"DOCN": "000093085",
"A": "blabla",
"B": "blabla",
"C": "blabla"
}
]

我想达到的目标:file3:

[
{
"DOCN": "000093019",
"A": "blabla",
"B": "blabla",
"C": "blabla",
"D": "blabla",
"E": "blabla"
},
{
"DOCN": "000093085",
"A": "blabla",
"B": "blabla",
"C": "blabla",
"D": "blabla"
}
]

我会在两个不同的数组中读取它们,并将其映射为一个新的数组。

// read file1 instead using `fs`
const arr1 = [
{
"DOCN": "000093019",
"A": "blabla",
"B": "blabla",
"C": "blabla"
},
{
"DOCN": "000093085",
"B": "blabla",
"C": "blabla",
"D": "blabla"
}
]
// read file2 instead
const arr2 = [
{
"DOCN": "000093019",
"A": "blabla",
"C": "blabla",
"D": "blabla",
"E": "blabla"
},
{
"DOCN": "000093085",
"A": "blabla",
"B": "blabla",
"C": "blabla"
}
]
const arr3 = arr1.map(
x => {
const val = arr2.find(y => y.DOCN === x.DOCN)
x= {
...x,
...val
}
return x
})
//write arr3 to new file
```

这是对字典的一个简单操作。我不能说这将是一个大型数据集的最佳表现。但是您可以根据键"docn"合并字典。(也许有更好的方法!; -))

f1 = [
{
"DOCN": "000093019",
"A": "blabla",
"B": "blabla",
"C": "blabla"
},
{
"DOCN": "000093085",
"B": "blabla",
"C": "blabla",
"D": "blabla"
}
]
f2 = [
{
"DOCN": "000093019",
"A": "blabla",
"C": "blabla",
"D": "blabla",
"E": "blabla"
},
{
"DOCN": "000093085",
"A": "blabla",
"B": "blabla",
"C": "blabla"
}
]
f1 = {item.get("DOCN"): item for item in f1}
f2 = {item.get("DOCN"): item for item in f2}
keys = set(list(f1.keys())+list(f2.keys()))
output = []
for key in keys:
output.append({**f1.get(key), **f2.get(key)})
print(output)

输出为:

[
{
"DOCN": "000093019",
"A": "blabla",
"B": "blabla",
"C": "blabla",
"D": "blabla",
"E": "blabla"
},
{
"DOCN": "000093085",
"B": "blabla",
"C": "blabla",
"D": "blabla",
"A": "blabla"
}
]

我会这样做-使用pandas加载2个文件,concat dataframes,由DOCN组并采取第一个记录(这将采取none none值),然后将其转换为列表并删除none条目-

df1 = pd.read_json("my_file1.json")
df2 = pd.read_json("my_file2.json")
df = pd.concat([df1, df2])
grp = df.groupby("DOCN").first().reset_index()
[{k: v for k, v in record if v} for record in grp.to_dict(orient='records')]