从多行连接Pandas中的json文件



我得到了一个具有以下格式的数据帧(表1(。它只有col1和col2,以及json_col。

id    col1   col2   json_col
1     a      b        json1
2     a      c        json2
3     b      d        json3
4     c      a        json4
5     d      e        json5

我有一个新表(表2(,我想在我的新表中加入json文件

col1   col2   col3  col4  union_json
a      b                 json1
a      b      d          json1 and json3 union
a      b      d     e    json1, json3, and json5 union 
c      a                 json4

以下是表1 的示例

df = pd.DataFrame({'col1': ['a', 'a', 'b', 'c', 'd'],
'col2': ['b', 'c', 'd', 'a', 'e'],
'col3': [{"origin":"a","destination":"b", "arc":[{"Type":"763","Number":"20"}]},
{"origin":"a","destination":"c", "arc":[{"Type":"763","Number":"50"}]},
{"origin":"a","destination":"d", "arc":[{"Type":"723","Number":"40"}]},
{"origin":"c","destination":"a", "arc":[{"Type":"700","Number":"30"}]},
{"origin":"d","destination":"e", "arc":[{"Type":"700","Number":"40"}]}]})

下面是表2的一个例子:

df = pd.DataFrame({'col1': ['a', 'a', 'a', 'c'],
'col2': ['b', 'b', 'b', 'a'],
'col3': ['', 'd', 'd', ''],
'col4': ['', '', 'e', '']})

json1和json2的并集应该是这样的:

[[{"origin":"a","destination":"b", "arc":[{"Type":"763","Number":"20"}]}], 
[{"origin":"a","destination":"d", "arc":[{"Type":"723","Number":"40"}]}]]

我希望我正确理解了你的问题:

from itertools import combinations

def fn(x):
out, non_empty_vals = [], x[x != ""]
for c in combinations(non_empty_vals, 2):
out.extend(df1.loc[df1[["col1", "col2"]].eq(c).all(axis=1), "col3"])
return out

df2["union_json"] = df2.apply(fn, axis=1)
print(df2.to_markdown(index=False))

打印:

col1
aa
ac

最新更新