在单个 json 对象中导出多个熊猫数据帧



我有多个pandas.DataFrame的对象,我想将它们转储到一个json字符串中。

假设我有以下两个 dfs:

import pandas as pd
import json
df1 = pd.DataFrame(
[["a", "b"], ["c", "d"]],
index=["row 1", "row 2"],
columns=["col 1", "col 2"],
)
df2 = pd.DataFrame(
[["A", "B", "C"], ["D", "E", "F"]],
index=["Row 1", "Row 2"],
columns=["Col 1", "Col 2", "Col3"],
)

我想将它们导出为单个 json 字符串:

{"df1":
{"columns":
["col 1", "col 2"],
"index":
["row 1", "row 2"],
"data":
[["a", "b"], ["c", "d"]]
},
"df2":
{"columns":
["Col 1", "Col 2", "Col3"],
"index":
["Row 1", "Row 2"],
"data":
[["A", "B", "C"], ["D", "E", "F"]]
}
}

我的尝试

尝试 1

如果我在python中创建一个包含两个数据帧的字典,然后将其传递给json.dumps,我会收到一个TypeError,因为json不知道如何序列化pandas.DafaFrame

out = {'df1': df1,
'df2': df2
}
out = json.dumps(out) #<-- Raises TypeError: Object of type DataFrame is not JSON serializable

尝试 2

如果我使用pandas.DataFrame.to_json方法单独序列化每个 df 作为

df1_jsonstr = df1.to_json(orient='split')
df2_jsonstr = df2.to_json(orient='split')
out = {'df1': df1_jsonstr,
'df2': df2_jsonstr
}
out  = json.dumps(out)

输出如下所示:

{"df1": "{"columns":["col 1","col 2"],"index":["row 1","row 2"],"data":[["a","b"],["c","d"]]}", "df2": "{"columns":["Col 1","Col 2","Col3"],"index":["Row 1","Row 2"],"data":[["A","B","C"],["D","E","F"]]}"}

pandas.DataFrame.to_json生成的两个字符串都已转义并引用。当我尝试将它们加载回data = json.loads(out)时,两个数据帧被视为(正确)字符串并按此加载。

尝试 3

我发现生成我想要的 json 文件的唯一方法是使用pandas.DataFrame.to_json将数据帧转储到 json,然后将它们加载回带有json.loads字典,然后再次将它们一起转储。这看起来像:

df1_json = df1.to_json(orient='split')
df2_json = df2.to_json(orient='split')
out = {'df1': json.loads(df1_json),
'df2': json.loads(df2_json)
}
out = json.dumps(out)
data = json.loads(out)

这行得通,但是如果df1df2有数十万或数百万行,您可以理解这执行转换三次(pd.DataFrame->str->dict->str)变得效率低下。

问题

有没有办法达到与上一个示例相同的结果,但执行一次转换?

我认为你可以做这样的事情:

out = """
{
"df1": """ + df1.to_json(orient='split') + """,
"df2": """ + df2.to_json(orient='split') + """
}
"""

或:

df1_json = df1.to_dict()
df2_json = df2.to_dict()
out = {'df1': df1_json
'df2': df2_json
}
out = json.dumps(out)
data = json.loads(out)

最新更新