读取 json 到 Pandas 数据帧时出现"trailing data"错误



我有一个Python 3.8.5脚本,它从API获取JSON,保存到磁盘,读取JSON到DF。它有效。

df = pd.io.json.read_json('json_file', orient='records')

我想尝试IO缓冲区,这样我就不必读/写磁盘,但我遇到了一个错误。代码如下:

from io import StringIO
io = StringIO()
json_out = []
# some code to append API results to json_out
json.dump(json_out, io)
df = pd.io.json.read_json(io.getvalue())

在最后一行,我得到错误

File "C:UserschapAnaconda3libsite-packagespandasutil_decorators.py", line 199, in wrapper
return func(*args, **kwargs)
File "C:UserschapAnaconda3libsite-packagespandasutil_decorators.py", line 296, in wrapper
return func(*args, **kwargs)
File "C:UserschapAnaconda3libsite-packagespandasiojson_json.py", line 618, in read_json
result = json_reader.read()
File "C:UserschapAnaconda3libsite-packagespandasiojson_json.py", line 755, in read
obj = self._get_object_parser(self.data)
File "C:UserschapAnaconda3libsite-packagespandasiojson_json.py", line 777, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "C:UserschapAnaconda3libsite-packagespandasiojson_json.py", line 886, in parse
self._parse_no_numpy()
File "C:UserschapAnaconda3libsite-packagespandasiojson_json.py", line 1119, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None
ValueError: Trailing data

JSON采用列表格式。所以这不是真正的json,但当我写到磁盘时,它看起来是这样的:

json = [
{"state": "North Dakota",
"address": "123 30th st E #206",
"account": "123"
},
{"state": "North Dakota",
"address": "456 30th st E #206",
"account": "456"
}
]

考虑到它在第一种情况下(从磁盘写入/读取(是有效的,我不知道如何进行故障排除。如何排除缓冲区中的故障?实际数据主要是文本,但也有一些数字字段。

不知道你出了什么问题,这对我有用:

import json
import pandas as pd
from io import StringIO
json_out = [
{"state": "North Dakota",
"address": "123 30th st E #206",
"account": "123"
},
{"state": "North Dakota",
"address": "456 30th st E #206",
"account": "456"
}
]
io = StringIO()
json.dump(json_out, io)
df = pd.io.json.read_json(io.getvalue())
print(df)

让我相信附加API数据的代码有问题。。。

但是,如果您有一个字典列表,则不需要IO步骤。你可以做:

pd.DataFrame(json_out)

编辑:我想我记得这个错误,当时我的json末尾有一个逗号,像这样:

[
{
"hello":"world",
},
]

最新更新