不能用Pandas读取JSON,一个用UCS-2小端码编码的文件


with open(filename+'.json') as json_file:
data=pd.io.json.read_json(json_file,encoding='utf_16_be')

我尝试了多种编码选项,但它失败了。它返回空对象。我可以转换只有当保存我的文件在notepad++为UTF8没有BOM。我像往常一样打开它,使用默认编码:

with open(filename+'.json') as json_file:
data=pd.io.json.read_json(json_file)

文件的默认编码是UTC-2 Little Endian。如何读取json与这种编码?

阅读并跟随import pandas as pd; help (pd.io.json.read_json)。下面的代码片段(部分注释)可能会有所帮助:

filename = r"D:PShellDataFiles61571258" # my test case
import pandas as pd
filepath = filename + ".json"
# define encoding while opening a file 
with open(filepath, encoding='utf-16') as f:
data = pd.io.json.read_json(f)
# or open file in binary mode and decode while converting to pandas object
with open(filepath, mode='rb') as f:
atad = pd.io.json.read_json(f, encoding='utf-16')
# ensure that both above methods are equivalent
print((data == atad).values)

:.SO69537408.py

[[真真真真真真真真真真]]

最新更新