我在 Python 中将函数应用于字典的所有叶子(从 JSON 文件加载(时遇到问题。文本编码错误,我想使用 ftfy 模块来修复它。
这是我的函数:
def recursive_decode_dict(e):
try:
if type(e) is dict:
print('Dict: %s' % e)
return {k: recursive_decode_dict(v) for k, v in e.items()}
elif type(e) is list:
print('List: %s' % e)
return list(map(recursive_decode_dict, e))
elif type(e) is str:
print('Str: %s' % e)
print('Transformed str: %s' % e.encode('sloppy-windows-1252').decode('utf-8'))
return e.encode('sloppy-windows-1252').decode('utf-8')
else:
return e
我称之为这种方式:
with open('test.json', 'r', encoding='utf-8') as f1:
json_content = json.load(f1)
recursive_decode_dict(json_content)
with open('out.json', 'w', encoding='utf-8') as f2:
json.dump(json_content, f2, indent=2)
控制台输出很好:
> python fix_encoding.py
List: [{'fields': {'field1': 'the European-style café into a '}}]
Dict: {'fields': {'field1': 'the European-style café into a '}}
Dict: {'field1': 'the European-style café into a '}
Str: the European-style café into a
Transformed str: the European-style café into a
但是我的输出文件不是固定的:
[
{
"fields": {
"field1": "the European-style cafu00c3u00a9 into a "
}
}
]
如果您正在处理的是 JSON 数据,则可以改为挂接到 JSON 解码器并在遇到字符串时修复字符串。
不过,这确实需要使用速度较慢的基于 Python 的 JSON 解析器,但对于一次性转换来说,这可能不是问题......
import json
import ftfy
decoder = json.JSONDecoder()
def ftfy_parse_string(*args, **kwargs):
string, length = json.decoder.scanstring(*args, **kwargs)
string = string.encode("sloppy-windows-1252").decode("utf-8")
return (string, length)
decoder.parse_string = ftfy_parse_string
decoder.scan_once = json.scanner.py_make_scanner(decoder)
print(decoder.decode(r"""[
{
"fields": {
"field1": "the European-style café into a "
}
}
]"""))
输出
[{'fields': {'field1': 'the European-style café into a '}}]