递归转换 Python 中的字典叶子

我在 Python 中将函数应用于字典的所有叶子(从 JSON 文件加载(时遇到问题。文本编码错误，我想使用 ftfy 模块来修复它。

这是我的函数：

def recursive_decode_dict(e):
try:
if type(e) is dict:
print('Dict: %s' % e)
return {k: recursive_decode_dict(v) for k, v in e.items()}
elif type(e) is list:
print('List: %s' % e)
return list(map(recursive_decode_dict, e))
elif type(e) is str:
print('Str: %s' % e)
print('Transformed str: %s' % e.encode('sloppy-windows-1252').decode('utf-8'))
return e.encode('sloppy-windows-1252').decode('utf-8')
else:
return e

我称之为这种方式：

with open('test.json', 'r', encoding='utf-8') as f1:
json_content = json.load(f1)
recursive_decode_dict(json_content)

with open('out.json', 'w', encoding='utf-8') as f2:
json.dump(json_content, f2, indent=2)

控制台输出很好：

> python fix_encoding.py 
List: [{'fields': {'field1': 'the European-style cafÃ© into a '}}]
Dict: {'fields': {'field1': 'the European-style cafÃ© into a '}}
Dict: {'field1': 'the European-style cafÃ© into a '}
Str: the European-style cafÃ© into a 
Transformed str: the European-style café into a

但是我的输出文件不是固定的：

[
{
"fields": {
"field1": "the European-style cafu00c3u00a9 into a "
}
}
]

如果您正在处理的是 JSON 数据，则可以改为挂接到 JSON 解码器并在遇到字符串时修复字符串。

不过，这确实需要使用速度较慢的基于 Python 的 JSON 解析器，但对于一次性转换来说，这可能不是问题......

import json
import ftfy

decoder = json.JSONDecoder()

def ftfy_parse_string(*args, **kwargs):
string, length = json.decoder.scanstring(*args, **kwargs)
string = string.encode("sloppy-windows-1252").decode("utf-8")
return (string, length)

decoder.parse_string = ftfy_parse_string
decoder.scan_once = json.scanner.py_make_scanner(decoder)
print(decoder.decode(r"""[
{
"fields": {
"field1": "the European-style cafÃ© into a "
}
}
]"""))

输出

[{'fields': {'field1': 'the European-style café into a '}}]

相关内容

最新更新

热门标签：