python脚本未编码到utf-8



我有这个Python 3脚本来读取json文件并保存为csv。除了像u00e9这样的特殊字符外,它工作得很好。所以Montru00e9al应该像Montréal一样编码,但它给了我Montréal

import json
ifilename = 'business.json'
ofilename = 'business.csv'
json_lines = [json.loads( l.strip() ) for l in open(ifilename).readlines() ]
OUT_FILE = open(ofilename, "w", newline='', encoding='utf-8')
root = csv.writer(OUT_FILE)
root.writerow(["business_id","name","neighborhood","address","city","state"])
json_no = 0
for l in json_lines:
root.writerow([l["business_id"],l["name"],l["neighborhood"],l["address"],l["city"],l["state"]])
json_no += 1
print('Finished {0} lines'.format(json_no))
OUT_FILE.close()

事实证明,当用Notepad++打开csv文件时,它显示正确,而不是用Excel打开。所以我不得不用Excel导入csv文件,并指定65001:Unicode(UTF-8(。谢谢你的帮助。

尝试在文件的顶部使用此选项

# -*- coding: utf-8 -*-

考虑这个例子:

# -*- coding: utf-8 -*-    
import sys
print("my default encoding is : {0}".format(sys.getdefaultencoding()))
string_demo="Montréal"
print(string_demo)
reload(sys) # just in python2.x
sys.setdefaultencoding('UTF8') # just in python2.x
print("my default encoding is : {0}".format(sys.getdefaultencoding()))
print(str(string_demo.encode('utf8')), type(string_demo.encode('utf8')))

在我的例子中,如果我在python2.x中运行,输出是这样的:

my default encoding is : ascii
Montréal
my default encoding is : UTF8
('Montrxc3xa9al', <type 'str'>)

但是当我注释掉重载和setdefaultencoding行时,我的输出是这样的:

my default encoding is : ascii
Montréal
my default encoding is : ascii
Traceback (most recent call last):
File "test.py", line 12, in <module>
print(str(string_demo.encode('utf8')), type(string_demo.encode('utf8')))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5: ordinal not in range(128)

这是编辑器的最大问题,Python在发生编码错误时会引发异常。