如何阅读中文文本并将中文字符写入csv-Python 3

我已经搜索了SO，但未能找到这个特定问题的答案。我正在试着从一个.txt文件中读取中文字符。当我尝试写入.csv时，单元格的内容如下所示：

b'\xef\xb\xbf\xe5'

与相反

山西襄汾

如何将后一种格式输出到.csv？相关代码段如下：

infilehandle = open(infilepath, encoding = 'utf-8') # open .txt file
txtlines = infilehandle.read().replace('n', '')
date_pattern = re.compile('(d{4}.d{1,2}.d{1,2})')
date = date_pattern.findall(txtlines)[0]
title = txtlines.split(date)[0]
localrow = []
localrow.append(date.encode("utf-8-sig"))
localrow.append(title.encode("utf_8_sig"))
outfilehandle.writerow(localrow) # writes to .csv

首先，确保使用encoding='utf-8'创建outfilehandle，如Peter Wood所建议的：

outfilehandle = csv.writer(open('outfile.csv', 'w', encoding='utf-8'))

然后不需要调用date.encode("utf-8-sig")，只需将代码片段中的第7-8行更改为：

localrow.append(date)
localrow.append(title)

此外，在Python 3中阅读Python Unicode HOWTO和处理文本文件可能会有所帮助。

相关内容

最新更新

热门标签：