如何处理"_csv.Error: line contains NULL byte"？

我正在尝试修复CSV文件中的空字节问题。

csv_file对象是从我的Flask应用程序中的另一个函数传入的：

stream = codecs.iterdecode(csv_file.stream, "utf-8-sig", errors="strict")
dict_reader = csv.DictReader(stream, skipinitialspace=True, restkey="INVALID")

for row in dict_reader:  # Error is thrown here
...

控制台中抛出的错误为_csv.Error: line contains NULL byte。

到目前为止，我已经尝试过：

不同的编码类型(我检查了编码类型，它是utf-8-sig(
使用.replace('x00', '')

但我似乎无法删除这些空字节。

我想删除空字节并用空字符串替换它们，但我也可以跳过包含空字节的行；我无法共享我的csv文件。

编辑：我达成的解决方案：

content = csv_file.read()
# Converting the above object into an in-memory byte stream
csv_stream = io.BytesIO(content)
# Iterating through the lines and replacing null bytes with empty 
string
fixed_lines = (line.replace(b'x00', b'') for line in csv_stream)

# Below remains unchanged, just passing in fixed_lines instead of csv_stream
stream = codecs.iterdecode(fixed_lines, 'utf-8-sig', errors='strict')
dict_reader = csv.DictReader(stream, skipinitialspace=True, restkey="INVALID")

我认为您的问题肯定需要显示您期望从csv_file.stream获得的字节流的示例。

我喜欢督促自己更多地了解Python的IO、编码/解码和CSV方法，所以我已经为自己做了很多工作，但可能不希望其他人这样做。

import csv
from codecs import iterdecode
import io
# Flask's file.stream is probably BytesIO, see https://stackoverflow.com/a/18246385 
# and the Gist in the comment, https://gist.github.com/lost-theory/3772472?permalink_comment_id=1983064#gistcomment-1983064
csv_bytes = b'''xefxbbxbf C1, C2
r1c1, r1c2
r2c1, r2c2, r2c3x00'''
# This is what Flask is probably giving you
csv_stream = io.BytesIO(csv_bytes)
# Fixed lines is another iterator, `(line.repl...)` vs. `[line.repl...]`
fixed_lines = (line.replace(b'x00', b'') for line in csv_stream)
decoded_lines = iterdecode(fixed_lines, 'utf-8-sig', errors='strict')
reader = csv.DictReader(decoded_lines, skipinitialspace=True, restkey="INVALID")
for row in reader:
print(row)

我得到：

{'C1': 'r1c1', 'C2': 'r1c2'}
{'C1': 'r2c1', 'C2': 'r2c2', 'INVALID': ['r2c3']}

相关内容

最新更新

热门标签：