这应该很简单,但。。。我有一个拙劣的csv,字段中使用了逗号。幸运的是,这个csv只有三列,多余的逗号都在中间一列,所以如果我能删除每行中除了第一个和最后一个之外的所有逗号,我应该没问题。我该如何让csv阅读器做到这一点?
with open('bad.csv') as f, open('good.csv', 'w') as fout:
for line in f:
first, *middle, last = line.split(',')
fout.write(f'{first},"{",".join(middle)}",{last}')
有时,您需要一个直通解决方案,它可以在读取时动态修复文件,而不会生成"固定的";文件,例如,如果您想使用例如pandas.read_csv(...)
直接读取数据。在这种情况下,你可以这样做:
def fix_commas(csv_file):
with open(csv_file) as f:
buf = f.read()
buf = 'n'.join([re.sub(r',,+', ',', s) for s in buf.splitlines()])
return io.StringIO(buf)
# and then
df = pd.read_csv(fix_commas(filename), ...)
示例:
txt = """
first,second,third
a,,b,bbbb
c,,,,,d,,,,,,,e
f,g,h
"""
with open('test.csv', 'w') as f:
f.write(txt)
# now test:
df = pd.read_csv(fix_commas('test.csv'))
结果(在df
中(:
first second third
0 a b bbbb
1 c d e
2 f g h