删除csv中的换行符

我有一个CSV文件有错误。最常见的是太早换行。

但现在我不知道如何理想地去除它。如果我逐行读取

with open("test.csv", "r") as reader:
test = reader.read().splitlines()

错误的结构已经在我的变量中。这仍然是正确的方法，我是否在测试中使用for循环并创建一个副本，或者我可以在测试变量中直接操作，同时迭代它?

我可以通过分号来识别损坏的行，有些行以a结尾;其他人从它开始。也许数数是另一种解决方法?

编辑:我将reader.read().splitlines()替换为reader.readlines()，这样我就可以处理以;

结尾的行了

for line in lines:
if("Foobar" in line):
line = line.replace("Foobar", "")
if(";n" in line):
line = line.replace(";n", ";")

只保留以a开头的行;因为我需要返回列表中的一个条目

的例子:

Col_a;Col_b;Col_c;Col_d 
2021;Foobar;Bla 
;Blub

b属于上面那一行。

下面是一个简单的Python脚本，用于合并行，直到您拥有所需的字段数量。

import sys
sep = ';'
fields = 4
collected = []
for line in sys.stdin:
new = line.rstrip('n').split(sep)
if collected:
collected[-1] += new[0]
collected.extend(new[1:])
else:
collected = new
if len(collected) < fields:
continue
print(';'.join(collected))
collected = []

这只是从标准输入读取并打印到标准输出。如果最后一行不完整，它将丢失。分隔符和字段数可以编辑到顶部的变量中;将这些作为命令行参数公开作为练习。

如果您想保留换行符，那么只从最后一个字段中去掉换行符并使用csv.writer将字段写回正确引用的CSV并不是太难。

我是这样处理的。如果列多于需要的列，或者在中间有一个换行符，这个函数将修复行。

函数参数为:

message -文件内容- reader.read()
columns -期望列数
filename - filename(我用它来记录)

def pre_parse(message, columns, filename):
parsed_message=[]
i =0
temp_line =''
for line in message.splitlines():
#print(line)
split = line.split(',')
if len(split) == columns:
parsed_message.append(line)
elif len(split) > columns:
print(f'Line {i} has been truncated in file {filename} - too much columns'))
split = split[:columns]
line = ','.join(split)
parsed_message.append(line)
elif len(split) < columns and temp_line =='':
temp_line = line.replace('n','')
print(temp_line)
elif temp_line !='':
line = temp_line+line
if line.count(',') == columns-1:
print((f'Line {i} has been fixed in file {filename} - extra line feed'))
parsed_message.append(line)
temp_line =''
else:
temp_line=line.replace('n', '')
i+=1
return parsed_message

确保你使用正确的分割字符和换行字符。

我最终用这篇文章创建了一个解决方案:在Python 3.6中用LF代替CRLF，它也帮助我克服了困难，并提供了对底层发生的事情的理解。

OldFile=r"c:Testinput.csv"
NewFile=r"C:Testoutput.csv"
#reading it in as binary keeps the cr lf in windows as is
with (
open(OldFile, 'rb') as f_in,
open(NewFile, 'wb') as f_out,
):

FileContent = f_in.read()
#removing all line breaks including the ones after the carriage return
oldLineFeed = b'n'
newLineFeed = b''
FileContent = FileContent.replace(oldLineFeed, newLineFeed)
#only have a carriage return now at the end of each true line, added back in the line break 
oldLineFeed = b'r'
newLineFeed = b'rn'
FileContent = FileContent.replace(oldLineFeed, newLineFeed)
f_out.write(FileContent)
f_in.close()
f_out.close()

相关内容

最新更新

热门标签：