我有一个csv文件,我试图用python清理。
用<<, n祝辞的在或者空行。
我希望每一行不以<<";在比;要剪切/粘贴到前一行。
这里有一个更明确的具体例子!
CSV FILE I HAVE
*"id","name","age","city","remark"
"1","kevin","27","paris","This is too bad"
"8","angel","18","london","Incredible !!!"
"14","maria","33","madrid","i can't believe it."
"16","john","28","new york","hey men,
nhow do you did this"
"22","naima","35","istanbul","i'm sure it's false,
nit can't be real"
"35","marco","26","roma","you'r my hero!"
"39","lili","37","tokyo","all you need to knows.
nnthe best way to upgrade easely"
...*
CSV文件我想有
*"id","name","age","city","remark"
"1","kevin","27","paris","This is too bad"
"8","angel","18","london","Incredible !!!"
"14","maria","33","madrid","i can't believe it."
"16","john","28","new york","hey men,how do you did this"
"22","naima","35","istanbul","i'm sure it's false, it can't be real"
"35","marco","26","roma","you'r my hero!"
"39","lili","37","tokyo","all you need to knows. the best way to upgrade easely"
...*
有人会怎么办?
事先感谢您的帮助!
我实际上在尝试这个python代码——>
text = open("input.csv", "r", encoding='utf-8')
text = ''.join([i for i in text])
text = text.replace("\n", "")
x = open("output.csv","w")
x.writelines(text)
x.close()
for this_row in read_file.readlines():
if not this_row.startswith('"'):
prev_row = prev_row.rstrip('n') + this_row
else:
write_file.write(prev_row)
prev_row = this_row
只是草稿。您可以将string .join与list-cache一起使用以获得增强
这里有几点需要说明:
-
您的CSV文件在备注中包含
,
字符。这意味着该字段必须用引号括起来(它确实是)。 -
CSV文件允许在单个字段中包含换行符。这不会导致额外的数据行,但它确实使文件对于人类来说读起来很奇怪。
-
Python的CSV阅读器将自动处理字段中的换行符。
-
最后,您的数据似乎被奇怪地编码,您希望删除所有额外的换行符。每行还有一个不应该在那里的尾随反斜杠字符。
我建议这样做:
- 使用Python的CSV阅读器一次正确读取一行(你有7行+一个标题)。
- 从备注字段中删除所有换行符 例如:
import csv
with open('input.csv') as f_input, open('output.csv', 'w', newline='') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
for row in csv_input:
if len(row) == 5: # skip blank lines
row[4] = row[4].replace('n', '').replace('\n', ' ').replace('\', '')
csv_output.writerow(row)
这将给你:
id,name,age,city,remark
1,kevin,27,paris,This is too bad
8,angel,18,london,Incredible !!!
14,maria,33,madrid,i can't believe it.
16,john,28,new york,"hey men, how do you did this"
22,naima,35,istanbul,"i'm sure it's false, it can't be real"
35,marco,26,roma,you'r my hero!
39,lili,37,tokyo,all you need to knows. the best way to upgrade easely
input.csv
文件内容:
"id","name","age","city","remark"
"1","kevin","27","paris","This is too bad"
"8","angel","18","london","Incredible !!!"
"14","maria","33","madrid","i can't believe it."
"16","john","28","new york","hey men,
how do you did this"
"22","naima","35","istanbul","i'm sure it's false,
nit can't be real"
"35","marco","26","roma","you'r my hero!"
"39","lili","37","tokyo","all you need to knows.
the best way to upgrade easely"
可能的(快速和简单的)解决方案如下:
with open('input.csv', 'r', encoding='utf-8') as file:
data = file.read()
clean_data = data.replace('"n"', '"||"').replace("n", "").replace('"||"', '"n"')
with open('output.csv', 'w', encoding='utf-8') as file:
file.write(clean_data)
返回output.csv
内容:
"id","name","age","city","remark"
"1","kevin","27","paris","This is too bad"
"8","angel","18","london","Incredible !!!"
"14","maria","33","madrid","i can't believe it."
"16","john","28","new york","hey men,how do you did this"
"22","naima","35","istanbul","i'm sure it's false,nit can't be real"
"35","marco","26","roma","you'r my hero!"
"39","lili","37","tokyo","all you need to knows.the best way to upgrade easely"