读取带逗号和双引号的csv文件



我在S3桶中有一个CSV文件(逗号分隔)。很少有字段有逗号,CSV文件看起来像这样:

Q,W,E,R
A,S,"D,F",G
Z,X,C,V

当我在pandas中阅读时,我应该在一列中获得"D,F"的4列,但我得到了额外的一列。

我的代码;我尝试了不同的方法,但所有的尝试都不起作用:

import io
import csv
import pandas as pd
#encoding
result = chardet.detect(self.raw_content)
self.encoding = result['encoding']
#csv_delimiter 
is being read from the DB ( , in this case)
#max_columns 
is NUMBER of columns in the csv file
#reading from s3 bucket
self.raw_content = obj['Body'].read()
content = io.BytesIO(self.raw_content)
#Try 1
df_s3_file = pd.read_csv(content, delimiter=csv_delimiter, engine='python',
dtype=object, encoding=self.encoding, quotechar='"',
names=list(range(0,max_columns)))
#Try 2
df_s3_file = pd.read_csv(content, delimiter=csv_delimiter, engine='python',
dtype=object, encoding=self.encoding, quoting=csv.QUOTE_ALL,
names=list(range(0,max_columns)))
#Try 3
df_s3_file = pd.read_csv(content, delimiter=csv_delimiter, dtype=object,
encoding=self.encoding, quoting=csv.QUOTE_ALL,
names=list(range(0,max_columns)))           

当前结果:

0    1    2    4    5
Q    W    E    R    NaN
A    S    "D   F"   G
Z    X    C    V    NaN  

预期结果:

0    1    2    4
Q    W    E    R
A    S    D,F  G
Z    X    C    V

您可以使用以下代码处理它(在https://stackoverflow.com/a/64456792/5660315):

之后)
from io import StringIO
import csv
import pandas as pd
s="""
Q,W,E,R
A,S,"D,F",G
Z,X,C,V
"""
df = pd.read_csv(StringIO(s),
names=range(4),
sep=',',
quoting=csv.QUOTE_ALL,
quotechar='"'
)
print(df)
#    0  1    2  3
# 0  Q  W    E  R
# 1  A  S  D,F  G
# 2  Z  X    C  V

相关内容

  • 没有找到相关文章

最新更新