如何将文本文件(包含","作为分隔符)转换为熊猫数据帧



我正在尝试读取一个包含以下内容(约100万行(的文本文件:

第一行:"column_header"、"column_header"、"column_header"one_answers"column_header">

第二行以后:"value","value"、"value"one_answers"value">

我尝试了以下操作:

''' try 1 '''
with open(file, 'rt') as f:
contents = f.readlines()
for i in contents:
print(i) # ->> seeing the text as ," value ", " value ", "
x = [_.strip().replace('""', '').split(',') for _ in i]
print(str(x)) # ->> getting bytez
''' try 2 '''
with open(file, 'rt') as f:
contents = f.read()
for i in contents:
print(str(i)) # ->> text but cannot do anything
''' try 3 '''
frame = pd.read_csv(file, sep=',', doublequote=True, skip_blank_lines=True) # ->> utf parsing error

我发现收到的文本文件没有编码utc-8。因此,上述两种方法都不起作用。我的解决方案:打开并另存为.txt(utf8编码(。比使用以下python代码:

file = folder_location + 'report.txt'
''' try 3 '''
frame = pd.read_csv(file, sep=',', doublequote=True, skip_blank_lines=True)
print(frame.head())

最新更新