我想像这样使用panda读取一个固定宽度的文件.txt:
option19971675181 ACHILLE BLA BLA BLA1 blabla 88 498
option19971675182 ACHILLE BLA BLA BLA1 blabla 176 498
option19971675183 ACHILLE BLA BLA BLA1 blabla 191 498
option19971675184 ACHILLE BLA BLA BLA1 blabla 521 498
option19971675185 ACHILLE BLA BLA BLA1 blabla 919 498
option19971675186 ACHILLE BLA BLA BLA134234531 blabla 10 498
option19971675187 ACHILLE BLA BLA BLA134234531 7 65 blabla 0 0
option19971675188 ACHILLE BLA BLA BLA1342 90345 31 blabla 0 0
option19971675189 ACHILLE BLA BLA BLA 134 23N 094 87OP531 blabla 0 0
option19971675190 ACHILLE BLA BLA BLA 134 23N 094 87OP53 blabla 0 0
我试着把文件读成熊猫的样子。文件的值用空格分隔
但我不知道如何将文本选项199716751810分为两列。
我使用了答案中的代码,它有效,但不适用于第一行
df = pd.read_csv("test.txt", delimiter ="ss+", header = None,error_bad_lines=False)
df[df.columns[0]] = df[df.columns[0]].str.replace("option199716","")
>>> df
我得到了这个输出
75181 ACHILLE BLA BLA BLA1 blabla 88 498
75182 ACHILLE BLA BLA BLA1 blabla 176 498
75183 ACHILLE BLA BLA BLA1 blabla 191 498
75184 ACHILLE BLA BLA BLA1 blabla 521 498
75185 ACHILLE BLA BLA BLA1 blabla 919 498
75186 ACHILLE BLA BLA BLA134234531 blabla 10 498
75187 ACHILLE BLA BLA BLA134234531 7 65 blabla 0 0
75188 ACHILLE BLA BLA BLA1342 90345 31 blabla 0 0
75189 ACHILLE BLA BLA BLA 134 23N 094 87OP531 blabla 0 0
75190 ACHILLE BLA BLA BLA 134 23N 094 87OP53 blabla 0 0
但它仍然显示错误:Skipping line 16: Expected 5 fields in line 136, saw 6. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.
有人能帮我拿这个plz 吗
假设您的文本文件的间距与问题中的完全相同,请尝试以下操作:
df = pd.read_csv("test.txt", delimiter ="ss+")
df[df.columns[0]] = df[df.columns[0]].str.replace("option199716","")
>>> df
0 1 2 3 4
0 751810 Pascal Male 23 11
1 845087 Achille Male 13 12
2 602183 Hera Femelles 9 98
3 802183 Alma Femelles 19 88