如何在python中保留熊猫系列中的前导空格?

我正在尝试通过python中的熊猫read_csv读取文本文件。我的文本文件看起来像(所有值都是数字)：

35 61  7 1 0              # with leading white spaces
0 1 1 1 1 1              # with leading white spaces
33 221 22 0 1              # without leading white spaces
233   2                    # without leading white spaces
1(01-02),2(02-03),3(03-04) # this line cause 'Error tokenizing data. C error: Expected 1 fields in line 5, saw 3

我的 Python 代码如下：

import pandas as pd
df = pd.read_csv('example.txt', header=None)
df

输出如下所示：

CParserError: 'Error tokenizing data. C error: Expected 1 fields in line 5, saw 3

在处理前导空格之前，我需要先处理"标记数据时出错"问题。所以我更改了代码，例如：

import pandas as pd
df = pd.read_csv('example.txt', header=None, error_bad_lines=False)
df

我可以按预期获取带有前导空格的数据，但第 5 行中的数据已经消失。输出如下：

b'Skipping line 5: expected 1 fields, saw 3n
35 61  7 1 0              # with leading white spaces as intended
0 1 1 1 1 1              # with leading white spaces as intended
33 221 22 0 1              # without leading white spaces
233   2                    # without leading white spaces
# 5th line disappeared (not my intention).

所以我尝试更改下面的代码以获得第 5 行。

import pandas as pd
df = pd.read_csv('example.txt', header=None, sep=':::', engine='python')
df

我成功地在第 5 行获取了数据，但第 1 行和第 2 行的前导空格如下：

35 61  7 1 0               # without leading white spaces(not my intention)
0 1 1 1 1 1                # without leading white spaces(not my intention)
33 221 22 0 1              # without leading white spaces
233   2                    # without leading white spaces
1(01-02),2(02-03),3(03-04) # I successfully got this line as intended.

我看到了几篇关于用字符串保留前导空格的帖子，但我找不到用数字保留前导空格的情况。感谢您的帮助。

键在分隔符中。如果指定sep为正则表达式^行首元字符，则此方法有效。

s = pd.read_csv('example.txt', header=None, sep='^', squeeze=True)

s
0                  35 61  7 1 0
1                   0 1 1 1 1 1
2                 33 221 22 0 1
3                       233   2
4    1(01-02),2(02-03),3(03-04)
Name: 0, dtype: object
s[1]
'  0 1 1 1 1 1'

相关内容

最新更新

热门标签：