熊猫数据帧：从整个数据帧的所有单元格值中添加和删除前缀/后缀

要在数据框中添加前缀/后缀，我通常会执行以下操作。

例如，添加后缀'@'，

df = df.astype(str) + '@'

这基本上已将'@'附加到所有单元格值中。

我想知道如何删除此后缀。pandas.dataframe类是否可以直接使用一种方法，可以从整个dataframe中删除特定的前缀/后缀字符？

我尝试在使用rstrip('@')时尝试通过行（AS系列）进行迭代：

：

for index in range(df.shape[0]):
    row = df.iloc[index]
    row = row.str.rstrip('@')

现在，为了使数据框架从本系列中制成

new_df = pd.DataFrame(columns=list(df))
new_df = new_df.append(row)

但是，这行不通。给出空数据框。

我是否缺少一些基本的东西？

您可以使用applymap将字符串方法应用于每个元素：

df = df.applymap(lambda x: str(x).rstrip('@'))

注意：我不希望这能像矢量化方法那样快： pd.Series.str.rstrip即分别转换每列

您可以使用 apply和pd.series的 str.strip方法：

In [13]: df
Out[13]:
       a       b      c
0    dog   quick    the
1   lazy    lazy    fox
2  brown   quick    dog
3  quick     the   over
4  brown    over   lazy
5    fox   brown  quick
6  quick     fox    the
7    dog  jumped    the
8   lazy   brown    the
9    dog    lazy    the
In [14]: df = df + "@"
In [15]: df
Out[15]:
        a        b       c
0    dog@   quick@    the@
1   lazy@    lazy@    fox@
2  brown@   quick@    dog@
3  quick@     the@   over@
4  brown@    over@   lazy@
5    fox@   brown@  quick@
6  quick@     fox@    the@
7    dog@  jumped@    the@
8   lazy@   brown@    the@
9    dog@    lazy@    the@
In [16]: df = df.apply(lambda S:S.str.strip('@'))
In [17]: df
Out[17]:
       a       b      c
0    dog   quick    the
1   lazy    lazy    fox
2  brown   quick    dog
3  quick     the   over
4  brown    over   lazy
5    fox   brown  quick
6  quick     fox    the
7    dog  jumped    the
8   lazy   brown    the
9    dog    lazy    the

注意，您的方法不起作用，因为当您在前面进行以下作业时：

row = row.str.rstrip('@')

这仅将row.str.strip的结果分配给名称row而不突变DataFrame。对于所有Python对象和简单的名称分配是相同的行为：

In [18]: rows = [[1,2,3],[4,5,6],[7,8,9]]
In [19]: print(rows)
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
In [20]: for row in rows:
    ...:     row = ['look','at','me']
    ...:
In [21]: print(rows)
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

要实际更改使用突变器方法所需的基础数据结构：

In [22]: rows
Out[22]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
In [23]: for row in rows:
    ...:     row.append("LOOKATME")
    ...:
In [24]: rows
Out[24]: [[1, 2, 3, 'LOOKATME'], [4, 5, 6, 'LOOKATME'], [7, 8, 9, 'LOOKATME']]

请注意，切片分配只是突变器方法的句法糖：

In [26]: rows
Out[26]: [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
In [27]: for row in rows:
    ...:     row[:] = ['look','at','me']
    ...:
    ...:
In [28]: rows
Out[28]: [['look', 'at', 'me'], ['look', 'at', 'me'], ['look', 'at', 'me']]

这类似于pandas loc或基于iloc的分配。

您可以使此真正简单，只使用pandas.dataframe.replace（）方法将所有"@"用"：

替换

df.replace("@", "")

如果您担心"@"不仅在值结束时被替换，还可以使用正则：

df.replace("@$", "", regex=True)

相关内容

最新更新

热门标签：