我有一个数据帧df
:
name rank
A captain, general, soldier
B general, foo, major
C foo
D captain, major
E foo, foo, foo
我想检查列rank
中是否有任何单元格由foo
组成,以及是否有用foo
替换整个单元格。
预期输出:
name rank
A captain, general, soldier
B foo
C foo
D captain, major
E foo
我该怎么做?
df['rank'].replace('.*foo.*', 'foo', regex=True, inplace=True)
# OR
df['rank'].mask(df['rank'].str.contains('foo'), 'foo', inplace=True)
# OR
df.loc[df['rank'].str.contains('foo'), 'rank'] = 'foo'
输出:
name rank
0 A captain, general, soldier
1 B foo
2 C foo
3 D captain, major
4 E foo
您可以将lambda函数apply
添加到列:
df["rank"] = df["rank"].apply(lambda x: "foo" if "foo" in x.split(", ") else x)
在分隔符上进行拆分可以检查单词。例如,世界";foobar";不会在其行上触发转换。
编辑:感谢BeRT2me建议用","分隔。
mask = df['rank'].str.contains('foo')
df.loc[mask, 'rank'] = 'foo'
if df['rank'].str.contains('foo').any():
df['rank']='foo'