我的DataFrame
中有两列。如果第一列中的文本是第二列中的子字符串,我希望将第一列的值替换为第二列。
示例:
Input:
col1 col2
-----------------
text1 text1 and text2
some text some other text
text 3
text 4 this is text 4
Output:
col1 col2
------------------------------
text1 and text2 text1 and text2
some text some other text
text 3
this is text 4 this is text 4
如您所见,我已经替换了第1行和第4行,因为第1行中的文本第1列是第2列的子字符串。
我怎样才能在熊猫身上做这个手术?
尝试df.apply
和axis=1
。
因此,这将遍历每一行,并检查col1是否是col2的子字符串
如果是,则返回col2,否则返回col1
df['col1'] = df.apply(lambda row: row['col2'] if row['col1'] in row['col2'] else row['col1'], axis=1)
完整代码:
df = pd.DataFrame({'col1': ['text1', 'some text', 'text 3', 'text 4'], 'col2': ['text1 and text2', 'some other text', '', 'this is text 4']})
df['new_col1'] = df.apply(lambda row: row['col2'] if row['col1'] in row['col2'] else row['col1'], axis=1)
df
col1 col2 new_col1
0 text1 text1 and text2 text1 and text2
1 some text some other text some text
2 text 3 text 3
3 text 4 this is text 4 this is text 4
通过zip
:的NaN安全python选项
import numpy as np
import pandas as pd
df = pd.DataFrame({
'col1': {0: 'text1', 1: 'some text', 2: 'text 3 ', 3: 'text 4'},
'col2': {0: 'text1 and text2', 1: 'some other text', 2: np.nan,
3: 'this is text 4'}
})
df['col1'] = [b if isinstance(b, str) and a in b else a
for a, b in zip(df['col1'], df['col2'])]
通过fillna
+apply
:的NaN安全熊猫选项
import numpy as np
import pandas as pd
df = pd.DataFrame({
'col1': {0: 'text1', 1: 'some text', 2: 'text 3 ', 3: 'text 4'},
'col2': {0: 'text1 and text2', 1: 'some other text', 2: np.nan,
3: 'this is text 4'}
})
df['col1'] = df.fillna('').apply(
lambda x: x['col2'] if x['col1'] in x['col2'] else x['col1'],
axis=1
)
通过布尔索引isna
+loc
:的另一个选项
m = ~df['col2'].isna()
df.loc[m, 'col1'] = df[m].apply(
lambda x: x['col2'] if x['col1'] in x['col2'] else x['col1'],
axis=1
)
df
:
col1 col2
0 text1 and text2 text1 and text2
1 some text some other text
2 text 3 NaN
3 this is text 4 this is text 4