我试图创建一个for
循环,首先检查检查一列('col1'
)是否有重复项,如果为真,将另一列('col2'
)的值添加到('col1'
)。
下面的语句工作,然而,所有('col1'
)值被视为重复。我确定这一列中很少有重复的,但不知何故,这个语句一直返回真。我认为问题出在第二行包含.duplicated()
import pandas as pd
tuple = [['Jake','NY'],['Tom','Montana'],['Hannah','Cali'],['Jason','Boston'],['Tom','Washington'],['Hannah','Florida']]
df = pd.DataFrame(tuple, columns=('col1', 'col2'))
for i in df['col1']:
if df['col1'].duplicated().any():
df['col1'] = df['col1'] + ' - ' + df['col2']
您可以使用duplicate找到重复的值,如您在示例中所示,这可以使用loc
:
作为过滤器使用df.loc[df['col1'].duplicated(keep=False), 'col1']
你可以用一个新值来替换这些值:
temp = df['col1'] + ' - ' + df['col2']
df.loc[df['col1'].duplicated(keep=False), 'col1'] = temp
以OP:
为例import pandas as pd
tuple = [['Jake','NY'],['Tom','Montana'],['Hannah','Cali'], ['Jason','Boston'],['Tom','Washington'],['Hannah','Florida']]
df = pd.DataFrame(tuple, columns=('col1', 'col2'))
temp = df['col1'] + ' - ' + df['col2']
df.loc[df['col1'].duplicated(keep=False), 'col1'] = temp
df
col1 col2
Jake NY
Tom - Montana Montana
Hannah - Cali Cali
Jason Boston
Tom - Washington Washington
Hannah - Florida Florida