如何创建For循环来检查一个列是否在Pandas DataFrame中包含重复项?



我试图创建一个for循环,首先检查检查一列('col1')是否有重复项,如果为真,将另一列('col2')的值添加到('col1')。

下面的语句工作,然而,所有('col1')值被视为重复。我确定这一列中很少有重复的,但不知何故,这个语句一直返回真。我认为问题出在第二行包含.duplicated()

的那一行
import pandas as pd
tuple = [['Jake','NY'],['Tom','Montana'],['Hannah','Cali'],['Jason','Boston'],['Tom','Washington'],['Hannah','Florida']]
df = pd.DataFrame(tuple, columns=('col1', 'col2'))
for i in df['col1']:
if df['col1'].duplicated().any():
df['col1'] = df['col1'] + ' - ' +  df['col2']

您可以使用duplicate找到重复的值,如您在示例中所示,这可以使用loc:

作为过滤器使用df.loc[df['col1'].duplicated(keep=False), 'col1']

你可以用一个新值来替换这些值:

temp = df['col1'] + ' - ' + df['col2']
df.loc[df['col1'].duplicated(keep=False), 'col1'] = temp

以OP:

为例
import pandas as pd
tuple = [['Jake','NY'],['Tom','Montana'],['Hannah','Cali'], ['Jason','Boston'],['Tom','Washington'],['Hannah','Florida']]
df = pd.DataFrame(tuple, columns=('col1', 'col2'))
temp = df['col1'] + ' - ' + df['col2']
df.loc[df['col1'].duplicated(keep=False), 'col1'] = temp
df
col1                col2
Jake                NY
Tom - Montana       Montana
Hannah - Cali       Cali
Jason               Boston
Tom - Washington    Washington
Hannah - Florida    Florida

相关内容

最新更新