我必须找到等于df1 col1
的df2 col1
值,然后将df1 col2
替换为同一行中的df2 col2
。
我已经尝试了.isin()
(可能不正确(和多种条件,即if (df1['col1'] == df2['col1']) & (df1['col3'] == 'x index')
i=0
for i in df1:
if df1['col1'].isin(df2['col1']):
df1['col2'] = df2['col2']
else df1['col1'].isin(df3):
df1['col2'] = df['col3']
i+=1
我是熊猫的新手,但使用Pythonic选项:
from pandas import DataFrame as DF
columns = ["col1", "col2"]
df1 = DF([
(1, "a"),
(2, "b"),
(3, "c")
], columns=columns)
df2 = DF([
(4, "x"),
(2, "y"),
(5, "z")
], columns=columns)
for i, z in enumerate(zip(df1.col1, df2.col1, df2.col2)):
compare_1, compare_2, value = z
if compare_1 == compare_2:
df1.col2[i] = value
pass
pass
print(df1)
# col1 col2
# 0 1 a
# 1 2 y <--- new value
# 2 3 c
enumerate
解释:
枚举生成(index, value_from_list)
元组
for i, value in enumerate(["x", "y", "z"]):
print(i, value, sep=": ")
# Output:
# 0: x
# 1: y
# 2: z
zip
解释:
Zip 生成可迭代对象(列表、字典等(中每个值的元组。
abc = ["a", "b", "c"]
xyz = ["x", "y", "z"]
num = [1, 2, 3]
for one_of_abc, one_of_xyz, one_of_num in zip(abc, xyz, num):
print(one_of_abc, one_of_xyz, one_of_num)
pass
# Output:
# a x 1
# b y 2
# c z 3
# Alternative without unpacking
for tuple_of_items in zip(abc, xyz, num):
print(tuple_of_items[0], tuple_of_items[1], tuple_of_items[2])
拜托,如果你找到一个不使用循环的解决方案,它总是更好。在您的情况下,查找其他列中的行可以通过内部连接来解决。我希望这是一个可以解决您问题的代码。
In [1]:
## Set the exemple with replicable code
import pandas as pd
cols = ['col1', 'col2']
data = [[100, 150],
[220, 240],
[80, 60]
]
df1 = pd.DataFrame(data=data, columns=cols).set_index('col1')
cols = ['col1', 'col2']
data = [[111, 0],
[220, 0],
[80, 0]
]
df2 = pd.DataFrame(data=data, columns=cols).set_index('col1')
## Get all the rows from df1 col1 that are in df2 col1
df_merge = df1.merge(df2, left_index=True, right_index=True, how='inner', suffixes=('_df1', '_df2'))
df_merge
Out [1]:
col2_df1 col2_df2
col1
220 240 0
80 60 0
然后执行左联接以将值从col2 df2
添加到col2 df1
In [2]:
df1 = df1.merge(df_merge, how='left', left_index=True, right_index=True)
df1.drop(axis=1, columns=['col2', 'col2_df1'], inplace=True)
df1.rename(columns={'col2_df2': 'df2'}, inplace=True)
df1
Out [2]:
df2
col1
100 NaN
220 0.0
80 0.0