提高iloc的效率-使用startwith()合并两个表



我正在尝试合并两个表,例如:df1:

ID
A1A1
A1A1A2
B1B1B1

问题的一个解决方案是构建具有所有(可能(不同长度的临时列。在您的案例中有6列。然后,您可以将df2转换为字典,只需在字典中查找id即可。然后使用combine_first组合列。请注意,列列表的顺序与combine_first有关。

import pandas as pd
ids = ['A1A1A1', 'A1A1A2', 'B1B1B1']
df1 = pd.DataFrame(ids, columns=['ID'], index=range(3))
df2 = pd.DataFrame.from_dict({'ID': {0: 'A1A1A1', 1: 'B', 2: 'C1C'},
'Country': {0: 'France', 1: 'Egypt', 2: 'Egypt'}})
# build dictionary from df2 (dictionary is probably faster than .loc). Also it is cleaner
map_id_dict = df2.set_index('ID')['Country'].to_dict()
# Define target column
df1['Country'] = None
# Build temporary columns
cols = [f'ID_{i}' for i in range(1, 7)]
for i, col in enumerate(cols):
# lookup ids in dictionary from df2
df1[col] = df1['ID'].str[:i + 1].apply(lambda x: map_id_dict.get(x))
df1['Country'] = df1['Country'].combine_first(df1[col])
# drop temporary columns
df1 = df1.drop(columns=cols)

输出:

ID Country
0  A1A1A1  France
1  A1A1A2    None
2  B1B1B1   Egypt

最新更新