如何使这个数据框架更新更有效?


for index, row in df.iterrows():
print(index)
alias = row['alias']
synonym = row['synonym']
original_name = row['original_name']
new_alias = ''
if alias:
new_alias += alias
if synonym:
new_alias += '|' + synonym
if original_name:
new_alias += '|' + original_name
row['alias'] = new_alias
df[index] = row

有条件地更新每一行,并逐行将行写回DF。它工作,但很慢。有更好的方法吗?

您的代码很慢,主要是因为您试图在每次迭代中更新数据。相反,做一个循环推导,或apply。像这样:

# this only joins non-empty elements
my_join = lambda x: '|'.join([a for a in x.unique() if a])
df['alias'] = df[['alias','synonym', 'original_name']].apply(my_join, axis=1)

最新更新