如何逐行迭代多个数据帧并修改其中一个数据帧



我遇到了一个不适合我的循环。我想要的是从基于最终数据帧的条件的数据帧中提取值。

我有:

最终数据帧:

final = {'code': ['A001','A002','A003'],
'reg': ['2234','3432', '6578'],
'name': ['Solutions BS', 'Flying 23', 'Fast Co'],
'df2_code': ['','',''],
'df2_name': ['', '', ''],
'df3_code': ['','',''],
'df3_name': ['', '', '']}

必须填充此数据帧。具体地,前缀为df2、df3、…的列,。。。

它必须填充其他数据帧的"code"one_answers"name"列,这些数据帧包含"最终数据帧"的相同的前三个列名(code、reg、name(。条件适用于填充,两个数据帧中的"reg"编号必须相同。

其他例子:

df2 = {'code': ['P001','A002','P003'],
'reg': ['2234','3432', '9978'],
'name': ['Chips 23', 'Flying 23', 'American99']}

所以,到目前为止,这个逻辑的产物是:

final = {'code': ['A001','A002','A003'],
'reg': ['2234','3432', '6578'],
'name': ['Solutions BS', 'Flying 23', 'Fast Co'],
'df2_code': ['P001','A002',''],
'df2_name': ['Chips 23', 'Flying 23', '']}

但是,问题有点复杂。df2中存在重复的"reg"编号,用作条件。所以"df2"实际上是:

df2 = {'code': ['P001','A002','P003', 'B004'],
'reg': ['2234','3432', '9978', '2234'],
'name': ['Chips 23', 'Flying 23', 'American99', NaN]}

这必须通过在同一单元格中添加两者的"代码"one_answers"名称"来考虑。产品将是:

final = {'code': ['A001','A002','A003'],
'reg': ['2234','3432', '6578'],
'name': ['Solutions BS', 'Flying 23', 'Fast Co'],
'df2_code': ['P001&B004','A002',''],
'df2_name': ['Chips 23', 'Flying 23', '']}

到目前为止,我只为一个数据帧(df2(编写了这段代码,因为最终的df有200000多行,所以需要花费太多时间(我有5个df要扫描,但这些更小(:

for i, row in final.iterrows():
for j, inrow in df2.iterrows():
if row['reg'] == inrow['reg']:
if final['df2_code'].iloc[i] == '':
final['df2_code'].iloc[i] = str(inrow['code'])
else:
final['df2_code'].iloc[i] += '&' + str(inrow['code'])
if inrow['name'] is None:
continue
else: 
if final['df2_name'].iloc[i] == '':
final['df2_name'].iloc[i] = str(inrow['name'])
else:
final['df2_name'].iloc[i] += '&' + str(inrow['name'])

考虑Series.cat+groupby:

df2 = pd.DataFrame({'code': ['P001','A002','P003', 'B004'],
'reg': ['2234','3432', '9978', '2234'],
'name': ['Chips 23', 'Flying 23', 'American99', float('nan')]})
agg_df = (df2.assign(name = lambda x: x["name"].fillna(""))
.groupby(['reg'])
.agg({'code': lambda g: g.str.cat(sep="&"), 
'name': 'max'})
.add_prefix("df2_")
)
agg_df
#        df2_code    df2_name
# reg
# 2234  P001&B004    Chips 23
# 3432       A002   Flying 23
# 9978       P003  American99

要在数据帧之间迭代,请从数据帧列表运行水平合并(在列表理解中使用元素zip作为前缀(。

# LIST OF AGGREGATED DATA FRAMES
dfs = [
(df.assign(name = lambda x: x["name"].fillna(""))
.groupby(['reg'])
.agg({'code': lambda g: g.fillna("").str.cat(sep="&"), 
'name': max})
.add_prefix(f"{nm}_")
) 
for df, nm 
in zip([df2, df3, df4, df5], ["df2", "df3", "df4", "df5"])
]
# HORIZONTAL MERGE ON "reg"
final_df = pd.concat(dfs, axis=1)

最新更新