如何填充给定参数的pandas数据框架



我正在努力解决一个简单的问题。我有一个像这样的pandas数据框架:

results = pd.DataFrame([['executing (i) run', '2+(i)', 3],
['sampling (i) run', '3+(i)', 3]],
columns=['operation', 'executions', 'result'])

所以,输入是:

In [1]: results
Out[1]: 
operation executions  result
0  executing (i) run      2+(i)       3
1   sampling (i) run      3+(i)       3

我想做的是填充给定参数的结果数据框并更新单元格的值。假设i = 4,期望的输出是:

In [2]: results_populated
Out[2]: 
operation executions  result
0  executing (0) run      2+(0)       3
1  executing (1) run      2+(1)       3
2  executing (2) run      2+(2)       3
3  executing (3) run      2+(3)       3
4   sampling (0) run      3+(0)       3
5   sampling (1) run      3+(1)       3
6   sampling (2) run      3+(2)       3
7   sampling (3) run      3+(3)       3

我知道我可以在for循环中迭代每一行或使用iter_rows(或类似的),但当表有数百种不同的操作和"I "可以是几千。我发现这是非常理想的:

res_expanded = results.loc[results.index[results['operation'].str
.contains(r"(i)", regex = True)]
.repeat(i)].reset_index(drop=True)

并返回:

operation executions  result
0  executing (i) run      2+(i)       3
1  executing (i) run      2+(i)       3
2  executing (i) run      2+(i)       3
3  executing (i) run      2+(i)       3
4   sampling (i) run      3+(i)       3
5   sampling (i) run      3+(i)       3
6   sampling (i) run      3+(i)       3
7   sampling (i) run      3+(i)       3

但是我找不到一个最优(矢量化?)的方法来执行每个单元格的更新。如有任何帮助,不胜感激。

提前感谢。

step1:

df = pd.DataFrame([['executing (i) run', '2+(i)', 3],
['sampling (i) run', '3+(i)', 3]],
columns=['operation', 'executions', 'result'])
df = df.apply(lambda x: x.repeat(4))
df
operation executions  result
0  executing (i) run      2+(i)       3
0  executing (i) run      2+(i)       3
0  executing (i) run      2+(i)       3
0  executing (i) run      2+(i)       3
1   sampling (i) run      3+(i)       3
1   sampling (i) run      3+(i)       3
1   sampling (i) run      3+(i)       3
1   sampling (i) run      3+(i)       3

步骤2:

df = df.assign(tag=[*range(4)] * df.groupby('operation').ngroups)
df
operation executions  result  tag
0  executing (i) run      2+(i)       3    0
0  executing (i) run      2+(i)       3    1
0  executing (i) run      2+(i)       3    2
0  executing (i) run      2+(i)       3    3
1   sampling (i) run      3+(i)       3    0
1   sampling (i) run      3+(i)       3    1
1   sampling (i) run      3+(i)       3    2
1   sampling (i) run      3+(i)       3    3

步骤3:

df.apply(lambda ser: ser.map(lambda x: x.replace('(i)', f'({ser.tag})') if isinstance(x, str) else x), axis=1)
operation executions  result  tag
0  executing (0) run      2+(0)       3    0
0  executing (1) run      2+(1)       3    1
0  executing (2) run      2+(2)       3    2
0  executing (3) run      2+(3)       3    3
1   sampling (0) run      3+(0)       3    0
1   sampling (1) run      3+(1)       3    1
1   sampling (2) run      3+(2)       3    2
1   sampling (3) run      3+(3)       3    3

完成了!

最新更新