我正在努力解决一个简单的问题。我有一个像这样的pandas数据框架:
results = pd.DataFrame([['executing (i) run', '2+(i)', 3],
['sampling (i) run', '3+(i)', 3]],
columns=['operation', 'executions', 'result'])
所以,输入是:
In [1]: results
Out[1]:
operation executions result
0 executing (i) run 2+(i) 3
1 sampling (i) run 3+(i) 3
我想做的是填充给定参数的结果数据框并更新单元格的值。假设i = 4,期望的输出是:
In [2]: results_populated
Out[2]:
operation executions result
0 executing (0) run 2+(0) 3
1 executing (1) run 2+(1) 3
2 executing (2) run 2+(2) 3
3 executing (3) run 2+(3) 3
4 sampling (0) run 3+(0) 3
5 sampling (1) run 3+(1) 3
6 sampling (2) run 3+(2) 3
7 sampling (3) run 3+(3) 3
我知道我可以在for循环中迭代每一行或使用iter_rows(或类似的),但当表有数百种不同的操作和"I "可以是几千。我发现这是非常理想的:
res_expanded = results.loc[results.index[results['operation'].str
.contains(r"(i)", regex = True)]
.repeat(i)].reset_index(drop=True)
并返回:
operation executions result
0 executing (i) run 2+(i) 3
1 executing (i) run 2+(i) 3
2 executing (i) run 2+(i) 3
3 executing (i) run 2+(i) 3
4 sampling (i) run 3+(i) 3
5 sampling (i) run 3+(i) 3
6 sampling (i) run 3+(i) 3
7 sampling (i) run 3+(i) 3
但是我找不到一个最优(矢量化?)的方法来执行每个单元格的更新。如有任何帮助,不胜感激。
提前感谢。
step1:
df = pd.DataFrame([['executing (i) run', '2+(i)', 3],
['sampling (i) run', '3+(i)', 3]],
columns=['operation', 'executions', 'result'])
df = df.apply(lambda x: x.repeat(4))
df
operation executions result
0 executing (i) run 2+(i) 3
0 executing (i) run 2+(i) 3
0 executing (i) run 2+(i) 3
0 executing (i) run 2+(i) 3
1 sampling (i) run 3+(i) 3
1 sampling (i) run 3+(i) 3
1 sampling (i) run 3+(i) 3
1 sampling (i) run 3+(i) 3
步骤2:
df = df.assign(tag=[*range(4)] * df.groupby('operation').ngroups)
df
operation executions result tag
0 executing (i) run 2+(i) 3 0
0 executing (i) run 2+(i) 3 1
0 executing (i) run 2+(i) 3 2
0 executing (i) run 2+(i) 3 3
1 sampling (i) run 3+(i) 3 0
1 sampling (i) run 3+(i) 3 1
1 sampling (i) run 3+(i) 3 2
1 sampling (i) run 3+(i) 3 3
步骤3:
df.apply(lambda ser: ser.map(lambda x: x.replace('(i)', f'({ser.tag})') if isinstance(x, str) else x), axis=1)
operation executions result tag
0 executing (0) run 2+(0) 3 0
0 executing (1) run 2+(1) 3 1
0 executing (2) run 2+(2) 3 2
0 executing (3) run 2+(3) 3 3
1 sampling (0) run 3+(0) 3 0
1 sampling (1) run 3+(1) 3 1
1 sampling (2) run 3+(2) 3 2
1 sampling (3) run 3+(3) 3 3
完成了!