Python Dataframe处理两列列表并找到最小值

我有一个由列表作为元素组成的数据框架。我想从每个列表中减去一个值然后找到最小值的索引。我想找到另一列中每个列表对应的值。

我代码:

df = pd.DataFrame({'A':[[1,2,3],[1,3,5,6]]})
df
A              B
0   [1, 2, 3]       [10, 20, 30]
1   [1, 3, 5, 6]    [10, 30, 50, 60]  
# lets subtract 2 from A, find index of minimum in this result and find corresponding element in the B column
val = 2
df['A_new_min'] = (df['A'].map(np.array)-val).map(abs).map(np.argmin)
df['B_new'] = df[['A_new_min','B']].apply(lambda x: x[1][x[0]],axis=1)

当前解决方案:它产生了一个正确的解决方案，但我不想存储A_new_min，这是不必要的。我正在寻找是否有可能在一行代码中获得此结果?

df = 
A               B                 A_new_min     B_new
0   [1, 2, 3]       [10, 20, 30]        1           20
1   [1, 3, 5, 6]    [10, 30, 50, 60]    0           10

预期的解决方案:我怎么能直接得到下面的解决方案，而不必创建一个额外的和不必要的列A_new_min?简单来说，我想

df = 
A               B                 B_new
0   [1, 2, 3]       [10, 20, 30]        20
1   [1, 3, 5, 6]    [10, 30, 50, 60]    10

Withapply:

df["B_new"] = df.apply(lambda row: row["B"][np.argmin(abs(np.array(row["A"])-val))], axis=1)
>>> df
A                 B  B_new
0     [1, 2, 3]      [10, 20, 30]     20
1  [1, 3, 5, 6]  [10, 30, 50, 60]     10

在我看来，最有效的方法是只使用列表推导式。

B_new只有:

df['B_new'] = [b[min(range(len(a)), key=lambda x: abs(a[x]-val))]
for a,b in zip(df['A'], df['B'])]

输出:

A                 B  B_new
0     [1, 2, 3]      [10, 20, 30]     20
1  [1, 3, 5, 6]  [10, 30, 50, 60]     10

两列:

df2 = pd.DataFrame([[(i:=min(range(len(a)), key=lambda x: abs(a[x]-val))), b[i]]
for a,b in zip(df['A'], df['B'])], columns=['A_new_min', 'B_new'])
df.join(df2)

输出:

A                 B  A_new_min  B_new
0     [1, 2, 3]      [10, 20, 30]          1     20
1  [1, 3, 5, 6]  [10, 30, 50, 60]          0     10

计时(200k行)

# @mozway (option #1)
290 ms ± 16.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# @enke (list comprehension)
340 ms ± 16.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# @enke list comprehension + numpy
968 ms ± 17 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# @not_speshal
4.12 s ± 246 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

一个选择是使用列表推导式:

df['new'] = [arr[i] for i, arr in zip(df['A'].map(np.array).sub(2).abs().map(np.argmin), df['B'])]

另一个选择是永远不转换为numpy数组，并坚持使用列表:

df['new'] = [b[min(enumerate([abs(x-2) for x in a]), key=lambda x:x[1])[0]] for a,b in zip(df['A'], df['B'])]

输出:

A                 B  new
0     [1, 2, 3]      [10, 20, 30]   20
1  [1, 3, 5, 6]  [10, 30, 50, 60]   10

计时(200k行)

相关内容

最新更新

热门标签：