通过在数组列中应用np.maximum.reduce来Pandas组



我需要对数据框进行分组,将np.maximum.reduce(创建一个包含每个位置最大值的新数组)应用于包含numpy数组的列

例如:

import pandas as pd
import numpy as np
df = pd.DataFrame([{'name': 'John', 'points' : [1,1,3,5]},{'name': 'John', 'points' : [2,0,1,5]},{'name': 'John', 'points' : [4,1,2,2]}])
df['points'] = df['points'].apply(lambda x : np.array(x)) # converting the list column to pd.array()
df
name        points
0  John  [1, 1, 3, 5]
1  John  [2, 0, 1, 5]
2  John  [4, 1, 2, 2]

如果我尝试使用apply(np.maximum),我会得到以下错误:

result = df.groupby(['name'])['points'].apply(np.maximum.reduce).reset_index()
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

执行完groupby后的预期结果:

name         points
0  John  [4, 1, 3, 5]

如果我尝试使用np.sum(), groupby工作正常:

result = df.groupby(['name'])['points'].apply(np.sum).reset_index()
result
name         points
0  John  [7, 2, 6, 12]

但是我需要应用np.maximum.reduce函数:

a = np.array([1,1,3,5])
b = np.array([2,0,1,5])
c = np.array([4,1,2,2])
test = np.maximum.reduce([a,b,c])
test
array([4, 1, 3, 5])

使用数组的numpy效率来实现这个组比(类似于maximum.reduce)的解决方案是什么?

如果您想使用np.maximum.reduce,那么首先在分组值上应用list

df.groupby('name')['points'].apply(lambda x: np.maximum.reduce(list(x)))
name
John    [4, 1, 3, 5]
Name: points, dtype: object

替代方法:让我们尝试用np.stacknp.max应用函数

df.groupby('name')['points'].apply(lambda x: np.stack(x).max(0))
name
John    [4, 1, 3, 5]
Name: points, dtype: object

最新更新