Groupby.Agg(( 使用字典字典参数来命名生成的列,取而代之的是新的命名聚合方法。 但是,我在应用以前工作正常的 lambda 函数(使用字典(时遇到问题。
我正在使用Python 3.7.4,NumPy 1.16.4,Pandas 0.25.0
import numpy as np
import pandas as pd
data = [['tom', 10, 'blue', 1000, 'a'], ['nick', 15, 'blue', 2000, 'b'], ['julie', 14, 'green', 3000, 'a'], ['bob', 11, 'green', 4000, 'a'], ['cindy', 16, 'red', 5000, 'b']]
df = pd.DataFrame(data, columns = ['Name', 'Age', 'Color', 'Num', 'Letter'])
# Dict-style renaming seems to work fine:
df.groupby(by='Color').agg({'Num': {'SumNum' : np.sum, 'SumNumIfLetterA': lambda x: x[df.iloc[x.index].Letter=='a'].sum()}})
C:UsersAppDataLocalContinuumanaconda3Libsite-packagespandascoregroupbygeneric.py:1455: FutureWarning: using a dict with renaming is deprecated and will be removed
in a future version.
For column-specific groupby renaming, use named aggregation
df.groupby(...).agg(name=('column', aggfunc))
return super().aggregate(arg, *args, **kwargs)
Out[4]:
Num
SumNum SumNumIfLetterA
Color
blue 3000 1000
green 7000 7000
red 5000 0
# Named aggregation throws a KeyError:
df.groupby(by='Color').agg(SumNum = ('Num', np.sum), SumNumIfLetterA = ('Num', lambda x: x[df.iloc[x.index].Letter=='a'].sum()))
Traceback (most recent call last):
File "<ipython-input-5-9be7b560a3f5>", line 2, in <module>
df.groupby(by='Color').agg(SumNum = ('Num', np.sum), SumNumIfLetterA = ('Num', lambda x: x[df.iloc[x.index].Letter=='a'].sum()))
File "C:UsersAppDataLocalContinuumanaconda3Libsite-packagespandascoregroupbygeneric.py", line 1455, in aggregate
return super().aggregate(arg, *args, **kwargs)
File "C:UsersAppDataLocalContinuumanaconda3Libsite-packagespandascoregroupbygeneric.py", line 264, in aggregate
result = result[order]
File "C:UsersAppDataLocalContinuumanaconda3Libsite-packagespandascoreframe.py", line 2981, in __getitem__
indexer = self.loc._convert_to_indexer(key, axis=1, raise_missing=True)
File "C:UsersAppDataLocalContinuumanaconda3Libsite-packagespandascoreindexing.py", line 1271, in _convert_to_indexer
return self._get_listlike_indexer(obj, axis, **kwargs)[1]
File "C:UsersAppDataLocalContinuumanaconda3Libsite-packagespandascoreindexing.py", line 1078, in _get_listlike_indexer
keyarr, indexer, o._get_axis_number(axis), raise_missing=raise_missing
File "C:UsersAppDataLocalContinuumanaconda3Libsite-packagespandascoreindexing.py", line 1171, in _validate_read_indexer
raise KeyError("{} not in index".format(not_found))
KeyError: "[('Num', '<lambda>')] not in index"
我有一个非常相似的问题。在深入研究 github 之后,我找到了一种解决方法,即在主数据框中创建一个虚拟列。所以在你的代码中,如果你执行以下操作,它应该可以工作
data = [['tom', 10, 'blue', 1000, 'a'], ['nick', 15, 'blue', 2000, 'b'], ['julie', 14, 'green', 3000, 'a'], ['bob', 11, 'green', 4000, 'a'], ['cindy', 16, 'red', 5000, 'b']]
df = pd.DataFrame(data, columns = ['Name', 'Age', 'Color', 'Num', 'Letter'])
#Dummy Columns
df['Num1']=df['Num']
#now your groupby with NamedAgg on Num and Num1
df.groupby(by='Color').agg(SumNum = ('Num', np.sum), SumNumIfLetterA = ('Num1', lambda x: x[df.iloc[x.index].Letter=='a'].sum()))
Ipython 控制台的输出
df['Num1']=df['Num']
df.groupby(by='Color').agg(SumNum = ('Num', np.sum), SumNumIfLetterA = ('Num1', lambda x: x[df.iloc[x.index].Letter=='a'].sum()))
Out[46]:
SumNum SumNumIfLetterA
Color
blue 3000 1000
green 7000 7000
red 5000 0
希望这有效!