DAASK Groupby AGG 加权平均"unknown aggregate lambda"误差



在 Dask 中,我需要根据第三列对两列的值进行分组来计算加权平均值。我正在这样做:

dask_df = dd.from_pandas(df, npartitions = 10)
wm = lambda x: np.average(x, weights=dask_df.loc[x.index,"C"])
dask_df = dask_df.groupby(['A', 'B']).agg({'C' : 
wm}).reset_index()
output_df = dask_df.compute()

在熊猫中,我的记忆耗尽了。 在达斯克,我得到:

File "<ipython-input-16-0beb32700c04>", line 3, in <module>
dask_df = dask_df.groupby(['A', 'B']).agg({'C' : wm}).reset_index()
File "/anaconda3/lib/python3.7/site-packages/dask/dataframe/groupby.py", line 1555, in agg
return self.aggregate(arg, split_every=split_every, split_out=split_out)
File "/anaconda3/lib/python3.7/site-packages/dask/dataframe/groupby.py", line 1550, in aggregate
arg, split_every=split_every, split_out=split_out
File "/anaconda3/lib/python3.7/site-packages/dask/dataframe/groupby.py", line 1355, in aggregate
chunk_funcs, aggregate_funcs, finalizers = _build_agg_args(spec)
File "/anaconda3/lib/python3.7/site-packages/dask/dataframe/groupby.py", line 659, in _build_agg_args
impls = _build_agg_args_single(result_column, func, input_column)
File "/anaconda3/lib/python3.7/site-packages/dask/dataframe/groupby.py", line 703, in _build_agg_args_single
raise ValueError("unknown aggregate {}".format(func))
ValueError: unknown aggregate lambda

您可能对自定义聚合感兴趣,定义如下:https://docs.dask.org/en/latest/dataframe-groupby.html#aggregate

显然,该错误消息可以改进。 我建议提出一个问题 https://github.com/dask/dask/issues/new

最新更新