我有一个看起来像这样的数据集:
entity_id transaction_date transaction_month net_flow inflow outflow
0 51 2018-07-02 2018-07-01 10161.06 20161.06 10000.00
1 51 2018-07-03 2018-07-01 5823.73 5867.37 43.64
2 51 2018-07-05 2018-07-01 17835.79 24107.29 6271.50
3 51 2018-07-06 2018-07-01 -3544.72 31782.84 35327.56
4 51 2018-07-09 2018-07-01 18252.42 18332.42 80.00
我正在尝试使用rolling
和transform
来计算整个entity_id
字段的滚动度量。我有多个变量想要创建,并且希望在一个调用中运行它们。
例如,如果我使用agg
创建这些度量,我会执行如下操作:
transactions = (
raw_transactions
.groupby(['entity_id','transaction_month'])[['inflow','outflow']]
.agg([
'sum','skew',
( 'coef_var', lambda x: x.std() / x.mean() ),
( 'kurtosis', lambda x: x.kurtosis() )
])
.reset_index()
)
但是,我无法使用transform
来复制此内容。当我尝试使用dict或list传递函数时,由于list或dict不可更改,我会得到TypeError。
>>> transactions.groupby(['entity_id'])[['inflow','outflow']].transform(['skew','mean'])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-62-4ef49d836b3f> in <module>
----> 1 transactions.groupby(['entity_id'])[['inflow','outflow']].transform(['skew','mean'])
/jupyter/packages/pandas/core/groupby/generic.py in transform(self, func, engine, engine_kwargs, *args, **kwargs)
1354
1355 # optimized transforms
-> 1356 func = self._get_cython_func(func) or func
1357
1358 if not isinstance(func, str):
/jupyter/packages/pandas/core/base.py in _get_cython_func(self, arg)
335 if we define an internal function for this argument, return it
336 """
--> 337 return self._cython_table.get(arg)
338
339 def _is_builtin_func(self, arg):
TypeError: unhashable type: 'list'
我认为transform
不可能。你至少有两个变通办法。merge
或groupby.agg
在原始数据帧上的结果:
tmp_ = (
raw_transactions
.groupby(['entity_id','transaction_month'])[['inflow','outflow']]
.agg([
'sum','skew',
( 'coef_var', lambda x: x.std() / x.mean() ),
( 'kurtosis', lambda x: x.kurtosis() )
]) #no reset_index here
)
# need to flatten multiindex columns
tmp_.columns = ['_'.join(cols) for cols in tmp_.columns]
# then merge with original dataframe
res = raw_transactions.merge(tmp_, on=['entity_id','transaction_month'])
或者使用对不同函数的列表理解在具有原始数据的CCD_ 9中进行转换
# group once
gr = raw_transactions.groupby(['entity_id'])[['inflow','outflow']]
#concat each dataframe of transformed function with otiginal data
res = pd.concat([raw_transactions] +
[gr.transform(func)
for func in ('skew', 'mean', lambda x: x.std() / x.mean() )],
axis=1, keys=('', 'skew', 'mean', 'coef_var'))
然后您可以处理名为的列