有一个列为ID
和Feature_1
的数据集。CCD_ 3可以理解为以秒为单位的会话的特定持续时间。还有一个自定义函数,它根据窗口宽度导致的NaN数量,在开始时添加简单平均值来计算移动平均值。这是:
def moving_average_mit_anfang(x, w):
# First part - simple average
first_part_result = np.cumsum(x)/np.cumsum(np.ones(len(x)))
# If appearence of user's sessions is greater than window width, we calculate moving average
if len(x)>w:
# Second part - moving average with window w
sec_part_result = np.convolve(x, np.ones(w), 'valid') / w
return np.append(first_part_result[:-len(sec_part_result)],sec_part_result)
# Otherwise we calculate only simple average
else:
return first_part_result
我们应该在Featrue_1
列上应用这个函数,根据相应ID的出现时间,得到每个ID的当前平均值。
示例数据帧:
pd.DataFrame(data={'ID':[1,2,3,2,3,1,2,1,3,3,3,2,1],
'Feature_1':[4,5,6,73,2,21,13,45,32,9,18,45,39]})
我试过这个:
test_df.groupby('ID')['Feature_1'].transform(lambda x: moving_average_mit_anfang(x,1))
得到了这个:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-38-6cc3e6c9b134> in <module>
----> 1 test_df.groupby('ID')['Feature_1'].transform(lambda x: moving_average_mit_anfang(x,1))
~/DS/RS/rs_env/lib/python3.8/site-packages/pandas/core/groupby/generic.py in transform(self, func, engine, engine_kwargs, *args, **kwargs)
505
506 if not isinstance(func, str):
--> 507 return self._transform_general(func, *args, **kwargs)
508
509 elif func not in base.transform_kernel_allowlist:
~/DS/RS/rs_env/lib/python3.8/site-packages/pandas/core/groupby/generic.py in _transform_general(self, func, *args, **kwargs)
535 res = res._values
536
--> 537 results.append(klass(res, index=group.index))
538
539 # check for empty "results" to avoid concat ValueError
~/DS/RS/rs_env/lib/python3.8/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
346 try:
347 if len(index) != len(data):
--> 348 raise ValueError(
349 f"Length of passed values is {len(data)}, "
350 f"index implies {len(index)}."
ValueError: Length of passed values is 6, index implies 4.
输出应该像:
ID Feature_1 Custom average
0 1 4 4.0
1 2 5 5.0
2 3 6 6.0
3 2 73 39.0
4 3 2 4.0
5 1 21 12.5
6 2 13 43.0
7 1 45 33.0
8 3 32 4.0
9 3 9 20.5
10 3 18 13.5
11 2 45 29.0
12 1 39 42.0
您的新解决方案正在运行,也可以为更简单的解决方案省略lambda函数(lambda也在运行(:
test_df['Custom average'] = test_df.groupby('ID')['Feature_1'].transform(moving_average_mit_anfang,2)
print (test_df)
ID Feature_1 Custom average
0 1 4 4.0
1 2 5 5.0
2 3 6 6.0
3 2 73 39.0
4 3 2 4.0
5 1 21 12.5
6 2 13 43.0
7 1 45 33.0
8 3 32 17.0
9 3 9 20.5
10 3 18 13.5
11 2 45 29.0
12 1 39 42.0